DW CS5 rtl text snippits within ltr pages

DW CS5 on MacOS 10.7.x
I sometimes need to embed short snippets of right-to-left text in pages that are predominently left-to-right.
Editing around these insertions is often very difficult.  Design View and Code View fall out of synch.   Once the snippets are inserted, successful adjustments often depend on placing the editing cursor at the opposite end of the text and editing there.  Yeah,  it is really difficult confusing editing text that isn't visible at the cursor.
Since most of the mixed material comes to me in word-processing document form, I've tried inserting
     <span DIR="RTL"> ... </span>
around each snippet and
    <span DIR="LTR"> ... </span>
around everything in the original .DOC or .rtf, etc., then copying-and-pasting from that dic ti  to Code View.  This seems to help.
Suggestions for best results?
TIA

Ken Binney:
Thanks for your response.    I DID ask for a workaround!  (A specific reply further down.)
I've continued to search and found some additional relevant material:
1) This page contains some well-thought-out  examples of inserting RTL snippets in LTR Latin English text.  See the 2 x 2  table about 1/3 of the way down,  just below the text, "This example shows how browsers that support bidirectional text behave with different DIR attribute settings."
Note the reversals (character and right- versus left-alignment) in the right column due to using dir="LTR" and dir="RTL" directives, which are shown in the left column.    What I see  makes intuitive sense to me.  It looks the same in recent versions of Firefox, Safari, and Chrome.  Which I guess means that these modern browsers all support bidirectional text.
I  saved this page into a local file and opened it in DW.  Hmmm,  the reversals do NOT occur.  The two sentences in each of the right-hand boxes are identical.   Which, to me, implies: 
DW CS5 Design View does NOT support bidirectional text. 
(A possibly equivalent statement:  DW CS5 ignores "dir=" directives.)
2) I should have mentioned from the start that I'm using Unicode throughout.    I think MacOS uses Unicode, and that's why RTL snippets inserted in LTR Latin English text work correctly in the Mac apps I checked.   I'm guessing the underlying mechanism is described here.
Whew!  That's really a lot of industrial-strength tekkie material, but for this discussion, I think it boils down to this:  Unicode-capable apps insert and detect non-printable "Bidi Controls" indicating text direction.   I can maintain a Hebrew snippet in (say) LibreOffice: it is fully Unicode-capable.
Is DW CS5 fully Unicode-capable?
3) I found this article, which specifically addresses  your workaround.   Again, some high-powered tekkie material, but the relevant bottom line seems clear enough:  don't  use CSS to control text directionality, essentially because that's too far downstream.
4) The definitive document on this issue might be this one from W3C.
Much more dense tekkie material, unusually well-explained, but:  My head is spinning!   For the moment I'm going to concentrate on one section, The "missing space' phenomenon", here.
I saved the markup for the second example (circle-arrow link at right), which I presume is completely valid.  When I tried to edit in and around the Hebrew text in DW CS5, I observed the same problems:  putting the insertion point here, resulted in editing there.
Bottom line:  The evidence I've found so far is that DW CS5 does not have full bidirectional text support and/or does not fully support Unicode.   If so, I think that this is more than sufficient to explain the poor editing results I'm seeing.
Here's a desperate hope:   The problems I'm observing can be fixed by flipping a hidden DW switch to activate full bidirectional text/Unicode operation.
Am I interpreting the evidence correctly?   Please feel free to correct me, prove me wrong, and suggest alternatives.  Even better, please --anyone-- suggest a simple way I can get back to my  actual work,  which requires embedding RTF snippets in LTR text flows, with normal editing capabilities.
TIA

Similar Messages

  • How do i read a context of text file within Dynamic Page of portal ?

    Dear sir,
    I have a text file in client computer. I hope i can transfer the context words of this text file to oracle database. How do i coding this script in dynamic page of portal?
    Can Oaracle 9ias script read and draw the context of a document. Thank you ver much!
    Ghia Liu

    Great questions, Rik, and I understand how this might seem bizarre. Here's the story... these 2500 files were authored with a built-in authoring tool of our current knowledge management system. This KMS is about to be replaced with a new one and because the new KMS needs content within the <head> tag instead of where it was in the <body> tag and redefined as meta data, I repuposed the content in those files using Dreamweaver and regular expressions.  However, the one remaining <h3> and <p> content is in the middle of the newly tagged meta data and it must be moved from the <head> area and into the <body> area.
    Because these were authored in the built-in KMS authoring tool, there is no style sheet and no, we do not manage our content with a CMS.
    You are correct, if the <h3> and <p> were at the end of the data, I could simply move the </head> and <body> tags, but unfortunately that is not the case.  They are consistenly in the same place in all 2500 files, but in the middle of the data. Following is an example:
    <html>
    <head>
    <title>Title here</title>
    <meta name="XYZ" content="Something here...">
    <h3> Blah blah</h3> <p>More blah blah</p>
    <meta name="123" content="More somthing here">
    <meta name="456" content="More somthing here">
    </head>
    <body>
    </body>
    </html>
    Again, I can find the block using a regular expression <h3>(.*?)</p>, but dont know what to do after that.
    I've been told that perl or grep might do the needed task, but that requires outside resources and I have no budget for that.
    Any suggestions are greatly appreciated.
    thanks,
    Rick

  • Newbie needs InDesign CS5 help: text overflow into new page

    Hello...  so, I've been using InDesign CS5 to create a document, because it transfers nicely into the pdf format.  I had written most of the stuff a couple months ago, am transferring it to InDesign via copy and paste, and then have to go in an "re-do" the bolds and italics and hyper links etc, b/c it does not transfer those things to InDesign.  I can live with that.  However, with this 10+ page document, how can I copy all my text and paste it onto 10 pages+ of InDesign, rather than piece-by-piece on a page-by-page basis?
    As of now, I have to take the first page of my original, and then copy and paste and estimate, add and subtract until the sizing is right...  and then I have to do that with page 2...  and page 3... and page 4.. and... well, you get the idea.  More than 10 pages of this!  Now, the straw breaking this camel's back is the fact that I have to go in and do a revision in one part...  so that I have to move and shift part of page 2 to page 3, and page 3 to page 4, and page 4 to page 5, and...   ahhhhhhhhhhhhhhhhh!!! 
    Is there ANY way to get text to simply overflow from one page to the next?  When you try to drag a text box down to the next page you've added, it stops at the bottom of the page and will not let you flow into the next page.  I am not particularly tech savvy and don't know coding or html or etc.  I am looking for a way to allow text to flow from one page to the next as if you were writing a Word doc.
    I have 80 more pages and 7 more documents to do this with.  Please help!  I want to cry at the thought of dealing with this in every single 8 to 12 page document I'm recreating. 

    All is not lost for autoflow even using copy paste. Here's a nifty little trick that will help you:
    ID does not require a text frame before you paste, so if you don't start with one, or don't use the one you've already drawn, ID will conveniently draw a 2" square text frame in the center of the screen if you issue a paste command. This frame will contain all of the copied text, but most of it will be overset (you'll see a red plus sign inthe "out port" on the lower right of the frame).
    Overset text can be picked up and flowed using autoflow, just as if you were placing a text file, so use the selection tool (black arrow) to select that small text frame if it isn't already selected, then click the red plus sign to pick up the overset. You should now have a loaded text cursor that looks sort of like the upper left corner of a page. DON'T click it anywhere yet.
    First press backspace to delete the small text frame (which should have remained selected). Now you can start your auto-flow holding the Shift key and clicking your loaded cursor wherever you like. All of the text, including what was visible in the now deleted frame is in the cursor waiting for you.

  • How do i send hyperlink to a text box within the same page

    how do i send hyperlink to a text box within the same page on iweb

    It's called an anchor. It's often discussed in this forum.
    2 days ago : anchor
    Here's a search of the past year : anchors

  • I did a COPY of some text from a web page, and then did a PASTE into notepad.exe (Windows). The text from each line was duplicated -- on the line! Instead of "Fred", it became "Fred Fred".

    I just recently installed Firefox for the first time. It seems nice and quick. The version is reported as: "10.0.1".
    I wanted to save some text from a web page, so navigated to that page, selected the text, and pressed the Control-C combination to COPY the selected text to the buffer. For example, the text I selected looked something like this:
    Harry
    Ron
    Hermione
    Hagrid
    Albus
    NOTE: Each line of text has a small icon to the left of the text.
    It is not reasonable to COPY and PASTE each line individually, as there can be hundreds of lines of data. I recall, however, that
    doing a COPY and PASTE on this data into Microsoft's Excel will produce cells which have the icons included in the cell, but unfortunately one cannot can't get rid of them! At least I've never found a way to remove them, but that's another issue. :)
    Once I'd done the COPY operation I switched to a Notepad window and did a PASTE operation. To my surprise, the text from each line was duplicated. It looked like this:
    Harry Harry
    Ron Ron
    Hermione Hermione
    Hagrid Hagrid
    Albus Albus
    Thinking that there might be something unusual about the web page I looked at the source, but it appeared "normal" -- that is, as expected.
    Note: I have done this operation several times before, and have never seen this occur before.
    Note: In the actual data some of the lines have quoted text in them. Curiously there is weird behavior on these lines. In some cases the entire line is shown only once. (These occur at the top of the line, and the quoted text is at the beginning of the name.)
    When quoted text appears "later" in the name, in some cases the quoted text is duplicated, and in other cases the quoted text is missing altogether! I have also noticed an error with the quoted text, and so will be reporting that to the web site which generates the HTML.
    Note that each line of "text" is "anchor text", so if I click on a name the browser navigates to a page for that name.
    I believe that the problem is that the COPY operating in Firefox is not simply copying the visible text, but also the ALT=
    Below is a sample of what the source HTML looks like:
    &lt;a class="lnk" target="_blank" href="http://details.aspx?id=Harry">
    &lt;img width="16" height="16" alt="Harry" class="tb_icon" src="http://.../Harry.gif"/>
    &lt;span>Harry&lt;/span>&lt;/a>
    <br/>
    (Because of the true length of the lines in the source HTML, I have stripped out the actual URL of the site.)
    To make sure I wasn't imaging this difference I repeated the process within Internet Explorer. In that browser I did not get duplicated data.

    Try:
    *Extended Copy Menu (fix version): https://addons.mozilla.org/firefox/addon/extended-copy-menu-fix-vers/

  • How do I copy text from a web page in Safari?

    I've searched up and down and can't find the answer to this simple question.
    There is a UI element that I want to copy to the clipboard and then paste into Excel. The UI element is:
    static text of group 104 of UI element 1 of scroll area 1 of group 4 of window "Account Summary"
    The contents of this static text on the web page is "$1,000.00"
    How can I copy this to the clipboard?
    I've tried:
    select static text of group 104 of UI element 1 of scroll area 1 of group 4 of window "Account Summary"
    keystroke "c" using command down
    keystroke "l" using command down
    keystroke "v" using command down
    but it doesn't work. "Select" doesn't actually seem to select anything. However, when I run this from within Script Editor, in the Results Window I get:
    {static text "$1000.00" of group 104 of UI element 1 of scroll area 1 of group 4 of window "Account Summary" of application process "Safari" of application "System Events"}
    ... I'm confused as to what this is telling me. All I want is to copy this value to the clipboard. Any suggestions???
    Thanks,
    Jeff

    Try this:
    set the clipboard to item 1 of (get name of (static text of group 104 of UI element 1 of scroll area 1 of group 4 of window "Account Summary" of application process "Safari"))
    (10906)

  • Can I embed a PDF file within a Pages document?

    I am working with Pages '09 and would like to embed a rather long PDF document file into my Pages text.  I want the reader to be able to click on the PDF file and have it open so that the reader can gather more information on the subject without making the actual Pages document text any longer.  I don't want the PDF file to be an external link that sends the reader away from the document -- I want to know if I can embed the PDF file WITHIN the pages document and have it open only when clicked.  I don't think I can do this.  Thanks.
    Small Town Gal.

    I found my answer within archived discussions - convert the file to a PDF and then embed the PDF file as an attachment.  Thanks.
    Barbara Smits

  • Creating a tab region within a page in APEX

    Hi there,
    Could someone please guide me to some examples on how i could create a tabbed region within a page?
    Thanks

    Here is what i do...
    In HTML Header I will add the below code
    <link rel="stylesheet" href="http://ajax.googleapis.com/ajax/libs/jqueryui/1.7.2/themes/redmond/jquery-ui.css" type="text/css" />
    <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.js"> </script>
    <script src="http://ajax.googleapis.com/ajax/libs/jqueryui/1.7.2/jquery-ui.js"> </script>
    <script type="text/javascript">
    $(function() {
       $("#tabs").tabs();
       $x("tabs").appendChild( $x("tabs-1"));
       $x("tabs").appendChild( $x("tabs-2"));
    </script>Then i will Create 3 Region.
    Region 1 >>> Create an HTML Region ( REGION TEMPLATE = NO TEMPLATE) and add the below code in REGION SOURCE
    <div id="tabs">
    <ul>
       <li><a href="#tabs-1">Employee</a></li>
       <li><a href="#tabs-2">Chart</a></li>
    </ul>
    </div>Region 2 >>> Create HTML Region... Add two text field to this region .. now edit the region
    Add the below code in region header of REGION 2
    <div id="tabs-1">Add the below code in region footer of REGION 2
    </div>Region 3 >>> Create HTML Region... Add two text field to this region .. now edit the region
    Add the below code in region header of REGION 3
    <div id="tabs-2">Add the below code in region footer of REGION 3
    </div>Example : http://apex.oracle.com/pls/apex/f?p=12060:7
    I have used exactly the same code... except that my Region 2 contains REPORT insted of two text field and Region 3 contains CHART instead of text field.
    Regards,
    Shijesh
    Please reward the answer if it was helpful / correct

  • How do I alphabetize within a pages document?

    How do I alphabetize a list within a pages document?

    You sort a table there, but if you want a more permanent solution, OSX has a very useful but less well known method of extending application abilities:
    Services are little snippets of code that do simple tasks and can be accessed by any suitable application from under the Application menu > Services
    First download WordServices (free).
    Unzip the file and double click on the WordService.service file. If you have the default security on your Mac, you may have to do this 3 times before it lets you open it. Once you have opened it, it will be installed.
    In Pages:
    Menu > Pages > Services > Services Preferences… > Keyboard > Shortcuts > Services > Shortcuts > scroll about half way down and check Sort Lines Ascending/Descending and any other services that catch your eye
    Now when you select your text list > Menu > Pages > Services > Sort Lines Ascending/Descending
    This is available everywhere not just Pages..
    As it is now part of your menus, you can give it your own custom keyboard shortcut in System Preferences > Keyboard > Shortcuts
    Peter

  • I  Would like to create a page within a page feel. How do I do this in CS3?

    Hi
    I have a basic understanding of dreamweaver however I'm
    trying to learn more difficult parts of the software. I'm trying to
    build a site which has a layout quite similar to
    www.seasicksteve.com/. I want a page within a page feel, but I have
    no idea how to do this. Can anyone help me?

    It could be accomplishes with DW "Set text of container" (or
    layer)"
    behavior.
    "ktbl83" <[email protected]> wrote in
    message
    news:go795r$ke7$[email protected]..
    > Hi
    >
    > I have a basic understanding of dreamweaver however I'm
    trying to learn
    > more
    > difficult parts of the software. I'm trying to build a
    site which has a
    > layout
    > quite similar to www.seasicksteve.com/. I want a page
    within a page feel,
    > but I
    > have no idea how to do this. Can anyone help me?
    >

  • Dragging to select multiple objects within a Pages document

    I am trying to select multiple, closely spaced objects (text boxes, lines, shapes, etc) within a Pages document. They are located ~1/3 of the way from the left margin of the document, and there are objects both to the left and the right of this group.
    I know that you can click outside of the Pages document with the mouse, then drag into the document to select multiple objects. However, is there any way to drag to select multiple objects without first clicking outside the document?
    Command click is a pain in the butt because I inevitably mis-click and have to start again to select all objects. I don't like having to drag in from the margin and then command-click to de-select objects I don't want.
    is there not an option to just shift-click to form a selection box around objects within the document? It seems like there should be a simple solution to this, but I have yet to find it.
    Thanks

    mattcass wrote:
    Command click is a pain in the butt
    Isn't it just! The people who created Pages should get a right spanking on this fundamental stuff up.
    All you can do is try again. I find myself repeating the same task over and over in Pages simply because you can't easily grab things.
    When you do +command click+ try and get the edges of objects, otherwise Pages thinks you are still trying to click inside the object.
    Once you have +command drag lassoed+ multiple objects you can +shift command click+ to unselect the ones you don't want. Again a problem if they are close to each other.
    Peter

  • How to read text from a web page

    I want to read text from a web page. Can any body tell me how to do it.

    Ok i tell you detail. visit the site " http://seriouswheels.com/" you will a index from A to Z which are basically car name index i want to read each page get car name and its model and store it in data base. I you can provide me the code i will be very thankful.

  • How to stop text box on iWeb page from automatically reverting to hyperlink when I click 'save'

    As far as I see, the main text box in the body section of my iWeb page reverts to a hyperlink (connecting to a file) when I click save.  I don't know how the hyperlink got activated to begin with--I must have done something, but I don't know what it was.  I tried this many times to fix it: went to Inspector "hyperlink" section, activated the text box in the page, un-clicked "make hyperlinks active" (so they are inactive), then un-clicked "Enable as a hyperlink."  That much works--the hyperlink (little white arrow in blue circle) disappears from the lower right corner of text box.  But as soon as I save (command S), the text box reverts to a hyperlink. 
    If I don't click save and move around the pages, it does not revert to a hyperlink.  It only reverts to a hyperlink when the file is saved.  Why is it doing this and how to stop this from happening--how to permanently return the text box to a non-hyperlink?
    The website page is: http://www.usronline.net/USR/VedicResearch.html.  If you click anywhere randomly in the main text, you'll get a prompt to download a PDF file.  This should not be, but I can't get this hyperlink to go away for good.
    Many thanks in advance.
    Vishnupriya

    Please pardon the reply delay--a big project deadline had to be met.  Now back to this work:
    Yes, the text in the oval object had a drop shadow (the oval object itself did not).  I removed the drop shadow from the text, but the image file icon stayed in the right corner of the oval object.  I deleted the text and then the image file icon disappeared.  But as soon as I typed text again inside the oval object (not even pasting the previous text) the image file icon appeared again.  I deleted the oval object entirely and simply typed the text inside the main text box itself.  The image file icon that was attached to that oval object is gone (along with the oval object).
    Still the image file icon is attached to the main text box, though.  I again tried to uncheck the Enable as hyperlink (same process as described in first posting) with that oval object gone, but when I "save" again it reverts to a hyperlink. 
    Next I tried this: I copied the text from the problem text box and pasted it into another text box (outside of the problem one).  Still in the new text box with the pasted text, the image file icon shows in the upper right corner. 
    I also removed all the drop shadow features from this main text box (there was one more, on some words typed in the main text box), but it is still showing as an image.
    I don't know how it became an image file, and I'm still stumped as to how to fix it (short of forgetting about trying to figure out how it became an image and just retyping all text into a new text box).
    The link you provided for "Web Safe Fonts" is returning this: "The server at www.ampsoft.net is taking too long to respond."  I'm using Arial and Georgia (Georgia only on the menu and header text) in the site, and I had verified before designing the site that these are web safe, so I had thought they were.
    If you or anyone have any further suggestions or suspicions on this matter, I'd be glad to hear them.
    Thank you.

  • Duplicate a Text Field on Every Page of a Document?

    Today I wrote my first vb.net/adobe app.  I managed to open the pdf doc in adobe and create a named text field.  I managed to insert a java script that gets the current date when the document is open and updates the text field.  I would like to duplicate the text field on all pages of the document.  Manually, I can right click on a text field and select duplicate.  I don't see a property or method that allows me to program this.  The field property "page" appears to be read only.  any help would be appreciated.  thanks.
    page
    The page number or an array of page numbers of a field. If the field has only one appearance in the document, the page property returns an integer representing the 0-based page number of the page on which the field appears. If the field has multiple appearances, it returns an array of integers, each member of which is a 0-based page number of an appearance of the field. The order in which the page numbers appear in the array is determined by the order in which the individual widgets of this field were created (and is unaffected by tab-order). If an appearance of the field is on a hidden template page, page returns a value of -1 for that appearance.
    TypeInteger | ArrayAccessRFieldsallExample 1Determine whether a particular field appears on one page, or more than one page.var f = this.getField("myField");
    if (typeof f.page == "number")
    console.println("This field only occurs once on page " + f.page);
    else
    console.println("This field occurs " + f.page.length + " times);

    I also tried the brute force method of moving from page to page and creating a text field on each page.   I used pageNum to changes the page, but, adobe didn't seem to recognize that.  I saw the page change on the screen, but, all of the textfields were created on the first page!

  • How ias integrate with Snacktory for getting main text from an html page

    Hi All,
    i am new to endeca and ias, i have an requirement, need to get main text from whole html page before ias save text to Endeca_Document_Text property,
    as ias save all text in page to endeca_document_text property, it is not ok for reading when show in web page, i use an third party API to filter out the main text from original page,
    now i want to save these text to endeca_document_text property,
    an another question,
    i get zero page when doing the logic of filtering main text from original html text in ParseFilter( HTMLMetatagFilter implements ParseFilter) using Snacktory.
    if only do little things, it will work fine, if do more thing, clawer fail to crawl page. any one know how to fix it.
    log for clawler.
    Successfully set recordstore configuration.
    INFO    2013-09-03 00:56:42,743    0    com.endeca.eidi.web.Main    [main]    Reading seed URLs from: /home/oracle/oracle/endeca/IAS/3.0.0/sample/myfirstcrawl/conf/endeca.lst
    INFO    2013-09-03 00:56:42,744    1    com.endeca.eidi.web.Main    [main]    Seed URLs: [http://www.liferay.com/community/forums/-/message_boards/category/]
    INFO    2013-09-03 00:56:43,497    754    com.endeca.eidi.web.db.CrawlDbFactory    [main]    Initialized crawldb: com.endeca.eidi.web.db.BufferedDerbyCrawlDb
    INFO    2013-09-03 00:56:43,498    755    com.endeca.eidi.web.Crawler    [main]    Using executor settings: numThreads = 100, maxThreadsPerHost=1
    INFO    2013-09-03 00:56:44,163    1420    com.endeca.eidi.web.Crawler    [main]    Fetching seed URLs.
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:56:52,889    10146    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:52,889    10146    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:52,890    10147    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:56:59,184    16441    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:59,185    16442    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:59,185    16442    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:57:07,058    24315    com.endeca.eidi.web.Crawler    [main]    Seeds complete.
    INFO    2013-09-03 00:57:07,090    24347    com.endeca.eidi.web.Crawler    [main]    Starting crawler shut down
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    Waiting for running threads to complete
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    Progress: Level: Cumulative crawl summary (level)
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    host-summary: www.liferay.com to depth 1
    host    depth    completed    total    blocks
    www.liferay.com    0    0    1    1
    www.liferay.com    1    0    0    0
    www.liferay.com    all    0    1    1
    INFO    2013-09-03 00:57:07,096    24353    com.endeca.eidi.web.Crawler    [main]    host-summary: total crawled: 0 completed. 1 total.
    INFO    2013-09-03 00:57:07,096    24353    com.endeca.eidi.web.Crawler    [main]    Shutting down CrawlDb
    INFO    2013-09-03 00:57:07,160    24417    com.endeca.eidi.web.Crawler    [main]    Progress: Host: Cumulative crawl summary (host)
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]   Host: www.liferay.com:  0 fetched. 0.0 mB. 0 records. 0 redirected. 4 retried. 0 gone. 0 filtered.
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]    Progress: Perf: All (cumulative) 23.6s. 0.0 Pages/s. 0.0 kB/s. 0 fetched. 0.0 mB. 0 records. 0 redirected. 4 retried. 0 gone. 0 filtered.
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]    Crawl complete.
    ~/oracle/endeca
    -======================================
    source code for parsefilter
    package com.endeca.eidi.web.parse;
    import java.util.Map;
    import java.util.Properties;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.log4j.Logger;
    import org.apache.nutch.metadata.Metadata;
    import org.apache.nutch.parse.HTMLMetaTags;
    import org.apache.nutch.parse.Parse;
    import org.apache.nutch.parse.ParseData;
    import org.apache.nutch.parse.ParseFilter;
    import org.apache.nutch.protocol.Content;
    import de.jetwick.snacktory.ArticleTextExtractor;
    import de.jetwick.snacktory.JResult;
    public class HTMLMetatagFilter implements ParseFilter {
        public static String METATAG_PROPERTY_NAME_PREFIX = "Endeca.Document.HTML.MetaTag.";
        public static String CONTENT_TYPE = "text/html";
        private static final Logger logger = Logger.getLogger(HTMLMetatagFilter.class);
        public Parse filter(Content content, Parse parse) throws Exception {
            logger.info("come into EndecaHtmlParser getParse");
            logger.info("come into HTMLMetatagFilter");
            //update the content with the main text in html page
            //content.setContent(HtmlExtractor.extractMainContent(content));
            parse.getData().getParseMeta().add("FILTER-HTMLMETATAG", "ACTIVE");
            ParseData parseData = parse.getData();
            if (parseData == null) return parse;
            extractText(content, parse);
            logger.info("update the content with the main text content");
            return parse;
        private void extractText(Content content, Parse parse){
            try {
                ParseData parseData = parse.getData();
                if (parseData == null) return;
                 Metadata md = parseData.getParseMeta();
                ArticleTextExtractor extractor = new ArticleTextExtractor();
                String sourceHtml = new String(content.getContent());
                JResult res = extractor.extractContent(sourceHtml);
                String text = res.getText();
                md.set("Endeca_Document_Text", text);
            } catch (Exception e) {
                // TODO: handle exception
        public static void log(String msg){
            System.out.println(msg);
        public Configuration getConf() {
            return null;
        public void setConf(Configuration conf) {

    but it only extracts URLs from <A> (anchor) tags. I want to be able to extract URLs from <MAP> tags as wellGee, do you think you could modify the code to check for "Map" attributes as well.
    Can someone maybe point a page containing info on the HTML toolkit for me?It's called the API. Since you are using the HTMLEditorKit and an ElementIterator and an AttributeSet, I would start there.
    There is no such API that says "get me all the links", so you have to do a little work on your own.
    Maybe you could use a ParserCallback and every time you get a new tag you check for the "href" attribute.

Maybe you are looking for