HTML DOM

I'm having troubles extracting form tags and their attributes from a html... Wich class should I use for this job?? There is a html parser in JavaSE, right?

It depends if the html is guaranteed to be well formed. If so you would just use an xml parser, which possible is the one built into the jdk.
start with either
SAXParserFactory, or DocumentBuilderFactory
if it is not guaranteed to be well formed then you will need a third party api.
You'll have to try different ones to see what fits your needs. Here's a starting point for looking:
http://java-source.net/open-source/html-parsers

Similar Messages

  • How to send HTML DOM to Servlet?

    How to send HTML DOM to Servlet?

    What exactly you mean by sending DOM to servlet?? if you want to post the entire html to servlet use XMLHttp object and post the entire html to servlet. You can get more info on XMLHttp at microsoft's MSDN site.

  • Oddity with JAXP HTML DOM

    Ok heres the idea. For this web app I am supposed to be finding locations in a web page by letting a user "step" his way through. Basically, you highlight a section of text, I take the node you highlighted and walk backwards and up through the tree until I hit the root. Then I turn around and show you all the steps I take. Then I follow the steps all the way back down (its supposed to show people how DOM tree's work).
    To debug this I made a simple little tool that highlights what node we are on and around it shows all its siblings so I can watch the program step its way backwards and forwards through the tree.
    Anyway, the issue is that I can get up to the Body tag easily enough (which because it is unique in the page is as high as I need to climb, I can just use GetTags By name to get the body tag again) but when I try to turn around and go back down the tree is different. When I go up on the top "level" of the tree there is only 2 nodes (HEAD and BODY) and there are 7 nodes in the next level down.
    When I walk down the tree there are 4 nodes in the top (HEAD BODY and 2 #text nodes, and the next level has 9 nodes, themselves plus 2 #text nodes.
    I don't get how this is happening. The page I am going up and down is exactly the same. I am using JAXP and HttpUnit and thats it. The page I am going up and down is saved and does not change at all between passed. How can there be differing numbers of nodes in places? Can anyone shed some light on why this could be happening

    Ok I want to make an addendum to my last post in the hope's someone can tell me what I'm doing wrong. I tried using the exact same file to go up and down (literally) rather than the one that had the highlighted section and getting the other page off its actual web site as I am supposed to do and it seems to work fine. I can only assume that there is something wrong with the way I am getting the file. I am using URLConnection to get a BufferedReader and reading the html file into a string. After I have finished adding whatever needs to ne added (so I know where the user highlighted I add a tag) I write it out to our local server as a temp file in plain text so HttpUnit can find it and build a DOM out of it.
    Is there something about the way I'm getting the html that might damage it somehow? Can anyone suggest a better way?
    Thanks

  • HTML DOM Class!

    Hello all,
    Here is my requirement..,
    My program should be able to save the webpage (including images, style sheets )in local drive by taking a URL of any website.
    For this, I will get the content by using java.net.URL class. After getting the content, i have to search for any framesets. if framesets are exists, I have to get the content of them also. Next i have to search for any images. After getting the html content and images, i have to replace the image, frameset 'src' attributes in html content point to local drive and store them in local drive.
    After getting every thing, if anybody opens that webpage from my local drive, it should not be contacted to the original site. Each and every entity I should store in local drive and repalce the entity names to point to local ones.
    Is there any Java Class to achieve this functionality?
    For eg: In browser after loading the webpage, JavaScript builds Document Object Model. It automatically creates window, document, location, anchor, img, and form Objects. Then it is very easy for me to chnage the 'src' attribute of img and frame tags.
    Waiting for your valuable replay with thousands of eyes..
    Thanks in advance,
    V.Thandava Krishna.

    Hi Thandava,
    There is a way to do this, however you would have to resort to using XML technology unless you resort to substringing all your pages. Not fun ;)
    Check out the Java XML APIs (Xerces, JAXP, JDOM etc) which should make the task a bit easier.
    Cheers,
    Anthony

  • Interacting with DOM objects using HTML

    Hi,
    I want to listen to "text selection" on a webpage in AS (i.e. as soon as a user highlights some text and releases the mouse button, an event gets fired). Events for cut, copy, and paste are available, but none for "text selection". I have read about cross-scripting between AS and JavaScript and understand it, although I do not understand yet how it can solve my problem. Also, is there any simpler way to do it directly in AS, without getting into JavaScript?
    Thank you very much
    Rehan.

    There's a non-standard onselectstart attribute (or "selectstart" event type) that looks to be the only HTML DOM event dispatched for text selection.
    You can listen for such DOM events from ActionScript like so:
    package {
        import flash.display.Sprite;
        import flash.display.StageAlign;
        import flash.display.StageScaleMode;
        import flash.events.Event;
        import flash.html.HTMLLoader;
        public class HTMLLoadderTest extends Sprite
            private var html:HTMLLoader = new HTMLLoader();
            public function HTMLLoadderTest()
                this.stage.scaleMode = StageScaleMode.NO_SCALE;
                this.stage.align = StageAlign.TOP_LEFT;
                html.loadString("<html><body><p>Something to select.</p></body></html>");
                html.width = this.stage.stageWidth;
                html.height = this.stage.stageHeight;
                html.addEventListener(Event.COMPLETE, completeHandler);
                addChild(html);
                stage.nativeWindow.activate();
            private function completeHandler(event:Event):void {
                event.target.window.document.body.addEventListener("selectstart", reportSelection);
    //Note that you have to use Object as the parameter type because the JavaScript Event class is not the same as the ActionScript Event class
            private function reportSelection( event:Object ):void
                trace(html.window.getSelection());

  • A little problem getting the style tag of a html file seperate from rest

    I'm making a program that will take in a URL and then search through that URL for all a, link, embed, frame, and img tags, find their sources, and download them. I also want to search through the style and find anything that uses a URL (ex. background-image:url('somepic.jpg')) and download that file. In the end, you should be able to go to the directory you saved it all in, open index.html, and see an exact replica of the original site. Now, my problem is that my program isn't getting the style tag's contents. Here's my code: import java.io.*;
    import java.util.*;
    import java.net.*;
    public class Test
         //-->>>> MAIN <<<<--//
         public static void main(String...a)
              try{
                   System.out.print("Enter URL: ");
                   String target = new Scanner(System.in).next();
                   URL url = null;
                   try{
                        url = new URL(target);
                   }catch(MalformedURLException x){
                        url = new URL("http://" + target);
                   Scanner scan = new Scanner(url.openStream());
                   scan.useDelimiter("<");
                   ArrayList<String> tokens = new ArrayList<String>();
                   while(scan.hasNext())
                        String str = scan.next();
                        str = str.trim();
                        Scanner tags = new Scanner(str);
                        if(tags.hasNext())
                             String tag = tags.next();
                             if(tag.equalsIgnoreCase("a") || tag.equalsIgnoreCase("img") || tag.equalsIgnoreCase("link") || tag.equalsIgnoreCase("embed") || tag.equalsIgnoreCase("frame"))
                                  tokens.add(str);
                             else if(tag.equalsIgnoreCase("style"))
                                  tokens.add(str);// This isn't adding anything
                   for(String str : tokens)
                        System.out.println(str);
              }catch(UnknownHostException x){
                   System.err.println("Host not found.");
              }catch(Exception x){
                   x.printStackTrace();
         //-->>>> FindURLAttributes <<<<--// <--- Under construction
         private static ArrayList<String> findURLAttributes(String tag)
              ArrayList<String> tokens = new ArrayList<String>();
              tokens.add(tag);
              return tokens;
    }

    I've never tried it, but it seems like using an existing html parser would be a lot easier. I've worked with xml dom parsers, and it's not really that hard. I don't imagine working with an html dom would be too difficult either, at least it wouldn't be as hard as doing it by hand. Google for java html parser and see if any of them suit your needs.

  • Display word/pdf document in the HTML region or report

    Hi,
    How to display blob content stored in a table in HTML region or report region.
    I already have a process to display the image content.
    But my question here is how to display word/pdf document within the html region so that the user can read the document without downloading it.
    Any suggestions/solutions would be of great help.
    Thanks in advance...
    Thanks,
    Ramesh P.

    I was dead wrong.
    The display of images from BLOB is a special case because APEX provides a Display Image item type.
    Moreover, HTP/HTF packages also do not provide for handling of BLOB content. So AJAX cannot be used.
    Which implies that the only way to get binary content, other than images, is with the use of a WPG_DOCLOAD.DOWNLOAD_FILE call.
    This in turn implies that it may not be feasible to "inject" the BLOB into an exist HTML DOM in the Browser.
    Regards,

  • HTML to XML Conversion ?

    Developed a content presentation java servlet implmenting xmlparser2.jar classes, works well. We're storing content (in XML) format as blob, then using parser we are able to do the transformation of the xml file to HTML for presentation.
    stream = null;
    String result = null;
    URL URLStream = new URL(xmlIn);
    ByteArrayOutputStream xbaos = new ByteArrayOutputStream();
    if(mStylesheet.startsWith("http"))
    stream = getURLInputStream(mStylesheet);
    else
    stream = new FileInputStream(mStylesheet);
    XSLProcessor processor = new XSLProcessor();
    DOMParser parser = new DOMParser();
    parser.setValidationMode(false);
    parser.setPreserveWhitespace(true);
    parser.parse(in);
    xdoc = parser.getDocument();
    XSLStylesheet xss = new XSLStylesheet(stream, URLStream);
    processor.processXSL(xss, xdoc, xbaos);
    result = xbaos.toString();
    parser.reset();
    return result; -- HTML conversion
    We are evaluating using xslt to convert the XML to a form based medium for content maintenance. Wondering if once a XML document is parsed to HTML (DOM) can it be parsed back to XML for subsequent update to stored value in blob column. Specifically interested in conversion (parser) from HTML to XML
    Simply can HTML (in DOM format validated against a xsd) be transformed back to XML ?

    Do you know of a method in the xdk that takes a well formed HTML doc and using xsd / xslt convert back to original xml spec?
    Because you created (and as long as you create) the HTML from XML it will be well formed (every tag will be ended with an end-tag) and you can therefore transform it back into XML.
    Most times it will not be possible to convert HTML found on the 'internet' into XML because this HTML is not well formed. For example, many people forget to end a paragraph of text within HTML with the </p> tag.
    We are evaluating using xslt to convert the XML to a form based medium for content maintenance. Wondering if once a XML document is parsed to HTML (DOM) can it be parsed back to XML for subsequent update to stored value in blob column. Specifically interested in conversion (parser) from HTML to XML
    Simply can HTML (in DOM format validated against a xsd) be transformed back to XML ?

  • Xml to html conversion using xslt

    xml contains exponential no i.e. number in scientific notation. When it is converd to HTML, we get NaN for that number. It happens in JDK 1.4 i.e. WLS8.1 with jdk 1.4 bea jrockit jvm.
    It worked fine with wls7 using xalan-j_2_1_0/bin/xalan.jar
    ANy solution?

    Do you know of a method in the xdk that takes a well formed HTML doc and using xsd / xslt convert back to original xml spec?
    Because you created (and as long as you create) the HTML from XML it will be well formed (every tag will be ended with an end-tag) and you can therefore transform it back into XML.
    Most times it will not be possible to convert HTML found on the 'internet' into XML because this HTML is not well formed. For example, many people forget to end a paragraph of text within HTML with the </p> tag.
    We are evaluating using xslt to convert the XML to a form based medium for content maintenance. Wondering if once a XML document is parsed to HTML (DOM) can it be parsed back to XML for subsequent update to stored value in blob column. Specifically interested in conversion (parser) from HTML to XML
    Simply can HTML (in DOM format validated against a xsd) be transformed back to XML ?

  • HTML to Tiff conversion

    Hi,
    I want to perform Html to Tiff conversion.
    The Html file is on my sys and i want to convert it into Tiff file using my java code. The html file contain Some formated text and 3-4 images.
    I have a tool (GUI) that take html file path and snap the html and convert it into Tiff. But i want it in my java programming.
    Does jave provide API for doing this, or any other vendor providing this as jar so that i include the jar and call its API for conversion.
    Thanks,
    Manish

    Do you know of a method in the xdk that takes a well formed HTML doc and using xsd / xslt convert back to original xml spec?
    Because you created (and as long as you create) the HTML from XML it will be well formed (every tag will be ended with an end-tag) and you can therefore transform it back into XML.
    Most times it will not be possible to convert HTML found on the 'internet' into XML because this HTML is not well formed. For example, many people forget to end a paragraph of text within HTML with the </p> tag.
    We are evaluating using xslt to convert the XML to a form based medium for content maintenance. Wondering if once a XML document is parsed to HTML (DOM) can it be parsed back to XML for subsequent update to stored value in blob column. Specifically interested in conversion (parser) from HTML to XML
    Simply can HTML (in DOM format validated against a xsd) be transformed back to XML ?

  • Dowloading HTML page with picture like the IE save as function

    Hi everyone,
    I would like to download a HTML page entirely ( path, pictures... ) like the IE save as function.
    Is it possible ?
    thanx

    It is.
    However doing so will be a big task. You have to write a htmlparser (I did) which checks all possible options for writing html. I based mine on the w3c standard for html 4.01. Takes a while till you get through everything. Don't ask me for it, I'm in the middle of an update to make it sure it has a bit of a descent structure.
    Then you must know all attributes which can possibly contain a link to an image. Find out if this link is not an url with a protocol. Also find out if the link is not something like a cgi. And so on for more options.
    You can look on the net for some parser but I didn't find one. There is of course Html Tidy but this one reads the htmlcode and makes a valid html document from it which is almost never the same as you wanted it to be. For crappy html code this can result in the removal of a lot of your elements.
    http://www.w3.org/TR/html4/ for the latest version of html
    http://www.w3.org/People/Raggett/tidy/ for html tidy
    http://www.w3.org/TR/2002/WD-DOM-Level-3-Core-20020114/ for the latest dom structure
    http://www.w3.org/TR/DOM-Level-2-HTML/ for the html dom
    for the last 2 get the java language binding api.
    http://xml.apache.org to download the apache implementation of the dom (no htmlparser build in), xml, sax, ...

  • How to parse HTML page

    What API or package can I use to parse an HTML page and to obtain
    HTML DOM interfaces.

    Use JTidy to make the HTML well-formed, then use the DOM parser in the Xerces API:
    JTidy (recommended by W3C, so its probably pretty good):
    http://www.w3.org/People/Raggett/tidy/
    http://sourceforge.net/projects/jtidy

  • HTML WINDOW APPEARANCE

    Happy Valentine's Day!
    QUESTION: How do I change the appearance of a pre-existent
    HTML document to appear as a framed picture.
    BACKGROUND: There are a very large number of these
    pre-existent documents, and each document will be called up by one
    or several other documents that have yet to be created. Associated
    with each pre-existent document is a folder of PNG files that are
    called into the document via three input buttons: home, next,
    previous, and close. The first PNG (home) file is automatically
    called into the document's <body> tag when the window opens.
    Upon browsing the properties and functions of the HTML DOM
    window Object as described in W3Schools.com, I discovered the
    following statement:
    quote:
    Sets whether or not the browser's tool bar is visible or not
    (can only be set before the window is opened and you must have
    UniversalBrowserWrite privilege)
    Similar comments are associated with other crucial window
    properties. These statements suggests to me that I will be unable
    to achieve my goal.
    Roddy

    Another suggestion might be, in an effort to save yourself
    some typing
    responsibilities, is to remove the QUESTION: at the beginning
    of your query,
    and replace it with a "question mark" (?) at the end of the
    aforementioned
    question. I am certain anyone qualified to respond to your
    plight will be
    adept at the use of punctuation in a non-coding enviornment.
    Another point is that in the QUESTION in question, you
    mentioned a
    "pre-existent" document, however that would imply that the
    document in fact
    does not already exist, hence the pre-. Perhaps you were
    trying to convey
    that the documents do in fact already exist, therefor making
    them existing
    documents. And not to belabor the point, if you have a
    document, it unto
    itself implies that it is already exisiting, so the use of
    existing document
    can be construed as being redundant.
    Lastly, I believe that you could in fact combine the QUESTION
    and the
    BACKGROUND into one succinct entity, thereby making your post
    more
    acceptable to the reading public, plus, and more important,
    bringing you
    closer to your goal of actually getting your question
    answered in a manor
    that you could use the information (put forth in the form of
    an answer) to
    solve your problem.
    So perhaps a better way to have set forth your post would
    have been.
    Subject: Changing window appearance
    I have a large number of HTML documents that I would like to
    change to
    appear to have a picture frame around the png.
    Here is the link
    http://www.roddysproblem.com
    Can anyone suggest an easy solution as there are quite a few
    pages.
    Whaddaya think Roddy...can you give it a go?
    "kiusau" <[email protected]> wrote in
    message
    news:[email protected]...
    > Happy Valentine's Day!
    >
    > QUESTION: How do I change the appearance of a
    pre-existent HTML document
    > to
    > appear as a framed picture.
    >
    > BACKGROUND: There are a very large number of these
    pre-existent
    > documents,
    > and each document will be called up by one or several
    other documents that
    > have
    > yet to be created. Associated with each pre-existent
    document is a folder
    > of
    > PNG files that are called into the document via three
    input buttons:
    > home,
    > next, previous, and close. The first PNG (home) file is
    automatically
    > called
    > into the document's <body> tag when the window
    opens.
    >
    > Upon browsing the properties and functions of the HTML
    DOM window Object
    > as
    > described in W3Schools.com, I discovered the following
    statement:
    >
    quote:
    Sets whether or not the browser's tool bar is visible or not
    (can only
    > be
    > set before the window is opened and you must have
    UniversalBrowserWrite
    > privilege)
    > Similar comments are associated with other crucial
    window properties.
    > These
    > statements suggests to me that I will be unable to
    achieve my goal.
    >
    > Roddy
    >
    >
    >
    >

  • Pass an argument from external jsx to html panel?

    Hi everyone,
    Can you please tell me if it's possible somehow to listen to event from the external jsx in html panel? What I'm trying to do is to pass an argument from JSX to HTML Panel and to update
    the panel with it.
    Many thanks,
    Sergey

    Hi Sergey!
    What I was suggesting privately to you (I report this here for the others' sake) is to use soon-to-be-released CEP5 technology: http://blogs.adobe.com/cssdk/2014/04/introducing-cep-5.html
    Chiefly, the part that says:
    Call from ExtendScript into HTML DOM: Most of the currently supported Adobe apps (including but not only Photoshop CC and Illustrator CC) will include a new ExternalObject which provides an API that allows developers to dispatch events from ExtendScript to the JavaScript/HTML5
    But we have to wait for the next update of CC apps to support it!
    Regards
    Davide Barranca
    www.davidebarranca.com
    www.cs-extensions.com

  • HTML component scrolling (verticalScrollPosition/scrollv)

    I'm trying to use an HTML component to do some of the rendering in my application and I need to be able to scroll the content to the bottom as things get added to it.  It looks like I should be able to use the verticalScrollPosition property of the HTML component (which should correctly set the scrollV property on the htmlLoader).  However, when I do this, I see the scrollbar jump to the bottom go immediately back to the top.
    I'm logging the scrollV value of the htmlLoader when I do this and it looks like it's being set correctly (contentHeight - htmlLoader.height).  On successive calls to set the verticalScrollPosition property, I can see that scrollV had been set correctly.
    Anyone know what might be going wrong or encountered this problem before?

    So, I think I figured this one out.  I was setting the htmlText property of my HTML component when I wanted to add content.  However, doing this causes the whole thing to redraw and scroll to the top.  Unfortunately, this meant that I had to wait until getting an HTML_RENDER event before I could rescroll to the bottom and that sometimes even then it didn't work.
    Instead, I now do everything using functions on the HTML DOM through ActionScript.  So, instead of saying:
          htmlText += "<p>blah<p>"
    I am now using:
         var p:* = html.htmlLoader.window.document.createElement('p');
         p.innerHTML = 'blah';
         html.htmlLoader.window.document.body.appendChild(p);
    This keeps the existing html content intact and allows scrolling to work much better.  It's just a little more lengthy in terms of amount of code.

Maybe you are looking for

  • Fake Find my iPhone site

    Hello, I was robbed a couple weeks ago and I keep receiving a text message from different numbers with this text: "Your iPhone 5S is powered on right now, to see its location please go to http://itracking.me/apple" If you enter this site, it's pretty

  • How to bluetooth from PC

    Hi. I'm struglling to Bluetooth from my PC to my Iphone 4. what Hardrive do I need to complete this action. Any tips on how to get it or what to do would be Appreciated. Thanks

  • Need X11... mac says it's installed but I can't find it anywhere

    My computer won't let me install X11 from the apple site because it says a newer version is already installed on my computer... except that I can't find said program anywhere. I don't have my discs, they're in florida and I'm in NY. I didn't install

  • My flash player has stopped working

    I tried uninstalling and re installing, i checked all settings as per your website. On your website when checking to see if flash is installed it says not. I'm running windows vista 32 bit and all updates are current. On you tube it says you need to

  • Al is the mushy type

    I got chocolates! Direct from Belgium! Thanks Al and... Roddy ;-) Thierry