Parsing HTML to get DOM structure

I have been looking at the various XML libraries such as JTidy, HotSax, Xalan, Tagsoup, htmlparser, etc. trying to find a library which would allow me to parse some HTML, retrieving the DOM structure of the document, without trying to make it any better.
My goal is to write an application which is able to go through a huge bunch of html templates to modify some parts of it, and since these can be footers, headers, or just pieces of content, I don't want some HTML and BODY tags to be automatically generated...
Is there any way I could achieve that? All the libraries I tried ended up generating some extra HTML in the DOM structure which I wasn't able to get rid of...

Well, what I'm doing is a program which can process existing HTML templates so that I can refactor some patterns we have targeted to make everything more uniform.
Thus I want to be able to read HTML code, alter it, and then produce the result without adding any extra tags guessed by a cleaner. The reason is simple, since the templates are only pieces of a final page, I don't want to end up with <html> tags inside every template piece!
Oh and it is true that TagSoup is SAX based, but I mixed it with Xalan so that it produces a DOM tree. Here's the resource I found which helped me do that:
http://www.hackdiary.com/archives/000041.html

Similar Messages

  • Parsing HTML - best tool

    Hi guys, like to know the best open source API to parse HTML and get required data from it? Hopefully one thats uses SAX Parser but the HTML not fully XML compliant, i.e XHMTL
    Thanks
    Abe

    Thanks I found my anser to use Jericho HTML Parser. Any of you guys know of a better one?
    Thanks
    Abe

  • Parsing XML using java DOM

    hi
    i am trying to parse a document and change a specific text value within an element althouh when i run the program it changes the nodes text however when i check the xml file it doesnt show the changes it remains the same the code that i am using is as follow iwould be greatful if any one culd help:
    //  ReplaceText.java
    // Reads intro.xml and replaces a text node.
    // Java core packages
    import java.io.*;
    // Java extension packages
    import javax.xml.parsers.*;
    import javax.xml.parsers.*;
    import javax.xml.transform.*;
    import javax.xml.transform.stream.*;
    import javax.xml.transform.dom.*;
    // third-party libraries
    import org.xml.sax.*;
    import org.w3c.dom.*;
    public class ReplaceText {
       private Document document;
       public ReplaceText()
          // parse document, find/replace element, output result
          try {
             // obtain default parser
             DocumentBuilderFactory factory =
                DocumentBuilderFactory.newInstance();
             // set parser as validating          
             factory.setValidating( true );
             // obtain object that builds Documents
             DocumentBuilder builder = factory.newDocumentBuilder();
             // set error handler for validation errors
             builder.setErrorHandler( new MyErrorHandler() );
      System.err.println( "reading" );
             // obtain document object from XML document
             File f = new File("D:/Documents and Settings/Administrator/Desktop/xml adv java bk/appC/intro.xml");
              System.err.println( "reading" );
             document = builder.parse(f);
    //document = builder.parse( new File( "intro.xml" ) );
    System.err.println( "reading document" );
             // retrieve the root node
             Node root = document.getDocumentElement();
             if ( root.getNodeType() == Node.ELEMENT_NODE ) {
                Element myMessageNode = ( Element ) root;
                NodeList messageNodes =
                   myMessageNode.getElementsByTagName( "message5" );
                if ( messageNodes.getLength() != 0 ) {
                   Node message = messageNodes.item( 0 );
                        System.out.println("iiiii");
                   // create text node
                   Text newText = document.createTextNode(
                      "New Changed Message!!" );
                   // get old text node
                   Text oldText =
                      ( Text ) message.getChildNodes().item( 0 ); 
                   // replace text
                   //message.removeChild(oldText);
                   message.replaceChild( newText, oldText );
             // output Document object
             // create DOMSource for source XML document
             Source xmlSource = new DOMSource( document );
             // create StreamResult for transformation result
             Result result = new StreamResult( System.out );
             // create TransformerFactory
             TransformerFactory transformerFactory =
                TransformerFactory.newInstance();
             // create Transformer for transformation
             Transformer transformer =
                transformerFactory.newTransformer();
             transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
               transformer.setOutputProperty( OutputKeys.STANDALONE, "yes" );
             // transform and deliver content to client
             transformer.transform( xmlSource, result );
          // handle exception creating DocumentBuilder
          catch ( ParserConfigurationException parserException ) {
             parserException.printStackTrace();
          // handle exception parsing Document
          catch ( SAXException saxException ) {
             saxException.printStackTrace();        
          // handle exception reading/writing data
          catch ( IOException ioException ) {
             ioException.printStackTrace();
             System.exit( 1 );
          // handle exception creating TransformerFactory
          catch (
             TransformerFactoryConfigurationError factoryError ) {
             System.err.println( "Error while creating " +
                "TransformerFactory" );
             factoryError.printStackTrace();
          // handle exception transforming document
          catch ( TransformerException transformerError ) {
             System.err.println( "Error transforming document" );
             transformerError.printStackTrace();
       public static void main( String args[] )
          ReplaceText replace = new ReplaceText();   
    }the xml file that i am using is as follows:
    <?xml version = "1.0"?>
    <!-- Fig. 28.10 : intro.xml             -->
    <!-- Simple introduction to XML markup   -->
    <!DOCTYPE myMessage [
         <!ELEMENT myMessage (message, message5)>
         <!ELEMENT message (#PCDATA)>
         <!ELEMENT message5 (#PCDATA)>
    ]>
    <myMessage>
         <message>welcome to the xml shhhhhushu</message>
         <message5>welcome to the xml shhhhhushu</message5>
    </myMessage>i would be greatful if some one could please help.....

    See if the Text 'oldText' actually has any text within it. Sometimes in DOM parsing, you will get something like:
    Element
       Text (blank)
       Text (actual)
       Text (blank)Whereas you would expect to receive:
    Element
       Text (actual)See if that is the case. If yes, modify your logic to iterate through the child text nodes until one with actual text inside of it (getNodeValue()) is found.
    - Saish

  • JEditorPane parsing HTML

    Hi all,
    I am using JEditorPane and it's ability to parse HTML, which although is relatively old and crusty is certainly all I need for the job.
    Now, I understand there is a chain of classes involved in taking my .html file and turning popping into a something we can see in a JEditorPane. For example, an img tag, is picked up by HTMLEditorKit and turned into an ImageView for display purposes.
    I want to do the following: I have subclassed HTMLEditorKit, and have overridden the HTMLFactory (although at the moment it just defers everything to super). I want to be able to pick out all of the html comment tags as they go through the HTMLEditorKit :
    <!-- hey hey this is a comment -->... and get to the comment text, "hey hey this is a comment", as a Java string. However I've been digging around with Element for hours now and although my HTMLFactory correctly digs out the comments from the rest of the elements:
    else if (kind == HTML.Tag.COMMENT)
                        {System.out.println("I found a comment but don't know what it said!!");... as you can see, I don't know how to get to the comment text itself.
    The reason why I want access to the comment text is that I want to supplement the HTML code a little bit and add something in the comment that will affect the way it is rendered when I read it depending on the comment - so there's the reason if curious.
    Any help, and I do mean anything at all, would be much appreciated, as this is the last obstacle in my path to getting this thing working :)
    Thanks for your time!
    - Peter

    Here is some old code I have lying around that attempts to iterate through all the elements. If I remember correctly the comment text is found in the AttributeSet of the element:
    import java.io.*;
    import java.net.*;
    import java.util.*;
    import javax.swing.*;
    import javax.swing.text.*;
    import javax.swing.text.html.*;
    class GetHTML
        public static void main(String[] args)
            EditorKit kit = new HTMLEditorKit();
            Document doc = kit.createDefaultDocument();
            // The Document class does not yet handle charset's properly.
            doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
            try
                // Create a reader on the HTML content.
                Reader rd = getReader(args[0]);
                // Parse the HTML.
                kit.read(rd, doc, 0);
                System.out.println( doc.getText(0, doc.getLength()) );
                System.out.println("----");
                // Iterate through the elements of the HTML document.
                ElementIterator it = new ElementIterator(doc);
                Element elem = null;
                while ( (elem = it.next()) != null )
                    AttributeSet as = elem.getAttributes();
                    System.out.println( "\n" + elem.getName() + " : " + as.getAttributeCount() );
                    if ( elem.getName().equals( HTML.Tag.IMG.toString() ) )
                        Object o = elem.getAttributes().getAttribute( HTML.Attribute.SRC );
                        System.out.println( o );
                    Enumeration enum = as.getAttributeNames();
                    while( enum.hasMoreElements() )
                        Object name = enum.nextElement();
                        Object value = as.getAttribute( name );
                        System.out.println( "\t" + name + " : " + value );
                        if (value instanceof DefaultComboBoxModel)
                            DefaultComboBoxModel model = (DefaultComboBoxModel)value;
                            for (int j = 0; j < model.getSize(); j++)
                                Object o = model.getElementAt(j);
                                Object selected = model.getSelectedItem();
                                if ( o.equals( selected ) )
                                    System.out.println( o + " : selected" );
                                else
                                    System.out.println( o );
                    if ( elem.getName().equals( HTML.Tag.SELECT.toString() ) )
                        Object o = as.getAttribute( HTML.Attribute.ID );
                        System.out.println( o );
                    //  Wierd, the text for each tag is stored in a 'content' element
                    if (elem.getElementCount() == 0)
                        int start = elem.getStartOffset();
                        int end = elem.getEndOffset();
                        System.out.println( "\t" + doc.getText(start, end - start) );
            catch (Exception e)
                e.printStackTrace();
            System.exit(1);
        // Returns a reader on the HTML data. If 'uri' begins
        // with "http:", it's treated as a URL; otherwise,
        // it's assumed to be a local filename.
        static Reader getReader(String uri)
            throws IOException
            // Retrieve from Internet.
            if (uri.startsWith("http:"))
                URLConnection conn = new URL(uri).openConnection();
                return new InputStreamReader(conn.getInputStream());
            // Retrieve from file.
            else
                return new FileReader(uri);
    }To test it just use:
    java GetHTML somefile.html

  • XML Parsing exception: org.w3c.dom.ls.LSException

    Hi All,
    <p>We have a WSM(Webservice Management Application) product which will generate a 'Proxy WSDL URL' for a 'Real WSDL URL' and it does security/auditing/logging/routing and other stuffs at runtime while getting a webservice request (on Proxy WSDL) and route it to the Functional(Real WSDL URL - Application server where the actual webservice is deployed) endpoint.
    On receiving response from the functional endpoint, it again comes back to WSM which has to just give the response back to the user unless and until some special policies are attached (like schema validation policy - it will validate the response body based on the schema XSD) </p>
    <p/>
    <p>Here, while reading the response (from functional/application endpoint) over the wire and at the time of creating the actual SoapResponse (XmlResponse) for the end user
    xercesImpl.jar is used to parse the data -lsParser.parse(lsInput); which is throwing the exception <b>"org.w3c.dom.ls.LSException: An invalid XML character (Unicode: 0x16) was found in the element content of the document" </b>when it sees not properly formatted XML at any cause (having incompatible data/special character).</p>
    <p/>
    <p><b>As the exception does not even give enough information like where the XML is corrupt/having incompatible data/special character, we cant have a control to do anything from our application/product side ,as it is third party jar xercesImpl.jar. It would be really very helpful if we
    > either get a option/boolean to turn off the validation logic which is done internally in xercesImpl.jar at the time of parsing 'lsParser.parse(lsInput);'
    > or get additional information in the exception (original cause - like the incompatible/special character (or) a full corrupted response in the exception itself) with which we can get a clue to resolve the issue.</b></p>
    <p/>
    Thanks in Advance
    Priya

    I did a search on Sun site, nothing came back.
    It is on http://xml.apache.org/xerces2-j though.
    You might need to go there and download it. or make sure your
    classpath includes the right jar file.

  • Parsing HTML files

    Hello,
    I have a question about parsing HTML files. Usually when I get an HTML file and I need to find all the text in it I do this. This stuff just collects all of the hyperlinks and ignores all the html tags just keeping the actual text. It's fine for smaller files but occasionally I'll hit a large online text file and it will work but its way to slow for large files. I don't need to do all of this HTML tag stripping however for text files. Is there a way to still grab all the text without doing any tag searching to make it faster?
    thanks,
    private void find() throws IOException
            //Really slow for large text files.  Need a way to just use a regular scanner on an internet text file
            new ParserDelegator().parse(new InputStreamReader(myBase.openStream()),
                    new ParserListener(),
                    true); 
         * Inner class for processing all "<a href.."> tags when reading a base URL.
        private class ParserListener extends HTMLEditorKit.ParserCallback
            final String IGNORED_LINKS = "^(http|mailto|\\W).*";
            public void handleStartTag (HTML.Tag t, MutableAttributeSet a, int pos)
                if (t == HTML.Tag.A)
                    String href = (String)(a.getAttribute(HTML.Attribute.HREF));
                    //System.out.println(href);
                    //System.out.println(href.matches(IGNORED_LINKS) + "\t" + href);
                    if (! (href == null || href.matches(IGNORED_LINKS)) && !myURLs.contains(href))
                        myURLs.add(href);
                //TODO fix
                if (t == HTML.Tag.TITLE)
                    String title = (String) (a.getAttribute(HTML.Attribute.TITLE));
                    if (!(title == null))
                        myTitle = title;
                    else myTitle = "No title was found";
            public void handleText (char[] data, int pos)
                myText.append(" ");
                myText.append(data);
        }

    JFactor2004 wrote:
    My question is. If I know an html file is actually just a txt fileThis isn't a question. HTML files are text by definition.
    is it possible to look through it (maybe use something similar to a regular scanner) without doing anything with html.That depends on what you mean by "doing something with HTML". You can certainly read it one line at a time.

  • Parsing HTML using Swing's HTMLEditorKit

    Hi all,
    I posted this question on the "Java programming", but I think I posted on the wrong forum. So, please let me know if I have posted on the wrong forum, again.
    Anyway, I have read an article on parsing HTML using the Swing HTML Parser (http://java.sun.com/products/jfc/tsc/articles/bookmarks/index.html). However, I find that the HTMLEditorKit is unable to understand the <Meta> tag under the <Head> tag? Is this true? I am getting an error message:
    javax.swing.text.ChangedCharSetException
    at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:172)
    at javax.swing.text.html.parser.Parser.startTag(Parser.java:327)
    at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1786)
    at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1821)
    at javax.swing.text.html.parser.Parser.parse(Parser.java:1980)
    at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:109)
    at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:74)
    at URLReader.main(URLReader.java:58)
    Below is a simple code to write out the html file it reads in:
    public static void main(String[] args) throws Exception {
    HTMLEditorKit.ParserCallback callback = new HTMLEditorKit.ParserCallback () {
    public void handleText(char[] data, int pos) {
    try {
    System.out.println(data);
    } catch (Exception e) {
    System.out.println("IOE: " + e);
    Reader reader = new FileReader("myFile.html");
    new ParserDelegator().parse(reader, callback, false);
    The html file that is having a problem reading in is:
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    <title>NWS WSR-88D Radar System Transmit/Receive Status</title>
    </head>
    <p>A <foo>xx</foo>link</html>
    If I take away <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">, there is no problem.
    Any suggestions? Thanks in advance.

    Hi,
    Setting the third argument really works!!! Yee..... haa....!!!
    WORKING SOLUTION: new ParserDelegator().parse(reader, callback, TRUE);
    MANY... MANY THANKS for looking at the problem!!!
    Send third argument in parse method as true.

  • Parsing html files via an url

    Hi,
    I already have a Java program that is able to read in html files that are stored on my computers hard drive. Now I would like to expand its functionality by being able to parse html files straight from the web.
    For example, when the program is run, I would like to be able to give it an url for a given website. Then, I would like to be able to parse the html file that the link goes to.
    I've searched the forum, but have not been able to find anything of any real use. If you could offer an overview or point me towards a resource, I would be very greatful.

    If you've done things right, you have a HTML reader/parser that takes an InputStream. For Files, this would be a FileInputStream.
    For URLs, this would be the InputStream you get from URLConnection.getInputStream(). You can get a URLConnection by calling openConnection() on a URL instance (created from your input url of course).

  • Dreamweaver stopped working. XML parsing fatal error: Invalid document structure, line1  Tried reloading DW. Didn't work.

    Dreamweaver (CS5) stopped working.  The error message says - XML parsing fatal error: Invalid document structure, line: 1, I tried to reload DW. No change. I also tried to reset the computer to an earlier date before I reloaded. Also didn't work. Can anybody shed some light?

    The first thing to try is Deleting Corrupted Cache.  Be sure to turn on Hidden Files & Folders in your file manager (Win Explorer or Mac Finder).
    http://forums.adobe.com/thread/494811
    If that doesn't help, try Restore Preferences
    http://helpx.adobe.com/dreamweaver/kb/restore-preferences-dreamweaver-cs4-cs5.html
    If all else fails, use the CC Cleaner Tools below to wipe DW from your system, followed by a software re-install.
    http://helpx.adobe.com/creative-suite/kb/cs5-cleaner-tool-installation-problems.html
    Keep us posted on your results.
    Nancy O.

  • Getting DTD structure from Oracle DOMParser

    I am having trouble getting DTD structure from DOMParser after I parse the xml file with external DTD.
    When I do:
    xmlDOMParser.parse(new FileInputStream(xmlFile));
    XMLDocument xmlDoc=xmlDOMParser.getDocument();
    DTD docType=xmlDOMParser.getDoctype();
    NamedNodeMap nodeMap=docType.getElementDecls();
    the nodeMap is equal to null.
    I need to get the element structure of DTD, how can I do that?

    The below example is working fine for me
    create table t1
       as
        select object_id id, object_name text
          from all_objects;
    Create table t2
      as
      select t1.*, 0 session_id
        from t1
       where 1=0;
    CREATE OR REPLACE TYPE t2_type
    AS OBJECT (
      id         number,
      text       varchar2(30),
      session_id number
    create or replace type t2_tab_type
    as table of t2_type
    create or replace
      function parallel_pipelined( l_cursor in sys_refcursor )
      return t2_tab_type
      pipelined
      parallel_enable ( partition l_cursor by any )
      is
          l_session_id number;
          l_rec        t1%rowtype;
      begin
          select sid into l_session_id
            from v$mystat
           where rownum =1;
          loop
              fetch l_cursor into l_rec;
              exit when l_cursor%notfound;
              -- complex process here
              pipe row(t2_type(l_rec.id,l_rec.text,l_session_id));
          end loop;
          close l_cursor;
          return;
      end;
      /And its getting executed in parallel
    SQL> select DISTINCT session_id
      2    from table(parallel_pipelined
      3              (CURSOR(select /*+ parallel(t1) */ *
      4                        from t1 )
      5               ))
      6  ;
    SESSION_ID
           221
            76
            77
           241
           161
           152
           160
           302
           232
           313
            73
    SESSION_ID
           292
    12 rows selected.But why its getting disconnected in my scenario. ???

  • Parse HTML behaviour

    Hi,
    Can anybody explain the behavior of SunOne Web server when parse HTML is enabled for all html.
    If we have a valid html in the web server for instance http://servername/myhtml.html , then the page will be loaded by web server. The same case if we put anything in the URL after that then also the sma epage will get served. Consider an URL http://servername/myhtml.html/ahdjksad/asdhjsad/sdhjklsad/asjdksald (Anything after that HTML), web server would be able to load the myhtml.html page. If we disable the parse HTML , then it won�t work. I want a way to work the Server side includes where it shouldn�t server such wrong URLs.
    How this comes? If we look on the web server the path won�t be there on the server and it should give 404 error. Is it the way parse HTML works? Is there any way to restrict it by keeping the parse HTML functionality for server side includes enabled other than custom NSAPI?
    If anyone noticed this behavior please explain.
    Thanks,
    Rijesh.

    Hi,
    Acually i load one external html using LoadVars class
    methods.
    var oLoad:LoadVars=new LoadVars();
    oLoad.load("external.html");
    I want to parse that html in flash.
    I need some text from html page(html is having 200 line code)
    How can i parse that html and trace that particular text.

  • Parsed HTML/SSI not working in Web Server 7 on Ubuntu Server 9.10

    Please help. I have SJSWS 7.0u6 on Ubuntu Server 9.10. The HTML parsing is set to parse all HTML files.
    My HTML code is:
    <body>
      <!--#include file="includes/corner.html"-->
      <div id="maincontent">
         <!--#echo var="DATE_GMT"-->
      </div>
    </body>I added the echo command later to rule out an error with my include file. I even took out the include command to rule it out completely. If I "view/page source" from firefox I allways get the code as it is above in its origonal form. The server is completely ignoring the include and the echo.
    In the virtual server settings under content handling / Parsed HTML/SSI I have tried "all HTML" and "executable HTML". Both return the same result, which is no parsing whatsoever. The log is set to "finest" and so far no errors have come up. Please tell me what I am doing wrong, did I miss a step, overlook some extra settings?
    I am happy to provide more detail. Just let me know what you need to see.
    Thank you.
    update: I tested another bare bones html and got the same results, no parsing.
    Seen here : [http://kenbuxton.net/test.html]
    Edited by: Ken_Buxton on Nov 17, 2009 7:53 PM

    Deploy the configuration? Is there something beyond clicking save and restarting the instance? I checked the server.xml config file and the log level was at "info" even though I set it for "finest" in the GUI. I am now getting the finest details in the logs after I changed the server.xml file manualy. Here is what I am getting for test.html. ...
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, process-uri-objects reports: processing objects for URI /test.html
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, process-uri-objects reports: processing object name="default"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="match-browser" browser="*MSIE*" ssl-unclean-shutdown="true" Directive="AuthTrans" magnus-internal="1"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="match-browser" browser="*MSIE*" ssl-unclean-shutdown="true" Directive="AuthTrans" magnus-internal="1" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="ntrans-j2ee" name="j2ee" Directive="NameTrans"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="ntrans-j2ee" name="j2ee" Directive="NameTrans" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="pfx2dir" from="/mc-icons" dir="/sun/webserver7/lib/icons" name="es-internal" Directive="NameTrans"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="pfx2dir" from="/mc-icons" dir="/sun/webserver7/lib/icons" name="es-internal" Directive="NameTrans" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="uri-clean" Directive="PathCheck"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="uri-clean" Directive="PathCheck" returned 0 (REQ_PROCEED)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="find-pathinfo" Directive="PathCheck"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="find-pathinfo" Directive="PathCheck" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="find-index-j2ee" Directive="PathCheck"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="find-index-j2ee" Directive="PathCheck" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="find-index" index-names="index.html,home.html,index.jsp" Directive="PathCheck"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="find-index" index-names="index.html,home.html,index.jsp" Directive="PathCheck" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="type-j2ee" Directive="ObjectType"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="type-j2ee" Directive="ObjectType" returned 0 (REQ_PROCEED)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="type-by-extension" Directive="ObjectType"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="type-by-extension" Directive="ObjectType" returned 0 (REQ_PROCEED)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="force-type" type="text/plain" Directive="ObjectType"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="force-type" type="text/plain" Directive="ObjectType" returned 0 (REQ_PROCEED)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing method="(GET|HEAD|POST)" type="*~magnus-internal/*" fn="send-file" Directive="Service"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: method="(GET|HEAD|POST)" type="*~magnus-internal/*" fn="send-file" Directive="Service" returned -1 (REQ_ABORTED)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="error-j2ee" Directive="Error"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="error-j2ee" Directive="Error" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="flex-log" Directive="AddLog"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="flex-log" Directive="AddLog" returned 0 (REQ_PROCEED)

  • In java, can I parse HTML file

    and build a DOM tree? I think DOM level 1 support HTML , but does Java implement that one?
    It would be much helpful if you can provide some sample code.
    Thanks

    Java has a simple parser that can parse HTML 3.2. See this thread for an example:
    http://forum.java.sun.com/thread.jsp?forum=31&thread=266798
    It also has a callback parser. See this article:
    http://java.sun.com/products/jfc/tsc/articles/bookmarks/index.html

  • How to get tree structure navigation in module pool program

    please send me a sample code for getting tree structure navigation in a screen  in module pool program.
    ex.
    masters
    items

    do a chain and endchain on the fields.Then insert the fields in to the required database.

  • Parsing an XML using DOM parser in Java in Recursive fashion

    I need to parse an XML using DOM parser in Java. New tags can be added to the XML in future. Code should be written in such a way that even with new tags added there should not be any code change. I felt that parsing the XML recursively can solve this problem. Can any one please share sample Java code that parses XML recursively. Thanks in Advance.

    Actually, if you are planning to use DOM then you will be doing that task after you parse the data. But anyway, have you read any tutorials or books about how to process XML in Java? If not, my suggestion would be to start by doing that. You cannot learn that by fishing on forums. Try this one for example:
    http://www.cafeconleche.org/books/xmljava/chapters/index.html

Maybe you are looking for