Parsing HTML to get DOM structure

I have been looking at the various XML libraries such as JTidy, HotSax, Xalan, Tagsoup, htmlparser, etc. trying to find a library which would allow me to parse some HTML, retrieving the DOM structure of the document, without trying to make it any better.
My goal is to write an application which is able to go through a huge bunch of html templates to modify some parts of it, and since these can be footers, headers, or just pieces of content, I don't want some HTML and BODY tags to be automatically generated...
Is there any way I could achieve that? All the libraries I tried ended up generating some extra HTML in the DOM structure which I wasn't able to get rid of...

Well, what I'm doing is a program which can process existing HTML templates so that I can refactor some patterns we have targeted to make everything more uniform.
Thus I want to be able to read HTML code, alter it, and then produce the result without adding any extra tags guessed by a cleaner. The reason is simple, since the templates are only pieces of a final page, I don't want to end up with <html> tags inside every template piece!
Oh and it is true that TagSoup is SAX based, but I mixed it with Xalan so that it produces a DOM tree. Here's the resource I found which helped me do that:
http://www.hackdiary.com/archives/000041.html

Similar Messages

Parsing HTML - best tool

Hi guys, like to know the best open source API to parse HTML and get required data from it? Hopefully one thats uses SAX Parser but the HTML not fully XML compliant, i.e XHMTL
Thanks
Abe

Thanks I found my anser to use Jericho HTML Parser. Any of you guys know of a better one?
Thanks
Abe

Parsing XML using java DOM

hi
i am trying to parse a document and change a specific text value within an element althouh when i run the program it changes the nodes text however when i check the xml file it doesnt show the changes it remains the same the code that i am using is as follow iwould be greatful if any one culd help:
// ReplaceText.java
// Reads intro.xml and replaces a text node.
// Java core packages
import java.io.*;
// Java extension packages
import javax.xml.parsers.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import javax.xml.transform.dom.*;
// third-party libraries
import org.xml.sax.*;
import org.w3c.dom.*;
public class ReplaceText {
 private Document document;
 public ReplaceText()
 // parse document, find/replace element, output result
 try {
 // obtain default parser
 DocumentBuilderFactory factory =
 DocumentBuilderFactory.newInstance();
 // set parser as validating
 factory.setValidating( true );
 // obtain object that builds Documents
 DocumentBuilder builder = factory.newDocumentBuilder();
 // set error handler for validation errors
 builder.setErrorHandler( new MyErrorHandler() );
System.err.println( "reading" );
 // obtain document object from XML document
 File f = new File("D:/Documents and Settings/Administrator/Desktop/xml adv java bk/appC/intro.xml");
 System.err.println( "reading" );
 document = builder.parse(f);
//document = builder.parse( new File( "intro.xml" ) );
System.err.println( "reading document" );
 // retrieve the root node
 Node root = document.getDocumentElement();
 if ( root.getNodeType() == Node.ELEMENT_NODE ) {
 Element myMessageNode = ( Element ) root;
 NodeList messageNodes =
 myMessageNode.getElementsByTagName( "message5" );
 if ( messageNodes.getLength() != 0 ) {
 Node message = messageNodes.item( 0 );
 System.out.println("iiiii");
 // create text node
 Text newText = document.createTextNode(
 "New Changed Message!!" );
 // get old text node
 Text oldText =
 ( Text ) message.getChildNodes().item( 0 );
 // replace text
 //message.removeChild(oldText);
 message.replaceChild( newText, oldText );
 // output Document object
 // create DOMSource for source XML document
 Source xmlSource = new DOMSource( document );
 // create StreamResult for transformation result
 Result result = new StreamResult( System.out );
 // create TransformerFactory
 TransformerFactory transformerFactory =
 TransformerFactory.newInstance();
 // create Transformer for transformation
 Transformer transformer =
 transformerFactory.newTransformer();
 transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
 transformer.setOutputProperty( OutputKeys.STANDALONE, "yes" );
 // transform and deliver content to client
 transformer.transform( xmlSource, result );
 // handle exception creating DocumentBuilder
 catch ( ParserConfigurationException parserException ) {
 parserException.printStackTrace();
 // handle exception parsing Document
 catch ( SAXException saxException ) {
 saxException.printStackTrace();
 // handle exception reading/writing data
 catch ( IOException ioException ) {
 ioException.printStackTrace();
 System.exit( 1 );
 // handle exception creating TransformerFactory
 catch (
 TransformerFactoryConfigurationError factoryError ) {
 System.err.println( "Error while creating " +
 "TransformerFactory" );
 factoryError.printStackTrace();
 // handle exception transforming document
 catch ( TransformerException transformerError ) {
 System.err.println( "Error transforming document" );
 transformerError.printStackTrace();
 public static void main( String args[] )
 ReplaceText replace = new ReplaceText();
}the xml file that i am using is as follows:
<?xml version = "1.0"?>


<!DOCTYPE myMessage [
 <!ELEMENT myMessage (message, message5)>
 <!ELEMENT message (#PCDATA)>
 <!ELEMENT message5 (#PCDATA)>
]>
<myMessage>
 <message>welcome to the xml shhhhhushu</message>
 <message5>welcome to the xml shhhhhushu</message5>
</myMessage>i would be greatful if some one could please help.....

See if the Text 'oldText' actually has any text within it. Sometimes in DOM parsing, you will get something like:
Element
 Text (blank)
 Text (actual)
 Text (blank)Whereas you would expect to receive:
Element
 Text (actual)See if that is the case. If yes, modify your logic to iterate through the child text nodes until one with actual text inside of it (getNodeValue()) is found.
- Saish

JEditorPane parsing HTML

Hi all,
I am using JEditorPane and it's ability to parse HTML, which although is relatively old and crusty is certainly all I need for the job.
Now, I understand there is a chain of classes involved in taking my .html file and turning popping into a something we can see in a JEditorPane. For example, an img tag, is picked up by HTMLEditorKit and turned into an ImageView for display purposes.
I want to do the following: I have subclassed HTMLEditorKit, and have overridden the HTMLFactory (although at the moment it just defers everything to super). I want to be able to pick out all of the html comment tags as they go through the HTMLEditorKit :
... and get to the comment text, "hey hey this is a comment", as a Java string. However I've been digging around with Element for hours now and although my HTMLFactory correctly digs out the comments from the rest of the elements:
else if (kind == HTML.Tag.COMMENT)
 {System.out.println("I found a comment but don't know what it said!!");... as you can see, I don't know how to get to the comment text itself.
The reason why I want access to the comment text is that I want to supplement the HTML code a little bit and add something in the comment that will affect the way it is rendered when I read it depending on the comment - so there's the reason if curious.
Any help, and I do mean anything at all, would be much appreciated, as this is the last obstacle in my path to getting this thing working :)
Thanks for your time!
- Peter

Here is some old code I have lying around that attempts to iterate through all the elements. If I remember correctly the comment text is found in the AttributeSet of the element:
import java.io.*;
import java.net.*;
import java.util.*;
import javax.swing.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
class GetHTML
 public static void main(String[] args)
 EditorKit kit = new HTMLEditorKit();
 Document doc = kit.createDefaultDocument();
 // The Document class does not yet handle charset's properly.
 doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
 try
 // Create a reader on the HTML content.
 Reader rd = getReader(args[0]);
 // Parse the HTML.
 kit.read(rd, doc, 0);
 System.out.println( doc.getText(0, doc.getLength()) );
 System.out.println("----");
 // Iterate through the elements of the HTML document.
 ElementIterator it = new ElementIterator(doc);
 Element elem = null;
 while ( (elem = it.next()) != null )
 AttributeSet as = elem.getAttributes();
 System.out.println( "\n" + elem.getName() + " : " + as.getAttributeCount() );
 if ( elem.getName().equals( HTML.Tag.IMG.toString() ) )
 Object o = elem.getAttributes().getAttribute( HTML.Attribute.SRC );
 System.out.println( o );
 Enumeration enum = as.getAttributeNames();
 while( enum.hasMoreElements() )
 Object name = enum.nextElement();
 Object value = as.getAttribute( name );
 System.out.println( "\t" + name + " : " + value );
 if (value instanceof DefaultComboBoxModel)
 DefaultComboBoxModel model = (DefaultComboBoxModel)value;
 for (int j = 0; j < model.getSize(); j++)
 Object o = model.getElementAt(j);
 Object selected = model.getSelectedItem();
 if ( o.equals( selected ) )
 System.out.println( o + " : selected" );
 else
 System.out.println( o );
 if ( elem.getName().equals( HTML.Tag.SELECT.toString() ) )
 Object o = as.getAttribute( HTML.Attribute.ID );
 System.out.println( o );
 // Wierd, the text for each tag is stored in a 'content' element
 if (elem.getElementCount() == 0)
 int start = elem.getStartOffset();
 int end = elem.getEndOffset();
 System.out.println( "\t" + doc.getText(start, end - start) );
 catch (Exception e)
 e.printStackTrace();
 System.exit(1);
 // Returns a reader on the HTML data. If 'uri' begins
 // with "http:", it's treated as a URL; otherwise,
 // it's assumed to be a local filename.
 static Reader getReader(String uri)
 throws IOException
 // Retrieve from Internet.
 if (uri.startsWith("http:"))
 URLConnection conn = new URL(uri).openConnection();
 return new InputStreamReader(conn.getInputStream());
 // Retrieve from file.
 else
 return new FileReader(uri);
}To test it just use:
java GetHTML somefile.html

XML Parsing exception: org.w3c.dom.ls.LSException

Hi All,
We have a WSM(Webservice Management Application) product which will generate a 'Proxy WSDL URL' for a 'Real WSDL URL' and it does security/auditing/logging/routing and other stuffs at runtime while getting a webservice request (on Proxy WSDL) and route it to the Functional(Real WSDL URL - Application server where the actual webservice is deployed) endpoint.
On receiving response from the functional endpoint, it again comes back to WSM which has to just give the response back to the user unless and until some special policies are attached (like schema validation policy - it will validate the response body based on the schema XSD) 

Here, while reading the response (from functional/application endpoint) over the wire and at the time of creating the actual SoapResponse (XmlResponse) for the end user
xercesImpl.jar is used to parse the data -lsParser.parse(lsInput); which is throwing the exception "org.w3c.dom.ls.LSException: An invalid XML character (Unicode: 0x16) was found in the element content of the document" when it sees not properly formatted XML at any cause (having incompatible data/special character).

As the exception does not even give enough information like where the XML is corrupt/having incompatible data/special character, we cant have a control to do anything from our application/product side ,as it is third party jar xercesImpl.jar. It would be really very helpful if we
> either get a option/boolean to turn off the validation logic which is done internally in xercesImpl.jar at the time of parsing 'lsParser.parse(lsInput);'
> or get additional information in the exception (original cause - like the incompatible/special character (or) a full corrupted response in the exception itself) with which we can get a clue to resolve the issue.

Thanks in Advance
Priya

I did a search on Sun site, nothing came back.
It is on http://xml.apache.org/xerces2-j though.
You might need to go there and download it. or make sure your
classpath includes the right jar file.

Parsing HTML files

Hello,
I have a question about parsing HTML files. Usually when I get an HTML file and I need to find all the text in it I do this. This stuff just collects all of the hyperlinks and ignores all the html tags just keeping the actual text. It's fine for smaller files but occasionally I'll hit a large online text file and it will work but its way to slow for large files. I don't need to do all of this HTML tag stripping however for text files. Is there a way to still grab all the text without doing any tag searching to make it faster?
thanks,
private void find() throws IOException
 //Really slow for large text files. Need a way to just use a regular scanner on an internet text file
 new ParserDelegator().parse(new InputStreamReader(myBase.openStream()),
 new ParserListener(),
 true);
 * Inner class for processing all "<a href.."> tags when reading a base URL.
 private class ParserListener extends HTMLEditorKit.ParserCallback
 final String IGNORED_LINKS = "^(http|mailto|\\W).*";
 public void handleStartTag (HTML.Tag t, MutableAttributeSet a, int pos)
 if (t == HTML.Tag.A)
 String href = (String)(a.getAttribute(HTML.Attribute.HREF));
 //System.out.println(href);
 //System.out.println(href.matches(IGNORED_LINKS) + "\t" + href);
 if (! (href == null || href.matches(IGNORED_LINKS)) && !myURLs.contains(href))
 myURLs.add(href);
 //TODO fix
 if (t == HTML.Tag.TITLE)
 String title = (String) (a.getAttribute(HTML.Attribute.TITLE));
 if (!(title == null))
 myTitle = title;
 else myTitle = "No title was found";
 public void handleText (char[] data, int pos)
 myText.append(" ");
 myText.append(data);
 }

JFactor2004 wrote:
My question is. If I know an html file is actually just a txt fileThis isn't a question. HTML files are text by definition.
is it possible to look through it (maybe use something similar to a regular scanner) without doing anything with html.That depends on what you mean by "doing something with HTML". You can certainly read it one line at a time.

Parsing HTML using Swing's HTMLEditorKit

Hi all,
I posted this question on the "Java programming", but I think I posted on the wrong forum. So, please let me know if I have posted on the wrong forum, again.
Anyway, I have read an article on parsing HTML using the Swing HTML Parser (http://java.sun.com/products/jfc/tsc/articles/bookmarks/index.html). However, I find that the HTMLEditorKit is unable to understand the <Meta> tag under the <Head> tag? Is this true? I am getting an error message:
javax.swing.text.ChangedCharSetException
at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:172)
at javax.swing.text.html.parser.Parser.startTag(Parser.java:327)
at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1786)
at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1821)
at javax.swing.text.html.parser.Parser.parse(Parser.java:1980)
at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:109)
at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:74)
at URLReader.main(URLReader.java:58)
Below is a simple code to write out the html file it reads in:
public static void main(String[] args) throws Exception {
HTMLEditorKit.ParserCallback callback = new HTMLEditorKit.ParserCallback () {
public void handleText(char[] data, int pos) {
try {
System.out.println(data);
} catch (Exception e) {
System.out.println("IOE: " + e);
Reader reader = new FileReader("myFile.html");
new ParserDelegator().parse(reader, callback, false);
The html file that is having a problem reading in is:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>NWS WSR-88D Radar System Transmit/Receive Status</title>
</head>
A <foo>xx</foo>link</html>
If I take away <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">, there is no problem.
Any suggestions? Thanks in advance.

Hi,
Setting the third argument really works!!! Yee..... haa....!!!
WORKING SOLUTION: new ParserDelegator().parse(reader, callback, TRUE);
MANY... MANY THANKS for looking at the problem!!!
Send third argument in parse method as true.

Parsing html files via an url

Hi,
I already have a Java program that is able to read in html files that are stored on my computers hard drive. Now I would like to expand its functionality by being able to parse html files straight from the web.
For example, when the program is run, I would like to be able to give it an url for a given website. Then, I would like to be able to parse the html file that the link goes to.
I've searched the forum, but have not been able to find anything of any real use. If you could offer an overview or point me towards a resource, I would be very greatful.

If you've done things right, you have a HTML reader/parser that takes an InputStream. For Files, this would be a FileInputStream.
For URLs, this would be the InputStream you get from URLConnection.getInputStream(). You can get a URLConnection by calling openConnection() on a URL instance (created from your input url of course).

Dreamweaver stopped working. XML parsing fatal error: Invalid document structure, line1 Tried reloading DW. Didn't work.

Dreamweaver (CS5) stopped working. The error message says - XML parsing fatal error: Invalid document structure, line: 1, I tried to reload DW. No change. I also tried to reset the computer to an earlier date before I reloaded. Also didn't work. Can anybody shed some light?

The first thing to try is Deleting Corrupted Cache. Be sure to turn on Hidden Files & Folders in your file manager (Win Explorer or Mac Finder).
http://forums.adobe.com/thread/494811
If that doesn't help, try Restore Preferences
http://helpx.adobe.com/dreamweaver/kb/restore-preferences-dreamweaver-cs4-cs5.html
If all else fails, use the CC Cleaner Tools below to wipe DW from your system, followed by a software re-install.
http://helpx.adobe.com/creative-suite/kb/cs5-cleaner-tool-installation-problems.html
Keep us posted on your results.
Nancy O.

Getting DTD structure from Oracle DOMParser

I am having trouble getting DTD structure from DOMParser after I parse the xml file with external DTD.
When I do:
xmlDOMParser.parse(new FileInputStream(xmlFile));
XMLDocument xmlDoc=xmlDOMParser.getDocument();
DTD docType=xmlDOMParser.getDoctype();
NamedNodeMap nodeMap=docType.getElementDecls();
the nodeMap is equal to null.
I need to get the element structure of DTD, how can I do that?

The below example is working fine for me
create table t1
   as
    select object_id id, object_name text
      from all_objects;
Create table t2
as
select t1.*, 0 session_id
    from t1
   where 1=0;
CREATE OR REPLACE TYPE t2_type
AS OBJECT (
id         number,
text       varchar2(30),
session_id number
create or replace type t2_tab_type
as table of t2_type
create or replace
function parallel_pipelined( l_cursor in sys_refcursor )
return t2_tab_type
pipelined
parallel_enable ( partition l_cursor by any )
is
      l_session_id number;
      l_rec        t1%rowtype;
begin
      select sid into l_session_id
        from v$mystat
       where rownum =1;
      loop
          fetch l_cursor into l_rec;
          exit when l_cursor%notfound;
          -- complex process here
          pipe row(t2_type(l_rec.id,l_rec.text,l_session_id));
      end loop;
      close l_cursor;
      return;
end;
/And its getting executed in parallel
SQL> select DISTINCT session_id
2    from table(parallel_pipelined
3              (CURSOR(select /*+ parallel(t1) */ *
4                        from t1 )
5               ))
6 ;
SESSION_ID
       221
        76
        77
       241
       161
       152
       160
       302
       232
       313
        73
SESSION_ID
       292
12 rows selected.But why its getting disconnected in my scenario. ???

Parse HTML behaviour

Hi,
Can anybody explain the behavior of SunOne Web server when parse HTML is enabled for all html.
If we have a valid html in the web server for instance http://servername/myhtml.html , then the page will be loaded by web server. The same case if we put anything in the URL after that then also the sma epage will get served. Consider an URL http://servername/myhtml.html/ahdjksad/asdhjsad/sdhjklsad/asjdksald (Anything after that HTML), web server would be able to load the myhtml.html page. If we disable the parse HTML , then it won�t work. I want a way to work the Server side includes where it shouldn�t server such wrong URLs.
How this comes? If we look on the web server the path won�t be there on the server and it should give 404 error. Is it the way parse HTML works? Is there any way to restrict it by keeping the parse HTML functionality for server side includes enabled other than custom NSAPI?
If anyone noticed this behavior please explain.
Thanks,
Rijesh.

Hi,
Acually i load one external html using LoadVars class
methods.
var oLoad:LoadVars=new LoadVars();
oLoad.load("external.html");
I want to parse that html in flash.
I need some text from html page(html is having 200 line code)
How can i parse that html and trace that particular text.

Parsed HTML/SSI not working in Web Server 7 on Ubuntu Server 9.10

Please help. I have SJSWS 7.0u6 on Ubuntu Server 9.10. The HTML parsing is set to parse all HTML files.
My HTML code is:
<body>

<div id="maincontent">

</div>
</body>I added the echo command later to rule out an error with my include file. I even took out the include command to rule it out completely. If I "view/page source" from firefox I allways get the code as it is above in its origonal form. The server is completely ignoring the include and the echo.
In the virtual server settings under content handling / Parsed HTML/SSI I have tried "all HTML" and "executable HTML". Both return the same result, which is no parsing whatsoever. The log is set to "finest" and so far no errors have come up. Please tell me what I am doing wrong, did I miss a step, overlook some extra settings?
I am happy to provide more detail. Just let me know what you need to see.
Thank you.
update: I tested another bare bones html and got the same results, no parsing.
Seen here : [http://kenbuxton.net/test.html]
Edited by: Ken_Buxton on Nov 17, 2009 7:53 PM

Deploy the configuration? Is there something beyond clicking save and restarting the instance? I checked the server.xml config file and the log level was at "info" even though I set it for "finest" in the GUI. I am now getting the finest details in the logs after I changed the server.xml file manualy. Here is what I am getting for test.html. ...
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, process-uri-objects reports: processing objects for URI /test.html
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, process-uri-objects reports: processing object name="default"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="match-browser" browser="*MSIE*" ssl-unclean-shutdown="true" Directive="AuthTrans" magnus-internal="1"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="match-browser" browser="*MSIE*" ssl-unclean-shutdown="true" Directive="AuthTrans" magnus-internal="1" returned -2 (REQ_NOACTION)
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="ntrans-j2ee" name="j2ee" Directive="NameTrans"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="ntrans-j2ee" name="j2ee" Directive="NameTrans" returned -2 (REQ_NOACTION)
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="pfx2dir" from="/mc-icons" dir="/sun/webserver7/lib/icons" name="es-internal" Directive="NameTrans"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="pfx2dir" from="/mc-icons" dir="/sun/webserver7/lib/icons" name="es-internal" Directive="NameTrans" returned -2 (REQ_NOACTION)
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="uri-clean" Directive="PathCheck"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="uri-clean" Directive="PathCheck" returned 0 (REQ_PROCEED)
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="find-pathinfo" Directive="PathCheck"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="find-pathinfo" Directive="PathCheck" returned -2 (REQ_NOACTION)
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="find-index-j2ee" Directive="PathCheck"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="find-index-j2ee" Directive="PathCheck" returned -2 (REQ_NOACTION)
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="find-index" index-names="index.html,home.html,index.jsp" Directive="PathCheck"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="find-index" index-names="index.html,home.html,index.jsp" Directive="PathCheck" returned -2 (REQ_NOACTION)
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="type-j2ee" Directive="ObjectType"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="type-j2ee" Directive="ObjectType" returned 0 (REQ_PROCEED)
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="type-by-extension" Directive="ObjectType"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="type-by-extension" Directive="ObjectType" returned 0 (REQ_PROCEED)
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="force-type" type="text/plain" Directive="ObjectType"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="force-type" type="text/plain" Directive="ObjectType" returned 0 (REQ_PROCEED)
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing method="(GET|HEAD|POST)" type="*~magnus-internal/*" fn="send-file" Directive="Service"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: method="(GET|HEAD|POST)" type="*~magnus-internal/*" fn="send-file" Directive="Service" returned -1 (REQ_ABORTED)
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="error-j2ee" Directive="Error"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="error-j2ee" Directive="Error" returned -2 (REQ_NOACTION)
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="flex-log" Directive="AddLog"
[19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="flex-log" Directive="AddLog" returned 0 (REQ_PROCEED)

In java, can I parse HTML file

and build a DOM tree? I think DOM level 1 support HTML , but does Java implement that one?
It would be much helpful if you can provide some sample code.
Thanks

Java has a simple parser that can parse HTML 3.2. See this thread for an example:
http://forum.java.sun.com/thread.jsp?forum=31&thread=266798
It also has a callback parser. See this article:
http://java.sun.com/products/jfc/tsc/articles/bookmarks/index.html

How to get tree structure navigation in module pool program

please send me a sample code for getting tree structure navigation in a screen in module pool program.
ex.
masters
items

do a chain and endchain on the fields.Then insert the fields in to the required database.

Parsing an XML using DOM parser in Java in Recursive fashion

I need to parse an XML using DOM parser in Java. New tags can be added to the XML in future. Code should be written in such a way that even with new tags added there should not be any code change. I felt that parsing the XML recursively can solve this problem. Can any one please share sample Java code that parses XML recursively. Thanks in Advance.

Actually, if you are planning to use DOM then you will be doing that task after you parse the data. But anyway, have you read any tutorials or books about how to process XML in Java? If not, my suggestion would be to start by doing that. You cannot learn that by fishing on forums. Try this one for example:
http://www.cafeconleche.org/books/xmljava/chapters/index.html

Parsing HTML to get DOM structure

Similar Messages

Maybe you are looking for