Parseing html in java

I want to remove style attibute and replace it with unique class in html in java.
Input html:
<div style="A">
<div style="B">
</div>
<div style="C">
</div>
</div>
Output updated html:
<div class="class01">
<div class="class02">
</div>
<div class="class03">
</div>
</div>
Please tell me how can I do it easly in java!
I am trying to get using code available on:
http://www.java2s.com/Tutorial/Java/0120__Development/ParseHTML.htm
If you know any other good way, then please tell me! I don't have rnd time and have to done soon.

Have your tried by yourself with any code yet? I don't beleive you even tried once.
It will not be a tough job to find and replace some string in java even within HTML. But first you need to write an algorithm for this that how you will search that particular style and replace it and it totally depends on the formatting of your HTML as mentioned above.
You need to make an algorithm and try to implement it and during the implementation if you get problems then come back here.

Similar Messages

  • Parsing HTML from Java, How

    Problem:
    I need to Connect to a URL using Java.
    Then i need to detect/ parse the html of the url for Image contents
    *So that i can replace some thing rather than the image such as with the text [IMAGE].*
    I already able to connect and read the htmls of url. But i need ur help regarding parsing for image links.
    Just i need to parse the html page to sense image links/contents on that page.
    How it can be done?

    hi shazzad,
    Could you please try with this,
    * XsdReader.java
    * Created on September 12, 2008, 11:36 AM
    * To change this template, choose Tools | Template Manager
    * and open the template in the editor.
    package EDITool;
    import java.io.File;
    import java.io.IOException;
    import java.util.Hashtable;
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import javax.xml.parsers.ParserConfigurationException;
    import org.w3c.dom.Document;
    import org.w3c.dom.Element;
    import org.w3c.dom.NamedNodeMap;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;
    import org.xml.sax.SAXException;
    * @author Rajesh
    public class XsdReader {
        /** Creates a new instance of XsdReader */
        private DocumentBuilder docBuilder;
        private Document doc;
        private DocumentBuilderFactory docBuilderFactory;
        private File xsdFile;
        private File xmlFile;
        private Hashtable dataList;
        private String modXML = "";
        public XsdReader() {
            docBuilder = null;
            doc = null;
            docBuilderFactory = DocumentBuilderFactory.newInstance();
            try{
                    docBuilder = docBuilderFactory.newDocumentBuilder();
            catch(ParserConfigurationException e)
                    System.out.println("Wrong parser configuration: " + e.getMessage());
            this.xsdFile = null;
            this.xmlFile = null;
            dataList = new Hashtable();
        public class Attr{
        private String minOccurs;
        private String maxOccurs;
        public Attr(String minOccurs, String maxOccurs)
                this.minOccurs = minOccurs;
                this.maxOccurs = maxOccurs;
        public void xsdParser(String xsdInputFileName,int level)
            try{
                this.xsdFile = new File(xsdInputFileName);
            catch(Exception e)
                System.out.println("File Not Exception: "+e);
            try{
              doc = docBuilder.parse(this.xsdFile);
                    NodeList nodeList = doc.getChildNodes();
                    xsdRecursive(nodeList,level);
         catch(SAXException e)
              System.out.println("Wrong XML file structure: " + e.getMessage());
         catch(IOException e)
              System.out.println("Wrong XML file structure: " + e.getMessage());
         catch(Exception e)
              System.out.println("Error: "+e);
        public void xsdRecursive(NodeList nodeList, int level)
            try{
              for(int i=0;i<nodeList.getLength();i++)
                   Node node = nodeList.item(i);
                   if(node.getNodeType() == node.ELEMENT_NODE  && level == 1)
                        if(node.hasAttributes())
                             Element e = (Element)node;
                             String minOccursValue = e.getAttribute("minOccurs");
                             if(!minOccursValue.equals(""))
                                  String name = e.getAttribute("name");
                                  if(name.equals(""))
                                  name = e.getAttribute("ref");
                                  if(name.equals(""))
                                  continue;
                                  String minOccurs = e.getAttribute("minOccurs");
                                  String maxOccurs = e.getAttribute("maxOccurs");
                                  System.out.println(name);
                                  this.dataList.put(name,new Attr(minOccurs,maxOccurs));
                            else if(node.getNodeType() == node.ELEMENT_NODE  && level == 2)
                                String currentTagName = node.getNodeName();
                                Attr attr = (Attr)this.dataList.get(currentTagName);
                                NamedNodeMap nodeAttr = node.getAttributes();
                                org.dom4j.Node dom4jNode = (org.dom4j.Node) node;
                                org.dom4j.Element e = (org.dom4j.Element) dom4jNode;
                                e.addAttribute("min", attr.minOccurs);
                                e.addAttribute("max", attr.maxOccurs);
                   if(node.hasChildNodes())
                   xsdRecursive(node.getChildNodes(),level);
              catch(Exception e)
                   System.out.println("Error: "+e);
        public void xsdRecursive(Node node, int level)
            NodeList nodeList = node.getChildNodes();
            try{
                for(int i=0;i<nodeList.getLength();i++)
                if(node.getNodeType() == Node.ELEMENT_NODE )
                   // if(level == 1)
                    if(node.hasAttributes())
                        Element e = (Element)node;
                        String minOccursValue = e.getAttribute("minOccurs");
                        if(!minOccursValue.equals(""))
                                String name = e.getAttribute("name");
                                if(name.equals(""))
                                name = e.getAttribute("ref");
                                String minOccurs = e.getAttribute("minOccurs");
                                String maxOccurs = e.getAttribute("maxOccurs");
                                System.out.println(name);
                                this.dataList.put(name,new Attr(minOccurs,maxOccurs));
                    else if(level == 2)
                    System.out.println("rajesh");
                    String currentTagName = node.getNodeName();
                    Attr attr = (Attr)this.dataList.get(currentTagName);
                    NamedNodeMap nodeAttr = node.getAttributes();
                    org.w3c.dom.Attr minAttr = ((Document)node).createAttribute("min");
                    minAttr.setValue(attr.minOccurs);
                    org.w3c.dom.Attr maxAttr = ((Document)node).createAttribute("max");
                    minAttr.setValue(attr.maxOccurs);
                    Element e =(Element)node;
                    System.out.println(e.getNodeName()+"\t"+e.getAttribute("min"));
            catch(Exception e)
                System.out.println("Exception in xsdRecursive: "+e);
        public void display()
            Attr attr = (Attr) this.dataList.get("ISA01");
            System.out.println(attr.maxOccurs+"\t"+attr.minOccurs);
        public static void main(String s[])
            XsdReader xsdReader = new XsdReader();
            //String xsdInputFileName = "C:/Documents and Settings/vs73471/Documents/SAP/workspace/Sample/src/packages/com/sap/java/Copy of 850.xml";
            String xsdInputFileName = "C:/Documents and Settings/vs73471/Desktop/www.html";
            xsdReader.xsdParser(xsdInputFileName, 1);
            System.out.println(xsdReader.dataList.size());
            String xmlInputFileName = "C:/Documents and Settings/vs73471/Desktop/Reference XML Generate/850_Dummy.xml";
            xsdReader.xsdParser(xmlInputFileName, 2);
          //  xsdReader.display();
    }

  • In java, can I parse HTML file

    and build a DOM tree? I think DOM level 1 support HTML , but does Java implement that one?
    It would be much helpful if you can provide some sample code.
    Thanks

    Java has a simple parser that can parse HTML 3.2. See this thread for an example:
    http://forum.java.sun.com/thread.jsp?forum=31&thread=266798
    It also has a callback parser. See this article:
    http://java.sun.com/products/jfc/tsc/articles/bookmarks/index.html

  • Embedding html in .java file comments

    for years now the convention has been to liberally use html in java source code comments
    to beautify javadocs. for example, i have taken this out of the source code for java.lang.System:
    * The <code>System</code> class contains several useful class fields
    * and methods. It cannot be instantiated.
    * <p>
    * Among the facilities provided by the <code>System</code> class
    * are standard input, standard output, and error output streams;
    * access to externally defined "properties"; a means of
    * loading files and libraries; and a utility method for quickly
    * copying a portion of an array.
    * @author Arthur van Hoff
    * @version 1.125, 12/03/01
    * @since JDK1.0
    the problem i see with this is that the tags must be stripped if one is producing
    documentation in formats other than html (swing doclet, or pdf doclet, or any
    application of xml).
    aren't we forever tying ourselves to html? is there any intention on the part of
    j2se/j2ee developers to discontinue this practice? i've begun working on a xul
    doclet and am having to enclose the documentation in cdata to maintain the
    validity of the document i produce (i could also just strip out the tags of course).
    thanks, eitan

    Hi Eitan,
    I see little problem as long we provide a way to
    parse whatever format is being used.
    Please take a look at the commentdom package in
    the "Doclet Refactoring Design" under "What's New"
    on the right side of the Javadoc home page:
    http://java.sun.com/j2se/javadoc/
    Here's its description:
    com.sun.tools.doclet.toolkit.util.commentdom
    DocComment to DOM Translator - Documentation comment contains embedded HTML,
    Javadoc inline tags as well as custom inline tags. This package will contain classes that will help
    generate a DOM tree for the documentation comment. This can be of great help for doclets that
    generate documentation in formats other than HTML. Such doclets don?t have to undergo the
    tedious task of parsing the documentation comments for all the tags. Doclets can then directly
    traverse the DOM tree and then convert the tags to appropriate format.
    The DOM tree will adhere to the doccomment DTD that will be published later. The DTD will
    allow required HTML tags along with their attributes as defined in HTML4.0 specifications. The
    DTD will also support Javadoc-defined inline tags as well as user-defined custom inline tags.
    We plan for the doclet toolkit other than this part to be in Tiger;
    we haven't committed dates to delivering this piece, but feel it
    would be an important part of the toolkit.
    -Doug Kramer
    Javadoc team

  • JEditorPane parsing HTML

    Hi all,
    I am using JEditorPane and it's ability to parse HTML, which although is relatively old and crusty is certainly all I need for the job.
    Now, I understand there is a chain of classes involved in taking my .html file and turning popping into a something we can see in a JEditorPane. For example, an img tag, is picked up by HTMLEditorKit and turned into an ImageView for display purposes.
    I want to do the following: I have subclassed HTMLEditorKit, and have overridden the HTMLFactory (although at the moment it just defers everything to super). I want to be able to pick out all of the html comment tags as they go through the HTMLEditorKit :
    <!-- hey hey this is a comment -->... and get to the comment text, "hey hey this is a comment", as a Java string. However I've been digging around with Element for hours now and although my HTMLFactory correctly digs out the comments from the rest of the elements:
    else if (kind == HTML.Tag.COMMENT)
                        {System.out.println("I found a comment but don't know what it said!!");... as you can see, I don't know how to get to the comment text itself.
    The reason why I want access to the comment text is that I want to supplement the HTML code a little bit and add something in the comment that will affect the way it is rendered when I read it depending on the comment - so there's the reason if curious.
    Any help, and I do mean anything at all, would be much appreciated, as this is the last obstacle in my path to getting this thing working :)
    Thanks for your time!
    - Peter

    Here is some old code I have lying around that attempts to iterate through all the elements. If I remember correctly the comment text is found in the AttributeSet of the element:
    import java.io.*;
    import java.net.*;
    import java.util.*;
    import javax.swing.*;
    import javax.swing.text.*;
    import javax.swing.text.html.*;
    class GetHTML
        public static void main(String[] args)
            EditorKit kit = new HTMLEditorKit();
            Document doc = kit.createDefaultDocument();
            // The Document class does not yet handle charset's properly.
            doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
            try
                // Create a reader on the HTML content.
                Reader rd = getReader(args[0]);
                // Parse the HTML.
                kit.read(rd, doc, 0);
                System.out.println( doc.getText(0, doc.getLength()) );
                System.out.println("----");
                // Iterate through the elements of the HTML document.
                ElementIterator it = new ElementIterator(doc);
                Element elem = null;
                while ( (elem = it.next()) != null )
                    AttributeSet as = elem.getAttributes();
                    System.out.println( "\n" + elem.getName() + " : " + as.getAttributeCount() );
                    if ( elem.getName().equals( HTML.Tag.IMG.toString() ) )
                        Object o = elem.getAttributes().getAttribute( HTML.Attribute.SRC );
                        System.out.println( o );
                    Enumeration enum = as.getAttributeNames();
                    while( enum.hasMoreElements() )
                        Object name = enum.nextElement();
                        Object value = as.getAttribute( name );
                        System.out.println( "\t" + name + " : " + value );
                        if (value instanceof DefaultComboBoxModel)
                            DefaultComboBoxModel model = (DefaultComboBoxModel)value;
                            for (int j = 0; j < model.getSize(); j++)
                                Object o = model.getElementAt(j);
                                Object selected = model.getSelectedItem();
                                if ( o.equals( selected ) )
                                    System.out.println( o + " : selected" );
                                else
                                    System.out.println( o );
                    if ( elem.getName().equals( HTML.Tag.SELECT.toString() ) )
                        Object o = as.getAttribute( HTML.Attribute.ID );
                        System.out.println( o );
                    //  Wierd, the text for each tag is stored in a 'content' element
                    if (elem.getElementCount() == 0)
                        int start = elem.getStartOffset();
                        int end = elem.getEndOffset();
                        System.out.println( "\t" + doc.getText(start, end - start) );
            catch (Exception e)
                e.printStackTrace();
            System.exit(1);
        // Returns a reader on the HTML data. If 'uri' begins
        // with "http:", it's treated as a URL; otherwise,
        // it's assumed to be a local filename.
        static Reader getReader(String uri)
            throws IOException
            // Retrieve from Internet.
            if (uri.startsWith("http:"))
                URLConnection conn = new URL(uri).openConnection();
                return new InputStreamReader(conn.getInputStream());
            // Retrieve from file.
            else
                return new FileReader(uri);
    }To test it just use:
    java GetHTML somefile.html

  • Parsing HTML using Swing's HTMLEditorKit

    Hi all,
    I posted this question on the "Java programming", but I think I posted on the wrong forum. So, please let me know if I have posted on the wrong forum, again.
    Anyway, I have read an article on parsing HTML using the Swing HTML Parser (http://java.sun.com/products/jfc/tsc/articles/bookmarks/index.html). However, I find that the HTMLEditorKit is unable to understand the <Meta> tag under the <Head> tag? Is this true? I am getting an error message:
    javax.swing.text.ChangedCharSetException
    at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:172)
    at javax.swing.text.html.parser.Parser.startTag(Parser.java:327)
    at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1786)
    at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1821)
    at javax.swing.text.html.parser.Parser.parse(Parser.java:1980)
    at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:109)
    at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:74)
    at URLReader.main(URLReader.java:58)
    Below is a simple code to write out the html file it reads in:
    public static void main(String[] args) throws Exception {
    HTMLEditorKit.ParserCallback callback = new HTMLEditorKit.ParserCallback () {
    public void handleText(char[] data, int pos) {
    try {
    System.out.println(data);
    } catch (Exception e) {
    System.out.println("IOE: " + e);
    Reader reader = new FileReader("myFile.html");
    new ParserDelegator().parse(reader, callback, false);
    The html file that is having a problem reading in is:
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    <title>NWS WSR-88D Radar System Transmit/Receive Status</title>
    </head>
    <p>A <foo>xx</foo>link</html>
    If I take away <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">, there is no problem.
    Any suggestions? Thanks in advance.

    Hi,
    Setting the third argument really works!!! Yee..... haa....!!!
    WORKING SOLUTION: new ParserDelegator().parse(reader, callback, TRUE);
    MANY... MANY THANKS for looking at the problem!!!
    Send third argument in parse method as true.

  • Parsing html files via an url

    Hi,
    I already have a Java program that is able to read in html files that are stored on my computers hard drive. Now I would like to expand its functionality by being able to parse html files straight from the web.
    For example, when the program is run, I would like to be able to give it an url for a given website. Then, I would like to be able to parse the html file that the link goes to.
    I've searched the forum, but have not been able to find anything of any real use. If you could offer an overview or point me towards a resource, I would be very greatful.

    If you've done things right, you have a HTML reader/parser that takes an InputStream. For Files, this would be a FileInputStream.
    For URLs, this would be the InputStream you get from URLConnection.getInputStream(). You can get a URLConnection by calling openConnection() on a URL instance (created from your input url of course).

  • Parse HTML document embedded in IFRAME

    Dear fellows:
    How can I access contents of an HTML document embedded in an IFRAME tag, by using java class HTMLEditorKit.Parser?
    It is well known that the contents of such embedded HTML document can be accessed by javascript at front end. However, I am more interested on processing it at backend, using HTMLEditorKit.Parser, or any java swing API.
    Thanks for help.

    The javax.swing.text.html framework barely supports HTML 3.2.

  • Parsing HTML from Google API results

    Hello,
    I just downloaded the Google API (http://www.google.com/apis) and I am trying to parse the HTML content which is returned so that it can be displayed in a TextArea or some other GUI component.
    Here are my questions:
    1. Is there a Java class that can parse HTML and display it correctly?
    2. If not, are there are third party, prefabably free Java components that can do that?
    3. Has anyone tried out the Google API? Any interesting applications?
    Thank you.
    Hanxue

    To convert plain text to html, you can parse the text with a simple code like this
    1.
    String inputText = getInputText(); //
    StringBuffer HTMLOutputText = new StringBuffer();
    java.util.StringTokenizer st = new java.util.StringTokenzier(inputText, "\n\r");
    while ( st.hasMoreTokens() ) {
    HTMLOutputText.append(st.nextToken());
    HTMLOutputText.append("<br>");
    /// insert the top level HTML tags
    HTMLOutputText.insert(0, "<HTML> <HEAD><TITLE> Some Title</TITLE></HEAD> <BODY>");
    HTMLOutputText.insert( HTMLOutputText.getLength(), "</BODY> </HTML>" );
    2. even simpler, but as far as I know it doesn't display right in a JEditorPane
    String inputText = getInputText();
    inputText = "<HTML> <HEAD><TITLE> Some Title</TITLE></HEAD> <BODY> <PRE> <TT>" +
    + inputText + "</TT></PRE></BODY> </HTML>";

  • DocumentParser parsing HTML ...

    i am parsing HTML of website through this
    HTMLEditorKit.Parser parser = new javax.swing.text.html.parser.ParserDelegator();
    i was able to parse www.yahoo.com
    its html code (first few lines)
    <html><head>
    <script language=javascript>
    var now=new Date,t1=0,t2=0,t3=0,t4=0,t5=0,t6=0,cc='',ylp='';t1=now.getTime();
    </script>
    <title>Yahoo!</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l r (cz 1 lz 1 nz 1 oz 1 vz 1) gen true for "http://www.yahoo.com" r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0) gen true for "http://www.yahoo.com" r (n 0 s 0 v 0 l 0))'>
    <base href="http://www.yahoo.com/_ylh=X3oDMTEwZGh2NmNjBF9TAzI3MTYxNDkEdGVzdAMwBHRtcGwDaW5kZXgtdGJs/" target=_top>
    <script language=javascript>------------
    and my corresponding log goes like this ....
    0 DEBUG  [main]  - Start :html
    15 DEBUG  [main]  - Start :head
    15 DEBUG  [main]  - Start :script
    15 DEBUG  [main]  - End :script
    15 DEBUG  [main]  - Start :title
    15 DEBUG  [main]  - End :title
    15 DEBUG  [main]  - meta -- http-equiv=Content-Type content=text/html; charset=UTF-8
    31 DEBUG  [main]  - meta -- http-equiv=PICS-Label content=(PICS-1.1 "http://www.icra.org/ratingsv02.html" l r (cz 1 lz 1 nz 1 oz 1 vz 1) gen true for "http://www.yahoo.com" r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0) gen true for "http://www.yahoo.com" r (n 0 s 0 v 0 l 0))
    31 INFO  [main]  - base http://www.yahoo.com/_ylh=X3oDMTEwZGh2NmNjBF9TAzI3MTYxNDkEdGVzdAMwBHRtcGwDaW5kZXgtdGJs/
    31 DEBUG  [main]  - Start :script
    31 DEBUG  [main]  - End :script
    31 DEBUG  [main]  - Start :script
    62 DEBUG  [main]  - End :script
    62 DEBUG  [main]  - Start :style
    62 DEBUG  [main]  - End :style
    62 DEBUG  [main]  - Start :script
    next I parsed www.java.sun.com/index.html
    its html code (first few lines ) goes like this ...
    <html>
    <head>
    <title>Java Technology</title>
    <meta name="keywords" content="Java, platform" />
    <meta name="description" content="Java technology is a portfolio of products that are based on the power of networks and the idea that the same software should run on many different kinds of systems and devices." />
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
    <meta name="date" content="2003-11-23" />
    <link rel="stylesheet" href="/css/default_developer.css" />
    <script type="text/javascript" language="JavaScript" src="/js/popUp.js"></script>
    <script type="text/javascript" language="JavaScript" src="/js/support_incident.js"></script>
    <link href="http://developers.sun.com/rss/java.xml" rel="alternate" type="application/rss+xml" title="rss" />
    </head>
    <!--stopindex-->
    <body leftmargin="0"....-----
    and my corresponding log goes like this ...
    0 DEBUG  [main]  - Start :html
    16 DEBUG  [main]  - Start :head
    16 DEBUG  [main]  - Start :title
    16 DEBUG  [main]  - End :title
    16 INFO  [main]  - meta --- name=keywords content=Java, platform
    16 DEBUG  [main]  - End :head
    16 DEBUG  [main]  - Start :body
    16 DEBUG  [main]  - Simple Tag :linkNow as u can see from the logs that the META TAG of yahoo was read in twice by the Parser while the META TAG of java.sun.com/index.html was read only once.
    One visible difference between the html of these two tags is that the META tag of yahoo page doesnt has a closing tag (isnt well formed) whereas the META tag of java.sun.com is well formed.
    why is the meta tag (of java.sun.com) being ignored by the parser ?
    Is it because of this...
    javax.swing.text.html.parser.Parser.java , method boolean ignoreElement(Element elem) : line 429
    returns true for ignoring meta tag in html file...
    is my problem due to this?
    how can i possibly overcome this :-(
    Code for my Callback class looks like this ...
         HTMLEditorKit.ParserCallback  parserCallback = new HTMLEditorKit.ParserCallback()
              public void handleStartTag(HTML.Tag t, MutableAttributeSet a , int pos)
                   try {
                   if (t==HTML.Tag.A)
                        String hrefValue = (String)a.getAttribute(HTML.Attribute.HREF);
                        logger.log(Level.INFO,t + " " + hrefValue);
                   else
                        logger.log(Level.DEBUG,"Start :"+t  );
                   catch(Exception e){ e.printStackTrace();     }
              public void handleEndTag(HTML.Tag t, int pos)
                   try {
                   logger.log(Level.DEBUG, "End :"+t);
                   catch(Exception e){ e.printStackTrace();     }
              public void handleSimpleTag(HTML.Tag t , MutableAttributeSet a,int pos)
                   try
                   if (t== HTML.Tag.BASE )
                        String hrefValue = (String)a.getAttribute(HTML.Attribute.HREF);
                        logger.log(Level.INFO,t + " " + hrefValue);
                   else if (t == HTML.Tag.FRAME)
                        String srcValue= (String)a.getAttribute(HTML.Attribute.SRC);
                        logger.log(Level.INFO, t +" "+ srcValue);                     
                   else if (t == HTML.Tag.META)
                        String nm = (String)a.getAttribute(HTML.Attribute.NAME);
                        String content = (String)a.getAttribute(HTML.Attribute.CONTENT);
                        if ("keywords".equalsIgnoreCase(nm) || "description".equalsIgnoreCase(nm))
    // i found it
                             logger.log(Level.INFO, t + " --- " + a);
                        else
                             logger.log(Level.DEBUG,t + " -- " + a);
                   else
                        logger.log(Level.DEBUG,"Simple Tag :" + t);
                   catch(Exception e){ e.printStackTrace();     }
         };I want to read the values in meta tag attributes "name" , "content" where <meta name="keywords" content="asdfasdfasdf" > or <meta name="description" content="asdfasdfasdf">
    ?

    ok ...
    then if there is some other way to be able to read in html tags such as meta , a (anchor) , base , frame ( only these tags matter to me ) without being concerned abt the way their html has been coded .............. then plz tell me ...
    searching internet showed that their are html parser that use stringtokenizer kind of ways to read in html ...
    has anyone over here use anything like this ever......

  • I can not get Firefox to play music backgrounds on pages in browsers it works in IE and other browsers but not firefox, what HTML or Jave script do I need too use to make them work?

    I'm having a problem with firefox playing background music on some sites, it works with IE and other browsers but not with fire fox, what html or java script do I need to add to the pages so that they play the music?
    I develop my pages in Front Page and I know that the music only works in IE, so if someone can tell me how to correct this problem as it is important as I use fire fox as my main browser.
    I thank you in advance

    If a website uses BGSOUND then it will only work in IE.
    BGSOUND is not compatible with other browsers like Firefox.
    * http://kb.mozillazine.org/Background_music_does_not_play

  • How to parse XML to Java object... please help really stuck

    Thank you for reading this email...
    If I have a **DTD** like:
    <!ELEMENT person (name, age)>
    <!ATTLIST person
         id ID #REQUIRED
    >
    <!ELEMENT name ((family, given) | (given, family))>
    <!ELEMENT age (#PCDATA)>
    <!ELEMENT family (#PCDATA)>
    <!ELEMENT given (#PCDATA)>
    the **XML** like:
    <person id="a1">
    <name>
         <family> Yoshi </family>
         <given> Samurai </given>
    </name>
    <age> 21 </age>
    </person>
    **** Could you help me to write a simple parser to parse my DTD and XML to Java object, and how can I use those objects... sorry if the problem is too basic, I am a beginner and very stuck... I am very confuse with SAXParserFactory, SAXParser, ParserAdapter and DOM has its own Factory and Parser, so confuse...
    Thank you for your help, Yo

    Hi, Yo,
    Thank you very much for your help. And I Wish you are there...I'm. And I plan to stay - It's sunny and warm here in Honolulu and the waves are up :)
    A bit more question for dear people:
    In the notes, it's mainly focus on JAXB,
    1. Is that mean JAXB is most popular parser for
    parsing XML into Java object? With me, definitely. There are essentially 3 technologies that allow you to parse XML documents:
    1) "Callbacks" (e.g. SAX in JAXP): You write a class that overrides 3 methods that will be called i) whenever the parser encounters a start tag, ii) an end tag, or iii) PCDATA. Drawback: You have to figure out where the heck in the document hierarchy you are when such a callback happens, because the same method is called on EACH start tag and similarly for the end tag and the PCDATA. You have to create the objects and put them into your own data structure - it's very tedious, but you have complete control. (Well, more or less.)
    2) "Tree" (e.g. DOM in JAXP, or it's better cousin JDOM): You call a parser that in one swoop creates an entire hierarchy that corresponds to the XML document. You don't get called on each tag as with SAX, you just get the root of the resulting tree. Drawback: All the nodes in the tree have the same type! You probably want to know which tags are in the document, don't you? Well, you'll have to traverse the tree and ask each node: What tag do you represent? And what are your attributes? (You get only strings in response even though your attributes often represent numbers.) Unless you want to display the tree - that's a nice application, you can do it as a tree model for JTree -, or otherwise don't care about the individual tags, DOM is not of much help, because you have to keep track where in the tree you are while you traverse it.
    3) Enter JAXB (or Castor, or ...): You give it a grammar of the XML documents you want to parse, or "unmarshall" as the fashion dictates to call it. (Actually the name isn't that bad, because "parsing" focuses on the input text while "unmarshalling" focuses on the objects you get, even though I'd reason that it should be marshalling that converts into objects and unmarshalling that converts objects to something else, and not vice versa but that's just my opinion.) The JAXB compiler creates a bunch of source files each with one (or now more) class(es) (and now interfaces) that correspond to the elements/tags of your grammar. (Now "compiler" is a true jevel of a misnomer, try to explain to students that after they run the "compiler", they still need to compile the sources the "compiler" generated with the real Java compiler!). Ok, you've got these sources compiled. Now you call one single method, unmarshall() and as a result you get the root node of the hierarchy that corresponds to the XML document. Sounds like DOM, but it's much better - the objects in the resulting tree don't have all the same type, but their type depends on the tag they represent. E.g if there is the tag <ball-game> then there will be an object of type myPackage.BallGame in your data structure. It gets better, if there is <score> inside <ball-game> and you have an object ballGame (of type BallGame) that you can simply call ballGame.getScore() and you get an object of type myPackage.Score. In other words, the child tags become properties of the parent object. Even better, the attributes become properties, too, so as far as your program is concerned there is no difference whether the property value was originally a tag or an attribute. On top of that, you can tell in your schema that the property has an int value - or another primitive type (that's like that in 1.0, in the early release you'll have to do it in the additional xjs file). So this is a very natural way to explore the data structure of the XML document. Of course there are drawbacks, but they are minor: daunting complexity and, as a consequence, very steep learning curve, documentation that leaves much to reader's phantasy - read trial and error - (the user's guide is too simplicistic and the examples too primitive, e.g. they don't even tell you how to make a schema where a tag has only attributes) and reference manual that has ~200 pages full of technicalities and you have to look with magnifying glas for the really usefull stuff, huge number of generated classes, some of which you may not need at all (and in 1.0 the number has doubled because each class has an accompanying interface), etc., etc. But overall, all that pales compared to the drastically improved efficiency of the programmer's efforts, i.e. your time. The time you'll spend learning the intricacies is well spent, you'll learn it once and then it will shorten your programming time all the time you use it. It's like C and Java, Java is order of magnitude more complex, but you'd probably never be sorry you gave up C.
    Of course the above essay leaves out lots and lots of detail, but I think that it touches the most important points.
    A word about JAXB 1.0 vs. Early Release (EA) version. If you have time, definitively learn 1.0, they are quite different and the main advantage is that the schema combines all the info that you had to formulate in the DTD and in the xjs file when using the EA version. I suggested EA was because you had a DTD already, but in retrospect, you better start from scratch with 1.0. The concepts in 1.0 are here to stay and once your surmounted the learning curve, you'll be glad that you don't have to switch concepts.
    When parser job is done,
    what kind of Java Object we will get? (String,
    InputStream or ...)See above, typically it's an object whose type is defined as a class (and interface in 1.0) within the sources that JABX generates. Or it can be a String or one of the primitive types - you tell the "compiler" in the schema (xjs file in EA) what you want!
    2. If we want to use JAXB, we have to contain a
    XJS-file? Something like:In EA, yes. In 1.0 no - it's all in the schema.
    I am very new to XML, is there any simpler way to get
    around them? It has already take me 4 days to find a
    simple parser which give it XML and DTD, then return
    to me Java objects ... I mean if that kind of parser
    exists....It'll take you probably magnitude longer that that to get really familiar with JAXB, but believe me it's worth it. You'll save countless days if not weeks once you'll start developing serious software with it. How long did it take you to learn Java and it's main APIs? You'll either invest the time learning how to use the software others have written, or you invest it writing it yourself. I'll take the former any time. But it's only my opinion...
    Jan

  • Parsing HTML characters (e.g. &nbsp)

    Hi
    Apologies if I'm missing something obvious, I haven't been able to find an answer searching the API or Forums...
    I'm parsing HTML documents (currently as Strings) to extract certain information. Is there an easy way to replace all special HTML characters such as   < etc. to a space or < respectively without having to do a string replace on every possible HTML character?
    I know there's an HTML parser in swing but that seems to be geared towards creating an HTML editor.
    Any help would be appreciated!

    There are also a number of open source or shareware programs, such as TidyHTML, that clean-up and parse existing HTML. Check out Sourceforge or www.downloads.com.
    - Saish

  • Can I use a C/C++ xml parser in my java program?

    Hi,
    How can I use a C/C++ xml parser in my java program?
    Elvis.

    You would still need to convert the XML data structure into a Java data structure to import it into Java.
    Don't assume you need C++ to do anything. I woudl write it in Java first, then profile the application to see where the bottle necks are and then optimise them
    Don't optimise code unless you have proof it needs to be optimised.
    If you want to improve the speed of reading XML, try XMLbooster

  • Parsing HTML documents

    I am trying to write an application that uses a parsed html document to perform some data retrieval. The problem that I am having is that the parser in JDK1.4.1 is unable to completely parse the document correctly. Some fields are skipped as well as other problems. I believe it has to do with the html32.bdtd. Is there a later version?

    Parsing a HTML document is a huge task, you shouldn't do it yourself but instead javax.text.html and javax.text.html.parser already provide almost everything you ever need

Maybe you are looking for

  • TS1538 iPhone and iPad not recognized by iTunes 11.1

    After an update to 11.1 iTunes stopped recognizing my iPhone and iPad. I tried everything I could find on Apple site and generated by a Google search. I tried reinstalling iTunes, restarting Mobile Device Service, uninstalling all the components - re

  • Report not display Int.meas Unit, always display commercial

    Hi   I have one question, in Bex Reports or Web Reports in our Portal we see some information about Unit of Measure, for example Quantity and the corresponding unit of measure, in one repor I saw UOM "CJ", but internaly the information is "CS", I rev

  • Help my ps4 suddenly started smoking and i don't know what to do

    It could be worse, it could be on the Buckfast.

  • Edit Default Serach help

    HI Experts,                Is there any way to personalize the settings available on search help? Default serach help comming for my input field contains two fields. i want to hide one with my code. Please suggest if u have some idea to do this Regar

  • How to access .asl file in PSE 13 from old PSE 10?

    Can't access .asl files I had used with PSE10 in PSE 13.  Computer: Mac OS 10.10.1  with PSE 13 What I have done: 1- .ASL files from PSE 10 were copied to PSE 13:  Mac HD> applications > PSE13 > Support files  > Presets > styles 2- Closed PSE 13 and