Parsing html to text

Im looking for a libary which can remove all tags from a html document. Ie, ending up with the 'content' of the html doc.
Anyone knows of such a libary or has some example code on doing it?

Hi slackman,
I don't know if this is what you are looking for...give more details if I misunderstand :)
import java.util.*;
import java.io.*;
import javax.xml.parsers.*;
import org.xml.sax.helpers.*;
import org.xml.sax.*;
public class MyHandler extends DefaultHandler {
public void characters(char[] aCh, int aStart, int aLength) throws SAXException {
System.out.println(new String(aCh, aStart, aLength));
public static void main(String[] aArgs) {
try {
String oHTMLTest = "<html><body>This is a content </body></html>";
StringReader oReader = new StringReader(oHTMLTest);
InputSource oSource = new InputSource(oReader);
SAXParserFactory oParserFactory = SAXParserFactory.newInstance();
SAXParser oParser = oParserFactory.newSAXParser();
MyHandler oHandler = new MyHandler();
oParser.parse(oSource, oHandler);
} catch (Exception e) {
e.printStackTrace();
}

Similar Messages

  • XML parser fails to convert html encoded text nodes

    Under the strain of large documents this defect rears its ugly head more often. While parsing a text node containing html encoded chars i.e. < > &; etc...
    The parser will seemingly forget to change the chars at random, 99.9% of the time everything is ok and the proper conversions take place:
    < -> &#60;
    > -> >
    &; -> &#38;
    Once an error occurs it is reproducible until the text node is changed ( values and/or order ) then it is a crapshoot again.
    These tests were done using the default UTF-8 encoding, here is the exception thrown by the parser along with a portion of the text node before and after the first parsing.
    Let me be clear, the parser actually succeeds the first time but the transformation of the HTML encoded pieces possibly fails. It is on the parsing of the text node value as its own document that the parser fails.
    Error 0 : Error parsing XML string! Line :: 1, Column :: 65674
    Error 1 : End tag does not match start tag 'Project'.
    End tag does not match start tag 'Project'.
    at oracle/xml/parser/v2/XMLError.flushErrors (XMLError.java:233)
    at oracle/xml/parser/v2/NonValidatingParser.parseDocument (NonValidatingParser.java:248)
    at oracle/xml/parser/v2/XMLParser.parse (XMLParser.java:117)
    at pacificedge/xml/XMLUtilities.loadParserFromString (XMLUtilities.java:104)
    Preprocessing ::
    <Project stuff0="0" stuff1="0" stuff2="0" stuff3="1" stuff4="100167" stuff5="100213">
    <StuffA>100213</StuffA>
    <Name>I am a Name</Name>
    <StartDate>1998-08-10</StartDate>
    <FinishDate>2000-06-30</FinishDate>
    <Path>Folder1\Folder2</Path>
    </Project>
    Post processing:
    <Project stuff0="0" stuff1="0" stuff2="0" stuff3="1" stuff4="100167" stuff5="100213">
    <StuffA>100213</StuffA>
    <Name>I am a Name</Name>
    <StartDate>1998-08-10</StartDate> <-- Error is raised here when the value of the text node is used as an xml document
    <FinishDate>2000-06-30</FinishDate>
    <Path>Folder1\Folder2</Path>
    </Project>
    Please investigate this. It is a chronic problem for us and possibly many others.
    null

    Sorry for the encoding issues in the message before here are the pertinent pieces hope this shows up correctly.
    &;lt; -> &;#60;
    &;gt; -> >
    &;amp; -> &;#38;
    Preprocessing ::
    &;lt;Project stuff0="0" stuff1="0" stuff2="0" stuff3="1" stuff4="100167" stuff5="100213"&;gt;
    &;lt;StuffA&;gt;100213&;lt;/StuffA&;gt;
    &;lt;Name&;gt;I am a Name&;lt;/Name&;gt;
    &;lt;StartDate&;gt;1998-08-10&;lt;/StartDate&;gt;
    &;lt;FinishDate&;gt;2000-06-30&;lt;/FinishDate&;gt;
    &;lt;Path&;gt;Folder1\Folder2&;lt;/Path&;gt;
    &;lt;/Project&;gt;
    Post processing:
    <Project stuff0="0" stuff1="0" stuff2="0" stuff3="1" stuff4="100167" stuff5="100213">
    <StuffA>100213</StuffA>
    <Name>I am a Name</Name>
    &;lt;StartDate>1998-08-10</StartDate> <-- Error is raised here when the value of the text node is used as an xml document
    <FinishDate>2000-06-30</FinishDate>
    <Path>Folder1\Folder2</Path>
    </Project>
    null

  • JEditorPane parsing HTML

    Hi all,
    I am using JEditorPane and it's ability to parse HTML, which although is relatively old and crusty is certainly all I need for the job.
    Now, I understand there is a chain of classes involved in taking my .html file and turning popping into a something we can see in a JEditorPane. For example, an img tag, is picked up by HTMLEditorKit and turned into an ImageView for display purposes.
    I want to do the following: I have subclassed HTMLEditorKit, and have overridden the HTMLFactory (although at the moment it just defers everything to super). I want to be able to pick out all of the html comment tags as they go through the HTMLEditorKit :
    <!-- hey hey this is a comment -->... and get to the comment text, "hey hey this is a comment", as a Java string. However I've been digging around with Element for hours now and although my HTMLFactory correctly digs out the comments from the rest of the elements:
    else if (kind == HTML.Tag.COMMENT)
                        {System.out.println("I found a comment but don't know what it said!!");... as you can see, I don't know how to get to the comment text itself.
    The reason why I want access to the comment text is that I want to supplement the HTML code a little bit and add something in the comment that will affect the way it is rendered when I read it depending on the comment - so there's the reason if curious.
    Any help, and I do mean anything at all, would be much appreciated, as this is the last obstacle in my path to getting this thing working :)
    Thanks for your time!
    - Peter

    Here is some old code I have lying around that attempts to iterate through all the elements. If I remember correctly the comment text is found in the AttributeSet of the element:
    import java.io.*;
    import java.net.*;
    import java.util.*;
    import javax.swing.*;
    import javax.swing.text.*;
    import javax.swing.text.html.*;
    class GetHTML
        public static void main(String[] args)
            EditorKit kit = new HTMLEditorKit();
            Document doc = kit.createDefaultDocument();
            // The Document class does not yet handle charset's properly.
            doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
            try
                // Create a reader on the HTML content.
                Reader rd = getReader(args[0]);
                // Parse the HTML.
                kit.read(rd, doc, 0);
                System.out.println( doc.getText(0, doc.getLength()) );
                System.out.println("----");
                // Iterate through the elements of the HTML document.
                ElementIterator it = new ElementIterator(doc);
                Element elem = null;
                while ( (elem = it.next()) != null )
                    AttributeSet as = elem.getAttributes();
                    System.out.println( "\n" + elem.getName() + " : " + as.getAttributeCount() );
                    if ( elem.getName().equals( HTML.Tag.IMG.toString() ) )
                        Object o = elem.getAttributes().getAttribute( HTML.Attribute.SRC );
                        System.out.println( o );
                    Enumeration enum = as.getAttributeNames();
                    while( enum.hasMoreElements() )
                        Object name = enum.nextElement();
                        Object value = as.getAttribute( name );
                        System.out.println( "\t" + name + " : " + value );
                        if (value instanceof DefaultComboBoxModel)
                            DefaultComboBoxModel model = (DefaultComboBoxModel)value;
                            for (int j = 0; j < model.getSize(); j++)
                                Object o = model.getElementAt(j);
                                Object selected = model.getSelectedItem();
                                if ( o.equals( selected ) )
                                    System.out.println( o + " : selected" );
                                else
                                    System.out.println( o );
                    if ( elem.getName().equals( HTML.Tag.SELECT.toString() ) )
                        Object o = as.getAttribute( HTML.Attribute.ID );
                        System.out.println( o );
                    //  Wierd, the text for each tag is stored in a 'content' element
                    if (elem.getElementCount() == 0)
                        int start = elem.getStartOffset();
                        int end = elem.getEndOffset();
                        System.out.println( "\t" + doc.getText(start, end - start) );
            catch (Exception e)
                e.printStackTrace();
            System.exit(1);
        // Returns a reader on the HTML data. If 'uri' begins
        // with "http:", it's treated as a URL; otherwise,
        // it's assumed to be a local filename.
        static Reader getReader(String uri)
            throws IOException
            // Retrieve from Internet.
            if (uri.startsWith("http:"))
                URLConnection conn = new URL(uri).openConnection();
                return new InputStreamReader(conn.getInputStream());
            // Retrieve from file.
            else
                return new FileReader(uri);
    }To test it just use:
    java GetHTML somefile.html

  • Parsing HTML documents

    I am trying to write an application that uses a parsed html document to perform some data retrieval. The problem that I am having is that the parser in JDK1.4.1 is unable to completely parse the document correctly. Some fields are skipped as well as other problems. I believe it has to do with the html32.bdtd. Is there a later version?

    Parsing a HTML document is a huge task, you shouldn't do it yourself but instead javax.text.html and javax.text.html.parser already provide almost everything you ever need

  • Parsing HTML files

    Hello,
    I have a question about parsing HTML files. Usually when I get an HTML file and I need to find all the text in it I do this. This stuff just collects all of the hyperlinks and ignores all the html tags just keeping the actual text. It's fine for smaller files but occasionally I'll hit a large online text file and it will work but its way to slow for large files. I don't need to do all of this HTML tag stripping however for text files. Is there a way to still grab all the text without doing any tag searching to make it faster?
    thanks,
    private void find() throws IOException
            //Really slow for large text files.  Need a way to just use a regular scanner on an internet text file
            new ParserDelegator().parse(new InputStreamReader(myBase.openStream()),
                    new ParserListener(),
                    true); 
         * Inner class for processing all "<a href.."> tags when reading a base URL.
        private class ParserListener extends HTMLEditorKit.ParserCallback
            final String IGNORED_LINKS = "^(http|mailto|\\W).*";
            public void handleStartTag (HTML.Tag t, MutableAttributeSet a, int pos)
                if (t == HTML.Tag.A)
                    String href = (String)(a.getAttribute(HTML.Attribute.HREF));
                    //System.out.println(href);
                    //System.out.println(href.matches(IGNORED_LINKS) + "\t" + href);
                    if (! (href == null || href.matches(IGNORED_LINKS)) && !myURLs.contains(href))
                        myURLs.add(href);
                //TODO fix
                if (t == HTML.Tag.TITLE)
                    String title = (String) (a.getAttribute(HTML.Attribute.TITLE));
                    if (!(title == null))
                        myTitle = title;
                    else myTitle = "No title was found";
            public void handleText (char[] data, int pos)
                myText.append(" ");
                myText.append(data);
        }

    JFactor2004 wrote:
    My question is. If I know an html file is actually just a txt fileThis isn't a question. HTML files are text by definition.
    is it possible to look through it (maybe use something similar to a regular scanner) without doing anything with html.That depends on what you mean by "doing something with HTML". You can certainly read it one line at a time.

  • Parsing HTML using Swing's HTMLEditorKit

    Hi all,
    I posted this question on the "Java programming", but I think I posted on the wrong forum. So, please let me know if I have posted on the wrong forum, again.
    Anyway, I have read an article on parsing HTML using the Swing HTML Parser (http://java.sun.com/products/jfc/tsc/articles/bookmarks/index.html). However, I find that the HTMLEditorKit is unable to understand the <Meta> tag under the <Head> tag? Is this true? I am getting an error message:
    javax.swing.text.ChangedCharSetException
    at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:172)
    at javax.swing.text.html.parser.Parser.startTag(Parser.java:327)
    at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1786)
    at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1821)
    at javax.swing.text.html.parser.Parser.parse(Parser.java:1980)
    at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:109)
    at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:74)
    at URLReader.main(URLReader.java:58)
    Below is a simple code to write out the html file it reads in:
    public static void main(String[] args) throws Exception {
    HTMLEditorKit.ParserCallback callback = new HTMLEditorKit.ParserCallback () {
    public void handleText(char[] data, int pos) {
    try {
    System.out.println(data);
    } catch (Exception e) {
    System.out.println("IOE: " + e);
    Reader reader = new FileReader("myFile.html");
    new ParserDelegator().parse(reader, callback, false);
    The html file that is having a problem reading in is:
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    <title>NWS WSR-88D Radar System Transmit/Receive Status</title>
    </head>
    <p>A <foo>xx</foo>link</html>
    If I take away <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">, there is no problem.
    Any suggestions? Thanks in advance.

    Hi,
    Setting the third argument really works!!! Yee..... haa....!!!
    WORKING SOLUTION: new ParserDelegator().parse(reader, callback, TRUE);
    MANY... MANY THANKS for looking at the problem!!!
    Send third argument in parse method as true.

  • Parsing HTML, how to?

    I use url to get the html formatted text. However, parsing the HTML file
    to get the info cleanly is a pain, by using the StringTokenizer. For example,
    I want to get the stock prices from
    http://finance.yahoo.com/q/hp?s=YHOO&a=00&b=1&c=2005&d=01&e=20&f=2005&g=d
    How to easily extrat the prices and the dates, so my program can
    further process the data? Thank you.

    problem is, most(I believe all) JAVA HTML parser
    relies on the HTML to be well-formed, which isn'tI think the default parser with Editor Kit is actually fairly lenient.
    true all the time. you can thanx Microsoft Internet
    Explorer for allowing an HTML page to miss a closing
    tag. And Netscape.
    And W3C for not slapping them both.
    if you know the field(string) that you are looking
    for..you probably can use indexOf
    to get to that position, and slowing parse the
    document from there.And watch it blow up in your face when the content changes.
    As a technical exersise, writing a parser is not too hard. As a solution to your problem, I'd recomend getting the data from a better source as suggested above.

  • How to parse html for and display hyperlinks?

    I'm trying to design a report that has a text field that contains html formatted text and includes links. These links need to be clickable in a thick client Java application I'm developing and the only way I can see in Crystal Developer to make a clickable link is to have a discrete text element with a hyperlink (set using the hyperlink tab in the element properties) for each link.
    The problem I have is that there can be any number of links in the html and I need a scalable way to extract and display these links.
    So far I have used formulas in Crystal Developer to parse the html and extracte the links into a fixed size array and then display each link in a separate text element. I would like to be able to have only one 'link' text element in the report design and create as many link elements as required depending on how many links are present in the html.
    How is this possible? With some funky grouping voodoo maybe, or something else? Please help me achieve this.
    Cheers,
    Elliot.

    Hi Elliot,
    Follow these steps, and check the output.
    1.Right click on the field object and select "Format Editor".
    2.Click on the "Paragraph" tab and select the "Text interpretation" as "RTF Text" or "HTML Text".
    3.Click on "Ok" button.
    Hope this helps.

  • Parsing HTML from Google API results

    Hello,
    I just downloaded the Google API (http://www.google.com/apis) and I am trying to parse the HTML content which is returned so that it can be displayed in a TextArea or some other GUI component.
    Here are my questions:
    1. Is there a Java class that can parse HTML and display it correctly?
    2. If not, are there are third party, prefabably free Java components that can do that?
    3. Has anyone tried out the Google API? Any interesting applications?
    Thank you.
    Hanxue

    To convert plain text to html, you can parse the text with a simple code like this
    1.
    String inputText = getInputText(); //
    StringBuffer HTMLOutputText = new StringBuffer();
    java.util.StringTokenizer st = new java.util.StringTokenzier(inputText, "\n\r");
    while ( st.hasMoreTokens() ) {
    HTMLOutputText.append(st.nextToken());
    HTMLOutputText.append("<br>");
    /// insert the top level HTML tags
    HTMLOutputText.insert(0, "<HTML> <HEAD><TITLE> Some Title</TITLE></HEAD> <BODY>");
    HTMLOutputText.insert( HTMLOutputText.getLength(), "</BODY> </HTML>" );
    2. even simpler, but as far as I know it doesn't display right in a JEditorPane
    String inputText = getInputText();
    inputText = "<HTML> <HEAD><TITLE> Some Title</TITLE></HEAD> <BODY> <PRE> <TT>" +
    + inputText + "</TT></PRE></BODY> </HTML>";

  • How to validate html or parse html

    Hi,
    I am thinking of some way to parse text in wich I can have simple html tags like <a>, <br>, <i> - I have clearly specified list of them.
    Now, I probably would parse this text using dom4j and the in some way check the elements against my configurable list of allowed tags.
    But all this is connected with writing a parser - maby you have some example of free library that would to this for me? Or maybe you have already written such parser to validate text put by application user that can contain specified html tags?

    user5970066 wrote:
    ..I am thinking of some way to parse text in wich I can have simple html tags like &lt;a>, &lt;br>, &lt;i> - I have clearly specified list of them.Quoted to change tags to show what the OP meant.
    See here for a way to validate against a DTD.
    Not sure if the linked technique is limited to validating XML.
    Edited by: Andrew Thompson on Apr 11, 2011 5:58 PM
    Added 'Not sure.."

  • Parse HTML behaviour

    Hi,
    Can anybody explain the behavior of SunOne Web server when parse HTML is enabled for all html.
    If we have a valid html in the web server for instance http://servername/myhtml.html , then the page will be loaded by web server. The same case if we put anything in the URL after that then also the sma epage will get served. Consider an URL http://servername/myhtml.html/ahdjksad/asdhjsad/sdhjklsad/asjdksald (Anything after that HTML), web server would be able to load the myhtml.html page. If we disable the parse HTML , then it won�t work. I want a way to work the Server side includes where it shouldn�t server such wrong URLs.
    How this comes? If we look on the web server the path won�t be there on the server and it should give 404 error. Is it the way parse HTML works? Is there any way to restrict it by keeping the parse HTML functionality for server side includes enabled other than custom NSAPI?
    If anyone noticed this behavior please explain.
    Thanks,
    Rijesh.

    Hi,
    Acually i load one external html using LoadVars class
    methods.
    var oLoad:LoadVars=new LoadVars();
    oLoad.load("external.html");
    I want to parse that html in flash.
    I need some text from html page(html is having 200 line code)
    How can i parse that html and trace that particular text.

  • DocumentParser parsing HTML ...

    i am parsing HTML of website through this
    HTMLEditorKit.Parser parser = new javax.swing.text.html.parser.ParserDelegator();
    i was able to parse www.yahoo.com
    its html code (first few lines)
    <html><head>
    <script language=javascript>
    var now=new Date,t1=0,t2=0,t3=0,t4=0,t5=0,t6=0,cc='',ylp='';t1=now.getTime();
    </script>
    <title>Yahoo!</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l r (cz 1 lz 1 nz 1 oz 1 vz 1) gen true for "http://www.yahoo.com" r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0) gen true for "http://www.yahoo.com" r (n 0 s 0 v 0 l 0))'>
    <base href="http://www.yahoo.com/_ylh=X3oDMTEwZGh2NmNjBF9TAzI3MTYxNDkEdGVzdAMwBHRtcGwDaW5kZXgtdGJs/" target=_top>
    <script language=javascript>------------
    and my corresponding log goes like this ....
    0 DEBUG  [main]  - Start :html
    15 DEBUG  [main]  - Start :head
    15 DEBUG  [main]  - Start :script
    15 DEBUG  [main]  - End :script
    15 DEBUG  [main]  - Start :title
    15 DEBUG  [main]  - End :title
    15 DEBUG  [main]  - meta -- http-equiv=Content-Type content=text/html; charset=UTF-8
    31 DEBUG  [main]  - meta -- http-equiv=PICS-Label content=(PICS-1.1 "http://www.icra.org/ratingsv02.html" l r (cz 1 lz 1 nz 1 oz 1 vz 1) gen true for "http://www.yahoo.com" r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0) gen true for "http://www.yahoo.com" r (n 0 s 0 v 0 l 0))
    31 INFO  [main]  - base http://www.yahoo.com/_ylh=X3oDMTEwZGh2NmNjBF9TAzI3MTYxNDkEdGVzdAMwBHRtcGwDaW5kZXgtdGJs/
    31 DEBUG  [main]  - Start :script
    31 DEBUG  [main]  - End :script
    31 DEBUG  [main]  - Start :script
    62 DEBUG  [main]  - End :script
    62 DEBUG  [main]  - Start :style
    62 DEBUG  [main]  - End :style
    62 DEBUG  [main]  - Start :script
    next I parsed www.java.sun.com/index.html
    its html code (first few lines ) goes like this ...
    <html>
    <head>
    <title>Java Technology</title>
    <meta name="keywords" content="Java, platform" />
    <meta name="description" content="Java technology is a portfolio of products that are based on the power of networks and the idea that the same software should run on many different kinds of systems and devices." />
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
    <meta name="date" content="2003-11-23" />
    <link rel="stylesheet" href="/css/default_developer.css" />
    <script type="text/javascript" language="JavaScript" src="/js/popUp.js"></script>
    <script type="text/javascript" language="JavaScript" src="/js/support_incident.js"></script>
    <link href="http://developers.sun.com/rss/java.xml" rel="alternate" type="application/rss+xml" title="rss" />
    </head>
    <!--stopindex-->
    <body leftmargin="0"....-----
    and my corresponding log goes like this ...
    0 DEBUG  [main]  - Start :html
    16 DEBUG  [main]  - Start :head
    16 DEBUG  [main]  - Start :title
    16 DEBUG  [main]  - End :title
    16 INFO  [main]  - meta --- name=keywords content=Java, platform
    16 DEBUG  [main]  - End :head
    16 DEBUG  [main]  - Start :body
    16 DEBUG  [main]  - Simple Tag :linkNow as u can see from the logs that the META TAG of yahoo was read in twice by the Parser while the META TAG of java.sun.com/index.html was read only once.
    One visible difference between the html of these two tags is that the META tag of yahoo page doesnt has a closing tag (isnt well formed) whereas the META tag of java.sun.com is well formed.
    why is the meta tag (of java.sun.com) being ignored by the parser ?
    Is it because of this...
    javax.swing.text.html.parser.Parser.java , method boolean ignoreElement(Element elem) : line 429
    returns true for ignoring meta tag in html file...
    is my problem due to this?
    how can i possibly overcome this :-(
    Code for my Callback class looks like this ...
         HTMLEditorKit.ParserCallback  parserCallback = new HTMLEditorKit.ParserCallback()
              public void handleStartTag(HTML.Tag t, MutableAttributeSet a , int pos)
                   try {
                   if (t==HTML.Tag.A)
                        String hrefValue = (String)a.getAttribute(HTML.Attribute.HREF);
                        logger.log(Level.INFO,t + " " + hrefValue);
                   else
                        logger.log(Level.DEBUG,"Start :"+t  );
                   catch(Exception e){ e.printStackTrace();     }
              public void handleEndTag(HTML.Tag t, int pos)
                   try {
                   logger.log(Level.DEBUG, "End :"+t);
                   catch(Exception e){ e.printStackTrace();     }
              public void handleSimpleTag(HTML.Tag t , MutableAttributeSet a,int pos)
                   try
                   if (t== HTML.Tag.BASE )
                        String hrefValue = (String)a.getAttribute(HTML.Attribute.HREF);
                        logger.log(Level.INFO,t + " " + hrefValue);
                   else if (t == HTML.Tag.FRAME)
                        String srcValue= (String)a.getAttribute(HTML.Attribute.SRC);
                        logger.log(Level.INFO, t +" "+ srcValue);                     
                   else if (t == HTML.Tag.META)
                        String nm = (String)a.getAttribute(HTML.Attribute.NAME);
                        String content = (String)a.getAttribute(HTML.Attribute.CONTENT);
                        if ("keywords".equalsIgnoreCase(nm) || "description".equalsIgnoreCase(nm))
    // i found it
                             logger.log(Level.INFO, t + " --- " + a);
                        else
                             logger.log(Level.DEBUG,t + " -- " + a);
                   else
                        logger.log(Level.DEBUG,"Simple Tag :" + t);
                   catch(Exception e){ e.printStackTrace();     }
         };I want to read the values in meta tag attributes "name" , "content" where <meta name="keywords" content="asdfasdfasdf" > or <meta name="description" content="asdfasdfasdf">
    ?

    ok ...
    then if there is some other way to be able to read in html tags such as meta , a (anchor) , base , frame ( only these tags matter to me ) without being concerned abt the way their html has been coded .............. then plz tell me ...
    searching internet showed that their are html parser that use stringtokenizer kind of ways to read in html ...
    has anyone over here use anything like this ever......

  • Parsed HTML/SSI not working in Web Server 7 on Ubuntu Server 9.10

    Please help. I have SJSWS 7.0u6 on Ubuntu Server 9.10. The HTML parsing is set to parse all HTML files.
    My HTML code is:
    <body>
      <!--#include file="includes/corner.html"-->
      <div id="maincontent">
         <!--#echo var="DATE_GMT"-->
      </div>
    </body>I added the echo command later to rule out an error with my include file. I even took out the include command to rule it out completely. If I "view/page source" from firefox I allways get the code as it is above in its origonal form. The server is completely ignoring the include and the echo.
    In the virtual server settings under content handling / Parsed HTML/SSI I have tried "all HTML" and "executable HTML". Both return the same result, which is no parsing whatsoever. The log is set to "finest" and so far no errors have come up. Please tell me what I am doing wrong, did I miss a step, overlook some extra settings?
    I am happy to provide more detail. Just let me know what you need to see.
    Thank you.
    update: I tested another bare bones html and got the same results, no parsing.
    Seen here : [http://kenbuxton.net/test.html]
    Edited by: Ken_Buxton on Nov 17, 2009 7:53 PM

    Deploy the configuration? Is there something beyond clicking save and restarting the instance? I checked the server.xml config file and the log level was at "info" even though I set it for "finest" in the GUI. I am now getting the finest details in the logs after I changed the server.xml file manualy. Here is what I am getting for test.html. ...
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, process-uri-objects reports: processing objects for URI /test.html
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, process-uri-objects reports: processing object name="default"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="match-browser" browser="*MSIE*" ssl-unclean-shutdown="true" Directive="AuthTrans" magnus-internal="1"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="match-browser" browser="*MSIE*" ssl-unclean-shutdown="true" Directive="AuthTrans" magnus-internal="1" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="ntrans-j2ee" name="j2ee" Directive="NameTrans"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="ntrans-j2ee" name="j2ee" Directive="NameTrans" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="pfx2dir" from="/mc-icons" dir="/sun/webserver7/lib/icons" name="es-internal" Directive="NameTrans"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="pfx2dir" from="/mc-icons" dir="/sun/webserver7/lib/icons" name="es-internal" Directive="NameTrans" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="uri-clean" Directive="PathCheck"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="uri-clean" Directive="PathCheck" returned 0 (REQ_PROCEED)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="find-pathinfo" Directive="PathCheck"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="find-pathinfo" Directive="PathCheck" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="find-index-j2ee" Directive="PathCheck"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="find-index-j2ee" Directive="PathCheck" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="find-index" index-names="index.html,home.html,index.jsp" Directive="PathCheck"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="find-index" index-names="index.html,home.html,index.jsp" Directive="PathCheck" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="type-j2ee" Directive="ObjectType"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="type-j2ee" Directive="ObjectType" returned 0 (REQ_PROCEED)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="type-by-extension" Directive="ObjectType"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="type-by-extension" Directive="ObjectType" returned 0 (REQ_PROCEED)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="force-type" type="text/plain" Directive="ObjectType"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="force-type" type="text/plain" Directive="ObjectType" returned 0 (REQ_PROCEED)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing method="(GET|HEAD|POST)" type="*~magnus-internal/*" fn="send-file" Directive="Service"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: method="(GET|HEAD|POST)" type="*~magnus-internal/*" fn="send-file" Directive="Service" returned -1 (REQ_ABORTED)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="error-j2ee" Directive="Error"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="error-j2ee" Directive="Error" returned -2 (REQ_NOACTION)
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: executing fn="flex-log" Directive="AddLog"
    [19/Nov/2009:10:48:37] finest ( 6266): for host 174.17.99.6 trying to GET /test.html, func_exec reports: fn="flex-log" Directive="AddLog" returned 0 (REQ_PROCEED)

  • EDGE and HTML dynamic text in a "box" with scroll bar

    I'm new to EDGE, a win7pro master collection cs5.5 suite owner. I'm mainly in the Film/Video post production field (mostly AE, PPro, Pshop, IA) but have been branching into web design the last couple of years.  I use Dreamweaver, Fireworks, Flash. While I'm a expert user with all the Film/video apps, I would say I only have intermediate ability with the web apps. While I understand a lot of programing logic bulding blocks I'm not a coder.
    So since we're told "flash is dead",  my interest in Edge is to try to do some of the things that I can currently do in flash in  EDGE. I was excited when Edge first came out but lost interest when it became obvious that Adobe was not going to offer Edge and Muse to "suite owners" but only in their force feeding of the "Cloud". Better known as the "golden goose" for adobe stockholders and a never ending perpetual hole in the pocket for users. Anyway....
    I spent the last couple of days doing some of the tuts and messing with the UI. It's matured a lot since I was here last.
    I've been working on a flash site for a sports team where one of the pages is a player profile page where college recuriters and other interested parties can view recuriting relavent info/stats about players. This is how it works. While on the "Team" page a users clicks on  a button labled "Player Profiles" . (Animation) A "page" flies in and unfurls from the upper right corner (3d page flips effect created in AE played by flash as a frame SEQ). Once it lands filling most of the center of the screen there is a bright flash. As the brightness fades we see the "page" is a bordered box with a BG image of a ball field(End). (Animation) from behind the border in fly small pictures (player head shots with name and jersey number). They stream in and form a circle like a wagon train and the team logo zooms up from infinity to the center of the circle(End). As the user mouses over a player's pic it zooms up a little and gets brighter (like mouseover image nav thumbs for a image slider). If the user clicks on a player's head shot it flips over and scales up to become a text box with a scrollbar. The content of the box is a mix of images, static and dynamic text fields populated from data in an "player info data base" XML file, and some hyperlinks. It's all kept updated dynamicaly with current stats, info and images from the XML file. There is also a "PDF" button that allows the user to open/save/print a PDF of the player's profile (the PDF's are static files for now but the choice of which pdf to retrive is dynamicaly supplied via the XML file.
    So.... Is Edge now able to do something like this?  Would it need to be a collection of small animations? could these be "assembled" and connected as an asset in dreamweaver ?
    I thought I would approach this from the end (ie click on an image and display a box with dynamic TEXT fileds. ) since that is the most important part, ie displaying the dynamicaly updated profile info.  Sooooo....
    Can Edge display a scrolling text box with Images, static text, and html dynamic text in it??
    Joel

    The code is in composition ready. Click the filled {}

  • How Do I Display HTML Formatted Text From A Data Table In Crystal Reports?

    I'm creating reports in Crystal XI.  The information being displayed in the reports comes from data tables where the text is formatted in HTML.
    I've worked with Crystal Reports enough to know that HTML text pulled from a data table doesn't appear in Crystal the same way it does in a web browser.  Crystal Reports ignores all the tags (...unless I'm missing something...) and just displays the text.
    Someone far more Crystal savy than I (...who I don't have access to...) came up with a Formula Field workaround that tricks Crystal Reports into displaying some basic HTML tags.  Here's that workaround:
    <!--
    stringVar TableName := ;
    TableName := Replace (TableName, "<ul>","<br> <br>");
    TableName := Replace (TableName, "<li>", "<br>   &bull; ");
    TableName := Replace (TableName, "</li>", "");
    TableName := Replace (TableName, "</ul>","<br> <br>");
    TableName := Replace (TableName, "<a", "<u><font color='blue'");
    TableName := Replace (TableName, "</a>", "</font></u>");
    TableName
    -->
    QUESTION - Does any similar workaround exist so I can display an HTML Table in Crystal Reports?  If not, is there any way to display HTML formatted text from a data table in Crystal Reports as it would appear in a web browser?

    Hi Steven,
    To display html text in Crystal Reports follows these steps.
    1. Right click on the field and select Paragraph tab.
    2. Under 'Text Interpretation' select 'HTML Text' and click OK.
    I have tried using the way,but it never works.So reply me if there is any way to solve the issue

Maybe you are looking for

  • How do I convert B&W penciled book drawings to illustrator colored drawings?

    I have an opportunity to with an author, but I want to see if I can even deliver first, if I were to take it. The author needs her current black and white drawings to be in color for a book she is putting online. The images for the children's book ar

  • Seeburger Message Catalogue - Need more messages

    Hi experts, very quick and simple question. The standard catalogue given with Seeburger MD/BIC doesn't have some of the messages I need e.g. INVOIC (D01B) and a few others. What is the best and quickest way to achieve this? I doubt if there are any f

  • Download documents

    Hi, I need to put a button in a report to make a download of a file that is stored in a blob field in a table. How can i do it? Is it possible to do it with wizards? In the forum I see a lot of questions about this, in all of them they use the instru

  • "Exception in "AWT-EventQueue-0""

    friends, i am getting an error when i want to handle an event(i.e. a button pressed). the error it throws is as follows Exception in "AWT-EventQueue-0" java.lang.NullPointerException at MyClass.actionPerformed(MyClass.java:12) actually i want to plac

  • How to create a ztable using program?

    Hi , How to create a ztable using programs for SQL database? Can anyone plz explain it with a program! Thanks in advance.