JEditorPane & HTML Parser

I am creating a Swing application. It's main purpose is to help disabled person to handle the webpages. In the first half of the screen, the webpage is displayed, and in the second half of the screen the url links (those in the first half of the screen) with Big Buttons - to help patient navigate the webpages - are displayed.
I successfully displayed HTML contents using JEditorPane in the first screen. But I don't know how to handle "Javascript", "Style sheet", which are not supported by JEditorPane. They make the screen ugly.
One more, it seems that Parser class is a clue for the second screen in getting url links. Once I get url links, displaying in buttons is trivial. But I don't know how to use it. Any good examples?
I appreciate your help.
Ghiyoung

I think the mozilla project has a project for javascript....... Rhino
www.mozilla.org

Similar Messages

How to parse a HTML file using HTML parser in J2SE?

I want to parse an HTML file using HTML parser. Can any body help me by providing a sample code to parse the HTML file?
Thanks nad Cheers,
Amaresh

What HTML parser and what does "parsing" mean to you?

STYLE tag problem in HTML Parser.

Hi,
I am trying to parse a HTML file. I am able to extract context of various tags like Tag.SPAN,Tag.DIV and so...
I want to extract the text content of Tag.Style. What to do? The problem is that HTML Parser right now doesnot support this tag along with 5 more tags which are Tag.META,Tag.PARAM and so..
Please help me out.

Before responding to this posting, you may want to check out the discussion in the OP's previous posting on this topic:
http://forum.java.sun.com/thread.jspa?threadID=634938

Don't understand error message from HTML parser?

I've written a simple test program to parse a simple html file.
Everything works fine accept for the <img src="test.gif"> tag.
It understands the img tag and the handleSimpleTag gets called.
I can even pick out the src attribute. But I get a very strange error message.
When I run the test program below on the test.html file (also below) I get the following output:
handleError(134) = req.att srcimg?
What does "req.att srcimg?" mean?!?!?
/John
This is my test program:
import javax.swing.text.html.*;
import javax.swing.text.*;
import javax.swing.text.html.parser.*;
import java.io.*;
public class htmltest extends HTMLEditorKit.ParserCallback
public htmltest()
   super();
public void handleError(String errorMsg, int pos)
   System.err.println("handleError("+pos+") = " + errorMsg);
static public void main (String[] argv) throws Exception
    Reader reader = new FileReader("test.html");
    new ParserDelegator().parse(reader, new htmltest(), false);
This is the "test.html" file
<html>
<head>
</head>
<body>
This is a plain text.<br>
This is <b>bold</b> and this is <i>itallic</i>!<br>
<img src="test.gif">
"This >is also a plain test text."<br>
</body>
</html>
----------------------------------------------------------------------

The handleError() method is not well documented any more than whole javax.swing.text.html package and its design structure. You can ignore the behavior of the method if other result of the parser and your HTML file are proper.

Attempting to use HTML parser - getAttribute() not preforming as expected.

How am I mis-using getAttribute()?
I am expecting (String)a.getAttribute((String)"name") to give me a value other than null in the below example. What am I doing wrong?
The HTML test source (missing headers/body so yes its not proper)
<input name="unit_1" size=5 maxsize=5 value="hr">
<input name="qty_1" size=5 value=4>
<input name="unit_1" size=5 maxsize=5 value="hr">
<input name="partnumber_1" size=10 value="Java Work">
<input name="description_1" size=50 value="Slip shod work at outragous prices">
<input name="sellprice_1" size=9 value=185.00>
<input name="discount_1" size=3 value=>
What I'd like to see is this:
About to parse test
Parsing error: invalid.tagattmaxsizeinput? at 39
Tag start(<html>, 1 attrs)
Tag start(<head>, 1 attrs)
Tag end(</head>)
Tag start(<body>, 1 attrs)
Tag(<input>, 4 attrs)
found input
unit_1
hr
Tag(<input>, 3 attrs)
found input
qty_1
4
Rather than this:
About to parse test
Parsing error: invalid.tagattmaxsizeinput? at 39
Tag start(<html>, 1 attrs)
Tag start(<head>, 1 attrs)
Tag end(</head>)
Tag start(<body>, 1 attrs)
Tag(<input>, 4 attrs)
found input
null
null
Tag(<input>, 3 attrs)
found input
null
null
The code that reads the HTML and give the output looks like this:
import java.io.*;
import java.net.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
* This small demo program shows how to use the
* HTMLEditorKit.Parser and its implementing class
* ParserDelegator in the Swing system.
class DataSaved {
String InputName;
String InputValue;
boolean IsHidden;
public class HtmlParseDemo {
public static void main(String [] args) {
DataSaved DataSet[];
Reader r;
if (args.length == 0) {
System.err.println("Usage: java HTMLParseDemo [url | file]");
System.exit(0);
String spec = args[0];
try {
if (spec.indexOf("://") > 0) {
URL u = new URL(spec);
Object content = u.getContent();
if (content instanceof InputStream) {
r = new InputStreamReader((InputStream)content);
else if (content instanceof Reader) {
r = (Reader)content;
else {
throw new Exception("Bad URL content type.");
else {
r = new FileReader(spec);
HTMLEditorKit.Parser parser;
System.out.println("About to parse " + spec);
parser = new ParserDelegator();
parser.parse(r, new HTMLParseLister(), true);
r.close();
catch (Exception e) {
System.err.println("Error: " + e);
e.printStackTrace(System.err);
* HTML parsing proceeds by calling a callback for
* each and every piece of the HTML document. This
* simple callback class simply prints an indented
* structural listing of the HTML data.
class HTMLParseLister extends HTMLEditorKit.ParserCallback
int indentSize = 0;
protected void indent() {
indentSize += 3;
protected void unIndent() {
indentSize -= 3; if (indentSize < 0) indentSize = 0;
protected void pIndent() {
for(int i = 0; i < indentSize; i++) System.out.print(" ");
public void handleText(char[] data, int pos) {
pIndent();
System.out.println("Text(" + data.length + " chars)");
public void handleComment(char[] data, int pos) {
pIndent();
System.out.println("Comment(" + data.length + " chars)");
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
pIndent();
System.out.println("Tag start(<" + t.toString() + ">, " +
a.getAttributeCount() + " attrs)");
indent();
public void handleEndTag(HTML.Tag t, int pos) {
unIndent();
pIndent();
System.out.println("Tag end(</" + t.toString() + ">)");
public void handleSimpleTag(HTML.Tag t, MutableAttributeSet a, int pos) {
String name;
String value;
boolean hidden;
pIndent();
System.out.println("Tag(<" + t.toString() + ">, " +
a.getAttributeCount() + " attrs)");
if( t==HTML.Tag.INPUT) {
System.out.println("found input");
name = (String)a.getAttribute((String)"name");
value = (String)a.getAttribute((String)"value");
System.out.println(name);
System.out.println(value);
public void handleError(String errorMsg, int pos){
System.out.println("Parsing error: " + errorMsg + " at " + pos);

System.out.println( a.getAttribute(HTML.Attribute.NAME) );

Exception in html parser under Linux

Hi all,
Following code is copied from Tech Tip 23Sep1999. I have compiled it and run it under Win98. It works fine for any uri. However, when I try to run it under Linux, it throws exceptions. I noticed that some web site can be parsered with the program in Linux but some can't. I wonder the different between those platforms. Anyone can tell me how to make the program works under Linux.
Rgds,
unplug
configuration
RedHat 7.1
JDK1.3.1
Failed: java GetLinks http://java.sun.com
Worked: java GetLinks http://www.apache.org
--begining of code
import java.io.*;
import java.net.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
class GetLinks {
public static void main(String[] args) {
EditorKit kit = new HTMLEditorKit();
Document doc = kit.createDefaultDocument();
// The Document class does not yet
// handle charset's properly.
doc.putProperty("IgnoreCharsetDirective",
Boolean.TRUE);
try {
// Create a reader on the HTML content.
Reader rd = getReader(args[0]);
// Parse the HTML.
kit.read(rd, doc, 0);
// Iterate through the elements
// of the HTML document.
ElementIterator it = new ElementIterator(doc);
javax.swing.text.Element elem;
while ((elem = it.next()) != null) {
SimpleAttributeSet s = (SimpleAttributeSet)
elem.getAttributes().getAttribute(HTML.Tag.A);
if (s != null) {
System.out.println(
s.getAttribute(HTML.Attribute.HREF));
} catch (Exception e) {
e.printStackTrace();
System.exit(1);
// Returns a reader on the HTML data. If 'uri' begins
// with "http:", it's treated as a URL; otherwise,
// it's assumed to be a local filename.
static Reader getReader(String uri)
throws IOException {
if (uri.startsWith("http:")) {
// Retrieve from Internet.
URLConnection conn=
new URL(uri).openConnection();
return new
InputStreamReader(conn.getInputStream());
} else {
// Retrieve from file.
return new FileReader(uri);
--End of code
--Exception in Linux
Exception in thread "main" java.lang.NoClassDefFoundError
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:120)
at java.awt.Toolkit$2.run(Toolkit.java:512)
at java.security.AccessController.doPrivileged(Native Method)
at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:503)
at javax.swing.text.html.CSS.getValidFontNameMapping(CSS.java:932)
at javax.swing.text.html.CSS$FontFamily.parseCssValue(CSS.java:1789)
at javax.swing.text.html.CSS.getInternalCSSValue(CSS.java:531)
at javax.swing.text.html.CSS.addInternalCSSValue(CSS.java:516)
at javax.swing.text.html.StyleSheet.addCSSAttribute(StyleSheet.java:436)
at javax.swing.text.html.HTMLDocument$HTMLReader$ConvertAction.start(HTM
LDocument.java:2536)
at javax.swing.text.html.HTMLDocument$HTMLReader.handleStartTag(HTMLDocu
ment.java:1992)
at javax.swing.text.html.parser.DocumentParser.handleStartTag(DocumentPa
rser.java:145)
at javax.swing.text.html.parser.Parser.startTag(Parser.java:333)
at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1786)
at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1821)
at javax.swing.text.html.parser.Parser.parse(Parser.java:1980)
at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java
:109)
at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.ja
va:74)
at javax.swing.text.html.HTMLEditorKit.read(HTMLEditorKit.java:239)
at GetLinks.main(GetLinks.java:23)

Support for CSS and clearly defined.Also Dictionary getDocumentProperties() is not properly exaplained meaning it doesnt give methods to get all the properties a HTML document can have.

HTML parser in J2ME

Hi all,
Even I'm stuck with the same problem. I'm developing a J2ME(MIDlet) application in which i have to open a http connection. N also i want to parse the html response n display the contents using J2ME elements in the mobile. I'm not able to solve this problem. Plz help me if any1 has come across the solution of this problem.
Below links are the related threads:
http://forums.sun.com/thread.jspa?forumID=76&threadID=250460
http://forums.sun.com/thread.jspa?forumID=76&threadID=5235530
Thanks in advance
Nandy

Hi All,
I like to ask if anyone knows if there is a HTML
parser available in J2ME? I am building an applicationTry google, a few do exist, but I don't know about free ones.
that needs to display HTML on the client.
Alternatively I may consider using XML, however I
learnt that parsing XML is expensive in terms of
computing power - is it the same for HTML?If you are controlling the content returned, the two would be about the same, as XML and HTML have the same roots. Some XML parsers do exist, and are free to use.
You might be best of returning a custom format, designed around the limitations of the device you are using .

Puting into JEditorPane HTML file

I want to put into JEditorPane HTML file from my local drive.
How to do that?

Hear is example try it.
*(c)pesilEX - 2007
* [email protected]
* Let�s make a open source software world
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
import java.net.URL;
public class pesilEX extends JFrame implements ActionListener{
JPanel cp;
// Declaring a url to get file name
// Also this can use to get url form online
public URL helpURL;
JScrollPane scrol;
JEditorPane htmlPane;
JButton btn1;
// This string is use to store file name
String fileName;
public pesilEX(){
     cp = (JPanel)getContentPane();
     cp.setLayout(null);
htmlPane = new JEditorPane();
htmlPane.setEditable(false);
scrol = new JScrollPane();
scrol.getViewport().add(htmlPane);
scrol.setBounds(10,10,370,300);
cp.add(scrol);
btn1 = new JButton("View!");
btn1.setBounds(260,320,120,20);
cp.add(btn1);
btn1.addActionListener(this);
public void actionPerformed(ActionEvent e){
     if(e.getSource()==btn1){
     // File name you can replace is as you want
fileName = "pesilEX/help.htm";
// Import html file to java
     helpURL = getClass().getClassLoader().getResource(fileName);
try{
// Set url to JEditorPane
     htmlPane.setPage(helpURL);
catch(Exception er){
     System.out.println(er.toString());
public static void main(String[] args){
pesilEX pesil = new pesilEX();
          pesil.setSize(400,400);
          pesil.setTitle("pesilEX JHtml viewer");
          pesil.setResizable(false);
          pesil.setVisible(true);
}

Error on HTML Parser

Hi,
I'm trying to parse a HTML page but I always get the same error, which is the following exception:
javax.swing.text.ChangedCharSetException
In the class ParserCallback I'm using the method handleError and it shows:
req.att contentmeta?
ioexception???
just before the exception occurs.
The only line where this error occurs in the html page is:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
and I know that the exact point is the attribute 'content'. If it is removed or changed to 'contenttype' the error desappears.
The problem is that I can't change the attribute because the html page is not mine, it is caught on the Web. And I don't want to remove it.
Anybody knows what is happening?
Thanks!!

i am also having a problem with html parsing in java
i have given a detailed / complete description of the problem on this link along with the log and my sample code ...
http://forum.java.sun.com/thread.jspa?threadID=643683&tstart=0
if u could see this ...

JRE1.5 swing.html parser fails to parse data between script tags

Hi all...
I've written a class that extends the java-provided default HTML parser to parse for text inside a table. In JRE1.4.* the parser works fine and extracts data between <script> tags as text. However now that I've upgraded to 1.5, the data between <script> tags are no longer parsed. Any suggestion anyone?
Steve

According to the API docs, the 1.5 parser supports HTML 3.2, for which the spec states that the content of SCRIPT and STYLE tags should not be rendered. I assume it doesn't have a scripting engine, so it won't get executed either.

Webpage (HTML) parsing...

Any ideas on how to parse an HTML page? I'm trying to do it with a StreamTokenizer but with little success. I don't think this class was made to do this sort of thing, Oridnarilly anyway. Is there a better choice? StringTokenizer? Here's what I have so far:
URLConnection uc = url.openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader
                                        (uc.getInputStream()));
StreamTokenizer stok = new StreamTokenizer(br);
stok.eolIsSignificant(false);
String inputLine;
for (int i=0; (stok.nextToken() != stok.TT_EOF); i++)
    System.out.println("token #" + i + stok.toString());
}It gives me a result like this:
token #0Token['<'], line 3
token #1Token[script], line 3
token #2Token[language], line 3
token #3Token['='], line 3
token #4Token[javascript], line 3
token #5Token['>'], line 3
token #6Token['<'], line 4
token #7Token['!'], line 4
token #8Token['-'], line 4
token #9Token['-'], line 4
token #10Token[function], line 5
token #11Token[dojump], line 5
token #12Token['('], line 5
token #13Token[')'], line 5
token #14Token['{'], line 6
token #15Token[document.location.href], line 7
token #16Token['='], line 7
token #17Token[play247.asp?page=promo&id=72&r=R2], line 7What I want is all the links that have "promo" as a parameter e.g. . Any suggestions?

Java has a callback parser, which notifies you when start/end tags are found. Then you can query the attributes and search for the desired string. Heres a sample to get you started:
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;
public class TestParser extends HTMLEditorKit.ParserCallback
     boolean ignoreText;
     public static void main(String[] args)
     throws IOException
          TestParser parser = new TestParser();
          // args[0] is the file to parse
          Reader reader = new FileReader(args[0]);
          try
               new ParserDelegator().parse(reader, parser, false);
          catch (IOException e)
               System.out.println(e);
     public void handleComment(char[] data, int pos)
          System.out.println(data);
     public void handleEndOfLineString(String eol)
     public void handleEndTag(HTML.Tag tag, int pos)
          System.out.println("/" + tag);
     public void handleError(String errorMsg, int pos)
          System.out.println(pos + ":" + errorMsg);
     public void handleMutableTag(HTML.Tag tag, MutableAttributeSet a, int pos)
          System.out.println("mutable:" + tag + ": " + pos + ": " + a);
     public void handleSimpleTag(HTML.Tag tag, MutableAttributeSet a, int pos)
          System.out.println( tag + ":" + a );
     public void handleStartTag(HTML.Tag tag, MutableAttributeSet a, int pos)
          System.out.println( tag + ":" + a );
     public void handleText(char[] data, int pos)
          System.out.println( data );

APPLESCRIPT AND HTML PARSING.

hi,
im new to applescript so im not quite sure if what i want to do is actually called html parsing.. but basically i want to put a variable in applescript that is linked to the actual html but i dont know how to make applescript access data inside a html code... to give u a better idea, inside the html is something like this:
100
now that value "100" changes but its maximum amount is 100. i want to create a script which responds to change when that value starts to drop by loading another link.
am i making sense? again the thing id like to achieve is make applescript use that value INSIDE the HTML as its own variable (and perform the right actions as that value changes)
any help would be appreciated.

In first place you could open the site you talked about in safari and run a little javascript via applescript to get that value.
Javascript is the "best" way to get a special value out of an HTML-Element, but only works in browsers.
e.g.
tell application "Safari"
open location "http://apple.com"
delay 6
set mypromo to do JavaScript "document.getElementById('promos').getElementsByTagName('a')[0].title" in document 1
display dialog "Title of first Promo is:" & return & mypromo
end tell
Or you could just d/l the pure source convert it to text and search for the phrase you are looking for
e.g.
set mysource_html to do shell script "curl http://mysite.org/bla.html"
set mysource_txt to do shell script "curl http://mysite.org/bla.html | textutil -stdin -convert txt -format html -stdout"
if mysource_html contains "<a>100</a>" then
display dialog "Hey, value of 100 is reached"
end if
--or something like
if mysource_txt contains "100" then
display dialog "Hey, value of 100 is reached"
end if

Java HTML Parser

What seems to be the best tool for HTML parser, not converting to XHTML unless its very robust and can handle any HTML page?
Been looking at JTidy - http://java-source.net/open-source/html-parsers/jtidy but that has some problems trying to convert from HTML to XHTML.
Jerico seems to parse HTML without converting to XHTML and looks reasonable
http://jerichohtml.sourceforge.net/doc/index.html,
Anyone tried other HTML parsers at http://java-source.net/open-source/html-parsers
Would like more information on other HTML parsers people have tried., preferably converting to XHTML without any problems, so we can use SAX parser to interpret the XML. Looking forward to your input
Kind Regards
Abs

It kiinda depends what you need to use if for.
Rent me and I'll tell you moregoogled it ;-)
http://sourceforge.net/projects/htmlparser/

Problem with HTML Parser and multiple instances

I have a parser program which queries a online shopping comparison web page and extracts the information needed. I am trying to run this program with different search terms which are created by entering a sentence, so each one is sent separately, however the outputs (text files) are the same for each word, despite the correct term and output file seeming passed. I suspect it might be that the connection is not being closed each time but am not sure why this is happening.
If i create an identical copy of the program and run that after the first one it works but this is not an appropriate solution.
Any help would be much appreciated. Here is some of my code, if more is required i will post.
To run the program:
StringTokenizer t = new StringTokenizer("red green yellow", " ");
        int c = 0;
        Parser1 p = new Parser1();
        while (t.hasMoreTokens()) {
            c++;
            String tok = t.nextToken();
            File tem = new File("C:/"+c+".txt");
                p.mainprog(tok, tem);
                p.mainprog(tok, tem)
                p.mainprog(tok, tem);
}The parser:
import javax.swing.text.html.parser.*;
import javax.swing.text.html.*;
import javax.swing.text.*;
import java.awt.*;
import java.util.*;
import javax.swing.*;
import java.io.*;
import java.net.*;
public class Parser1 extends HTMLEditorKit.ParserCallback {
    variable declarations
   public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos){
...methods
public void handleText(char[] data, int pos){
       ...methods
public void handleTitleTag(HTML.Tag t, char[] data){
public void handleEmptyTag(HTML.Tag t, char[] data){
public void handleSimpleTag(HTML.Tag t, MutableAttributeSet a, int pos){
...methods
      static void mainprog(String term, File file) {
...proxy and authentication methods
                    Authenticator.setDefault(new MyAuthenticator() );
                    HTMLEditorKit editorKit = new HTMLEditorKit();
                    HTMLDocument HTMLDoc;
                    Reader HTMLReader;
                  try {
                        String temp = new String(term);
                        String fullurl = new String(MainUrl+temp);
                        url = new URL(fullurl);
                        InputStream myInStream;
                        myInStream = url.openConnection().getInputStream();
                        HTMLReader = (new InputStreamReader(myInStream));
                        HTMLDoc = (HTMLDocument) editorKit.createDefaultDocument();
                        HTMLDoc.putProperty("IgnoreCharsetDirective", new Boolean(true));
                        ParserDelegator parser = new ParserDelegator();
                        HTMLEditorKit.ParserCallback callback = new Parser1();
                        parser.parse(HTMLReader, callback, true);
                        callback.flush();
                        HTMLReader.close();
                        myInStream.close();
                 catch (IOException IOE) {
                    IOE.printStackTrace();
                catch (Exception e) {
                    e.printStackTrace();
      try {
            FileWriter writer = new FileWriter(file);
            BufferedWriter bw = new BufferedWriter(writer);
            for (int i = 0; i < vect.size(); i++){
                bw.write((String)vect.elementAt(i));
                if (vect.elementAt(i)!=vect.lastElement()){
                    bw.newLine();
            bw.flush();
            bw.close();
            writer.close();
        catch (IOException IOE) {
                    IOE.printStackTrace();
                catch (Exception e) {
                    e.printStackTrace();
          }   catch (IOException IOE) {
                 System.out.println("User options not found.");
}

How many Directory Servers are you using?
Are both serverconfig.xml files of PS instances the same?
Set debug level to message in the appropriate AMConfig.properties of your portal instances and look into AM debug files.
For some reason amSDK seems not to get the correct service values.
-Bernhard

Html Parser

hai all,
I want to extract all the starting tags it's attributes and values ,links from a HTML page. now i can extract only starting tag using the code below
public void parse(File HTMLFile) throws IOException {
          DTD dtd = DTD.getDTD("jsp.dtd");
          Parser parser = new Parser(dtd )     {
          @Override
          protected void startTag(TagElement element) throws ChangedCharSetException{
          System.out.println("Start tag: " + element.getElement().getName());
          try {
     parser.parse(new FileReader(HTMLFile));
catch (Exception e) {
// Catch exception if any
          System.err.println("Error: " + e.getMessage());
}

there was an article writen about making an HTML parser on this site I think somewhere....

JEditorPane & HTML Parser

Similar Messages

Maybe you are looking for