Parse html (a href) using regex
Hello,
i would like to extract all the urls from a website that are included in < a href=" parse string">
I have already the regex which is
String regex = "< *a.*href *= *['|\"]";
May you please advise me which method in Pattern or Matcher classes shall i use in order to take as output
*only* the url inside the " " marks?
I have already tried end and start methods which return the indexes, but i don't get the desirable result.
Thanks, in advance
P.S.Also, i have already tried to use HtmlParser but i prefer to use regex cause i found a difficulty in it.
Please continue in your original thread.
[http://forums.sun.com/thread.jspa?threadID=5363751]
Similar Messages
-
Parsing HTML into DOM using HTMLEditorKit
I am trying to parse an HTML file using javax.swing.text.html.HTMLEditorKit. My limitations are that I cannot install new libraries like jtidy and I must use a .jsp file, not a servlet. I'm able to get the url and parse it using ParserCallBack, but the new handleText method will not write to the page. Further more I cannot pass anything out of this method to use later because it is void. I want to get some data back from this method or at least do something useful within it. Is that possible?
java.net.URL url = new java.net.URL("http://" + request.getServerName() + "/" + urls.get(i));
java.io.InputStream is = url.openStream();
java.io.InputStreamReader isr = new java.io.InputStreamReader(is);
java.io.BufferedReader br = new java.io.BufferedReader(isr);
javax.swing.text.html.HTMLEditorKit.ParserCallback callback =
new javax.swing.text.html.HTMLEditorKit.ParserCallback () {
public void handleText(char[] data, int pos) {
out.println(data);
new javax.swing.text.html.parser.ParserDelegator().parse(br, callback, false);Attempting to print from within this method gives this error:
Attempt to use a non-final variable out from a different method. From enclosing blocks, only final local variables are available.
Maybe I need to try and write the output xml file all from inside the parserCallback?Those are rather stupid requirements. Okay, I can see the one about not using external libraries because nobody knows how to deal with the licences. But making you use a JSP instead of a servlet just gets in the way of writing the Java code which you could probably do perfectly well if you didn't have to cram it into a JSP scriptlet. Stupid.
But anyway: the error message says you need a final local variable. So don't just sit there, give it a final local variable. I forget just what type "out" is supposed to be, but something like "final JSPWriter fakeOut = out", followed by using "fakeOut" rather than "out" should work. -
Parsing HTML using Swing's HTMLEditorKit
Hi all,
I posted this question on the "Java programming", but I think I posted on the wrong forum. So, please let me know if I have posted on the wrong forum, again.
Anyway, I have read an article on parsing HTML using the Swing HTML Parser (http://java.sun.com/products/jfc/tsc/articles/bookmarks/index.html). However, I find that the HTMLEditorKit is unable to understand the <Meta> tag under the <Head> tag? Is this true? I am getting an error message:
javax.swing.text.ChangedCharSetException
at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:172)
at javax.swing.text.html.parser.Parser.startTag(Parser.java:327)
at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1786)
at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1821)
at javax.swing.text.html.parser.Parser.parse(Parser.java:1980)
at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:109)
at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:74)
at URLReader.main(URLReader.java:58)
Below is a simple code to write out the html file it reads in:
public static void main(String[] args) throws Exception {
HTMLEditorKit.ParserCallback callback = new HTMLEditorKit.ParserCallback () {
public void handleText(char[] data, int pos) {
try {
System.out.println(data);
} catch (Exception e) {
System.out.println("IOE: " + e);
Reader reader = new FileReader("myFile.html");
new ParserDelegator().parse(reader, callback, false);
The html file that is having a problem reading in is:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>NWS WSR-88D Radar System Transmit/Receive Status</title>
</head>
<p>A <foo>xx</foo>link</html>
If I take away <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">, there is no problem.
Any suggestions? Thanks in advance.Hi,
Setting the third argument really works!!! Yee..... haa....!!!
WORKING SOLUTION: new ParserDelegator().parse(reader, callback, TRUE);
MANY... MANY THANKS for looking at the problem!!!
Send third argument in parse method as true. -
Hello,
I have a question about parsing HTML files. Usually when I get an HTML file and I need to find all the text in it I do this. This stuff just collects all of the hyperlinks and ignores all the html tags just keeping the actual text. It's fine for smaller files but occasionally I'll hit a large online text file and it will work but its way to slow for large files. I don't need to do all of this HTML tag stripping however for text files. Is there a way to still grab all the text without doing any tag searching to make it faster?
thanks,
private void find() throws IOException
//Really slow for large text files. Need a way to just use a regular scanner on an internet text file
new ParserDelegator().parse(new InputStreamReader(myBase.openStream()),
new ParserListener(),
true);
* Inner class for processing all "<a href.."> tags when reading a base URL.
private class ParserListener extends HTMLEditorKit.ParserCallback
final String IGNORED_LINKS = "^(http|mailto|\\W).*";
public void handleStartTag (HTML.Tag t, MutableAttributeSet a, int pos)
if (t == HTML.Tag.A)
String href = (String)(a.getAttribute(HTML.Attribute.HREF));
//System.out.println(href);
//System.out.println(href.matches(IGNORED_LINKS) + "\t" + href);
if (! (href == null || href.matches(IGNORED_LINKS)) && !myURLs.contains(href))
myURLs.add(href);
//TODO fix
if (t == HTML.Tag.TITLE)
String title = (String) (a.getAttribute(HTML.Attribute.TITLE));
if (!(title == null))
myTitle = title;
else myTitle = "No title was found";
public void handleText (char[] data, int pos)
myText.append(" ");
myText.append(data);
}JFactor2004 wrote:
My question is. If I know an html file is actually just a txt fileThis isn't a question. HTML files are text by definition.
is it possible to look through it (maybe use something similar to a regular scanner) without doing anything with html.That depends on what you mean by "doing something with HTML". You can certainly read it one line at a time. -
Please see my small code ( parsing html)
The following code extracts URLs from a webpage. It is working fine for most of the URLs. But not for some, like
http://www.sun.com/java
http://www.kraftfoods.com/
http://www.kitchen-bath.com/
Actually , I observed for these URLs, they are getting redirected. How can I over come this?
Thanks.
Note : there is some redundant code in the program.
import java.io.*;
import java.util.regex.*;
import java.net.*;
import java.util.*;
import java.lang.reflect.*;
import javax.swing.text.html.*;
import javax.swing.text.*;
class Out
public static String[] getLinks(String uriStr) {
List result = new ArrayList();
try {
URL locator = new URL(uriStr);
URLConnection connection = locator.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Q312461)");
connection.connect();
// BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
Reader rd = new InputStreamReader(connection.getInputStream());
// Parse the HTML
EditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
kit.read(rd, doc, 0);
// Find all the A elements in the HTML document
HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
while (it.isValid()) {
SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
String link = (String)s.getAttribute(HTML.Attribute.HREF);
if (link != null) {
// Add the link to the result list
result.add(link);
it.next();
} catch (MalformedURLException e) {
System.out.println("In Out.java");
System.out.println(e);
e.printStackTrace();
} catch (BadLocationException e) {
System.out.println("In Out.java");
System.out.println(e);
e.printStackTrace();
} catch (IOException e) {
System.out.println("In Out.java");
System.out.println(e);
e.printStackTrace();
// Return all found links
return (String[])result.toArray(new String[result.size()]);
public static void main(String[] args)
String links[] = getLinks(args[0]);
System.out.println(links.length);
for(int i = 0 ; i < links.length ; i++)
System.out.println(links);I made the following changes.Still it is not working.
URL locator = new URL(uriStr);
HttpURLConnection connection = (HttpURLConnection)locator.openConnection();
connection.setInstanceFollowRedirects(true);
Can anyone help me??
Thanks. -
DocumentParser parsing HTML ...
i am parsing HTML of website through this
HTMLEditorKit.Parser parser = new javax.swing.text.html.parser.ParserDelegator();
i was able to parse www.yahoo.com
its html code (first few lines)
<html><head>
<script language=javascript>
var now=new Date,t1=0,t2=0,t3=0,t4=0,t5=0,t6=0,cc='',ylp='';t1=now.getTime();
</script>
<title>Yahoo!</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l r (cz 1 lz 1 nz 1 oz 1 vz 1) gen true for "http://www.yahoo.com" r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0) gen true for "http://www.yahoo.com" r (n 0 s 0 v 0 l 0))'>
<base href="http://www.yahoo.com/_ylh=X3oDMTEwZGh2NmNjBF9TAzI3MTYxNDkEdGVzdAMwBHRtcGwDaW5kZXgtdGJs/" target=_top>
<script language=javascript>------------
and my corresponding log goes like this ....
0 DEBUG [main] - Start :html
15 DEBUG [main] - Start :head
15 DEBUG [main] - Start :script
15 DEBUG [main] - End :script
15 DEBUG [main] - Start :title
15 DEBUG [main] - End :title
15 DEBUG [main] - meta -- http-equiv=Content-Type content=text/html; charset=UTF-8
31 DEBUG [main] - meta -- http-equiv=PICS-Label content=(PICS-1.1 "http://www.icra.org/ratingsv02.html" l r (cz 1 lz 1 nz 1 oz 1 vz 1) gen true for "http://www.yahoo.com" r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0) gen true for "http://www.yahoo.com" r (n 0 s 0 v 0 l 0))
31 INFO [main] - base http://www.yahoo.com/_ylh=X3oDMTEwZGh2NmNjBF9TAzI3MTYxNDkEdGVzdAMwBHRtcGwDaW5kZXgtdGJs/
31 DEBUG [main] - Start :script
31 DEBUG [main] - End :script
31 DEBUG [main] - Start :script
62 DEBUG [main] - End :script
62 DEBUG [main] - Start :style
62 DEBUG [main] - End :style
62 DEBUG [main] - Start :script
next I parsed www.java.sun.com/index.html
its html code (first few lines ) goes like this ...
<html>
<head>
<title>Java Technology</title>
<meta name="keywords" content="Java, platform" />
<meta name="description" content="Java technology is a portfolio of products that are based on the power of networks and the idea that the same software should run on many different kinds of systems and devices." />
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<meta name="date" content="2003-11-23" />
<link rel="stylesheet" href="/css/default_developer.css" />
<script type="text/javascript" language="JavaScript" src="/js/popUp.js"></script>
<script type="text/javascript" language="JavaScript" src="/js/support_incident.js"></script>
<link href="http://developers.sun.com/rss/java.xml" rel="alternate" type="application/rss+xml" title="rss" />
</head>
<!--stopindex-->
<body leftmargin="0"....-----
and my corresponding log goes like this ...
0 DEBUG [main] - Start :html
16 DEBUG [main] - Start :head
16 DEBUG [main] - Start :title
16 DEBUG [main] - End :title
16 INFO [main] - meta --- name=keywords content=Java, platform
16 DEBUG [main] - End :head
16 DEBUG [main] - Start :body
16 DEBUG [main] - Simple Tag :linkNow as u can see from the logs that the META TAG of yahoo was read in twice by the Parser while the META TAG of java.sun.com/index.html was read only once.
One visible difference between the html of these two tags is that the META tag of yahoo page doesnt has a closing tag (isnt well formed) whereas the META tag of java.sun.com is well formed.
why is the meta tag (of java.sun.com) being ignored by the parser ?
Is it because of this...
javax.swing.text.html.parser.Parser.java , method boolean ignoreElement(Element elem) : line 429
returns true for ignoring meta tag in html file...
is my problem due to this?
how can i possibly overcome this :-(
Code for my Callback class looks like this ...
HTMLEditorKit.ParserCallback parserCallback = new HTMLEditorKit.ParserCallback()
public void handleStartTag(HTML.Tag t, MutableAttributeSet a , int pos)
try {
if (t==HTML.Tag.A)
String hrefValue = (String)a.getAttribute(HTML.Attribute.HREF);
logger.log(Level.INFO,t + " " + hrefValue);
else
logger.log(Level.DEBUG,"Start :"+t );
catch(Exception e){ e.printStackTrace(); }
public void handleEndTag(HTML.Tag t, int pos)
try {
logger.log(Level.DEBUG, "End :"+t);
catch(Exception e){ e.printStackTrace(); }
public void handleSimpleTag(HTML.Tag t , MutableAttributeSet a,int pos)
try
if (t== HTML.Tag.BASE )
String hrefValue = (String)a.getAttribute(HTML.Attribute.HREF);
logger.log(Level.INFO,t + " " + hrefValue);
else if (t == HTML.Tag.FRAME)
String srcValue= (String)a.getAttribute(HTML.Attribute.SRC);
logger.log(Level.INFO, t +" "+ srcValue);
else if (t == HTML.Tag.META)
String nm = (String)a.getAttribute(HTML.Attribute.NAME);
String content = (String)a.getAttribute(HTML.Attribute.CONTENT);
if ("keywords".equalsIgnoreCase(nm) || "description".equalsIgnoreCase(nm))
// i found it
logger.log(Level.INFO, t + " --- " + a);
else
logger.log(Level.DEBUG,t + " -- " + a);
else
logger.log(Level.DEBUG,"Simple Tag :" + t);
catch(Exception e){ e.printStackTrace(); }
};I want to read the values in meta tag attributes "name" , "content" where <meta name="keywords" content="asdfasdfasdf" > or <meta name="description" content="asdfasdfasdf">
?ok ...
then if there is some other way to be able to read in html tags such as meta , a (anchor) , base , frame ( only these tags matter to me ) without being concerned abt the way their html has been coded .............. then plz tell me ...
searching internet showed that their are html parser that use stringtokenizer kind of ways to read in html ...
has anyone over here use anything like this ever...... -
Hi all,
I am using JEditorPane and it's ability to parse HTML, which although is relatively old and crusty is certainly all I need for the job.
Now, I understand there is a chain of classes involved in taking my .html file and turning popping into a something we can see in a JEditorPane. For example, an img tag, is picked up by HTMLEditorKit and turned into an ImageView for display purposes.
I want to do the following: I have subclassed HTMLEditorKit, and have overridden the HTMLFactory (although at the moment it just defers everything to super). I want to be able to pick out all of the html comment tags as they go through the HTMLEditorKit :
<!-- hey hey this is a comment -->... and get to the comment text, "hey hey this is a comment", as a Java string. However I've been digging around with Element for hours now and although my HTMLFactory correctly digs out the comments from the rest of the elements:
else if (kind == HTML.Tag.COMMENT)
{System.out.println("I found a comment but don't know what it said!!");... as you can see, I don't know how to get to the comment text itself.
The reason why I want access to the comment text is that I want to supplement the HTML code a little bit and add something in the comment that will affect the way it is rendered when I read it depending on the comment - so there's the reason if curious.
Any help, and I do mean anything at all, would be much appreciated, as this is the last obstacle in my path to getting this thing working :)
Thanks for your time!
- PeterHere is some old code I have lying around that attempts to iterate through all the elements. If I remember correctly the comment text is found in the AttributeSet of the element:
import java.io.*;
import java.net.*;
import java.util.*;
import javax.swing.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
class GetHTML
public static void main(String[] args)
EditorKit kit = new HTMLEditorKit();
Document doc = kit.createDefaultDocument();
// The Document class does not yet handle charset's properly.
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
try
// Create a reader on the HTML content.
Reader rd = getReader(args[0]);
// Parse the HTML.
kit.read(rd, doc, 0);
System.out.println( doc.getText(0, doc.getLength()) );
System.out.println("----");
// Iterate through the elements of the HTML document.
ElementIterator it = new ElementIterator(doc);
Element elem = null;
while ( (elem = it.next()) != null )
AttributeSet as = elem.getAttributes();
System.out.println( "\n" + elem.getName() + " : " + as.getAttributeCount() );
if ( elem.getName().equals( HTML.Tag.IMG.toString() ) )
Object o = elem.getAttributes().getAttribute( HTML.Attribute.SRC );
System.out.println( o );
Enumeration enum = as.getAttributeNames();
while( enum.hasMoreElements() )
Object name = enum.nextElement();
Object value = as.getAttribute( name );
System.out.println( "\t" + name + " : " + value );
if (value instanceof DefaultComboBoxModel)
DefaultComboBoxModel model = (DefaultComboBoxModel)value;
for (int j = 0; j < model.getSize(); j++)
Object o = model.getElementAt(j);
Object selected = model.getSelectedItem();
if ( o.equals( selected ) )
System.out.println( o + " : selected" );
else
System.out.println( o );
if ( elem.getName().equals( HTML.Tag.SELECT.toString() ) )
Object o = as.getAttribute( HTML.Attribute.ID );
System.out.println( o );
// Wierd, the text for each tag is stored in a 'content' element
if (elem.getElementCount() == 0)
int start = elem.getStartOffset();
int end = elem.getEndOffset();
System.out.println( "\t" + doc.getText(start, end - start) );
catch (Exception e)
e.printStackTrace();
System.exit(1);
// Returns a reader on the HTML data. If 'uri' begins
// with "http:", it's treated as a URL; otherwise,
// it's assumed to be a local filename.
static Reader getReader(String uri)
throws IOException
// Retrieve from Internet.
if (uri.startsWith("http:"))
URLConnection conn = new URL(uri).openConnection();
return new InputStreamReader(conn.getInputStream());
// Retrieve from file.
else
return new FileReader(uri);
}To test it just use:
java GetHTML somefile.html -
I am trying to write an application that uses a parsed html document to perform some data retrieval. The problem that I am having is that the parser in JDK1.4.1 is unable to completely parse the document correctly. Some fields are skipped as well as other problems. I believe it has to do with the html32.bdtd. Is there a later version?
Parsing a HTML document is a huge task, you shouldn't do it yourself but instead javax.text.html and javax.text.html.parser already provide almost everything you ever need
-
Html a href functionality in JLabel?
hi,
i read that JLabels are able to parse HTML.
whats about an <a href> Link?As far as I know, only JEditorPane has functionality for displaying hyperlinks, let alone clicking them. See the methods setPage(URL), setContentType(String), and addHyperlinkListener(HyperlinkListener) in class javax.swing.JEditorPane.
If I'm wrong, I'd love to know, cause this would be quite handy! -
HTML a href Attribute in Safari
I have set up my personal homepage using a number of HTML a href attributes inside a table, so that when I click on one of these, it will take me down to the appropriate anchor defined by HTML a name. This works fine in Firefox, but does not seem to work in Safari 3.2.1. In Safari, it simply stays put and does not take me down to the appropriate anchor.
Example of my HTML:
<tr>
<td>Favourites</td>
<td>Airlines</td>
<td>Architecture</td>
<td>Art, Images, Pictures, Photography & Posters</td>
<td>Books, Book Stores, Literature, Literary Societies</td>
<td>Cars</td>
<td>CCCV</td>
</tr>
Airlines
Cathay Pacific
Qantas
Frequent Flyer
Singapore Airlines
Virgin Blue Australia
<hr>
Architecture
Monument Environments
<hr>Hi Alex,
Have you tried dropping the <base> tag in the header? That should make the named anchors load quicker as well as the browser won't attempt to re-download the page all over again.
Also, your DOCTYPE declaration seems a bit out of date. I don't think it's affecting anything as such but you might want to update it. I only mention it as the W3C validator check [marks it as being problematic|http://validator.w3.org/check?uri=http%3A%2F%2Fmembers.optusnet.com .au%2Falexcywong%2F&charset=(detect+automatically)&doctype=Inline&group=0].
I should note that I can't actually test that recommended change above in Safari as I'm not on a Mac at the moment, but it does help with a browser that uses WebKit on my current machine.
Hope that helps. -
What API or package can I use to parse an HTML page and to obtain
HTML DOM interfaces.Use JTidy to make the HTML well-formed, then use the DOM parser in the Xerces API:
JTidy (recommended by W3C, so its probably pretty good):
http://www.w3.org/People/Raggett/tidy/
http://sourceforge.net/projects/jtidy -
Hi,
I already have a Java program that is able to read in html files that are stored on my computers hard drive. Now I would like to expand its functionality by being able to parse html files straight from the web.
For example, when the program is run, I would like to be able to give it an url for a given website. Then, I would like to be able to parse the html file that the link goes to.
I've searched the forum, but have not been able to find anything of any real use. If you could offer an overview or point me towards a resource, I would be very greatful.If you've done things right, you have a HTML reader/parser that takes an InputStream. For Files, this would be a FileInputStream.
For URLs, this would be the InputStream you get from URLConnection.getInputStream(). You can get a URLConnection by calling openConnection() on a URL instance (created from your input url of course). -
Hi guys, like to know the best open source API to parse HTML and get required data from it? Hopefully one thats uses SAX Parser but the HTML not fully XML compliant, i.e XHMTL
Thanks
AbeThanks I found my anser to use Jericho HTML Parser. Any of you guys know of a better one?
Thanks
Abe -
Hi,
Can anybody explain the behavior of SunOne Web server when parse HTML is enabled for all html.
If we have a valid html in the web server for instance http://servername/myhtml.html , then the page will be loaded by web server. The same case if we put anything in the URL after that then also the sma epage will get served. Consider an URL http://servername/myhtml.html/ahdjksad/asdhjsad/sdhjklsad/asjdksald (Anything after that HTML), web server would be able to load the myhtml.html page. If we disable the parse HTML , then it won�t work. I want a way to work the Server side includes where it shouldn�t server such wrong URLs.
How this comes? If we look on the web server the path won�t be there on the server and it should give 404 error. Is it the way parse HTML works? Is there any way to restrict it by keeping the parse HTML functionality for server side includes enabled other than custom NSAPI?
If anyone noticed this behavior please explain.
Thanks,
Rijesh.Hi,
Acually i load one external html using LoadVars class
methods.
var oLoad:LoadVars=new LoadVars();
oLoad.load("external.html");
I want to parse that html in flash.
I need some text from html page(html is having 200 line code)
How can i parse that html and trace that particular text. -
Counting lines of parsed HTML documents
Hello,
I am using a HTMLEditorKit.ParserCallback to handle data generated by a ParserDelegator.
Everything is ok but I can not find how to catch end of lines (I need to know at what line a tag or an attribute is found).
Thanks in advance for any hints.I noticed that the parse() method of ParserDelegator creates a DocumentParser object to do the actual parsing of the HTML document. DocumentParser contains a method getCurrentLine(). So, I tried to extending ParserDelegator so I could access Document Parser. However, the getCurrentLine method is protected so I ended up also extending DocumentParser.
You probably have code something like:
new MyParserDelegator().parse(reader, this, false);
This should be replaced with:
parser = new MyParserDelegator();
parser.parse(reader, this, false);
where you defined an instance variable: MyParserDelegator parser;
You can now use parser.getCurrentLine() in any of you parser callback methods.
Note that you may not alway get the results that you expect for the current line as many times I found the line to be 1 greater than I thought it should be. Anyway you can decide if the code is of any value.
Following is the code for MyParserDelegator and MyDocumentmentParser inner class. Good Luck.
import java.io.IOException;
import java.io.Reader;
import java.io.Serializable;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.DTD;
import javax.swing.text.html.parser.DocumentParser;
import javax.swing.text.html.parser.ParserDelegator;
public class MyParserDelegator extends ParserDelegator implements Serializable
MyDocumentParser parser;
public void parse(Reader r, HTMLEditorKit.ParserCallback cb, boolean ignoreCharSet) throws IOException
String name = "html32";
DTD dtd = createDTD( DTD.getDTD( name ), name );
parser = new MyDocumentParser(dtd);
parser.parse(r, cb, ignoreCharSet);
public int getCurrentLine()
return parser.getCurrentLine();
public class MyDocumentParser extends DocumentParser
public MyDocumentParser(DTD dtd)
super(dtd);
public int getCurrentLine()
return super.getCurrentLine();
Maybe you are looking for
-
My hp officejet 6310 all in one disc will not install
had some trouble and uninstalled my software in trying to reinstall but when i put my dice in nothing happens i am able to print and send a fax but it wont scan or recieve a fax
-
Use of robots.txt to disallow system/secure domain names?
I've got a client who's system and secure domains are ranking very high on google. My SEO advisor has mentioned that a key way to eliminate these URLs from google is through the use of disallowing content through robots.txt. Given BC's unique natur
-
I Can't found Document tab in pdf file. How I can add the Dcoument tab. So I can edit and delete the pages in a pdf file.
-
Elements 11 and Canon CR2 Compatability
Will Elements 11 open Canon CR2 files or Will I still need to convert to DNG?
-
Does Elements 10 work with mac os 10.8 thanks
Does Elements 10 work with mac os 10.8 thanks