Parse HTML document embedded in IFRAME

Dear fellows:
How can I access contents of an HTML document embedded in an IFRAME tag, by using java class HTMLEditorKit.Parser?
It is well known that the contents of such embedded HTML document can be accessed by javascript at front end. However, I am more interested on processing it at backend, using HTMLEditorKit.Parser, or any java swing API.
Thanks for help.

The javax.swing.text.html framework barely supports HTML 3.2.

Similar Messages

Parsing HTML documents

I am trying to write an application that uses a parsed html document to perform some data retrieval. The problem that I am having is that the parser in JDK1.4.1 is unable to completely parse the document correctly. Some fields are skipped as well as other problems. I believe it has to do with the html32.bdtd. Is there a later version?

Parsing a HTML document is a huge task, you shouldn't do it yourself but instead javax.text.html and javax.text.html.parser already provide almost everything you ever need

Counting lines of parsed HTML documents

Hello,
I am using a HTMLEditorKit.ParserCallback to handle data generated by a ParserDelegator.
Everything is ok but I can not find how to catch end of lines (I need to know at what line a tag or an attribute is found).
Thanks in advance for any hints.

I noticed that the parse() method of ParserDelegator creates a DocumentParser object to do the actual parsing of the HTML document. DocumentParser contains a method getCurrentLine(). So, I tried to extending ParserDelegator so I could access Document Parser. However, the getCurrentLine method is protected so I ended up also extending DocumentParser.
You probably have code something like:
new MyParserDelegator().parse(reader, this, false);
This should be replaced with:
parser = new MyParserDelegator();
parser.parse(reader, this, false);
where you defined an instance variable: MyParserDelegator parser;
You can now use parser.getCurrentLine() in any of you parser callback methods.
Note that you may not alway get the results that you expect for the current line as many times I found the line to be 1 greater than I thought it should be. Anyway you can decide if the code is of any value.
Following is the code for MyParserDelegator and MyDocumentmentParser inner class. Good Luck.
import java.io.IOException;
import java.io.Reader;
import java.io.Serializable;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.DTD;
import javax.swing.text.html.parser.DocumentParser;
import javax.swing.text.html.parser.ParserDelegator;
public class MyParserDelegator extends ParserDelegator implements Serializable
     MyDocumentParser parser;
public void parse(Reader r, HTMLEditorKit.ParserCallback cb, boolean ignoreCharSet) throws IOException
     String name = "html32";
     DTD dtd = createDTD( DTD.getDTD( name ), name );
          parser = new MyDocumentParser(dtd);
          parser.parse(r, cb, ignoreCharSet);
     public int getCurrentLine()
          return parser.getCurrentLine();
public class MyDocumentParser extends DocumentParser
     public MyDocumentParser(DTD dtd)
          super(dtd);
     public int getCurrentLine()
          return super.getCurrentLine();

Fast Response Appreciated - Calling Edge Function from an HTML page embedded in iFrame within Comp

This is a tricky one, but it may just be that I don't understand the syntax,
Situation:
I have an Edge Animate composition that is acting as an interface and container into which other content is embedded using an iFrame. I have several functions in Stage > creationComplete, and for one of the embedded content pieces, I want to include a button that calls one of the Edge functions.
Challenge:
I have read about referencing elements within Edge when it is the Edge file that is embedded on an HTML page, but I cannot figure out how to reference Edge in the reverse.
I have tried these options where headerselect(page) is my function:
AdobeEdge.getComposition("EDGE-531849691").getStage().headerSelect("community");
window.top.Edge.getComposition("EDGE-531849691").getStage().headerSelect("community");
This is for a project that is time sensitive. Your immediate help is greatly appreciated!
Thanks!
Fred

Justin,
Looks like someone has already responded to your post. Did that answer your
question? In their suggestion, they indicated that you could actually
created the jplayer instance within the Edge composition, and therefore
have more direct access to its events. If that will not work for you, it
may help to know how your page is laid out. Where is the Edge file in
relation to the jplayer and how do you intend for them to interact? Does
the player need to be outside of Edge for some reason?
Let me know if you still need help. Thanks!
Fred

Parsing HTML characters (e.g. &nbsp)

Hi
Apologies if I'm missing something obvious, I haven't been able to find an answer searching the API or Forums...
I'm parsing HTML documents (currently as Strings) to extract certain information. Is there an easy way to replace all special HTML characters such as < etc. to a space or < respectively without having to do a string replace on every possible HTML character?
I know there's an HTML parser in swing but that seems to be geared towards creating an HTML editor.
Any help would be appreciated!

There are also a number of open source or shareware programs, such as TidyHTML, that clean-up and parse existing HTML. Check out Sourceforge or www.downloads.com.
- Saish

Problem parsing a html document

Hi all,
I need to parse a html document.
InputStream is = new java.io.FileInputStream(new File("c:/temp/htmldoc.html"));
DOMFragmentParser DOMparser = new DOMFragmentParser();
DocumentFragment doc = new HTMLDocumentImpl().createDocumentFragment();
DOMparser.parse(new InputSource(is), doc);
NodeList nl = doc.getChildNodes();
I get just 3 of the following nodes...... though the document htmldoc.html is a proper html doc..
#document-fragment
HTML
#text
Any suggestions/help are most welcome. Thanks

Here's an example showing how to do this via javax.xml:
import java.io.*;
import java.net.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
public class HTMLElementLister {
     public static void main(String[] args) throws Exception {
          URLConnection con = new URL("http://www.mywebsite.com/index.html").openConnection();
          con.connect();
          InputStream in = (InputStream)con.getContent();
          Document doc = null;
          try {
               DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
               DocumentBuilder db = dbf.newDocumentBuilder();
               doc = db.parse(in);
          } finally {
               in.close();
          NodeList nodes = doc.getChildNodes();
          for (int i=0; i<nodes.getLength(); i++) {
               Node node = nodes.item(i);
               String nodeName = node.getNodeName();
               System.out.println(nodeName);
               if ("html".equalsIgnoreCase(nodeName)) {
                    System.out.println("|");
                    NodeList grandkids = node.getChildNodes();
                    for (int j=0; j<grandkids.getLength(); j++) {
                         Node contentNode = grandkids.item(j);
                         nodeName = contentNode.getNodeName();
                         System.out.println("|- " + nodeName);
                         if ("body".equalsIgnoreCase(nodeName)) {
                              System.out.println("   |");
                              NodeList bodyNodes = contentNode.getChildNodes();
                              for (int k=0; k<bodyNodes.getLength(); k++) {
                                   node = bodyNodes.item(k);
                                   System.out.println("   |- " + node.getNodeName());
}

Parsing an HTML document

I want to parse an html document and replace anchor tags with mines on the fly. Can anybody suggest how to do it Please?
Ajay

If your HTML files are not well-formed (chances are with most HTML files) like attribute values are not enclosed in punctuation marks, etc, most XML parsers will fail.
Anand from this forum introduced the JTidy to me and it worked very well. This is a HTML parser that is able to tidy up your HTML codes.

How to parse a html document?

I am trying to parse an html document that I load from a url over the internet. The html is not well formed but thats ok. The problem is the document builder throws an exception because the document is not well formed.
Can I parse a html document using the document builder?
Please note that I set validating to false and the parse still has a fatal errror saying <meta> tag must have a corresponding </meta> tag.
I am using code like the following.....
DocumentBuilderfactory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
DocumentBuilder db = factory.newDocumentBuilder();
doc = db.parse(urlString);

The html is not well formed but thats ok.No, it isn't.
"Validation" means checking that the XML conforms to a schema or a DTD. Don't confuse that with checking whether the XML is well-formed, which means whether it follows the basic rules of XML like opening tags have to have matching closing tags. Which is what your message is telling you -- your file isn't well-formed XML.
So sure, you can parse HTML or anything else with an XML parser, just be prepared to be told it isn't well-formed XML.
If you want to clean up HTML so that it's well-formed XML, there are products like HTMLTidy and JTidy that will do that for you.

Parsing and HTML document

Does any one know how to parse an HTML document with having JEditorPane (in-the-neck) do it for you?

If you want to do a small amount of parsing, then it would make sense to write a custom program as described in the previous response. If, however, you want to do a lot of parsing, then it might make more sense to try to make use of an XML parser. If you are trying to parse html pages that are your own, then you might want to think about transforming them into xhtml so that an XML parser will be able to process them.
XML parsing is easy. I recently developed a web site for a set of mock exams for the Java Programmer Certification. Originally, I started developing the pages in html, but I quickly realized that I would have a hard time managing the exams in that format. I then organized the exam into a set of xml documents--one document for each topic. To publish a set of cross-topic exams, I use JDOM (with the help of SAX)to load all of the questions into the Java Collections Framework where I can easily organized a set of four cross-topic exams. Also, I use JDOM to number the questions and answers before writting the new exams out to a new set of four xml files. Then I use XSLT to transform the four exam.xml documents into eight HTML files--four html files for the questions and four for the answers.
If you would like to take a look at the result, then please use the following link.
http://www.geocities.com/danchisholm2000/
If you own the html files that you want to parse, then I would try to find a way to transform them into valid xml. XHTML might be a good choice.
Dan Chisholm

MSWord embedded in iFrame?

I hope that this question isn't repeated, I can't find the answer for my problem till now.
The idea is that i need to launch a new word file in MSWord embeded in iFrame in html page, then after i allow the user to edit the file. i need to get a reference that word file and upload it to the server. Any help would be appreciated:)

Hi,
Could you please tell me how you accomplished this. I am trying to embed a MsWord document from the server which I embed in the browser using Iframe. This works fine. Now the user can make changes to this document and once hes done on click of a button we need to upload the file back to the server.
Your help is really appreciated.
Thanks

Indexing - one document embedded within another

Greetings,
I am testing KM indexing in terms of examining how Trex handles the indexing of a scenario where you have one document embedded within another document.
In this case I have chosen to embed a PowerPoint (.ppt document) within a Word document (.doc document).
I index the Word document, expecting to therefore also have the PowerPoint document within indexed. When I search for the text within the embedded .ppt document I do not get results and therefore assume that Trex cannot index documents which are embedded in other documents. Please advise if I am correct.
Regards,
Keith

>
Keith Kibuuka wrote:
> Greetings,
>
> I am testing KM indexing in terms of examining how Trex handles the indexing of a scenario where you have one document embedded within another document.
>
> In this case I have chosen to embed a PowerPoint (.ppt document) within a Word document (.doc document).
>
> I index the Word document, expecting to therefore also have the PowerPoint document within indexed. When I search for the text within the embedded .ppt document I do not get results and therefore assume that Trex cannot index documents which are embedded in other documents. Please advise if I am correct.
>
> Regards,
> Keith
Hi Keith,
yes that's exactly correct from my knowledge's point of view.
I had a similar issue with an HTML-document that contained another HTML-document as iframe and this is also not possible (I asked SAP about this, so for this I am sure).
TREX does anyway a transformation for all sort of documents to HTML documents (so also DOCs or PDFs are firstly converted to HTML before they are indexed). And in here TREX never follows any embedded content.

Print html document

Hi, I'm trying to print a html document with this following java code but doesn't works:
public void onActionPrintEncuestaPDF(com.sap.tc.webdynpro.progmodel.api.IWDCustomEvent wdEvent )
    //@@begin onActionPrintEncuestaPDF(ServerEvent)
     try{
          Robot robot = new Robot();
          // Ctrl+P
          robot.keyPress(KeyEvent.VK_CONTROL );
          robot.keyPress(KeyEvent.VK_P );
          robot.keyRelease(KeyEvent.VK_CONTROL);
          robot.keyRelease(KeyEvent.VK_P );
          Thread.sleep(500);
          //Alt+U
          robot.keyPress(KeyEvent.VK_ALT );
          robot.keyPress(KeyEvent.VK_U );
          robot.keyRelease(KeyEvent.VK_ALT );
          robot.keyRelease(KeyEvent.VK_U );
          Thread.sleep(500);
          //entrée
          robot.keyPress(KeyEvent.VK_ENTER);
          robot.keyRelease(KeyEvent.VK_ENTER );
          Thread.sleep(500);
     catch(Exception e){ }
    //@@end
This is not working because java.awt.Robot is a part of AWT package and intended for the client site, how I can make an action that when the
client push the print button. Print the html document automatly.
Regards,
Gabriel

I create into KM Content on portal a file called print.html and I put the following code:
<html>
<head />
<body onLoad="window.parent.focus(); window.parent.print();">
</body>
</html>
Because window.paren.print(); printed the code of actual iFrame. And into Web Dynpro I created a button that call the following function.
public void onActionPrintEncuestaPDF(com.sap.tc.webdynpro.progmodel.api.IWDCustomEvent wdEvent )
    //@@begin onActionPrintEncuestaPDF(ServerEvent)     wdContext.currentPrintElement().setAtrURL("/irj/go/km/docs/documents/print/print.html");
   //@@end

Why can't I make call to parse HTML from inside Thread?

This is driving me crazy. With a defined HTMLEditorKit.ParserCallback object "callback", I am attempting to parse an HTML document retrieved from a URL by using:
new ParserDelegator().parse(new InputStreamReader(url.openStream( )), callback, true);
It doesn't work if I initiate the call in any way from within the run method of a Thread subclass (the way I'd like to do it). If I make the call in the constructor of the Thread subclass, however, it runs fine. I know it must have something to do with the fact that parse runs in a Thread of it's own - but the way to fix it isn't apparent to me.
I would appreciate some words from people who might know what's happening here... THANKS in advance.

Don't bother - figured it out - thanks.

Why can't I make call to parse HTML from inside a Thread?

This is driving me crazy. With a defined HTMLEditorKit.ParserCallback object "callback", I am attempting to parse an HTML document retrieved from a URL by using:
new ParserDelegator().parse(new InputStreamReader(url.openStream( )), callback, true);
It doesn't work if I initiate the call in any way from within the run method of a Thread subclass (the way I'd like to do it). If I make the call in the constructor of the Thread subclass, however, it runs fine. I know it must have something to do with the fact that parse runs in a Thread of it's own - but the way to fix it isn't apparent to me.
I would appreciate some words from people who might know what's happening here... THANKS in advance.

Don't bother - figured it out - thanks.

A year later...same question: How can I get a Pages document embedded ???

How can I get a Pages document embedded into an email? (so that the recipient sees it immediately and does NOT have to open a PDF). Although I know a PDF is universal and would allow anyone (mac or not) to open, read and see my business newsletter, I want it to be instantaneous for viewing. So, which leads me to a question I've kicked around for a year now on this discussion board. How do I turn a Pages doc into html for email purposes? If there isn't a way with Pages, How about with Word (the newest version) for Mac? Someone has to know this art, I receive emails like this all the time, probably from Apple! ha ha .
Please help!
-dnorheim

dnorheim wrote:
How can I get a Pages document embedded into an email? (so that the recipient sees it immediately and does NOT have to open a PDF).
we can't !
Although I know a PDF is universal and would allow anyone (mac or not) to open, read and see my business newsletter, I want it to be instantaneous for viewing.
In Mail.app, single page PDF are directly readable so you may try to
print your page #1 in a PDF
then
print your page #2 in a PDF
and so on.
So, which leads me to a question I've kicked around for a year now on this discussion board. How do I turn a Pages doc into html for email purposes?
we can't !
As Apple removed the export to html feature, I assume that it's no to reintroduce it one day
Yvan KOENIG (VALLAURIS, France) lundi 7 septembre 2009 17:50:32

Parse HTML document embedded in IFRAME

Similar Messages

Maybe you are looking for