Parsing the FRAME tag from HTML pages

Hello to everybody,
I am trying to parse the A tags & the Frame tags from HTML pages. I have developed the code below, which works for the A tags but it does not work for the Frame tags. Is there any idea about this?
private void getLinks() throws Exception {
     System.out.println(diskName);
links=new ArrayList();
frames=new ArrayList();
BufferedReader rd = new BufferedReader(new FileReader(diskName));
// Parse the HTML
EditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
doc.putProperty("IgnoreCharsetDirective", new Boolean(true));
try {
     kit.read(rd, doc, 0);
catch (RuntimeException e) {return;}
// Find all the FRAME elements in the HTML document, It finds nothing
     HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.FRAME);
while(it.isValid()) {
SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
String frameSrc = (String)s.getAttribute(HTML.Attribute.SRC);
     frames.add(frameSrc);
// Find all the A elements in the HTML document, it works ok
it = doc.getIterator(HTML.Tag.A);
while (it.isValid()) {
SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
String link = (String)s.getAttribute(HTML.Attribute.HREF);
int endOfSet=it.getEndOffset(),
startOfSet=it.getStartOffset();
String text=doc.getText(startOfSet,endOfSet-startOfSet);
if (link != null)
     links.add(new Link(link,text));
it.next();
}

Hello to everybody,
I am trying to parse the A tags & the Frame tags from HTML pages. I have developed the code below, which works for the A tags but it does not work for the Frame tags. Is there any idea about this?
private void getLinks() throws Exception {
     System.out.println(diskName);
links=new ArrayList();
frames=new ArrayList();
BufferedReader rd = new BufferedReader(new FileReader(diskName));
// Parse the HTML
EditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
doc.putProperty("IgnoreCharsetDirective", new Boolean(true));
try {
     kit.read(rd, doc, 0);
catch (RuntimeException e) {return;}
// Find all the FRAME elements in the HTML document, It finds nothing
     HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.FRAME);
while(it.isValid()) {
SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
String frameSrc = (String)s.getAttribute(HTML.Attribute.SRC);
     frames.add(frameSrc);
// Find all the A elements in the HTML document, it works ok
it = doc.getIterator(HTML.Tag.A);
while (it.isValid()) {
SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
String link = (String)s.getAttribute(HTML.Attribute.HREF);
int endOfSet=it.getEndOffset(),
startOfSet=it.getStartOffset();
String text=doc.getText(startOfSet,endOfSet-startOfSet);
if (link != null)
     links.add(new Link(link,text));
it.next();
}

Similar Messages

  • Let applet out without the frame of a html page?

    Hi,
    I want to run a java applet like this:
    when someone logs into a web page, the applet contained in this web page
    will come out just like a standalone application, rather than run in the frame
    of this web page, how could i get it?
    I appreciate any suggestion!
    Thanks a lot
    michael

    public void init(){
    YourClass application = new YourClass();
    Where YourClass is like that:
    import java.awt.*;
    public class YourClass extends Frame{
    getContentPane().show();
    ...

  • How to retrive the font color, style and size of the copied text from html

    I have requirement, where I need to retrive font size and style of the copied text from html page. Here copied text I mean, the text we select and copy using either windows copy command or using Ctrl+C.
    Please help me to get the solution for this req.
    Thanks in advance,
    Amodnk.

    You can also try this, especially if you've got the Text Inspector and the Color Picker open already.
    Select the text to be colored (note that if the text is already multiple different colors the swatch under Color & Alignment section of the Text Inspector still only shows one out of the several)
    Find the color you want in the Color Picker
    Click and drag from the Color Picker into the swatch under Color & Alignment in the Text Inspector
    That will also change all the selected text to the chosen color.
    Also, regarding web safe colors, that SHOULD come as a part of the Color Picker. With the Color Picker open, select the third icon at the top (If you mouse over it, it should indicate Color Palettes. Click the popup menu button and you should see Web Safe Colors as one of the choices. With this and the Text Inspector open, you can drag and drop your way to identical colors in no time!
    That same drag and drop trick works for text on the slide as well. If you just created a bit of text and you want to apply a color, scroll until you find the color you want, then drag and drop over to the text (it will highlight in blue showing you what you're about to color).

  • Way to remove HTML tags from a page-scoped attribute using JSTL?

    Hi,
    I'm using JSTL 1.2 with Tomcat 6.0.26. Does anyone know of a way to remove HTML tags from a page attribute, "${myExpr}". I would prefer a solution that uses JSTL only, but ultimately whatever gets the job done is fine with me.
    Thanks, - Dave

    I'm sorry, I don't understand your requirement. What do you mean by "remove HTML tags from a page attribute"?
    If you are dealing with a value of an attribute, it is most likely a String, and should be treated as such. The best approach would probably be java coding.

  • Error in parsing the taglib tag in the JSP page

    Hi
    We are trying to deploy and run a Web Application in CE 7.1 SP01. We are successful in deploying and running servlet based web pages, but when it comes to JSP's the taglibs are not parsed and we get the following error message
    Runtime error in processing of the JSP file E:\usr\sap\CE1\J01\j2ee\cluster\apps\sap.com\TestNWEAR\servlet_jsp\TestNW\root\admin\main.jsp.
    The error is: com.sap.engine.services.servlets_jsp.jspparser_api.exception.JspParseException: Error in parsing the taglib tag in the JSP page. Cannot resolve URI: [webwork]. Possible reason - validation failed. Check if your TLD is valid against its scheme.02004C4F4F5000190000004E000013400191D308B45
    Processing HTTP request to servlet [jsp] finished with error.
    The error is: java.io.FileNotFoundException: E:\usr\sap\CE1\J01\j2ee\cluster\apps\sap.com\TestNWEAR\servlet_jsp\TestNW\root\admin\webwork (The system cannot find the file specified)02004C4F4F50001900000051000013400191D308B45AF1AB
    We followed the below weblog to correct the TLD's in JAVA EE 5 @ SAP but it did not work for us.
    /people/community.user/blog/2006/10/13/porting-the-java-blueprint-solutions-catalogue-applications-to-sap-netweaver-application-server-java-ee-5-edition
    Any immediate help will be rewarded with full points
    Thanks in advance
    Lakshmi
    Edited by: lakshmi N Munnungi on May 5, 2008 11:36 PM
    Edited by: lakshmi N Munnungi on May 5, 2008 11:39 PM

    Hi Lakshmi,
    I have also the same problem. If you have found the solution please post it thanks,
    Thanks,
    Tariq

  • Read Text from HTML-Pages and want to solve "ChangedCharSetException"

    Hello,
    I have an app that connect via threads with pages and parse them an gives me only the Text-version of a HTML-page. Works fine, but if it found a page, where the text is within images, than the whole app stopps and gave me the message:
    javax.swing.text.ChangedCharSetException
            at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:169)
            at javax.swing.text.html.parser.Parser.startTag(Parser.java:372)
            at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1846)
            at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1881)
            at javax.swing.text.html.parser.Parser.parse(Parser.java:2047)
            at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:106)
            at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:78)
            at aufruf.main(aufruf.java:33)So I tried to catch them with "getCharSetSpec()" and "keyEqualsCharSet( )" from the class "javax.swing.text.ChangedCharSetException" and hoped that this solved the problem. But still doesen't work...
    Then I looked at the web and found, that I have to add the line:
    doc.putProperty("IgnoreCharsetDirective", new Boolean(true));"doc." is a new HTML Dokument, created with the HTMLEditorKit. I do not have much knowledge about that and so I hope, that someone can explain me, how I can solve that problem, within my code.
    Here we go:
    import javax.swing.text.*;
    import java.lang.*;
    import java.util.*;
    import java.net.*;
    import java.io.*;
    import javax.swing.text.html.*;
    import javax.swing.text.html.parser.*;
    public class myParser extends Thread
            private String name;
            public void run()
                    try
                            URL viele = new URL(name);                       // "name" ia a variable with a lot of links
                    URLConnection hs = viele.openConnection();
                    hs.connect();
                    if (hs.getContentType().startsWith("text/html"))
                            InputStream is = hs.getInputStream();
                            InputStreamReader isr = new InputStreamReader(is);
                            BufferedReader br = new BufferedReader(isr);
                            Lesen los = new Lesen();
                            ParserDelegator parser = new ParserDelegator();
                            parser.parse(br,los, false);
            catch (MalformedURLException e)
                    System.err.print("Doesn't work");
            catch (ChangedCharSetException e)
                    e.getCharSetSpec();
                    e.keyEqualsCharSet();
                    e.printStackTrace();
            catch (Exception o)
            public void vowi(String n)
                    name = n;
    }and for the case that it is important here is the class "Lesen"
    import java.net.*;
    import java.io.*;
    import javax.swing.text.*;
    import javax.swing.text.html.*;
    import javax.swing.text.html.parser.*;
    class Lesen extends HTMLEditorKit.ParserCallback
            public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos)
                    try
                            if ((t==HTML.Tag.P) || (t==HTML.Tag.H1) || (t==HTML.Tag.H2) || (t==HTML.Tag.H3) || (t==HTML.Tag.H4) || (t==HTML.Tag.H5) || (t==HTML.Tag.H6))
                                    System.out.println();
                    catch (Exception q)
                            System.out.println(q.getMessage());
            public void handleSimpleTag(HTML.Tag t,MutableAttributeSet a, int pos)
                    try
                            if (t==HTML.Tag.BR)
                                    System.out.println(); // Neue Zeile
                                    System.out.println();
                    catch (Exception qw)
                            System.out.println(qw.getMessage());
            public void handleText(char[] data, int pos)
                    try
                            System.out.print(data);                                           // prints the text from HTML-pages
                    catch (Exception ab)
                            System.out.println(ab.getMessage());
    }Thanks a lot for helping...
    Stephan

    parser.parse(br,los, false);
    parser.parse(br,los, true);

  • Incorrect functionality of Embedded tag in html page, displaying object on top of all layers of Adobe air application native window

    Title
    Incorrect functionality of embed tag in html loader for adobe air development
    Description
    Problem Description: If we had loaded youtube.com video url in a view stack and navigate to other index of stack or away from we UI screen within the same native window video or embeded tag of flash player or any other embed object will be displayed on top of all screens layer in same position of where the object should be placed in side of html loader only.
    Steps to Reproduce:
    1) go to http://get.straweb.com/StraWebBrowser/StraWebBrowser.air download and install
    2) load 2 tabs and in the 3 tab load video player of youtube.com which will a sample video
    before completing the load of 3 tab which trying to load video from youtube.com, navigate to other tabs 1 or 2 in few second once the flash player of youtube.com video player is load it will display in the current tab or UI screen.
    3) Try navigating to other tabs than youtube.com loaded tab you see the flash player is on top it stable.
    4) navigate to 3 tab and try to navigate to other tab you can observe that so how it will not show that flash player and only visible in that 3 tab which is fine.
    Actual Result: Embedded tag of html page displaying on top of all layers of Adobe air application native window
    Expected Result: Embedded tag of html page should only displayed in side htmlloader
    This you can replicate in any adobe air plugin update and on any Hardware and Environment.
    Applicable to all sdk versions of adobe air.

    Adobe Bugbase: Bug 3823839 Incorrect functionality of embed tag in html loader for adobe air development

  • How to send information from HTML page to JSP without reloading HTML page?

    Hello,
    Is it possible to send information(row number selected by user) from HTML page to JSP without reloading HTML page?
    Thanks.
    Oleg.

    Yes, you can do this with framesets and a hidden frame.
    You need a bit of JavaScritp in the "visible" frame that
    sets the location of the hidden frame to the JSP.
    Add the user's choice as a parameter to the JSP URL.

  • Create accessible pdf from html page

    Hello,
    I am trying to create a 508 compliant Pdf from a simple HTML page using the HTML to Pdf feature in Livecycle ES4 server. I was able to configure the service to generate tabbed Pdf, but the created Pdf has multiple accessibility issues.
    Some of the issues encountered are:  incorrect tab order,  some links are not tabbable while others are,  link text is being read “Blank”,  some text is skipped while tabbing,  missing alt text for images,  page being rendered in the responsive(mobile) view,  etc.
    I have tried both the available approaches, of 1) providing a URL to create the PDF, and 2) to send the html document as a zipped file, with same results.
    Attached is a sample PDF generated from a simple HTML page I created for demo.
    My questions are:
    Is it possible to generate 508 compliant (accessible) Pdf documents using Html to Pdf service from Pdf Generator? If yes, which settings might I be missing?
    Is there any other service provided by Adobe Livecycle Server that can generate 508 compliant Pdf documents from a 508 compliant HTML page?
    Your help is much appreciated.
    Thanks,
    Anup

    I don't think it's possible to do this using a standard JavaScript script in Acrobat, since the newDoc function doesn't work with URLs.
    The only option I can think of is to use an external automator that will call the Create PDF From Web Page dialog, paste the address from a file, and after the PDF file is created will continue to the next line.
    I might be able to create such a tool for you. If you're interested, contact me by email (click my username for the address) or PM.

  • How to include non web pages to the "Create PDF from Web Page" feature?

    In Acrobat Pro (v. 10), when I use the "Create PDF from Web Page" feature, it works great for html pages, but it skips non-html links (doc, pdf, ppt, xls, etc). I need Acrobat Pro to convert those files and put them in the order as well. I don't see an option for this in settings. Is there ANY way I can do this? This is for an archiving purpose and I have 10,000 plus files to convert. Please help.

    This is a question i'm trying to answer too. My issue is that I have a PDF file which itself contains links to both DOC and PDF files. The end result is that I need one consolidated PDF containing all the linked files (in order).
    I can run the "create from web page" on this PDF file, and it'll download them, but not convert them. It just adds them as "jumbled" text to the end of the document. I need it to download, convert, and then append them.
    So, as isunshine3 asked above, any way to have Adobe convert the files that it finds linked when running the "create from web page"?
    THanks
    Matt

  • Acquiring HTML Text From HTML Page

    Just wanting to make a simple html editor. I read the documentation, and yes I know that not all pages form to the specifiecations, but I just want to know how to get all that yummie HTML into text rather than serving up a webpage. I am using the JEditorPane, and I think this answer may lay in the javax.swing.text.*; packages.

    Just trying to make a code editor by allowing the loading of an html page as a text file. Assuming you have the HTML source and now you wish to get the plain text from the HTML source,the following piece of code can help you:
    private String getPlainText(String htmlText)
    String str=null;
    JTextPane text=new JTextPane();
    text.setContentType("text/html");
    text.setText(htmlText);
    try{
      str=text.getDocument().getText(0,text.getDocument().getLength());
    }catch(BadLocationException){e.printStackTrace();}
    return str;
    }

  • XMII Login from Html page

    Hi,
    how can we login in xMII from Html page? for example, if i give username and password in HTML page. that need to directly login in xMII? how can it do?
    - senthil

    Jeremy,
    When I use the following URL to open a specific page, it works.
    http://server/Lighthammer/Login.jsp?IllumLoginName=accountname&IllumLoginPassword=accountpassword&session=true&target=/Test/report.irpt
    Question:
    1) This URL opens the html or irpt page itself directly without the associated xMII navigation/menu and navigation bar. Is it possible to open the page with the associated xMII menu/navigation thro an URL

  • Can i change the file extension from .html to .cfm in Muse?

    Can I change the file extension from .html to .cfm in Muse?  Then I could use my old file names.

    Muse only supports .html as a page extension at this time.
    Thanks,
    Vinayak

  • What is the equivalient tag in uix pages

    hi,
    i handling the exception using struts in uix pages. just i want to know what is the equivalent tag for <html:errors/> in uix.
    any suggestions would be appreciated.
    with rgds
    parameswaran

    hi,
    i am handling the exception handling using struts in uix pages. if the input page is the jsp page i can use the following struts-html tag.
    <html:errors/>
    But i am using the uix page as my input page. any one pls give me the equivalent tag for <html:errors/> in uix.
    any suggestions would be appreciated.
    with rgds
    parameswaran

  • XMII Login from Html page under xMII Version 12

    Hi,
    I found this thread
    xMII Login from Html page
    but I'm not sure this will also work under xMII  Version 12.
    I have now this question:
    Is it possible to use a url login with loginname and password under xMII V12 similar this example for Version 11.5:
    http://server/Lighthammer/Login.jsp?IllumLoginName=accountname&IllumLoginPassword=accountpassword&session=true&target=/Test/report.irpt
    Many thanks in advance

    Hi,
    Has anyone had any luck with this - displaying a v12 MII screen without requiring login?
    We need to be able to do this as well in order to display read-only screens on large screen monitors on the manufacturing floor without requiring login to MII.
    Under v11.5 it worked with no issues.  Under v12, we haven't figured out how to do it yet.
    I've waded through the NetWeaver UME documentation, have searched through the NetWeaver forums, etc. but to this point have had no luck in making it work.
    We've tried enabling the UME Guest account, assigning Guest to the anonymous group and guest role (and xMII Users role), creating a Navigation for the Guest user, but still the NetWeaver login screen is displayed.
    MII experts - if you are aware of how to do this can you please give detailed instructions instead of just referencing the NetWeaver / UME documentation?
    Thank you for your help!

Maybe you are looking for

  • How do I protect intellectual property in my PDF?

    Situation: I have several (qty 50-100) PDF documents that get released to customers as part of their final product.  I need to protect these documents, and prevent others from copying and/or uploading PDFs into CAD programs. One way is to save it as

  • Can not type!

    Hi! Can not type while loged in as specific user. It is possible to type as another user, or when promped to authorize something. Mouse working fine. Combination mouse + cmd/alt/shift etc works fine. No hardware problem. I find out that ALL non-asian

  • Backing up virtual machines

    The absolute best way would be to use Veeam in my opinion. Easy to backup, easy to restore, easy to ship offsite, just plain easy.

  • Armoured Case for Nokia Lumia 720

    Hi I've recently been given a Nokia Lumia 720 to use a a work phone and last week I leaned over a desk and cracked the screen.  My colleague cracked his a couple of weeks ago.  I've been given a replacement, but told if I break it again, I'll have to

  • Increase memory for executable JAR (not -mx)

    My app runs out of memory, but I can't use the -mx flag because I want the Windows user to be able to double click the .jar file to execute it. Is there a way to either specifiy in the jar file, in the manifest, or programmatically to increase the ma