HTML DOM Class!

Hello all,
Here is my requirement..,
My program should be able to save the webpage (including images, style sheets )in local drive by taking a URL of any website.
For this, I will get the content by using java.net.URL class. After getting the content, i have to search for any framesets. if framesets are exists, I have to get the content of them also. Next i have to search for any images. After getting the html content and images, i have to replace the image, frameset 'src' attributes in html content point to local drive and store them in local drive.
After getting every thing, if anybody opens that webpage from my local drive, it should not be contacted to the original site. Each and every entity I should store in local drive and repalce the entity names to point to local ones.
Is there any Java Class to achieve this functionality?
For eg: In browser after loading the webpage, JavaScript builds Document Object Model. It automatically creates window, document, location, anchor, img, and form Objects. Then it is very easy for me to chnage the 'src' attribute of img and frame tags.
Waiting for your valuable replay with thousands of eyes..
Thanks in advance,
V.Thandava Krishna.

Hi Thandava,
There is a way to do this, however you would have to resort to using XML technology unless you resort to substringing all your pages. Not fun ;)
Check out the Java XML APIs (Xerces, JAXP, JDOM etc) which should make the task a bit easier.
Cheers,
Anthony

Similar Messages

  • How to send HTML DOM to Servlet?

    How to send HTML DOM to Servlet?

    What exactly you mean by sending DOM to servlet?? if you want to post the entire html to servlet use XMLHttp object and post the entire html to servlet. You can get more info on XMLHttp at microsoft's MSDN site.

  • Custom HTML importer/exporter, how to preserve HTML id, class, style

    I need to implement a custom XHTML importer/exporter and CSSFormatResolver.
    If I for example have a p-element: <p id="p1" class="xyz" style="margin:0.7em; color:#3300FF">lorem ipsum</p>
    How do I map the HTML attributes: id, class and style in order to preserve them in the ParagraphElement in order to use them during the custom CSS cascade process and for custom HTML export purposes (the exporter needs to spit out <p id="p1" class="xyz" style="margin:0.7em;  color:#3300FF">lorem ipsum</p> again.
    1. is this mapping correct or will this interfere with internal TLF formatting:
    HTML attribute
    ParagraphElement property
    id
    id
    class
    styleName
    style
    userStyles
    2. What is the purpose of FlowElement.coreStyles (where are those styles applied)?
    3. What is the actual purpose of FlowElement.userStyles, are those styles just for non TLF (end developer custom) purposes or does TLF use them at any point to set the element format properties?
    4. Any other pointers or related (non flex framwork) examples are welcome
    Thanks.
    Cheers, Benny

    Your mapping looks correct to me for id and styleName.
    I think what you are saying about style mapping to userStyles is correct.  Yes userStyles is an object of key value pairs holding all the non-TLF styles for a FlowElement.  coreStyles holds all the TLF styles for a FlowElement. TLF does use userStyles itself for the linkHoverFormat, linkActiveFormat and linkNormalFormat styles.
    Setting FlowElement.userStyles TLF will replace the current set of userStyles with those in the supplied Object.  That Object will be treated as a dictionary of stylenames and values.
    Normally I'd expect you'd use the FlowElement.setStyle API which figures out if a particular style belongs to coreStyles or userStyles and then sets the new style appropriately without changing the others.
    Richard

  • HTML DOM

    I'm having troubles extracting form tags and their attributes from a html... Wich class should I use for this job?? There is a html parser in JavaSE, right?

    It depends if the html is guaranteed to be well formed. If so you would just use an xml parser, which possible is the one built into the jdk.
    start with either
    SAXParserFactory, or DocumentBuilderFactory
    if it is not guaranteed to be well formed then you will need a third party api.
    You'll have to try different ones to see what fits your needs. Here's a starting point for looking:
    http://java-source.net/open-source/html-parsers

  • Where do i place my html files,classes and Java files

    Can any one plz help me out here. I'm using Tomcat3.2.1 to
    run my servlets, I have written the servlets and gotten
    tomcat to run as well. What confuses me now is where my
    files should be placed.
    - Do i have to create a special directory to contain my html
    files ?
    - another directory for my servlet classes ?
    - how about my .java files ?
    - and how should i call my servlet from the html code ?
    HEEEEEEEEEEEEEEEEEEEEELLLLLLLLLLLLLLPPPPPPPPPPPP!!!!!!!!!!!!!!!

    hi,
    you can my post here http://forum.java.sun.com/thread.jsp?forum=33&thread=302433

  • Can Some recommend a html book/class

    I would like to start learning the html basic's (and I mean
    basic's) :)
    Can anyone recommend a good book or online class.
    Is there an html for dummies book? (Haven't looked anywhere
    yet, I just thought of it when I was making this thread)
    Thanks.

    >"ggrant3" <[email protected]> wrote in
    message
    >news:em90i3$r83$[email protected]..
    >>I would like to start learning the html basic's (and
    I mean basic's) :)
    >>
    >> Can anyone recommend a good book or online class.
    >>
    On Tue, 19 Dec 2006 07:58:07 -0800, "Walt F. Schaefer"
    <[email protected]> wrote:
    >
    http://www.amazon.com/XHTML-Sixth-Visual-Quickstart-Guide/dp/0321430840/sr=1-1/qid=1166543 853/ref=pd_bbs_1/103-7573837-5905435?ie=UTF8&s=books
    >
    >An excellent starting point.
    >
    I too would wholeheartedly recommend this book by Elizabeth
    Castro (
    HTML for the world wide web - with XHTML and CSS. )
    And I see that the sixth edition is now out ( I have 5th ) -
    I've read
    their website to see what changes there are - apparently
    sections on
    frames taken out - and much more on CSS.
    So I will have to think about upgrading to 6th edition.
    Can anyone advice if it would be good value to buy 6th - is
    the new
    material worth the money .
    Malcolm

  • Oddity with JAXP HTML DOM

    Ok heres the idea. For this web app I am supposed to be finding locations in a web page by letting a user "step" his way through. Basically, you highlight a section of text, I take the node you highlighted and walk backwards and up through the tree until I hit the root. Then I turn around and show you all the steps I take. Then I follow the steps all the way back down (its supposed to show people how DOM tree's work).
    To debug this I made a simple little tool that highlights what node we are on and around it shows all its siblings so I can watch the program step its way backwards and forwards through the tree.
    Anyway, the issue is that I can get up to the Body tag easily enough (which because it is unique in the page is as high as I need to climb, I can just use GetTags By name to get the body tag again) but when I try to turn around and go back down the tree is different. When I go up on the top "level" of the tree there is only 2 nodes (HEAD and BODY) and there are 7 nodes in the next level down.
    When I walk down the tree there are 4 nodes in the top (HEAD BODY and 2 #text nodes, and the next level has 9 nodes, themselves plus 2 #text nodes.
    I don't get how this is happening. The page I am going up and down is exactly the same. I am using JAXP and HttpUnit and thats it. The page I am going up and down is saved and does not change at all between passed. How can there be differing numbers of nodes in places? Can anyone shed some light on why this could be happening

    Ok I want to make an addendum to my last post in the hope's someone can tell me what I'm doing wrong. I tried using the exact same file to go up and down (literally) rather than the one that had the highlighted section and getting the other page off its actual web site as I am supposed to do and it seems to work fine. I can only assume that there is something wrong with the way I am getting the file. I am using URLConnection to get a BufferedReader and reading the html file into a string. After I have finished adding whatever needs to ne added (so I know where the user highlighted I add a tag) I write it out to our local server as a temp file in plain text so HttpUnit can find it and build a DOM out of it.
    Is there something about the way I'm getting the html that might damage it somehow? Can anyone suggest a better way?
    Thanks

  • Custom HTML Store Class adobeDPS-Folio description

    Hi there,
    In the DPS portal for each folio we have the description field.
    But I can't access this field programmatically on iPad.
    As you can see where
    http://www.adobe.com/devnet-docs/digitalpublishingsuite/LibraryAndStoreAPI-2.22/symbols/ad obeDPS-Folio.html
    there is no description field.
    But I can see all descriptions for my folios in the XML, so I can get them on desktop, and I do.
    http://edge.adobe-dcfs.com/ddp/issueServer/issues?accountId=261f49c6567d49d186cae9f6c9f921 36&targetDimension=1024x768
    What am I doing wrong?
    Is the description field available via DPS API on iPad?
    Thank you!
    Best regards,
    Andrey

    Hi Neil.
    Thanks for reply.
    But sadly it doesn't work.
    Here you can see all the attributes for the folio (from Web Inspector)
    _processPropertyUpdate: function (newValue, prop) {
    _updateFromJSON: function (json) {
    archive: function () {
    broker: "noChargeStore"
    constructor: function Folio() {
    currentTransactions: Array[0]
    download: function () {
    downloadSize: 0
    filter: null
    folioNumber: "demo"
    getPreviewImage: function (width, height, isPortrait) {
    id: "3bf5e2bc-eaec-4341-b205-7ca53ea40de6"
    isArchivable: false
    isCompatible: true
    isFree: function () {
    isThirdPartyEntitled: false
    isUpdatable: false
    isViewable: false
    previewImageURL: null
    price: "FREE"
    productId: "com.condenast.gqrussia.0313d"
    publicationDate: Wed Feb 20 2013 00:00:00 GMT+0400 (MSK)
    purchase: function () {
    receipt: null
    state: 200
    targetDimensions: "1024x768"
    title: "GQ 03.13"
    toString: function () {
    update: function () {
    updatedSignal: Object
    view: function () {
    __proto__: Object

  • What is a html generation class in UIX?

    There's a site which is used OAF.
    But some pages are not rendered well.
    The problem is caused by "<oa:messageFileUpload>" tag in XML file.
    PC browser renders it properly, but ie mobile 5 couldn't render it well.
    However in html source, it is showed as "<input type=file>" tag, and this html tag can be rendered in ie mobile.
    So maybe UIX has some problems in a part of html code generation.
    I want to modify and apply it.
    Someone can help or advise me?

    Simply Singleton
    Use your singletons wisely
    When is a Singleton not a Singleton?
    Java Glossary : singleton
    How can I implement the Singleton pattern in the Java programming language?
    Singleton Pattern
    Java Singleton
    Using the Singleton Pattern
    Double-checked locking and the Singleton pattern

  • Unable to traverse children of child Nodes for w3c.dom class

    Hi Folks,
    I'm trying to Traverse children of child nodes but unfortunately its not working.
    Here is the snippet of my code..
    String xpath = "//*[starts-with(name(), 'PosRpt')]";
              String xpath1 = "/FIXML/Batch/PosRpt/*";
              try {
                   // Get all matching elements
                   NodeList nodelist = XPathAPI.selectNodeList(doc, xpath);
                   if (nodelist.getLength() < 0)
                        System.out.println("Element not found");
                   else
                             System.out.println("xpath element found");
                   // Process elements in nodelist
                   for (int i=0; i<nodelist.getLength(); i++){
                        Element element = (Element)nodelist.item(i);
                        System.out.println("Element name - " + element.getTagName());
                        GetAttributes(element);
                                            Node currentnode = (Node) nodelist.item(i);
                        NodeList childlist = currentnode.getChildNodes();
                        int xx = childlist.getLength();
                        System.out.println("No of child elements = " + xx);
                        for (int y=0; y<childlist.getLength(); y++){
                             Element element1 = (Element)childlist.item(y);
                             System.out.println(y + " Child Element name - " + element1.getTagName());
                             GetAttributes(element1);
                             if ("Pty".equals(element1.getTagName())&& element1.hasChildNodes()){
    //     THIS IS WHERE I THINK THE PROBLEM IS *****                                                             
                                  Node child = element1.getFirstChild();
                                  System.out.println("Node name - " + child.getNodeName()+
                                            " and node value is = " + child.getNodeValue());
                                  NamedNodeMap attrs = child.getAttributes();
                                  if (child.hasAttributes()){
                                  for(int x= 0; x<attrs.getLength(); x++){
                                       Attr attr = (Attr) attrs.item(x);
                                       System.out.println("attribute Name - " + attr.getNodeName());
                                       System.out.println("attribute value - " + attr.getNodeValue());     
    Sample of XML..snippet
    - <PosRpt RptID="273558126" BizDt="2005-07-13" ReqTyp="0" Ccy="USD">
    <Pty ID="OCC" R="21" />
    - <Pty ID="00299" R="4">
    <Sub ID="C" Typ="26" />
    </Pty>
    and this is what the result snippet is...
    Child Element name - Pty
    attribute Name - ID
    attribute value - 00299
    attribute Name - R
    attribute value - 4
    Node name - #text and node value is =
    WHERE DID NODE #text CAME FROM.
    I'M BAFFLED.
    ANY IDEAS??
    THANKS IN ADVANCE.
    RAJ

    Alas, it is probably working fine.
    What you are seeing in the "extra" text elements are newlines.
    Change your print statement from:
    System.out.println("Node name - " + child.getNodeName()+
    " and node value is = " + child.getNodeValue());to
    System.out.println("Node name - " + child.getNodeName()+
    " and node value is =` " + child.getNodeValue() + "'");You will see something like
    and node value is =`
    If I have
    <a>
    <b>xxx</b>
    </a>
    a will have three child nodes. The first and third will be text nodes with a newline, and the second will be the b child element.
    If I have
    <a><b>xxx</b></a>
    a will only have one child -- the element b.
    Also be aware that if you are processing XML data with SAX and have large
    blocks of text in an element, there is no requirement for all parsers to return a single text object. They are free to decide how many text blocks they create.
    Dave Patterson

  • Interacting with DOM objects using HTML

    Hi,
    I want to listen to "text selection" on a webpage in AS (i.e. as soon as a user highlights some text and releases the mouse button, an event gets fired). Events for cut, copy, and paste are available, but none for "text selection". I have read about cross-scripting between AS and JavaScript and understand it, although I do not understand yet how it can solve my problem. Also, is there any simpler way to do it directly in AS, without getting into JavaScript?
    Thank you very much
    Rehan.

    There's a non-standard onselectstart attribute (or "selectstart" event type) that looks to be the only HTML DOM event dispatched for text selection.
    You can listen for such DOM events from ActionScript like so:
    package {
        import flash.display.Sprite;
        import flash.display.StageAlign;
        import flash.display.StageScaleMode;
        import flash.events.Event;
        import flash.html.HTMLLoader;
        public class HTMLLoadderTest extends Sprite
            private var html:HTMLLoader = new HTMLLoader();
            public function HTMLLoadderTest()
                this.stage.scaleMode = StageScaleMode.NO_SCALE;
                this.stage.align = StageAlign.TOP_LEFT;
                html.loadString("<html><body><p>Something to select.</p></body></html>");
                html.width = this.stage.stageWidth;
                html.height = this.stage.stageHeight;
                html.addEventListener(Event.COMPLETE, completeHandler);
                addChild(html);
                stage.nativeWindow.activate();
            private function completeHandler(event:Event):void {
                event.target.window.document.body.addEventListener("selectstart", reportSelection);
    //Note that you have to use Object as the parameter type because the JavaScript Event class is not the same as the ActionScript Event class
            private function reportSelection( event:Object ):void
                trace(html.window.getSelection());

  • Can I use multiple p class="logos" tag with the same name within the same html page?

    I was told not to use <div class> tags too many times. I was using them for text, images, to clear floats, I basically built my website using multiple <div class> tags. So if I can't use multiple <div class> tags could I use <p class> tags multiple times in the same html page?
    I have a string of logos at the bottom of my webpage which will all be using the same css characteristics for all logos. Would this be the proper way to write the code:
    HTML
    <p class="logos">Logo1<a href="...></a></p>
    <p class="logos">Logo2<a href="...></a></p>
    <p class="logos">Logo3<a href="...></a></p>
    <p class="logos">Logo4<a href="...></a></p>
    <p class="logos">Logo5<a href="...></a></p>
    <p class="logos">Logo6<a href="...></a></p>
    CSS
    .logos {
    margin-left:10px;
    Here's my website: http://www.darbymanufacturing.com/test_website/index.html - this is the website built with all div class tags
    I restarted the website in order to write the code properly so that I don't come to errors when uploading on the server like I am having with the website link above.

    Instead of writing something like this -
    <p class="logos">Logo1<a href="...></a></p>
    <p class="logos">Logo2<a href="...></a></p>
    <p class="logos">Logo3<a href="...></a></p>
    <p class="logos">Logo4<a href="...></a></p>
    <p class="logos">Logo5<a href="...></a></p>
    <p class="logos">Logo6<a href="...></a></p>
    Why not have something like this -
    <div id="logodiv">
    <p>Logo1<a href="...></a></p>
    <p>Logo2<a href="...></a></p>
    <p>Logo3<a href="...></a></p>
    <p>Logo4<a href="...></a></p>
    <p>Logo5<a href="...></a></p>
    <p>Logo6<a href="...></a></p>
    </div>
    with CSS like this -
    #logodiv p { ... }

  • Implementing DOM Interface with existing Java classes

    I had planned on using some tree-like Java classes as a Document Object Model, which would give me access to all sorts of XML and DOM tools like parsers and XSLT transformers. Initially, I thought all that would be neccessary is to implement all the DOM Interfaces in org.w3c.dom and then I would have a set of classes that conformed to DOM Level 1. It was my understanding that interfaces such as DOMImplementation and Document would interface with various XML tools, allowing creation of a class that implements Document and then Document would have its various factory methods that know how to create the various DOM nodes such as Element, Attr, Text, NamedNodeMap, NodeList, etc.
    The problem I'm seeing now is that the JAXP specification (which is what the latest Xerces and Xalan tools conform to) has something called a DocumentBuilder and DocumentBuilderFactory that appear to be necessary to tell the framework what type of class to instantiate that implements the Document DOM interface. Those appear to have a lot of methods that deal with parsing of XML documents and I didn't really want to write or even subclass any existing Parsers in order to get the functionality of traversing and transforming a set of classes that implement the DOM interface.
    Am I missing something here? Is it possible to plug in any (set of classes for) DOMImplementation and get them to work with the various DOM and XML tools out there?
    Is there an easier way to allow parts of an application access to internal data structures but have the more generic tools or APIs, such as XSL transformers, access that same set of classes as a DOM with the generic DOM interface methods?
    Can someone provide me with some guidance here? I'm in the process of finalizing some design on a system and need to know if this is possible or if I need to alter my design.
    Thanks.

    If I understand you correctly, I think I am working on a similar issue. I am unhappy with the methods given by the DOM for retrieving data from the XML file and for building a file. Our software has a bunch of code that uses these classes and it is extremely ugly. My solution was to create a facade on top of the DOM model. Essentially I have some simple classes that store all the pertinent info (for me) about the XML structure. Essentially that is the element or attribute name, its values and in the case of the element, it's children. This makes it easier for me to build and retreive the data. What I then built was a loader class and a builder class. The loader takes an XML file and parses it in using the DOM classes and builds a structure using my classes and returns the root element. The builder takes a root element and creates a DOM object out of it. This frees me of having to code around the DOM classes all over the place and makes it simple to upgrade our XML code if the DOM changes or a better DOM is released. I am using factories to facilitate this and allow me to have loaders for specific types of XML documents so that I can have a class for each which further simplifies the XML related tasks of other developers on my team.

  • A little problem getting the style tag of a html file seperate from rest

    I'm making a program that will take in a URL and then search through that URL for all a, link, embed, frame, and img tags, find their sources, and download them. I also want to search through the style and find anything that uses a URL (ex. background-image:url('somepic.jpg')) and download that file. In the end, you should be able to go to the directory you saved it all in, open index.html, and see an exact replica of the original site. Now, my problem is that my program isn't getting the style tag's contents. Here's my code: import java.io.*;
    import java.util.*;
    import java.net.*;
    public class Test
         //-->>>> MAIN <<<<--//
         public static void main(String...a)
              try{
                   System.out.print("Enter URL: ");
                   String target = new Scanner(System.in).next();
                   URL url = null;
                   try{
                        url = new URL(target);
                   }catch(MalformedURLException x){
                        url = new URL("http://" + target);
                   Scanner scan = new Scanner(url.openStream());
                   scan.useDelimiter("<");
                   ArrayList<String> tokens = new ArrayList<String>();
                   while(scan.hasNext())
                        String str = scan.next();
                        str = str.trim();
                        Scanner tags = new Scanner(str);
                        if(tags.hasNext())
                             String tag = tags.next();
                             if(tag.equalsIgnoreCase("a") || tag.equalsIgnoreCase("img") || tag.equalsIgnoreCase("link") || tag.equalsIgnoreCase("embed") || tag.equalsIgnoreCase("frame"))
                                  tokens.add(str);
                             else if(tag.equalsIgnoreCase("style"))
                                  tokens.add(str);// This isn't adding anything
                   for(String str : tokens)
                        System.out.println(str);
              }catch(UnknownHostException x){
                   System.err.println("Host not found.");
              }catch(Exception x){
                   x.printStackTrace();
         //-->>>> FindURLAttributes <<<<--// <--- Under construction
         private static ArrayList<String> findURLAttributes(String tag)
              ArrayList<String> tokens = new ArrayList<String>();
              tokens.add(tag);
              return tokens;
    }

    I've never tried it, but it seems like using an existing html parser would be a lot easier. I've worked with xml dom parsers, and it's not really that hard. I don't imagine working with an html dom would be too difficult either, at least it wouldn't be as hard as doing it by hand. Google for java html parser and see if any of them suit your needs.

  • HTML to XML Conversion ?

    Developed a content presentation java servlet implmenting xmlparser2.jar classes, works well. We're storing content (in XML) format as blob, then using parser we are able to do the transformation of the xml file to HTML for presentation.
    stream = null;
    String result = null;
    URL URLStream = new URL(xmlIn);
    ByteArrayOutputStream xbaos = new ByteArrayOutputStream();
    if(mStylesheet.startsWith("http"))
    stream = getURLInputStream(mStylesheet);
    else
    stream = new FileInputStream(mStylesheet);
    XSLProcessor processor = new XSLProcessor();
    DOMParser parser = new DOMParser();
    parser.setValidationMode(false);
    parser.setPreserveWhitespace(true);
    parser.parse(in);
    xdoc = parser.getDocument();
    XSLStylesheet xss = new XSLStylesheet(stream, URLStream);
    processor.processXSL(xss, xdoc, xbaos);
    result = xbaos.toString();
    parser.reset();
    return result; -- HTML conversion
    We are evaluating using xslt to convert the XML to a form based medium for content maintenance. Wondering if once a XML document is parsed to HTML (DOM) can it be parsed back to XML for subsequent update to stored value in blob column. Specifically interested in conversion (parser) from HTML to XML
    Simply can HTML (in DOM format validated against a xsd) be transformed back to XML ?

    Do you know of a method in the xdk that takes a well formed HTML doc and using xsd / xslt convert back to original xml spec?
    Because you created (and as long as you create) the HTML from XML it will be well formed (every tag will be ended with an end-tag) and you can therefore transform it back into XML.
    Most times it will not be possible to convert HTML found on the 'internet' into XML because this HTML is not well formed. For example, many people forget to end a paragraph of text within HTML with the </p> tag.
    We are evaluating using xslt to convert the XML to a form based medium for content maintenance. Wondering if once a XML document is parsed to HTML (DOM) can it be parsed back to XML for subsequent update to stored value in blob column. Specifically interested in conversion (parser) from HTML to XML
    Simply can HTML (in DOM format validated against a xsd) be transformed back to XML ?

Maybe you are looking for

  • Infinite loop - A stale JDBC connection was detected in the connection pool

    Hello. I have a simple JSP (no servlets) application with a single Fastlane Reader style view object to back it up. I'm deploying my application under OC4J 10g 9.0.4 using BC4J 9.0.3.11.50 (JDev 9.0.3.3) on RHEL 3.0, j2sdk1.4.2_03. I run with -Djbo.d

  • TREX search failure for some pdf documents

    Hi, TREX search is not getting correct result for some pdf documents. It's not able read the content of some pdf documents. When we search with file name the search result is correct but we are getting "No document excerpt available" message in searc

  • How to increase the size of a report ( width)

    hello Everybody , In my report, I want to increase the width to put additionnal objects but I don't spread the big frame which bounds the report. How must I process to do it (increase the size of report) ? Helps Regards

  • How do i insert a date field?

    I download a date field widget but i can't seem to make it work. Any suggestions?

  • Cluster to data file

    I really like clusters, all through the program, right up to the end where I have to take the data out and write it to a delimited text file. The cluster to array conversion VI works great if your cluster has all of the same data type, but that makes