HTML DOM Class!

Hello all,
Here is my requirement..,
My program should be able to save the webpage (including images, style sheets )in local drive by taking a URL of any website.
For this, I will get the content by using java.net.URL class. After getting the content, i have to search for any framesets. if framesets are exists, I have to get the content of them also. Next i have to search for any images. After getting the html content and images, i have to replace the image, frameset 'src' attributes in html content point to local drive and store them in local drive.
After getting every thing, if anybody opens that webpage from my local drive, it should not be contacted to the original site. Each and every entity I should store in local drive and repalce the entity names to point to local ones.
Is there any Java Class to achieve this functionality?
For eg: In browser after loading the webpage, JavaScript builds Document Object Model. It automatically creates window, document, location, anchor, img, and form Objects. Then it is very easy for me to chnage the 'src' attribute of img and frame tags.
Waiting for your valuable replay with thousands of eyes..
Thanks in advance,
V.Thandava Krishna.

Hi Thandava,
There is a way to do this, however you would have to resort to using XML technology unless you resort to substringing all your pages. Not fun ;)
Check out the Java XML APIs (Xerces, JAXP, JDOM etc) which should make the task a bit easier.
Cheers,
Anthony

Similar Messages

How to send HTML DOM to Servlet?

How to send HTML DOM to Servlet?

What exactly you mean by sending DOM to servlet?? if you want to post the entire html to servlet use XMLHttp object and post the entire html to servlet. You can get more info on XMLHttp at microsoft's MSDN site.

Custom HTML importer/exporter, how to preserve HTML id, class, style

I need to implement a custom XHTML importer/exporter and CSSFormatResolver.
If I for example have a p-element: lorem ipsum
How do I map the HTML attributes: id, class and style in order to preserve them in the ParagraphElement in order to use them during the custom CSS cascade process and for custom HTML export purposes (the exporter needs to spit out lorem ipsum again.
1. is this mapping correct or will this interfere with internal TLF formatting:
HTML attribute
ParagraphElement property
id
id
class
styleName
style
userStyles
2. What is the purpose of FlowElement.coreStyles (where are those styles applied)?
3. What is the actual purpose of FlowElement.userStyles, are those styles just for non TLF (end developer custom) purposes or does TLF use them at any point to set the element format properties?
4. Any other pointers or related (non flex framwork) examples are welcome
Thanks.
Cheers, Benny

Your mapping looks correct to me for id and styleName.
I think what you are saying about style mapping to userStyles is correct. Yes userStyles is an object of key value pairs holding all the non-TLF styles for a FlowElement. coreStyles holds all the TLF styles for a FlowElement. TLF does use userStyles itself for the linkHoverFormat, linkActiveFormat and linkNormalFormat styles.
Setting FlowElement.userStyles TLF will replace the current set of userStyles with those in the supplied Object. That Object will be treated as a dictionary of stylenames and values.
Normally I'd expect you'd use the FlowElement.setStyle API which figures out if a particular style belongs to coreStyles or userStyles and then sets the new style appropriately without changing the others.
Richard

HTML DOM

I'm having troubles extracting form tags and their attributes from a html... Wich class should I use for this job?? There is a html parser in JavaSE, right?

It depends if the html is guaranteed to be well formed. If so you would just use an xml parser, which possible is the one built into the jdk.
start with either
SAXParserFactory, or DocumentBuilderFactory
if it is not guaranteed to be well formed then you will need a third party api.
You'll have to try different ones to see what fits your needs. Here's a starting point for looking:
http://java-source.net/open-source/html-parsers

Where do i place my html files,classes and Java files

Can any one plz help me out here. I'm using Tomcat3.2.1 to
run my servlets, I have written the servlets and gotten
tomcat to run as well. What confuses me now is where my
files should be placed.
- Do i have to create a special directory to contain my html
files ?
- another directory for my servlet classes ?
- how about my .java files ?
- and how should i call my servlet from the html code ?
HEEEEEEEEEEEEEEEEEEEEELLLLLLLLLLLLLLPPPPPPPPPPPP!!!!!!!!!!!!!!!

hi,
you can my post here http://forum.java.sun.com/thread.jsp?forum=33&thread=302433

Can Some recommend a html book/class

I would like to start learning the html basic's (and I mean
basic's) :)
Can anyone recommend a good book or online class.
Is there an html for dummies book? (Haven't looked anywhere
yet, I just thought of it when I was making this thread)
Thanks.

>"ggrant3" <[email protected]> wrote in
message
>news:em90i3$r83$[email protected]..
>>I would like to start learning the html basic's (and
I mean basic's) :)
>>
>> Can anyone recommend a good book or online class.
>>
On Tue, 19 Dec 2006 07:58:07 -0800, "Walt F. Schaefer"
<[email protected]> wrote:
>
http://www.amazon.com/XHTML-Sixth-Visual-Quickstart-Guide/dp/0321430840/sr=1-1/qid=1166543 853/ref=pd_bbs_1/103-7573837-5905435?ie=UTF8&s=books
>
>An excellent starting point.
>
I too would wholeheartedly recommend this book by Elizabeth
Castro (
HTML for the world wide web - with XHTML and CSS. )
And I see that the sixth edition is now out ( I have 5th ) -
I've read
their website to see what changes there are - apparently
sections on
frames taken out - and much more on CSS.
So I will have to think about upgrading to 6th edition.
Can anyone advice if it would be good value to buy 6th - is
the new
material worth the money .
Malcolm

Oddity with JAXP HTML DOM

Ok heres the idea. For this web app I am supposed to be finding locations in a web page by letting a user "step" his way through. Basically, you highlight a section of text, I take the node you highlighted and walk backwards and up through the tree until I hit the root. Then I turn around and show you all the steps I take. Then I follow the steps all the way back down (its supposed to show people how DOM tree's work).
To debug this I made a simple little tool that highlights what node we are on and around it shows all its siblings so I can watch the program step its way backwards and forwards through the tree.
Anyway, the issue is that I can get up to the Body tag easily enough (which because it is unique in the page is as high as I need to climb, I can just use GetTags By name to get the body tag again) but when I try to turn around and go back down the tree is different. When I go up on the top "level" of the tree there is only 2 nodes (HEAD and BODY) and there are 7 nodes in the next level down.
When I walk down the tree there are 4 nodes in the top (HEAD BODY and 2 #text nodes, and the next level has 9 nodes, themselves plus 2 #text nodes.
I don't get how this is happening. The page I am going up and down is exactly the same. I am using JAXP and HttpUnit and thats it. The page I am going up and down is saved and does not change at all between passed. How can there be differing numbers of nodes in places? Can anyone shed some light on why this could be happening

Ok I want to make an addendum to my last post in the hope's someone can tell me what I'm doing wrong. I tried using the exact same file to go up and down (literally) rather than the one that had the highlighted section and getting the other page off its actual web site as I am supposed to do and it seems to work fine. I can only assume that there is something wrong with the way I am getting the file. I am using URLConnection to get a BufferedReader and reading the html file into a string. After I have finished adding whatever needs to ne added (so I know where the user highlighted I add a tag) I write it out to our local server as a temp file in plain text so HttpUnit can find it and build a DOM out of it.
Is there something about the way I'm getting the html that might damage it somehow? Can anyone suggest a better way?
Thanks

Custom HTML Store Class adobeDPS-Folio description

Hi there,
In the DPS portal for each folio we have the description field.
But I can't access this field programmatically on iPad.
As you can see where
http://www.adobe.com/devnet-docs/digitalpublishingsuite/LibraryAndStoreAPI-2.22/symbols/ad obeDPS-Folio.html
there is no description field.
But I can see all descriptions for my folios in the XML, so I can get them on desktop, and I do.
http://edge.adobe-dcfs.com/ddp/issueServer/issues?accountId=261f49c6567d49d186cae9f6c9f921 36&targetDimension=1024x768
What am I doing wrong?
Is the description field available via DPS API on iPad?
Thank you!
Best regards,
Andrey

Hi Neil.
Thanks for reply.
But sadly it doesn't work.
Here you can see all the attributes for the folio (from Web Inspector)
_processPropertyUpdate: function (newValue, prop) {
_updateFromJSON: function (json) {
archive: function () {
broker: "noChargeStore"
constructor: function Folio() {
currentTransactions: Array[0]
download: function () {
downloadSize: 0
filter: null
folioNumber: "demo"
getPreviewImage: function (width, height, isPortrait) {
id: "3bf5e2bc-eaec-4341-b205-7ca53ea40de6"
isArchivable: false
isCompatible: true
isFree: function () {
isThirdPartyEntitled: false
isUpdatable: false
isViewable: false
previewImageURL: null
price: "FREE"
productId: "com.condenast.gqrussia.0313d"
publicationDate: Wed Feb 20 2013 00:00:00 GMT+0400 (MSK)
purchase: function () {
receipt: null
state: 200
targetDimensions: "1024x768"
title: "GQ 03.13"
toString: function () {
update: function () {
updatedSignal: Object
view: function () {
__proto__: Object

What is a html generation class in UIX?

There's a site which is used OAF.
But some pages are not rendered well.
The problem is caused by "<oa:messageFileUpload>" tag in XML file.
PC browser renders it properly, but ie mobile 5 couldn't render it well.
However in html source, it is showed as "<input type=file>" tag, and this html tag can be rendered in ie mobile.
So maybe UIX has some problems in a part of html code generation.
I want to modify and apply it.
Someone can help or advise me?

Simply Singleton
Use your singletons wisely
When is a Singleton not a Singleton?
Java Glossary : singleton
How can I implement the Singleton pattern in the Java programming language?
Singleton Pattern
Java Singleton
Using the Singleton Pattern
Double-checked locking and the Singleton pattern

Unable to traverse children of child Nodes for w3c.dom class

Hi Folks,
I'm trying to Traverse children of child nodes but unfortunately its not working.
Here is the snippet of my code..
String xpath = "//*[starts-with(name(), 'PosRpt')]";
 String xpath1 = "/FIXML/Batch/PosRpt/*";
 try {
 // Get all matching elements
 NodeList nodelist = XPathAPI.selectNodeList(doc, xpath);
 if (nodelist.getLength() < 0)
 System.out.println("Element not found");
 else
 System.out.println("xpath element found");
 // Process elements in nodelist
 for (int i=0; i<nodelist.getLength(); i++){
 Element element = (Element)nodelist.item(i);
 System.out.println("Element name - " + element.getTagName());
 GetAttributes(element);
 Node currentnode = (Node) nodelist.item(i);
 NodeList childlist = currentnode.getChildNodes();
 int xx = childlist.getLength();
 System.out.println("No of child elements = " + xx);
 for (int y=0; y<childlist.getLength(); y++){
 Element element1 = (Element)childlist.item(y);
 System.out.println(y + " Child Element name - " + element1.getTagName());
 GetAttributes(element1);
 if ("Pty".equals(element1.getTagName())&& element1.hasChildNodes()){
// THIS IS WHERE I THINK THE PROBLEM IS *****
 Node child = element1.getFirstChild();
 System.out.println("Node name - " + child.getNodeName()+
 " and node value is = " + child.getNodeValue());
 NamedNodeMap attrs = child.getAttributes();
 if (child.hasAttributes()){
 for(int x= 0; x<attrs.getLength(); x++){
 Attr attr = (Attr) attrs.item(x);
 System.out.println("attribute Name - " + attr.getNodeName());
 System.out.println("attribute value - " + attr.getNodeValue());
Sample of XML..snippet
- <PosRpt RptID="273558126" BizDt="2005-07-13" ReqTyp="0" Ccy="USD">
<Pty ID="OCC" R="21" />
- <Pty ID="00299" R="4">

</Pty>
and this is what the result snippet is...
Child Element name - Pty
attribute Name - ID
attribute value - 00299
attribute Name - R
attribute value - 4
Node name - #text and node value is =
WHERE DID NODE #text CAME FROM.
I'M BAFFLED.
ANY IDEAS??
THANKS IN ADVANCE.
RAJ

Alas, it is probably working fine.
What you are seeing in the "extra" text elements are newlines.
Change your print statement from:
System.out.println("Node name - " + child.getNodeName()+
" and node value is = " + child.getNodeValue());to
System.out.println("Node name - " + child.getNodeName()+
" and node value is =` " + child.getNodeValue() + "'");You will see something like
and node value is =`
If I have
<a>
xxx
</a>
a will have three child nodes. The first and third will be text nodes with a newline, and the second will be the b child element.
If I have
<a>xxx</a>
a will only have one child -- the element b.
Also be aware that if you are processing XML data with SAX and have large
blocks of text in an element, there is no requirement for all parsers to return a single text object. They are free to decide how many text blocks they create.
Dave Patterson

Interacting with DOM objects using HTML

Hi,
I want to listen to "text selection" on a webpage in AS (i.e. as soon as a user highlights some text and releases the mouse button, an event gets fired). Events for cut, copy, and paste are available, but none for "text selection". I have read about cross-scripting between AS and JavaScript and understand it, although I do not understand yet how it can solve my problem. Also, is there any simpler way to do it directly in AS, without getting into JavaScript?
Thank you very much
Rehan.

There's a non-standard onselectstart attribute (or "selectstart" event type) that looks to be the only HTML DOM event dispatched for text selection.
You can listen for such DOM events from ActionScript like so:
package {
 import flash.display.Sprite;
 import flash.display.StageAlign;
 import flash.display.StageScaleMode;
 import flash.events.Event;
 import flash.html.HTMLLoader;
 public class HTMLLoadderTest extends Sprite
 private var html:HTMLLoader = new HTMLLoader();
 public function HTMLLoadderTest()
 this.stage.scaleMode = StageScaleMode.NO_SCALE;
 this.stage.align = StageAlign.TOP_LEFT;
 html.loadString("<html><body>Something to select.</body></html>");
 html.width = this.stage.stageWidth;
 html.height = this.stage.stageHeight;
 html.addEventListener(Event.COMPLETE, completeHandler);
 addChild(html);
 stage.nativeWindow.activate();
 private function completeHandler(event:Event):void {
 event.target.window.document.body.addEventListener("selectstart", reportSelection);
//Note that you have to use Object as the parameter type because the JavaScript Event class is not the same as the ActionScript Event class
 private function reportSelection( event:Object ):void
 trace(html.window.getSelection());

Can I use multiple p class="logos" tag with the same name within the same html page?

I was told not to use <div class> tags too many times. I was using them for text, images, to clear floats, I basically built my website using multiple <div class> tags. So if I can't use multiple <div class> tags could I use tags multiple times in the same html page?
I have a string of logos at the bottom of my webpage which will all be using the same css characteristics for all logos. Would this be the proper way to write the code:
HTML
Logo1<a href="...></a>
Logo2<a href="...></a>
Logo3<a href="...></a>
Logo4<a href="...></a>
Logo5<a href="...></a>
Logo6<a href="...></a>
CSS
.logos {
margin-left:10px;
Here's my website: http://www.darbymanufacturing.com/test_website/index.html - this is the website built with all div class tags
I restarted the website in order to write the code properly so that I don't come to errors when uploading on the server like I am having with the website link above.

Instead of writing something like this -
Logo1<a href="...></a>
Logo2<a href="...></a>
Logo3<a href="...></a>
Logo4<a href="...></a>
Logo5<a href="...></a>
Logo6<a href="...></a>
Why not have something like this -
<div id="logodiv">
Logo1<a href="...></a>
Logo2<a href="...></a>
Logo3<a href="...></a>
Logo4<a href="...></a>
Logo5<a href="...></a>
Logo6<a href="...></a>
</div>
with CSS like this -
#logodiv p { ... }

Implementing DOM Interface with existing Java classes

I had planned on using some tree-like Java classes as a Document Object Model, which would give me access to all sorts of XML and DOM tools like parsers and XSLT transformers. Initially, I thought all that would be neccessary is to implement all the DOM Interfaces in org.w3c.dom and then I would have a set of classes that conformed to DOM Level 1. It was my understanding that interfaces such as DOMImplementation and Document would interface with various XML tools, allowing creation of a class that implements Document and then Document would have its various factory methods that know how to create the various DOM nodes such as Element, Attr, Text, NamedNodeMap, NodeList, etc.
The problem I'm seeing now is that the JAXP specification (which is what the latest Xerces and Xalan tools conform to) has something called a DocumentBuilder and DocumentBuilderFactory that appear to be necessary to tell the framework what type of class to instantiate that implements the Document DOM interface. Those appear to have a lot of methods that deal with parsing of XML documents and I didn't really want to write or even subclass any existing Parsers in order to get the functionality of traversing and transforming a set of classes that implement the DOM interface.
Am I missing something here? Is it possible to plug in any (set of classes for) DOMImplementation and get them to work with the various DOM and XML tools out there?
Is there an easier way to allow parts of an application access to internal data structures but have the more generic tools or APIs, such as XSL transformers, access that same set of classes as a DOM with the generic DOM interface methods?
Can someone provide me with some guidance here? I'm in the process of finalizing some design on a system and need to know if this is possible or if I need to alter my design.
Thanks.

If I understand you correctly, I think I am working on a similar issue. I am unhappy with the methods given by the DOM for retrieving data from the XML file and for building a file. Our software has a bunch of code that uses these classes and it is extremely ugly. My solution was to create a facade on top of the DOM model. Essentially I have some simple classes that store all the pertinent info (for me) about the XML structure. Essentially that is the element or attribute name, its values and in the case of the element, it's children. This makes it easier for me to build and retreive the data. What I then built was a loader class and a builder class. The loader takes an XML file and parses it in using the DOM classes and builds a structure using my classes and returns the root element. The builder takes a root element and creates a DOM object out of it. This frees me of having to code around the DOM classes all over the place and makes it simple to upgrade our XML code if the DOM changes or a better DOM is released. I am using factories to facilitate this and allow me to have loaders for specific types of XML documents so that I can have a class for each which further simplifies the XML related tasks of other developers on my team.

A little problem getting the style tag of a html file seperate from rest

I'm making a program that will take in a URL and then search through that URL for all a, link, embed, frame, and img tags, find their sources, and download them. I also want to search through the style and find anything that uses a URL (ex. background-image:url('somepic.jpg')) and download that file. In the end, you should be able to go to the directory you saved it all in, open index.html, and see an exact replica of the original site. Now, my problem is that my program isn't getting the style tag's contents. Here's my code: import java.io.*;
import java.util.*;
import java.net.*;
public class Test
 //-->>>> MAIN <<<<--//
 public static void main(String...a)
 try{
 System.out.print("Enter URL: ");
 String target = new Scanner(System.in).next();
 URL url = null;
 try{
 url = new URL(target);
 }catch(MalformedURLException x){
 url = new URL("http://" + target);
 Scanner scan = new Scanner(url.openStream());
 scan.useDelimiter("<");
 ArrayList<String> tokens = new ArrayList<String>();
 while(scan.hasNext())
 String str = scan.next();
 str = str.trim();
 Scanner tags = new Scanner(str);
 if(tags.hasNext())
 String tag = tags.next();
 if(tag.equalsIgnoreCase("a") || tag.equalsIgnoreCase("img") || tag.equalsIgnoreCase("link") || tag.equalsIgnoreCase("embed") || tag.equalsIgnoreCase("frame"))
 tokens.add(str);
 else if(tag.equalsIgnoreCase("style"))
 tokens.add(str);// This isn't adding anything
 for(String str : tokens)
 System.out.println(str);
 }catch(UnknownHostException x){
 System.err.println("Host not found.");
 }catch(Exception x){
 x.printStackTrace();
 //-->>>> FindURLAttributes <<<<--// <--- Under construction
 private static ArrayList<String> findURLAttributes(String tag)
 ArrayList<String> tokens = new ArrayList<String>();
 tokens.add(tag);
 return tokens;
}

I've never tried it, but it seems like using an existing html parser would be a lot easier. I've worked with xml dom parsers, and it's not really that hard. I don't imagine working with an html dom would be too difficult either, at least it wouldn't be as hard as doing it by hand. Google for java html parser and see if any of them suit your needs.

HTML to XML Conversion ?

Developed a content presentation java servlet implmenting xmlparser2.jar classes, works well. We're storing content (in XML) format as blob, then using parser we are able to do the transformation of the xml file to HTML for presentation.
stream = null;
String result = null;
URL URLStream = new URL(xmlIn);
ByteArrayOutputStream xbaos = new ByteArrayOutputStream();
if(mStylesheet.startsWith("http"))
stream = getURLInputStream(mStylesheet);
else
stream = new FileInputStream(mStylesheet);
XSLProcessor processor = new XSLProcessor();
DOMParser parser = new DOMParser();
parser.setValidationMode(false);
parser.setPreserveWhitespace(true);
parser.parse(in);
xdoc = parser.getDocument();
XSLStylesheet xss = new XSLStylesheet(stream, URLStream);
processor.processXSL(xss, xdoc, xbaos);
result = xbaos.toString();
parser.reset();
return result; -- HTML conversion
We are evaluating using xslt to convert the XML to a form based medium for content maintenance. Wondering if once a XML document is parsed to HTML (DOM) can it be parsed back to XML for subsequent update to stored value in blob column. Specifically interested in conversion (parser) from HTML to XML
Simply can HTML (in DOM format validated against a xsd) be transformed back to XML ?

Do you know of a method in the xdk that takes a well formed HTML doc and using xsd / xslt convert back to original xml spec?
Because you created (and as long as you create) the HTML from XML it will be well formed (every tag will be ended with an end-tag) and you can therefore transform it back into XML.
Most times it will not be possible to convert HTML found on the 'internet' into XML because this HTML is not well formed. For example, many people forget to end a paragraph of text within HTML with the tag.
We are evaluating using xslt to convert the XML to a form based medium for content maintenance. Wondering if once a XML document is parsed to HTML (DOM) can it be parsed back to XML for subsequent update to stored value in blob column. Specifically interested in conversion (parser) from HTML to XML
Simply can HTML (in DOM format validated against a xsd) be transformed back to XML ?

HTML DOM Class!

Similar Messages

Maybe you are looking for