Get content from html page
Hey guys, Im looking at accessing a webpage, downloading the content then stripping out the parts i want.
http://sunsolve.sun.com/search/document.do?assetkey=1-34-9-1
For example, I would like to be left with just the patches and their information, not the heading and intro. Where should i start?
here is some class that can read an URL:import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection;
import java.io.OutputStream;
import java.net.URLEncoder;
public class test {
public static void main(String args[]) {
if(args.length!=0){
new test(args);
new test();
public test() {
this.openURL("http://www.google.com",null);
public test(String[] args){
int i = 0;
while(i<args.length){
// do something with the encoding, I am assuing utf-8
// but the openURL method can check the header for you
try{
System.out.println(new String(this.openURL(args,null),"UTF-8"));
}catch(Exception e){
e.printStackTrace();
i++;
public byte[] openURL(String urlpath,URL u) {
// it is VERRY importaint to read the entire response
// if you want to connect to the same server again
// this is because closing the inputstream does not close the socket
// and response data from a previous request could be mixed up with the current
InputStream is;
OutputStream os;
byte[] buf = new byte[1024];
URLConnection urlc = null;
try {
URL a = null;
if(u!=null){
a = u;
}else{
a = new URL(urlpath);
urlc = a.openConnection();
urlc.setDoOutput(false);
// either setDoOutput to false or Post some info
// os = urlc.getOutputStream();
// String name = "key="+URLEncoder.encode("value", "UTF-8");
// os.write(name.getBytes("UTF-8"));
// os.close();
is = urlc.getInputStream();
int len = 0;
ByteArrayOutputStream bos = new ByteArrayOutputStream();
while ((len = is.read(buf)) > 0) {
bos.write(buf, 0, len);
// close the inputstream
is.close();
return bos.toByteArray();
} catch (Exception e) {
e.printStackTrace();
try {
// now failing to read the inputstream does not mean the server did not send
// any data, here is how you can read that data, this is needed for the same
// reason mentioned above.
((HttpURLConnection) urlc).getResponseCode();
InputStream es = ((HttpURLConnection) urlc).getErrorStream();
int ret = 0;
// read the response body
while ((ret = es.read(buf)) > 0) {
// close the errorstream
es.close();
} catch (IOException ex) {
ex.printStackTrace();
// deal with the exception
return new byte[0];
Here is some code to set a proxy// IF YOUR PROXY NEEDS AUTHENTICATION
//The base64encoder is part of the w3c tools
//download jigsaw and look for the base64,,, file
//http://www.google.nl/search?hl=nl&q=site%3Aw3c.org+jigsaw&lr=
//compiled it and put it in [jre home]/lib/ext
//put this jar file in the classpath when you compile
String proxyUrl = "myproxy";
String user = "myUser";
String password = "myPassword";
URLConnection conn = url.openConnection();
if(proxyUrl!=null){
System.getProperties().put( "proxySet", "true" );
System.getProperties().put( "proxyHost", proxyUrl );
System.getProperties().put( "proxyPort", "80" );
pwd = user + ":" + password;
Base64Encoder enc = new Base64Encoder(password);
encodedPassword = enc.processString() ;
// optional
conn.setRequestProperty( "Proxy-Authorization", encodedPassword );
// start opening output or inputstream on the connection
Similar Messages
-
Get parameters from html page from java application standalone ...
Hi all,
I work in one solution that i have values in Html Page and i want get the parameters values from html and cath they in java application standalone.
The Html page is in same host than de java application.
I want know if this is possible. I wnat know if without HttpServlet i can get the parameters from Html Page pure.
Thanks in Advance for the ideas,
Antonio.Hi Abdul,
The problem is my client want one solution where i have one page simple page Html and one application java standalone. This application runs in one machine, but we don't have web server. So the question is: Is possible without web server i can get the parameters values that is inside the html page from java application. I remember you that the application java is one .jar that run's with one command line from crontab "java -jar teste.jar". -
Help with getting links from HTML page
Hello all. I found the sun tutorial for getting HREF values from a tags in an HTML document at <http://java.sun.com/developer/TechTips/1999/tt0923.html>. My question now is how would a person add the ability to get the text of the link to this code?
For example:
Provided the HTML code:<a href="link.html">example</a>Returned is: href=link.html text=exampleI think the TechTip you've linked too is quite old (1999). I would write a simple SAXParser that uses TagSoup (http://www.ccil.org/~cowan/XML/tagsoup/) as its input source. In your handler, simply set a flag and reset a StringBuffer to collect the contents of any <a>...</a> element. Simplified:
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if ("a".equals(localName)) {
currentHref = attributes.getValue("href");
if (currentHref != null && currentHref.length() > 0) {
inLink = true;
//reset the string buffer
buffer.setLength(0);
public void characters(char[] ch, int start, int length) throws SAXException {
if (inLink) buf.append(ch, start, length);
public void endElement(String uri, String localName, String qName) throws SAXException {
if ("a".equals(localName) && inLink) {
inLink = false;
//add link to the stack
links.add(new Link(currentHref, buffer.toString()));
}Completely untested, of course... .Good luck... -
Values not getting displayed from first page of the report.
Values in the report is getting displayed from second page.
First page in the report only displaying the report title and column names.
Secone page onwards, data and column names are generated.
Can any one please help me, with the cause of the problem.what reporting tool?
Interactive Reporting
Financial Reporting -
how do you get content from one ipod to the new one? my content is on an external hard drive not on my pc and i have run out of space on my 120gb classic. can you get old ipod content to new? my itunes has only got short cuts, the real content is on an external drive? can this be done?? please help
If the content is on an external drive, but your library knows where to find it, then it should all work. Connect your device, make some selections for what to put on it, and sync. If, on the other hand, your current iPod is the only place holding some of your media then see this user tip: Recover your iTunes library from your iPod or iOS device.
tt2 -
How can I get content from my iPad and my air book to show up on the tv screen using Apple TV, without going thru iTunes?
You will need to use AirPlay to see that.
Assuming both devices are on the same network and that AirPlay is not turned off on the Apple TV, then simply tap on the screen when you are watching content you wish to stream to your Apple TV, then tap the airplay icon that appears in the control bar, choose the Apple TV from the menu that appears.
When displaying the content you wish to mirror on the iPad 2 (or better), iPad Mini, iPhone 4S (or better), double tap the home button (quickly) and swipe the bottom row of apps to the right to reveal the playback controls, tap the AirPlay icon and select your Apple TV from the list of available devices. -
Read Text from HTML-Pages and want to solve "ChangedCharSetException"
Hello,
I have an app that connect via threads with pages and parse them an gives me only the Text-version of a HTML-page. Works fine, but if it found a page, where the text is within images, than the whole app stopps and gave me the message:
javax.swing.text.ChangedCharSetException
at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:169)
at javax.swing.text.html.parser.Parser.startTag(Parser.java:372)
at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1846)
at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1881)
at javax.swing.text.html.parser.Parser.parse(Parser.java:2047)
at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:106)
at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:78)
at aufruf.main(aufruf.java:33)So I tried to catch them with "getCharSetSpec()" and "keyEqualsCharSet( )" from the class "javax.swing.text.ChangedCharSetException" and hoped that this solved the problem. But still doesen't work...
Then I looked at the web and found, that I have to add the line:
doc.putProperty("IgnoreCharsetDirective", new Boolean(true));"doc." is a new HTML Dokument, created with the HTMLEditorKit. I do not have much knowledge about that and so I hope, that someone can explain me, how I can solve that problem, within my code.
Here we go:
import javax.swing.text.*;
import java.lang.*;
import java.util.*;
import java.net.*;
import java.io.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
public class myParser extends Thread
private String name;
public void run()
try
URL viele = new URL(name); // "name" ia a variable with a lot of links
URLConnection hs = viele.openConnection();
hs.connect();
if (hs.getContentType().startsWith("text/html"))
InputStream is = hs.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
Lesen los = new Lesen();
ParserDelegator parser = new ParserDelegator();
parser.parse(br,los, false);
catch (MalformedURLException e)
System.err.print("Doesn't work");
catch (ChangedCharSetException e)
e.getCharSetSpec();
e.keyEqualsCharSet();
e.printStackTrace();
catch (Exception o)
public void vowi(String n)
name = n;
}and for the case that it is important here is the class "Lesen"
import java.net.*;
import java.io.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
class Lesen extends HTMLEditorKit.ParserCallback
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos)
try
if ((t==HTML.Tag.P) || (t==HTML.Tag.H1) || (t==HTML.Tag.H2) || (t==HTML.Tag.H3) || (t==HTML.Tag.H4) || (t==HTML.Tag.H5) || (t==HTML.Tag.H6))
System.out.println();
catch (Exception q)
System.out.println(q.getMessage());
public void handleSimpleTag(HTML.Tag t,MutableAttributeSet a, int pos)
try
if (t==HTML.Tag.BR)
System.out.println(); // Neue Zeile
System.out.println();
catch (Exception qw)
System.out.println(qw.getMessage());
public void handleText(char[] data, int pos)
try
System.out.print(data); // prints the text from HTML-pages
catch (Exception ab)
System.out.println(ab.getMessage());
}Thanks a lot for helping...
Stephanparser.parse(br,los, false);
parser.parse(br,los, true); -
Parsing the FRAME tag from HTML pages
Hello to everybody,
I am trying to parse the A tags & the Frame tags from HTML pages. I have developed the code below, which works for the A tags but it does not work for the Frame tags. Is there any idea about this?
private void getLinks() throws Exception {
System.out.println(diskName);
links=new ArrayList();
frames=new ArrayList();
BufferedReader rd = new BufferedReader(new FileReader(diskName));
// Parse the HTML
EditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
doc.putProperty("IgnoreCharsetDirective", new Boolean(true));
try {
kit.read(rd, doc, 0);
catch (RuntimeException e) {return;}
// Find all the FRAME elements in the HTML document, It finds nothing
HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.FRAME);
while(it.isValid()) {
SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
String frameSrc = (String)s.getAttribute(HTML.Attribute.SRC);
frames.add(frameSrc);
// Find all the A elements in the HTML document, it works ok
it = doc.getIterator(HTML.Tag.A);
while (it.isValid()) {
SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
String link = (String)s.getAttribute(HTML.Attribute.HREF);
int endOfSet=it.getEndOffset(),
startOfSet=it.getStartOffset();
String text=doc.getText(startOfSet,endOfSet-startOfSet);
if (link != null)
links.add(new Link(link,text));
it.next();
}Hello to everybody,
I am trying to parse the A tags & the Frame tags from HTML pages. I have developed the code below, which works for the A tags but it does not work for the Frame tags. Is there any idea about this?
private void getLinks() throws Exception {
System.out.println(diskName);
links=new ArrayList();
frames=new ArrayList();
BufferedReader rd = new BufferedReader(new FileReader(diskName));
// Parse the HTML
EditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
doc.putProperty("IgnoreCharsetDirective", new Boolean(true));
try {
kit.read(rd, doc, 0);
catch (RuntimeException e) {return;}
// Find all the FRAME elements in the HTML document, It finds nothing
HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.FRAME);
while(it.isValid()) {
SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
String frameSrc = (String)s.getAttribute(HTML.Attribute.SRC);
frames.add(frameSrc);
// Find all the A elements in the HTML document, it works ok
it = doc.getIterator(HTML.Tag.A);
while (it.isValid()) {
SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
String link = (String)s.getAttribute(HTML.Attribute.HREF);
int endOfSet=it.getEndOffset(),
startOfSet=it.getStartOffset();
String text=doc.getText(startOfSet,endOfSet-startOfSet);
if (link != null)
links.add(new Link(link,text));
it.next();
} -
Hi,
how can we login in xMII from Html page? for example, if i give username and password in HTML page. that need to directly login in xMII? how can it do?
- senthilJeremy,
When I use the following URL to open a specific page, it works.
http://server/Lighthammer/Login.jsp?IllumLoginName=accountname&IllumLoginPassword=accountpassword&session=true&target=/Test/report.irpt
Question:
1) This URL opens the html or irpt page itself directly without the associated xMII navigation/menu and navigation bar. Is it possible to open the page with the associated xMII menu/navigation thro an URL -
XMII Login from Html page under xMII Version 12
Hi,
I found this thread
xMII Login from Html page
but I'm not sure this will also work under xMII Version 12.
I have now this question:
Is it possible to use a url login with loginname and password under xMII V12 similar this example for Version 11.5:
http://server/Lighthammer/Login.jsp?IllumLoginName=accountname&IllumLoginPassword=accountpassword&session=true&target=/Test/report.irpt
Many thanks in advanceHi,
Has anyone had any luck with this - displaying a v12 MII screen without requiring login?
We need to be able to do this as well in order to display read-only screens on large screen monitors on the manufacturing floor without requiring login to MII.
Under v11.5 it worked with no issues. Under v12, we haven't figured out how to do it yet.
I've waded through the NetWeaver UME documentation, have searched through the NetWeaver forums, etc. but to this point have had no luck in making it work.
We've tried enabling the UME Guest account, assigning Guest to the anonymous group and guest role (and xMII Users role), creating a Navigation for the Guest user, but still the NetWeaver login screen is displayed.
MII experts - if you are aware of how to do this can you please give detailed instructions instead of just referencing the NetWeaver / UME documentation?
Thank you for your help! -
How do I get content from my old iPod Touch to my new iPhone 5? My iPod Touch is too old to use iCloud. I have activated the phone but am stick at the point of the set-up where my 3 choices are:
1. Set up as new phone
2. Restore from icloud Backup
3. Restore from iTunes Backup
thanks!If your iPod touch music is in your iTunes library, you just need to sync the new iPod nano with iTunes, the songs will be transfer to your new iPod. If not, you can follow this guide to transfer the songs from iPod touch to iTunes first, and then re-sync them to your new iPod. Hope it helps. Feel free to email me if you need further help.
-
Need to Copy a subform along with content from one page to another page
Hi All,
I am new to Adobe Live Cycle .
I am facing a particular problem in one scenario.
I have a growing list of item i.e the number of Items are uncertain. I have put all these item in a sub form.
Now I need a copy of this sub form from the First page to the 2nd Page.
Basically , I want to copy a Subform along with the content from one page to another.
Can anybody please help me.In source project open Tempo List (the one that is a list editor). Select all tempo changes and "copy them (command+c)
close project
Open destination project, open Tempo List delete all information and paste (command+V). Remember that Logic should be stopped at the exact position where the first tempo event happens. This is ussually 1.1.1.1, but check it in the source before closing it.
hope this helps.
regards -
Hello,
can i make script in the extendedScript toolkit(this script place document to indesign application) and run this script from html page?
or is there integration between extendedscript and html?
thanksHTML pages are usually displayed in web browsers, whose security model is designed specifically against any access to local resources, especially such calls to local applications.
Besides to browsers, on the Mac, the "Dashboard" allows you to write little applications "widgets" exactly the way you're suggesting. It uses WebKit to display HTML.
http://www.apple.com/downloads/dashboard/
In order to communicate with extendscript, you'd have to go through the operating system's command line.
http://developer.apple.com/documentation/AppleApplications/Conceptual/Dashboard_ProgTopics /Articles/CommandLine.html
From there you'd invoke the scripting system with "OSAScript"
http://developer.apple.com/documentation/Darwin/Reference/Manpages/man1/osascript.1.html
then pass on your extendscript and arguments into InDesign via doScript.
Another alternative to the main HTML browser is AIR, which also has an instance of WebKit for HTML display.
http://livedocs.adobe.com/labs/air/1/quickstartshtml/
Apparently it is possible to invoke BridgeTalk - the Creative Suite's inter application communication used by extendscript - straight from AIR. I don't yet have references how to do that, found the link below just yesterday.
http://www.inthemod.com/bps/?p=165
Dirk -
How to send information from HTML page to JSP without reloading HTML page?
Hello,
Is it possible to send information(row number selected by user) from HTML page to JSP without reloading HTML page?
Thanks.
Oleg.Yes, you can do this with framesets and a hidden frame.
You need a bit of JavaScritp in the "visible" frame that
sets the location of the hidden frame to the JSP.
Add the user's choice as a parameter to the JSP URL. -
How do i get content from my ipod touch to my new desk top computer
how do i get content from my ipod touch to my new desktop computer?
For iTunes purchases > iTunes Store: Transferring purchases from your iPhone, iPad, or iPod to a computer
Maybe you are looking for
-
My shared folder is not showing on the left side of my itunes.
I recently got a media server connected to my wireless router and was able to configure it to be used in itunes. My problem is that my Itunes in my mac book pro does not show the shared folders on the left side no matter what I do. In my wife's ima
-
Anyone facing problem while downloading ios 7, Numbers can't update to icloud
Numbers can't update ontime after downloading ios 7. Anyone have solution for this problem. Would appreciate your kind sharing.
-
Unable to Reply to Messages in Safari 4.0.2
I've been using Safari 4.0.2 (Mac OS X) to reply to messages in the Acrobat and InDesign forums. Suddenly, two days ago, I've lost that ability. (I can still reply in Firefox 3.5.1, so I can post this here. When I click Reply in Safari, the Post Repl
-
HT4623 i updated mi iPhone 5 just now, what are the new features?
what are the new features in the recen update?
-
Why does currency type change in my template
Im using Pages in iWork-09 and though I specify a currency through the inspector (EUR GBP USD or Skr) before saving the document as a template - it still shows up as Skr when I open the template. I keep having to save the templates as .pages instead