How to extract elements from an HTML page?

Hi:
If we have the following returned page which is the web query result from a library: http://library.fsi.uiuc.edu/dbtw-wpd/exec/dbtwpub.dll?XC=%2Fdbtw-wpd%2Fexec%2Fdbtwpub.dll&BU=http%3A%2F%2Flibrary.fsi.uiuc.edu%2Fdbtw-wpd%2FnewWeb%2Fauthor.asp&submit=Submit+Query&QB0=AND&QF0=Author&QI0=william&QB1=AND&QF1=Corporate+Author&QI1=&MR=20&TN=Catalog&DF=Display-Web&RF=Brief+Title+-+Web&DL=1&RL=1&NP=1&AC=QBE_QUERY
If we want to store all the returned information in the local memory,
how can Java be used to extract the records out of the html page?
What kind of classes should be used?
For example, should I just use regular expression to do this (java.util.regex class),
build-in html parser (javax.swing.text.html.parser.Parser, or even javax.swing.text.html.HTMLEditorKit.Parser), or any other classes? Thanks.

I been working on a simular project but i might not be using a method that you would want.
I am reading the webpage code and then getting the data by looking for certain words. The problem with this is that it is reading the page "blindly" so it could read the wrong data or screw up on an error on the page. but here is a bit of code i am using. I am using ti to get data that appears in " " (quotes) so i can get urls, images, etc
This code isnt totaly mine, i amended it from some one and dont know the original owner :(
try {
        is =                    new BufferedInputStream(new URL(jTextField1.getText()).openStream());
      while ((record = is.read()) != -1) {
        buffer.append((char) record);
        length++;
        System.out.print((char)record);
      }     is.close();
} catch // error management herei am using a seperate thread running at same time to enter data
do {
        while (lengthposition != length) {
          //System.out.print(buffer.charAt(lengthposition));
          temp[0] = "" + buffer.charAt(lengthposition);
          if (buffer.charAt(lengthposition) != /*34*/64 && tolist == true)
            temp[1] += ""+ buffer.charAt(lengthposition);
          else if (buffer.charAt(lengthposition) == /*34*/64 && tolist == false)
            tolist = true;
          else if (buffer.charAt(lengthposition) == /*34*/64 && tolist == true) {
            tolist = false;
            list1.addItem(temp[1]);
            temp[1] = "";
          lengthposition++;
      } while (buffer.charAt(lengthposition) != 32);I hope you find out a solution to your problem :)

Similar Messages

  • How to extract elements from a document

    I'm new to Java and I'm using JDOM in a JSP page. I've a document like this:
    <OFFERTA>
    <FACOLTA idFacolta="F1">
    <CORSO value="xxx"/>
    <CORSO value="yyy"/>
    </FACOLTA>
    <FACOLTA idFacolta="F2">
    <CORSO value="zzz"/>
    </FACOLTA>
    <FACOLTA idFacolta="F3">
    </FACOLTA>
    <FACOLTA idFacolta="F4">
    </FACOLTA>
    </OFFERTA>
    I'd like to get a document with the same structure but with the only FACOLTA elements that match a requested value for the idFacolta attribute.
    For example, if the requested idFacolta is "F2", I'd like to get:
    <OFFERTA>
    <FACOLTA idFacolta="F2">
    <CORSO value="zzz"/>
    </FACOLTA>
    </OFFERTA>
    I check in a loop if the element matches the idFacolta requested but when I find and try to delete it I get:
    java.util.ConcurrentModificationException
         at org.jdom.ContentList$FilterListIterator.checkConcurrentModification(ContentList.java:1230)
         at org.jdom.ContentList$FilterListIterator.hasNext(ContentList.java:942)
    Here's my code:
    Document doc = builder.build(file);
    Element root = doc.getRootElement();
    List children = root.getChildren();
    Iterator facoltaIterator = children.iterator();
    // Check if there's a request
    if (id!=null) {
         while (facoltaIterator.hasNext()) {
              Element facolta = (Element) facoltaIterator.next();
              if (!facolta.getAttribute("idFacolta").getValue().equalsIgnoreCase(id)) {
                   // Delete
                   root.removeContent(facolta);
    All seems to be right, except the line
    root.removeContent(facolta);
    So, I tried to make a new document for the results, adding the element facolta:
    Document doc = builder.build(file);
    Element root = doc.getRootElement();
    // Create a new document with the same root
    Element rootRisultati = new Element("OFFERTA");
    Document docRisultati = new Document(rootRisultati);
    List children = root.getChildren();
    Iterator facoltaIterator = children.iterator();
    if (id!=null) {
         while (facoltaIterator.hasNext()) {
              Element facolta = (Element) facoltaIterator.next();
              if (facolta.getAttribute("idFacolta").getValue().equalsIgnoreCase(id)) {
                   // Add the element to the new document
                   rootRisultati.addContent(facolta);
    but this causes a different exception:
    org.jdom.IllegalAddException: The element already has an existing parent "OFFERTA"
         at org.jdom.ContentList.add(ContentList.java:190)
         at org.jdom.ContentList.add(ContentList.java:146)
         at java.util.AbstractList.add(AbstractList.java:84)
         at org.jdom.Element.addContent(Element.java:1062)
    I'm not able to find a solution.
    There's a commonly used procedure to select elements maintaing the structure of the document?
    Can anyone help please? Thanks in advance.
    Stefano

    maybe you could try posting this in a JDOM forum? JDOM is not a standard XML tool from Sun / for Java... therefore out of the scope of this forum!

  • How to extract content from a html-document

    Hi!
    I need to display the content of a news-feed
    like: http://news.bbc.co.uk/go/rss/-/1/hi/sci/tech/7662565.stm
    I'd like to display this in a JEditorPane with only the relevant content without the banner, menu and so on.
    Any idea on how I can do this?
    Cheers

    In this particular case, you don't want to parse the HTML but the RSS feed used to build this this page. The feed for this page is [http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/sci/tech/rss.xml]
    RSS feeds are XML based so it's easy to parse them. You can do it by hand or use a specialized library. One of the well-known library is ROME ( [http://wiki.java.net/bin/view/Javawsxml/Rome] ).
    Example : [Parse a RSS XML file|http://www.rgagnon.com/javadetails/java-0557.html]
    Bye.

  • How ias integrate with Snacktory for getting main text from an html page

    Hi All,
    i am new to endeca and ias, i have an requirement, need to get main text from whole html page before ias save text to Endeca_Document_Text property,
    as ias save all text in page to endeca_document_text property, it is not ok for reading when show in web page, i use an third party API to filter out the main text from original page,
    now i want to save these text to endeca_document_text property,
    an another question,
    i get zero page when doing the logic of filtering main text from original html text in ParseFilter( HTMLMetatagFilter implements ParseFilter) using Snacktory.
    if only do little things, it will work fine, if do more thing, clawer fail to crawl page. any one know how to fix it.
    log for clawler.
    Successfully set recordstore configuration.
    INFO    2013-09-03 00:56:42,743    0    com.endeca.eidi.web.Main    [main]    Reading seed URLs from: /home/oracle/oracle/endeca/IAS/3.0.0/sample/myfirstcrawl/conf/endeca.lst
    INFO    2013-09-03 00:56:42,744    1    com.endeca.eidi.web.Main    [main]    Seed URLs: [http://www.liferay.com/community/forums/-/message_boards/category/]
    INFO    2013-09-03 00:56:43,497    754    com.endeca.eidi.web.db.CrawlDbFactory    [main]    Initialized crawldb: com.endeca.eidi.web.db.BufferedDerbyCrawlDb
    INFO    2013-09-03 00:56:43,498    755    com.endeca.eidi.web.Crawler    [main]    Using executor settings: numThreads = 100, maxThreadsPerHost=1
    INFO    2013-09-03 00:56:44,163    1420    com.endeca.eidi.web.Crawler    [main]    Fetching seed URLs.
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:56:52,889    10146    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:52,889    10146    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:52,890    10147    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:56:59,184    16441    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:59,185    16442    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:59,185    16442    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:57:07,058    24315    com.endeca.eidi.web.Crawler    [main]    Seeds complete.
    INFO    2013-09-03 00:57:07,090    24347    com.endeca.eidi.web.Crawler    [main]    Starting crawler shut down
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    Waiting for running threads to complete
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    Progress: Level: Cumulative crawl summary (level)
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    host-summary: www.liferay.com to depth 1
    host    depth    completed    total    blocks
    www.liferay.com    0    0    1    1
    www.liferay.com    1    0    0    0
    www.liferay.com    all    0    1    1
    INFO    2013-09-03 00:57:07,096    24353    com.endeca.eidi.web.Crawler    [main]    host-summary: total crawled: 0 completed. 1 total.
    INFO    2013-09-03 00:57:07,096    24353    com.endeca.eidi.web.Crawler    [main]    Shutting down CrawlDb
    INFO    2013-09-03 00:57:07,160    24417    com.endeca.eidi.web.Crawler    [main]    Progress: Host: Cumulative crawl summary (host)
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]   Host: www.liferay.com:  0 fetched. 0.0 mB. 0 records. 0 redirected. 4 retried. 0 gone. 0 filtered.
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]    Progress: Perf: All (cumulative) 23.6s. 0.0 Pages/s. 0.0 kB/s. 0 fetched. 0.0 mB. 0 records. 0 redirected. 4 retried. 0 gone. 0 filtered.
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]    Crawl complete.
    ~/oracle/endeca
    -======================================
    source code for parsefilter
    package com.endeca.eidi.web.parse;
    import java.util.Map;
    import java.util.Properties;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.log4j.Logger;
    import org.apache.nutch.metadata.Metadata;
    import org.apache.nutch.parse.HTMLMetaTags;
    import org.apache.nutch.parse.Parse;
    import org.apache.nutch.parse.ParseData;
    import org.apache.nutch.parse.ParseFilter;
    import org.apache.nutch.protocol.Content;
    import de.jetwick.snacktory.ArticleTextExtractor;
    import de.jetwick.snacktory.JResult;
    public class HTMLMetatagFilter implements ParseFilter {
        public static String METATAG_PROPERTY_NAME_PREFIX = "Endeca.Document.HTML.MetaTag.";
        public static String CONTENT_TYPE = "text/html";
        private static final Logger logger = Logger.getLogger(HTMLMetatagFilter.class);
        public Parse filter(Content content, Parse parse) throws Exception {
            logger.info("come into EndecaHtmlParser getParse");
            logger.info("come into HTMLMetatagFilter");
            //update the content with the main text in html page
            //content.setContent(HtmlExtractor.extractMainContent(content));
            parse.getData().getParseMeta().add("FILTER-HTMLMETATAG", "ACTIVE");
            ParseData parseData = parse.getData();
            if (parseData == null) return parse;
            extractText(content, parse);
            logger.info("update the content with the main text content");
            return parse;
        private void extractText(Content content, Parse parse){
            try {
                ParseData parseData = parse.getData();
                if (parseData == null) return;
                 Metadata md = parseData.getParseMeta();
                ArticleTextExtractor extractor = new ArticleTextExtractor();
                String sourceHtml = new String(content.getContent());
                JResult res = extractor.extractContent(sourceHtml);
                String text = res.getText();
                md.set("Endeca_Document_Text", text);
            } catch (Exception e) {
                // TODO: handle exception
        public static void log(String msg){
            System.out.println(msg);
        public Configuration getConf() {
            return null;
        public void setConf(Configuration conf) {

    but it only extracts URLs from <A> (anchor) tags. I want to be able to extract URLs from <MAP> tags as wellGee, do you think you could modify the code to check for "Map" attributes as well.
    Can someone maybe point a page containing info on the HTML toolkit for me?It's called the API. Since you are using the HTMLEditorKit and an ElementIterator and an AttributeSet, I would start there.
    There is no such API that says "get me all the links", so you have to do a little work on your own.
    Maybe you could use a ParserCallback and every time you get a new tag you check for the "href" attribute.

  • How do I pass input values from a html page to a jsf page

    hi,
    In my project,for front view we have used html pages.how can I get input values from that html page into my jsf page.for back end purpose we have used EJB3.0
    how can I write jsf managed bean for accessing these entities.we have used session facade design pattern and the IDE is netbeans5.5.
    pls,help me,very urgent
    thanx in advance

    Simplest way is to rewrite html page into jsf page.
    You can use session bean in your managed bean like this:
    import javax.naming.Context;
    import javax.naming.InitialContext;
    public class ManagedBean {
    private Context  ctx;
    private Object res;
    // session bean interface
    private Service service;
              public ManagedBean() {
                try{
                     ctx = new InitialContext();
                     res = ctx.lookup("Service");
                     service = (Service) res;
               catch(Exeption e){
    }Message was edited by:
    m00dy

  • How to set a cookie in the browser from an html page called via an Iview

    How to set a cookie in the browser from an html page called via an Iview
    Hello all,
    I have an issue which is causing problems. I have a snap survey (html form with submit and cookie setting) which is embedded in a url iview.
    Although the submit and the form work fine, the portal will not allow the cookie to be set it seems.
    Is there a way to allow cookies to be set from an embedded page in a url iview??
    You will make my day if you know!
    System: EP7 SP13
    Kind regards
    Alex

    Hi,
    Check this:
    http://www.oracle.com/technology/products/ias/portal/html/same_cookie_domain_with_pdkv2.html
    Cookie Basics
    Web browsers have built in rules for receiving and sending cookies. When a browser makes a request to a web server and the web server returns cookies with the response, the browser will only accept a cookie if the domain associated with the cookie matches that of the original request. Similarly, when a browser makes a subsequent request, it will only send those cookies whose domain matches that of the target web server.
    These rules are designed to ensure that information encoded in cookies is only "seen" by the web server(s) that the originator of the cookie intended. These rules also ensure that the cookie cannot be corrupted or imitated by another server. By default, the domain associated with a cookie exactly matches that of the server that created it. However, it is possible to modify the domain at the time the cookie is created. Relaxing the cookie domain increases the scope of the cookie's visibility making it available to a wider "audience" of web servers.
    For example, if a cookie is created by a.us.oracle.com, it's domain will usually be set to a.us.oracle.com. This means that the browser will only send the cookie to a.us.oracle.com. It will never send it to any other servers. However, if at the time of creation, the domain of the cookie is set to .us.oracle.com, the browser will send the cookie to any server whose domain falls within .us.oracle.com. such as portal.us.oracle.com, provider.us.oracle.com, app.us.oracle.com etc
    Regards,
    Praveen Gudapati

  • How to extract data from web URL

    I was doing one project which need to extract data from web pages and then analyze these data. the question is how to extract data from there, using html parser? need help, thanks a lot

    I was doing one project which need to extract data
    from web pages and then analyze these data. the
    question is how to extract data from there, using
    html parser? need help, thanks a lotTry this:
    http://java.sun.com/docs/books/tutorial/networking/urls/readingURL.html
    Or, like you said yourself, use an HTML parser:
    http://java-source.net/open-source/html-parsers

  • How to call java program by HTML page

    Hi guys,
    I'm new java programer and want to build an HTML page to access to ORACLE database on NT server by JDBC, Can anyone give me a sample?
    I already know how to access database by JDBC, but I don't know how to call java program by HTML page.
    If you have small sample,pls send to me. [email protected], thanks in advance
    Jian

    This code goes with the tutorial from this web page
    http://java.sun.com/docs/books/tutorial/jdbc/basics/applet.html
    good luck.
    * This is a demonstration JDBC applet.
    * It displays some simple standard output from the Coffee database.
    import java.applet.Applet;
    import java.awt.Graphics;
    import java.util.Vector;
    import java.sql.*;
    public class OutputApplet extends Applet implements Runnable {
    private Thread worker;
    private Vector queryResults;
    private String message = "Initializing";
    public synchronized void start() {
         // Every time "start" is called we create a worker thread to
         // re-evaluate the database query.
         if (worker == null) {     
         message = "Connecting to database";
    worker = new Thread(this);
         worker.start();
    * The "run" method is called from the worker thread. Notice that
    * because this method is doing potentially slow databases accesses
    * we avoid making it a synchronized method.
    public void run() {
         String url = "jdbc:mySubprotocol:myDataSource";
         String query = "select COF_NAME, PRICE from COFFEES";
         try {
         Class.forName("myDriver.ClassName");
         } catch(Exception ex) {
         setError("Can't find Database driver class: " + ex);
         return;
         try {
         Vector results = new Vector();
         Connection con = DriverManager.getConnection(url,
                                  "myLogin", "myPassword");
         Statement stmt = con.createStatement();
         ResultSet rs = stmt.executeQuery(query);
         while (rs.next()) {
              String s = rs.getString("COF_NAME");
              float f = rs.getFloat("PRICE");
              String text = s + " " + f;
              results.addElement(text);
         stmt.close();
         con.close();
         setResults(results);
         } catch(SQLException ex) {
         setError("SQLException: " + ex);
    * The "paint" method is called by AWT when it wants us to
    * display our current state on the screen.
    public synchronized void paint(Graphics g) {
         // If there are no results available, display the current message.
         if (queryResults == null) {
         g.drawString(message, 5, 50);
         return;
         // Display the results.
         g.drawString("Prices of coffee per pound: ", 5, 10);
         int y = 30;
         java.util.Enumeration enum = queryResults.elements();
         while (enum.hasMoreElements()) {
         String text = (String)enum.nextElement();
         g.drawString(text, 5, y);
         y = y + 15;
    * This private method is used to record an error message for
    * later display.
    private synchronized void setError(String mess) {
         queryResults = null;     
         message = mess;     
         worker = null;
         // And ask AWT to repaint this applet.
         repaint();
    * This private method is used to record the results of a query, for
    * later display.
    private synchronized void setResults(Vector results) {
         queryResults = results;
         worker = null;
         // And ask AWT to repaint this applet.
         repaint();

  • How to extract text from a PDF file?

    Hello Suners,
    i need to know how to extract text from a pdf file?
    does anyone know what is the character encoding in pdf file, when i use an input stream to read the file it gives encrypted characters not the original text in the file.
    is there any procedures i should do while reading a pdf file,
    File f=new File("D:/File.pdf");
                   FileReader fr=new FileReader(f);
                   BufferedReader br=new BufferedReader(fr);
                   String s=br.readLine();any help will be deeply appreciated.

    jverd wrote:
    First, you set i once, and then loop without ever changing it. So your loop body will execute either 0 times or infinitely many times, writing the same byte every time. Actually, maybe it'll execute once and then throw an ArrayIndexOutOfBoundsException. That's basic java looping, and you're going to need a firm grip on that before you try to do anything as advanced as PDF reading. the case.oops you are absolutely right that was a silly mistake to forget that,
    Second, what do the docs for getPageContent say? Do they say that it simply gives you the text on the page as if the thing were a simple text doc? I'd be surprised if that's the case.getPageContent return array of bytes so the question will be:
    how to get text from this array? i was thinking of :
        private void jButton1_actionPerformed(ActionEvent e) {
            PdfReader read;
            StringBuffer buff=new StringBuffer();
            try {
                read = new PdfReader("d:/getjobid2727.pdf");
                read.getMetaData();
                byte[] data=read.getPageContent(1);
                int i=0;
                while(i>-1){ 
                    buff.append(data);
    i++;
    String str=buff.toString();
    FileOutputStream fos = new FileOutputStream("D:/test.txt");
    Writer out = new OutputStreamWriter(fos, "UTF8");
    out.write(str);
    out.close();
    read.close();
    } catch (Exception f) {
    f.printStackTrace();
    "D:/test.txt"  hasn't been created!! when i ran the program,
    is my steps right?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

  • How to extract data from planning book

    nu t apo. need help regarding how to extract data from planning book givn the planning book name , view name & some key figures.
    Total Demand (Key Figure DMDTO):
    o     Forecast
    o     Sales Order
    o     Distribution Demand (Planned)
    o     Distribution Demand (Confirmed)
    o     Distribution Demand (TLB-Confirmed)
    o     Dependent Demand
    Total Receipts (Key Figure RECTO):
    o     Distribution Receipt (Planned)
    o     Distribution Receipt (Confirmed)
    o     Distribution Receipt (TLB-Confirmed)
    o     In-Transit
    o     Production (Planned)
    o     Production (Confirmed)
    o     Manufacture of Co-Products
    Stock on Hand (Key Figure STOCK):
    o     Stock on Hand (Excluding Blocked stock)
    o     Stock on Hand (Including Blocked stock)
    In-Transit Inventory (Cross-company stock transfers)
    Work-in-progress (Planned orders / Process orders with start date in one period and finish date in another period)
    Production (Planned), Production (Confirmed) and Distribution Receipt elements need to be converted based on Goods Receipt date for projected inventory calculation.

    Hello Debadrita,
    Function Module BAPI_PBSRVAPS_GETDETAIL2 or BAPI_PBSRVAPS_GETDETAIL can help you.
    For BAPI BAPI_PBSRVAPS_GETDETAIL, the parameters are:
    1) PLANNINGBOOK - the name of your planning book
    2) DATA_VIEW - name of your data view
    3) KEY_FIGURE_SELECTION - list of key figures you want to read
    4)  SELECTION - selection parameters which describe the attributes of the data you want to read (e.g. the category or brand). This is basically a list of characteristics and characteristic values.
    BAPI_PBSRVAPS_GETDETAIL2 is very similar to BAPI_PBSRVAPS_GETDETAI but is only available from SCM 4.1 onwards.
    For the complete list of parameters, you can go to transaction SE37, enter the function module names above and check out the documentation.
    Please post again if you have questions.
    Hope this helps.

  • How to read the code of html page

    Hi,
    I want to know how to read the code of html page through Java? And if anyone know the link of full implementation of Page Rank Algorithm in Java.
    Please let me know. I have to do the project on that topic.
    Regard
    Vivek

    Vivek_NITT wrote:
    I want to know how to read the code of html page through Java? Get the input stream from an HttpUrlConnection. Read from it like from any other stream.
    And if anyone know the link of full implementation of Page Rank Algorithm in Java.Which one?

  • Extracting info from a web page

    Hi,
         I m not sure if i m asking this question at the right forum.
    Can anyone tell me if there is a way to extract data from a web page.
    This means, say for example a web site Yahoo displays stock quotes
    updated or NASDAQ values almost in real time.
    Now if i want to get that information from the web page into one
    of my applications ,say, something that uses that data. Is there
    a way to do it?
    Just curious

    Yes, it's possible. You can use the java.net.URL object to connect to websites and download the html. Doing the coding is not that easy, and you should also be mindful of not redistributing data you've gotten from another site without permission

  • How to extract text from a PDF file using php?

    How to extract text from a PDF file using php?
    thanks
    fabio

    > Do you know of any other way this can be done?
    There are many ways. But this out of scope of this forum. You can try this forum: http://forum.planetpdf.com/

  • How to delete elements from a cluster?

    Hello. I would like to know how to delete elements from a cluster. I got stuck with this problem. There is its own order of each element in a cluster. I tried to initiate an array but it seems like too complicated. In the attached file, I want to obtain only the data from "flow" not "pressure". I try to use array programming but it doesn't work. Would be nice if you help me to fix the file. Thanks
    Solved!
    Go to Solution.
    Attachments:
    test 1.vi ‏16 KB

    Bombbooo wrote:
    What about if I want to store data into the pressure array too? Do I have to create another loop or it can be done in the same loop as the flow loop?
    Try this, for example. Modify as needed.
    LabVIEW Champion . Do more with less code and in less time .
    Attachments:
    test1MOD2.vi ‏9 KB
    SelectFlowOrPressure.png ‏7 KB

  • Need suggestion on how to extract data from a table

    Many years ago, I wrote many Perl scripts to increase my work productivity.
    Now I am starting learning Java, Javascript and PHP for some purposes. Just wrote a simple html2txt java program.
    Now I need experts' suggestion for one specific purpose - extract data from a html table. One of the examples is here: http://mops.tse.com.tw/nas/t06sa18/200902/A02_2454_200902.htm
    I need to extract datum from the table and make some analysis. (Txt is in Chinese, sorry)
    What is the best way to code it? Java, PHP or other? If Java, any suggestion what classes/module and approach to use?
    Thanks very much.

    xcomme wrote:
    Many years ago, I wrote many Perl scripts to increase my work productivity.
    Now I am starting learning Java, Javascript and PHP for some purposes. Just wrote a simple html2txt java program.
    Now I need experts' suggestion for one specific purpose - extract data from a html table. One of the examples is here: http://mops.tse.com.tw/nas/t06sa18/200902/A02_2454_200902.htm
    I need to extract datum from the table and make some analysis. (Txt is in Chinese, sorry)
    What is the best way to code it? It this a one shot (one time only task)? Then the best way is whatever way you are most comfortable with and which works.
    If on going then I doubt language choice matters but using a html parser rather than attempting to parse it yourself is going to help in any language.
    And are you starting with html files or starting with a http server? The two are very different.

Maybe you are looking for