Parse a WebPage and Extract Data

Hi, i've figured out how to get an html source using a java class posted earlier on this forum. I wanted to find out what is the best and easiest way to extract information from it. I have been searching the web and read a lot about using XML. Is that the best way to do it, or is there something else i can use ? Thanks !

I would suggest building your own statements.
I am not sure which web site you are trying to get data from. However the previous response was correct. HTML has to be well formed which often it is not. Even the most professional web sites have bad HTML code.
I am not sure if you are looking to extract data from one web site or various (please let me know).
here is an example of how you 'might' wish to do it:
HTML CODE:
<td nowrap><font face="Arial" size="3"><b>Description</b></font></td>
<td width="100%"></td>
<td nowrap width="100%" align="right">WonderfulDescription</font>
</td>
CODE:
String description  //Field to store the description
int startIndex      //Index of where description starts
int endIndex        //Index of where description ends
while (readLine...)     //That while loop from the previous code I gave
  if (readLine.indexOf("Description" != -1)
    HTMLpage.readLine(); // Read line 1 - hence: <td width="100%"></td>
    readLine = HTMLpage.readLine();
    startIndex = readLine.indexOf("align=\"right\">") + 14;
    endIndex = readLine.indexOf("</font>"); 
    description = readLine.subString(startIndex, endIndex);
}This is ideal if there is specific information in the page that you wish to obtain. But coding it takes time and you have to study the web page layout carefully. I suggest putting a System.out.println(readLine) statement in the while loop. The copying and pasting the output into FrontPage and looking at it. Dont copy and paste the source from ViewSource in Internet Explorer as it will not show you dynamic data.
Let me know how you get on. As always, get back to me if you have any questions.
Kind regards
Angus

Similar Messages

  • Parse XML file and extract data

    I'd like to parse an XML file and get some data extracted as columns.
    Input file country.xml:
    <?xml version="1.0" encoding="UTF-8"?>
    <MAS Action="Insert">
    <Country ObjectId="100000000000000009" VersionId="8"><Id>1</Id><NlTexts><Name Language="de">Land1</Name><Name Language="en">Country1</Name></NlTexts></Country>
    <Country ObjectId="100000000000000033" VersionId="2"><Id>2</Id><NlTexts><Name Language="de">Land2</Name><Name Language="en">Country1</Name></NlTexts></Country>
    </MAS>
    I'd like to parse the xmlfile in order to get the following output
    Required result:
    col1        col2           col3
    1            Land1        Country1
    2            Land2        Country2
    or alternatively
    col1  col2      
    1            Land1      
    1            Country1
    2            Land2      
    2            Country2
    I tried extract-function
    select extract((XMLTYPE(BFILENAME('XML_DAT_DIR', 'country.xml'),
               NLS_CHARSET_ID('AL32UTF8'))) , '/*/*/Id') as "xdata"
    from dual;
    xdata
    <Id>1</Id><Id>2</Id>
    and XMLTABLE (but how can I add the countries now)
    SELECT *
        FROM XMLTABLE('/*/*/Id'
               PASSING XMLTYPE(BFILENAME('XML_DAT_DIR', 'country.xml'),
               NLS_CHARSET_ID('AL32UTF8'))
    COLUMN_VALUE
    <Id>1</Id>
    <Id>2</Id>
    DB version 11.2.0.3 on Windows 64bit
    Thanks,
    Tim

    Here are a few examples.
    For your required output :
    SELECT *
    FROM XMLTable(
           '/MAS/Country'
           passing XMLType(bfilename('TEST_DIR', 'country.xml'), nls_charset_id('AL32UTF8'))
           columns col1 number       path 'Id'
                 , col2 varchar2(30) path 'NlTexts/Name[1]'
                 , col3 varchar2(30) path 'NlTexts/Name[2]'
    or, if the Language attribute is significant :
    SELECT *
    FROM XMLTable(
           '/MAS/Country'
           passing XMLType(bfilename('TEST_DIR', 'country.xml'), nls_charset_id('AL32UTF8'))
           columns col1 number       path 'Id'
                 , col2 varchar2(30) path 'NlTexts/Name[@Language="de"]'
                 , col3 varchar2(30) path 'NlTexts/Name[@Language="en"]'
    For your alternate output :
    SELECT x1.col1
         , x2.col2
         --, x2.col3
    FROM XMLTable(
           '/MAS/Country'
           passing XMLType(bfilename('TEST_DIR', 'country.xml'), nls_charset_id('AL32UTF8'))
           columns col1  number  path 'Id'
                 , names xmltype path 'NlTexts/Name'
         ) x1
       , XMLTable(
           '/Name'
           passing x1.names
           columns col2 varchar2(30) path '.'
                 --, col3 for ordinality
         ) x2
    (uncomment col3 to see what it does)
    or, in a shorter way :
    SELECT *
    FROM XMLTable(
           'for $i in /MAS/Country
              , $j in $i/NlTexts/Name
            return element r { $i/Id, $j }'
           passing XMLType(bfilename('TEST_DIR', 'country.xml'), nls_charset_id('AL32UTF8'))
           columns col1 number       path 'Id'
                 , col2 varchar2(30) path 'Name'

  • Can we connect one R/3 to two BW systems and extract data?

    Can we connect one R/3 to two BW systems and extract data succesfully ? kindly through light on this issue.
    Thanks and regards.
    Ankush Shejul

    Of course you can.
    Just have to have the RFC destinations in place and use the
    CALL FUNCTION '....' DESTINATION '....'
    commands in ABAP, check out the ABAP forums for more info.

  • Does EBS license include tools for reporting and extracting data (queries)

    Hello,
    I'm recommending Oracle EBS to a customer and I need to mention what tools for reporting and extracting data (queries) are included in a Oracle EBS license.
    I hope you can help me with that information, or at least with providing me the place where I can get the info.
    Thanks in advance,
    A/P Sergio Maestri

    Hi,
    Thanks, I think the same ... but the customer asked me about it ... I supose my answer will be that he can use Reports Builder for reporting and Toad or PLSQL Developer to access to the database and make queries ...
    What about XML Publisher ? ...it can be used to make reports against an Oracle EBS database and integrate them into the EBS Application ?I believe you need a developer license to use Oracle Developer (Forms/Reports) for customizing/creating reports.
    Global Pricing and Licensing
    http://www.oracle.com/corporate/pricing/index.html
    And IINM, the same thing applied to XML Publisher -- See this thread for a similar discussion.
    eBiz & XML Publisher - Confusion
    http://forums.oracle.com/forums/thread.jspa?messageID=3834445
    Please contact your Oracle sales representative, he/she is the best one to answer such questions.
    Thanks,
    Hussein

  • Need to pre-populate and Extract data from static PDF form

    Hi Jasmin or Jayan or anyone else that can answer.
    I have a requirement to use Digital Signatures.  Because of that, the forms must be static PDFs and the form variables will be “document form”.  I want to pre-populate the form via an SQL query and custom render process and render it as PDF so that the submitter can apply a digital signature when he/she is done and ready to submit for approvalSubsequent approvers will also digitally sign the form.  I know that I will specify the custom render to render only once and thereby preserve the signature(s) on the form.  I do, however, need to extract data from the form to control the business process.  I cannot access the data in the form the same way I do with an xdp and I also cannot pre-populate the same way I do with an xdp. 
    Any suggestions on how to attack this?

    Parth, one problem with your approach is he will submit PDF and therefore you won't be able to put the PDF in a variable that's suppose to contain just xml.
    The prepopulation should be the same. If you start off with an xdp, then you will call a render service that merges data with your xdp to create a PDF.
    Now when you submit, you will submit the entire PDF back in the Document Form variable. In Workbench, you can use the FormDataIntegration service to extract data from that PDF that's being stored under Document Form var/object/document and put it in an xml variable. Then you can just use xPath to do your condition.
    I'm assuming you'll just pass that same Document Form variable to the next step, because if you do any change to the PDF it'll brake the signature.
    Let me know if I missed anything.
    Jasmin

  • Parse text file and retrieve data

    Hi,
    I have a log file with comma separated entries in each line. Each line has about 50 or more integer values logged for statistical analysis. What i need to do is to be able to parse this text file Line by line and retrieve data say nth entry in each row of file. Basically, like i need a certain column from a table. Column selection may vary. I can have user need 10th entry or 25th entry or more than one entries.
    Please help me with suggestions for how to do this efficiently . Keep in mind log file can be huge at certian times.
    Thanks
    -Anupma

    here's something to get you started
    public void parseFile(File file) throws IOException{
        String lin = null;  // string to hold each line in the text file
        BufferedReader in = new BufferedReader(new FileReader(file));
        while ((line = in.readLine()) != null){
            processLine(line);
    public void processLine(String line){
        String tokens = line.split(",");     // tokenize the line - deliminator is a comma
        for (int i = 0; i < tokens.length; i++){
            try{
                int num = Integer.parseInt(tokens);
    System.out.println(" The integer is = " + num);
    catch(NumberFormattedException ee){}

  • Stuck parsing an InDesign document, extracting data to an XML file

    Hi,
    I have an indesign document of a newspaper page (1 spread + 1 page) with some articles (title, body, img, img caption) and ads (img). I need to parse this and create an XML file from which data can be stored into a mysql db. The db handling is already programmed.
    The xml file will have this structure (page = root):
    <page>
         <article>
              <title>Title</title>
              <content>Lorem ipsum</content>
              <img title=""></img>
         </article>
         <ad>
              <img title=""></img>
         </ad>
    </page>
    The .indd file is created using the Smart Layout plugin from WoodWing software. When trying to print the "TextFrames>Texts>contents" the textframes that use the smart layout plugin do not show, they seem to be not included in the data model of the app. That's why I went another route, and parsed "Stories>Texts>contents". I can get all the text objects of the page, including the articles that are created using the smart layout plugin. Among these objects are article title's and article bodies. The problem is I don't know which title belongs to which body, so I can't create a new article node in the XML. I have exported the orignal file as IDML and browsed the spread and the stories, but I see no way to link article titles (textframe), article bodies (textframe) and images (contained by rectangles) together.
    Is there a way to solve this?
    Setup:
    InDesign CS4 .indd documents
    Win2k8 server
    JavaScript (via ExtendScript toolkit)
    InDesignServer CS5.5 (64bit) (want to use this)
    SmartLayout plugin (installs @ InDesignServer CS5.5 32bit. This 32bit InDServer does not start, it has missing dll's).

    I am building something similar, and I have much of it working. Respond here if you're still interested, and hopefully I'll see it!

  • Parse xml file and extract tags (not well formed)

    im writing an xml editor and i would like to extract tags from a string of xml. And the string doesnt have to be well formed xml.
    An example "</a><b asd="kjkj"></b>"
    Output
    end tag (a)
    start tag (b)
    end tag (b)
    Sax nor stax does the job since they want well formed document. and i dont want to write a parser myself.
    any suggestions

    im writing an xml editor and i would like to extract
    tags from a string of xml. And the string doesnt have
    to be well formed xml.Then you aren't writing an XML editor. And as you observe, existing parsers don't work for you because they are don't have the requirement to process garbage. Where did you get that requirement from, anyway?

  • Parse a webpage and return all the URLs mms://***.wmv

    Hello,
    I would like to create a small script to parse http://jt.france2.fr/20h/ in order to extract the latest News program available:
    i.e.:
    mms://a988.v101995.c10199.e.vm.akamaistream.net/7/988/10199/3f97c7e6/ftvigrp.download.akamai.com/10199/cappuccino/production/publication/France_2/Autre/2009/S23/40692_HD_20h_20090602.wmv
    etc.
    I am not sure how to easily to do that (perl - python - bash/awk - or ???)
    Thanks in advance for your help!
    Ludo

    It's actually straightforward, but I had to struggle a bit, because the url you gave is wrong. The actual player page can be found at http://jt.france2.fr/player/20h/index-fr.php - this is embedded in an iframe on the front page.
    Now, to get the url from this player page with bash and wget, do
    $ wget "http://jt.france2.fr/player/20h/index-fr.php" -O - 2>/dev/null | grep -o -E 'mms://.*\.wmv'
    mms://a988.v101995.c10199.e.vm.akamaistream.net/7/988/10199/3f97c7e6/ftvigrp.download.akamai.com/10199/cappuccino/production/publication/France_2/Autre/2009/S23/40692_HD_20h_20090602.wmv

  • How to pass selection filters to logic and extract data from BI to BPC

    Hi everybody. I checked the forum and I found and tried some solutions but not matching my main aim.
    Scenario:
    In the input schedule I have a button which runs a data package to upload datas into BPC, coming from a BI cube.
    The pushbutton menu is ok. I tried also a solution suggested on the blogs and I can upload all transaction datas from BI to BPC successfully.
    But when I run the data load, I want to take from the BI cube ONLY the datas matching the CVW selected, not all the datas from the BI cube. Second needing is to upload during the run 2 keyfigures from BI to BPC with same selection filter.
    Example:
    - I select in the Current view  ENTITY:  ACME  and CATEGORY: CON
    - I want to upload the 0AMOUNT and 0QUANTITY from the BI cube to BPC of the 0COMPANY "ACME" and the 0VERSION: "CON"
    As I understand I can put in the Options of my transformation file using SELECTION , something to filter the datas, but I want that this is dependent from the current view selection. No extra prompts, and to upload the 2 keyfigures in the same process.
    Any idea is welcome.
    Thanks a lot in advance  for any help.
    Edited by: Walter Cista on May 29, 2010 9:29 AM
    Edited by: Walter Cista on May 30, 2010 7:14 PM

    Hi,
    thanks for the answer, but my situation is little bit different.
    I don't want to ask again with prompt to choose my ENTITY to the user, because I have on the main Input schedule, a menu button to choose ENTITY and so the user will just choose once.
    I also understood that the variables selected with the current view, can be found inside DEFAULT logic with [dimension].Currentmember, but I do not work with default logic. I have my own ZLOGIC, and during the call I would like to pass the ENTITY choosen by the user to the logic. In this way when I run my transformation to upload data from a BI cube to BPC cube, I only select a slice of datas, not all the datas in the cube. In short, if I can manage the [DIMENSION].CURRENTMEMBER I will only take from the BI cube the datas I want. As I know if in the TRASFORMATION OPTIONS, in SELECTION if I pass some datas, like ENTITY = ACME I upload to BPC only BI datas of company ACME.....
    So, I would like to do something like this.
    Any reply and help is welcome
    Walter

  • Parse SQL query and extract source tables and columns

    Hello,
    I have a set of SQL queries and I have to extract the source tables and columns from them.
    For example:
    Let's imagine that we have two tables
    CREATE TABLE T1 (col1 number, col2 number, col3 number)
    CREATE TABLE T2 (col1 number, col2 number, col3 number)
    We have the following query:
    SELECT
    T1.col1,
    T1.col2 + T1.col3 as field2
    FROM T1 INNER JOIN T2 ON T1.col2=T2.col2
    WHERE T2.col1 = 1
    So, as a result I would like to have:
    Order Table Column
    1 T1 col1
    2 T1 col2
    2 T1 col3
    Optionally, I would like to have a list of all dependency columns (columns used in "ON", "WHERE" and "GROUP BY" clauses:
    Table Column
    T1 col2
    T2 col1
    T2 col2
    I have tried different approaches but without any success. Any help is appreciated. Thank you in advance.
    Best regards,
    Beroetz

    I have a set of SQL queries and I have to extract the source tables and columns from them. In a recent db version you can use Re: sql injection question for this.

  • Validating and Extracting Data from OBSSOCookie

    Hello, I am looking for references on how to validate an OBSSO Cookie that is presented by a user (through their browser) and how to extract values from the cookie (using JAVA API calls in the Access Manager SDK).
    I am not able to locate any references in the developers guide - can someone please assist or point me to some documentation to start?
    Thanks in Advance.

    Here's a snippet from when I did this to get the username from the cookie:
    public String getUser(HttpServletRequest request, HttpServletResponse response) {
    try {
    String sdkLocation = System.getProperty("oam.sdk.location");
    ObConfig.initialize(sdkLocation);
    catch (ObAccessException ie) {
    System.out.println("Initialize failed");
    ie.printStackTrace();
    Cookie cookie = getCookie(request, "ObSSOCookie");
    ObUserSession userSession = null;
    try {
    userSession = new ObUserSession(cookie.getValue());
    catch (Throwable t) {
    t.printStackTrace();
    System.out.println("Failed to create new user session");
    try {
    String DNarray[];
    if (userSession.getStatus() == ObUserSession.LOGGEDIN) {
    String userDN = userSession.getUserIdentity();
    StringTokenizer userST = new StringTokenizer(userDN,",=");
    DNarray = new String[userST.countTokens()];
    for(int i=0;userST.hasMoreTokens();i++){
    DNarray[i] = userST.nextToken();
    userID = DNarray[1];
    System.out.println("user ID is " + userID);
    else {
    System.out.println("Login failed. Return status = " + userSession.getStatus());
    catch (com.oblix.access.ObAccessException obae) {
    obae.printStackTrace();
    obae.getMessage();
    return userID;
    }

  • Error when extracting data from ETL - Missing messages and missing InfoIdoc

    Hi All,
    We are using BW 3.0 and extracting data from other source systems through Informatica Powercenter (ETL). The system is working fine but when we try to extract data on 31st Dec , we get the following error. Only this load gives the error and all the other load to other data targets are going fine. All the data are one-to-one mapped from ETL to BW.
    Error messages from Monitor -Status tab:-
       "InfoIdocs missing; external system
       Diagnosis :- InfoIDocs are missing. The IDocs are created in BW with non-SAP systems as source    
       systems that transfer the data and metadata to BW using BAPIs. Since the IDocs are missing, a   
       short dump in BW probably occurred.
       System response:  No selection information arrived from the source system"
    Error messages from Monitor -Details tab:-
        Missing message: Number of sent records,   Missing message: Selection completed
    Highly appretiate your suggestions.
    Vinod.CH

    Hi Rathy Moorthy,
    Thank you very much for your reply. The source system connections are OK and we are able to load data to other Data targets from ETL, we have issue only with this this particular load. The load extracts data and I have loaded the data from ETL to PSA and have checked the data content and it is correct. But when I update the same data to the target I get this error. I have also tried to update from PSA to target and also directly from ETL to target.
    Appretiate your suggestions.

  • How to Extract  data from a transaction..

    Hi All
    There are some transactions and report programs which are used daily to produce some results and the requirement is to extract that data into BW
    so how can we extract information from a transactions and programs
    Any help will be really appreciated and ofcourse rewarded ...
    Regards
    Lisa

    You can use the following methods
    Method 1:
    1> Create another ABAP program with logic to calculate all the fields, but instead of writing the result data to screenm, it should store it in a internal table and pass it to the extract structure.
    2>Create a InfoSet query based on your ABAP program and use that InfoSet
    3> Create a DataSource based on this InfoSet query and extract data from the DS to you ODS/Cube
    In this method there is no Intermediate storage used.
    Method 2:
    1> Modify the report/transaction program to store the required data into a Transparent Table (u can create one from SE11) in addition to displaying on the screen.
    2> Create a View on this transparent table (from SE11)
    3> Create a DataSource on that view and extract data from that into your ODS/Cube.
    Here, the required data is stored in the Table in addition to the DataTarget.
    But, to implement either of these solution, you will need a decent ABAP knowledge. Else, it will give you a tough time!
    Good luck!
    Regards,
    Sree

  • How can I extract data from a sound file in carbon

    hello,
    I am a student and I start recently learning carbon. I have to do a application which can read and extract data from a sound file and use that data to do some kind of visual representation of the file. I would like to know if someone can give some directions, tutorial,some code sample etc.
    thank you for your help
    chenita7

    hello orangekay
    My idea is to create an application that can read a sound file( AIFF, MP3 or other) in order to extract some kind of data ( numbers or any values) and use that data to represent the sound file visually.
    I dont know which kind of data can be extract from a sound file, and in which way i can manipulate that data to became a visual representation of the sound file. this is what I want to do as my second assignment for a subject at school, introduction to programming. has to be done based in carbon.
    regards
    chenita7

Maybe you are looking for

  • Difference Smart Sync - Generic Sync

    Hi all, sorry for this simple and maybe stupid question, but I am a newbie in this topic. Can anybody tell me the differnce between generic sync and smart sync and when do i have to use them? Can they be used mixed or only one of them? Thanks in adva

  • BI Publisher PDF reports not displaying correctly from EPPM 8.2

    Hi, I have setup EPPM to run BI Publisher reports as specified in the documentation. I can see all the reports correctly and can run the reports ok after specifying the reqd parameters. However, when the report completes and tries to open the PDF out

  • How Do I Straighten Without Cropping?

    I haven't been able to figure out how to use the Straighten tool without cropping. I am starting with a full page image with Pixel Dimensions of 9750 x 13050 pixels, and a Document Size of 8.125 x 10.875 inches. I want to use the Straighten tool, and

  • HT4061 reet my ipad 2, disabled due to too many password attempts.

    I am silly and seemed to forgotten my password, tried it too many times, now it says its disabled, but I need to use it...Can someone help me and let me know how to reset it or un disbale it? Thank-you. Jessica Rzazewski PS: I cannot get into it to g

  • Why does my printer HP 3080 not work with Mac OS X VERSION 10.7.5

    Why does my printer HP 3080 not work with Mac OS X VERSION 10.7.5. Am I missing a driver? If so where do I go to download it? thanks