Extract article from HTML code

Hi,
I'm trying to build a search engine for an RSS feed. Thing is i'm trying to store every article as a BLOB field in the database. To optimize my search i'll need to extract the article only and nothing else (no unrelated hyperlinks or html code)
I'm using the HTMLEditoKit of swing to get the html content without the code, but that's not enough. I need to clean the page of things like headers and footers (they affect the search results)

i already parsed the XML in the RSS...i've got informa ; the problem is the article itself
Take this article for example: [http://news.bbc.co.uk/2/hi/europe/7572635.stm] , you've got the article, but you've things all around it from link to different articles to headers, footers menus.
All of this affects the search results dramatically so i just want the article itself.
that's the greatest challenge.

Similar Messages

  • Problem to extract text from HTML document

    I have to extract some text from HTML file to my database. (about 1000 files)
    The HTML files are get from ACM Digital Library. http://portal.acm.org/dl.cfm
    The HTML page is about the information of a paper. I only want to get the text of "Title" "Abstract" "Classification" "Keywords"
    The Problem is that I can't find any patten to parser the html files"
    EX: I need to get the Classification = "Theory of Computation","ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY","Numerical Algorithms and Problem","Mathematics of Computing","NUMERICAL ANALYSIS"......etc .
    The section code about "Classification" is below.
    Please give any idea to do this, or how to find patten to extract text from this.
    <div class="indterms"><a href="#CIT"><img name="top" src=
    "img/arrowu.gif" hspace="10" border="0" /></a><span class=
    "heading"><a name="IndexTerms">INDEX TERMS</a></span>
    <p class="Categories"><span class="heading"><a name=
    "GenTerms">Primary Classification:</a></span><br />
    � <b>F.</b> <a href=
    "results.cfm?query=CCS%3AF%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Theory of Computation</a><br />
    � <img src="img/tree.gif" border="0" height="20" width=
    "20" /> <b>F.2</b> <a href=
    "results.cfm?query=CCS%3A%22F%2E2%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">ANALYSIS OF ALGORITHMS AND PROBLEM
    COMPLEXITY</a><br />
    � � � <img src="img/tree.gif" border="0" height=
    "20" width="20" /> <b>F.2.1</b> <a href=
    "results.cfm?query=CCS%3A%22F%2E2%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Numerical Algorithms and Problems</a><br />
    </p>
    <p class="Categories"><span class="heading"><a name=
    "GenTerms">Additional�Classification:</a></span><br />
    � <b>G.</b> <a href=
    "results.cfm?query=CCS%3AG%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Mathematics of Computing</a><br />
    � <img src="img/tree.gif" border="0" height="20" width=
    "20" /> <b>G.1</b> <a href=
    "results.cfm?query=CCS%3A%22G%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">NUMERICAL ANALYSIS</a><br />
    � � � <img src="img/tree.gif" border="0" height=
    "20" width="20" /> <b>G.1.6</b> <a href=
    "results.cfm?query=CCS%3A%22G%2E1%2E6%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Optimization</a><br />
    � � � � � <img src="img/tree.gif" border=
    "0" height="20" width="20" /> <b>Subjects:</b> <a href=
    "results.cfm?query=CCS%3A%22Linear%20programming%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Linear programming</a><br />
    </p>
    <br />
    <p class="GenTerms"><span class="heading"><a name=
    "GenTerms">General Terms:</a></span><br />
    <a href=
    "results.cfm?query=genterm%3A%22Algorithms%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Algorithms</a>, <a href=
    "results.cfm?query=genterm%3A%22Theory%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Theory</a></p>
    <br />
    <p class="keywords"><span class="heading"><a name=
    "Keywords">Keywords:</a></span><br />
    <a href=
    "results.cfm?query=keyword%3A%22Simplex%20method%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Simplex method</a>, <a href=
    "results.cfm?query=keyword%3A%22complexity%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">complexity</a>, <a href=
    "results.cfm?query=keyword%3A%22perturbation%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">perturbation</a>, <a href=
    "results.cfm?query=keyword%3A%22smoothed%20analysis%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">smoothed analysis</a></p>
    </div>

    One approach is to download Htmlparser from sourceforge
    http://htmlparser.sourceforge.net/ and write the rules to match title, abstract etc.
    Another approach is to write your own parser that extract only title, abstract etc.
    1. tokenize the html file. --> convert html into tokens (tag and value)
    2. write a simple parser to extract certain information
    find out about the pattern of text you want to extract. For instance "<class "abstract">.
    then writing a rule for extracting abstract such as
    if (tag is abstract ) then extract abstract text
    apply the same concept for other tags
    Attached is the sample parser that was used to extract title and abstract from acm html files. Please modify to include keyword and other fields.
    good luck
    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.util.ArrayList;
    import java.util.List;
    public class ACMHTMLParser
         private String m_filename;
         private URLLexicalAnalyzer lexical;
         List urls = new ArrayList();
         public ACMHTMLParser(String filename)
              super();
              m_filename = filename;
          * parses only title and abstract
         public void parse() throws Exception
              lexical = new URLLexicalAnalyzer(m_filename);
              String word = lexical.getNextWord();
              boolean isabstract = false;
              while (null != word)
                   if (isTag(word))
                        if (isTitle(word))
                             System.out.println("TITLE: " + lexical.getNextWord());
                        else if (isAbstract(word) && !isabstract)
                             parseAbstract();
                             isabstract = true;
                   word = lexical.getNextWord();
              lexical.close();
         public static void main(String[] args) throws Exception
              ACMHTMLParser parser = new ACMHTMLParser("./acm_html.html");
              parser.parse();
         public static boolean isTag(String word)
              return ( word.startsWith("<") && word.endsWith(">"));
         public static boolean isTitle(String word)
              return ( "<title>".equals(word));
         //please modify according to the html source
         public static boolean isAbstract(String word)
              return ( "<p class=\"abstract\">".equals(word));
         private void parseAbstract() throws Exception
              while (true)
                   String abs = lexical.getNextWord();
                   if (!isTag(abs))
                        System.out.println(abs);
                        break;
         class URLLexicalAnalyzer
           private BufferedReader m_reader;
           private boolean isTag;
           public URLLexicalAnalyzer(String filename)
              try
                m_reader = new BufferedReader(new FileReader(filename));
              catch (IOException io)
                System.out.println("ERROR, file not found " + filename);
                System.exit(1);
           public URLLexicalAnalyzer(InputStream in)
              m_reader = new BufferedReader(new InputStreamReader(in));
           public void close()
              try {
                if (null != m_reader) m_reader.close();
              catch (IOException ignored) {}
           public String getNextWord() throws IOException
              int c = m_reader.read();   
              if (-1 == c) return null; 
              if (Character.isWhitespace((char)c))
                return getNextWord();
              if ('<' == c || isTag)
                return scanTag(c);
              else
                   return scanValue(c);
           private String scanTag(final int c)
              throws IOException
              StringBuffer result = new StringBuffer();
              if ('<' != c) result.append('<');
              result.append((char)c);
              int ch = -1;
              while (true)
                ch = m_reader.read();
                if (-1 == ch) throw new IllegalArgumentException("un-terminate tag");
                if ('>' == ch)
                     isTag = false;
                     break;
                result.append((char)ch);
              result.append((char)ch);
              return result.toString();
           private String scanValue(final int c) throws IOException
                StringBuffer result = new StringBuffer();
                result.append((char)c);
                int ch = -1;
                while (true)
                   ch = m_reader.read();
                   if (-1 == ch) throw new IllegalArgumentException("un-terminate value");
                   if ('<' == ch)
                        isTag = true;
                        break;
                   result.append((char)ch);
                return result.toString();
    }

  • Create a string from HTML code

    Hello to all,
    What i have already create is a web browser integrate into a VI project and from this i can read some values like temperature or humidity. Also i have put a HTML window in order to take this values and to export to a string (for example to check these values from the browser and to be able to see in separate constat). Can you help how to do this with an example??
    Thanks in advance

    Below i have attached the code and the HTML code from the sensor. If you have to recommend anything it will be usefull for me. I have another problem now. If you see in the HTML code you will see that i have three different values to export(temperature, humidity and dewpoint) but all of them have the same regular expression in order to request in my code. Do you know how is it possible to solve this problem because always i take the same temperature value for three of them.
    Thanks in advance
    Attachments:
    TempHumid.vi ‏41 KB
    HTMLcode.jpg ‏68 KB

  • Parametre from html code

    Its my first week in applet so i was trying 2 take a parametre from the html code and then 2 take this parametre and put it as a color ex:
    <APPLET code="xxxx.class" width=200 height=200 >
    <param name ="name1" value="green"/>
    and then in the java code i need 2 set this parametre as a color ex:
    g.setColor(Color.?);
    ?= the color in the parametre name
    how can i do this ?

    The following snippet of code assumes that the permissible values
    for the applet parameter "name1" are hexadecimal RGB values.
    "0xFF0000" would be red.
    "0x00FF00" would be green.
    "0x0000FF" would be blue.
    String s;
    Color c;
    // get the applet paramter from the html file
    s = getParameter("name1");
    try
        int aR, aG, aB; // RGB-Values
        aR = Integer.valueOf(s.substring(2,4),16).intValue();
        aG = Integer.valueOf(s.substring(4,6),16).intValue();
        aB = Integer.valueOf(s.substring(6,8),16).intValue();
        c = new Color(aR, aG, aB);
    catch( Exception e )
        c = new Color(0x00, 0xFF, 0x00);
    g.setColor(c);

  • To extract data from t.code mm02

    Hi,
       I want to write the program to extract the data from t.code mm02 for clsiification and unit of measure .
         In classification there are charactersitcs and values which i want to extract. And in unit of measure i want to extract the data .
    please if somebudy knows plz let me know.

    Hi Rahul,
    You can use following tables to extact default material characteristics.
    CABN,CABNT,CAWN & CAWNT
    And you can also refer following code where I have used FM's -
    *&--- Select class type available for material table (MARA)
        SELECT klart
                FROM tcla
                INTO TABLE t_klart
                WHERE obtab = c_mara AND
                      klart = '300'.    " GIves Material class type
    REFRESH t_objkeys.
        LOOP AT t_klart INTO w_klart.
          CLEAR : w_objkeys.
          w_objkeys-class_type     =  w_klart.
          w_objkeys-objectkey      =  w_matnr2.      "material number
          w_objkeys-objecttable    =  c_mara.          " MARA
          APPEND w_objkeys TO t_objkeys.
          CLEAR : w_klart,
                  w_objkeys.
        ENDLOOP.
    REFRESH : t_allocations, t_class_char1, t_class_char.
    *&---FM to get the Class Name for the material
        CALL FUNCTION 'CLBPAX_READ_CLASSIFICATIONS'
          EXPORTING
            t_object_keys = t_objkeys
            keydate       = sy-datum
          IMPORTING
            t_allocations     = t_allocations
            t_valuations_char = t_value_char
            t_valuations_num  = t_value_num.
      IF t_allocations[] IS NOT INITIAL.
        LOOP AT t_allocations INTO w_allocations WHERE class_type = '300'.
            CALL FUNCTION 'BAPI_CLASS_GETDETAIL'
              EXPORTING
                classtype            = w_allocations-class_type
                classnum             = w_allocations-classnum
              TABLES
                classcharacteristics = t_class_char1
                classcharvalues      = t_class_char.
          ENDLOOP.
        ENDIF.

  • Local information from html code

    I have built an application in which i required the web-sites which are exclusively built for us people.while retrieving the index/home page of web-sites some are in html, some are in jsp and some are in asp.
    so i need a code exclusively in java to know which area those websites belong to.
    can somebody there help me?
    thank you

    i already parsed the XML in the RSS...i've got informa ; the problem is the article itself
    Take this article for example: [http://news.bbc.co.uk/2/hi/europe/7572635.stm] , you've got the article, but you've things all around it from link to different articles to headers, footers menus.
    All of this affects the search results dramatically so i just want the article itself.
    that's the greatest challenge.

  • Extracting info from HTML documents

    My program returns the HTML of any web page entered by the user. The HTML documents that are returned all contain pricing infomration that I want to extract. Any idea of the best way to search an HTML document for specific infomration I require. Seems like a huge task to split it all into tokens and searching for � sign!!!!!

    This a nightmare of a problem........... the html
    files that I am retrieving are huge. All I need from
    them are a couple of lines of information. How do I
    find the specific infomration I need???Load the entire file, search for it. You find the information in the same way like you'd do when ouy look for it in the file's source code.
    Is it possible from a java program to open the HTML
    file in web broweser, search, then return the info?
    The html files seem really complex to search on.How would this help?

  • Extract textdata from HTML with AUTO_FILTER

    Hi,
    we're using Oracle AUTO_FILTER to extract text-information from DOC and PDF - Documents.
    Works fine.
    We also have data stored within a HTML structure.
    We use our filter with the following options:
    ctx_ddl.create_preference('SEARCH_iMT_ATTRIB_AFILTER', 'AUTO_FILTER');
    ctx_ddl.create_policy('SE_IMT_POLICY', 'SEARCH_iMT_ATTRIB_AFILTER');
    The filter itself is called within a loop:
    CTX_DOC.Policy_Filter('SE_IMT_POLICY', v_blobtab(i), v_ctmp, TRUE);
    It seems as if AUTO_FILTER converts our HTML to HTML again.
    When trying to insert the AUTO_FILTER, I get an ora-31167: XML nodes over 64K in size cannot be inserted
    How can I force the AUTO_FILTER only to return plain-text?
    Thanks in advance
    Message was edited by:
    user557708

    You need to create a section group preference employing HTML_SECTION_GROUP and then use it when creating your Policy.
    Faisal

  • Function to extract descr from given code

    Kindlhy help some one to get the following function. The prototype of the function is
    FUNCTION extract_descr(given_val VARCHAR2 , table_name VARCHAR2, field_name varchar2) RETURN BOOLEAN. This function
    get the above parameters and returning the description of from the table (table,field,value of the code is parameters). Thanks.

    Hello,
    Simply create a cursor on the USER_TAB_COLUMNS view.
    e.g.:
    Cursor c_cols is
    select column_name, data_type, data_length,data_precision, data_scale from user_tab_columns
    where table_name = 'EMP';Francois

  • Extract URL from HTML text

    Suppose you have the following String that is body text with HTML.
    String bodyText = " My name is Blake. I live in New York City. See my image here: <img href="http://www.blake.com/blake.jpg"/> isn't my picture awesome? Tata for now!"
    I want to extract the URL that contains the location of the image in this bodyText. The ideal would be to create a function called public String extractor(String bodyText) to be used
    String imageURL = extractor(bodyText);
    //imageURL should be "http://www.blake.com/blake.jpg"
    My first thoughts are using reg exp, yet the place i would find to use that would using the .replace in String class. I am by no means an expert on reg exp so I haven't taken too much time to try to figure it out with reg exp. I obviously could do a linear search of the bodyText and do a ton of if statements, but thats just poor coding. I want see if anyone came across this or has insight to this problem.
    Thanks for all the help,
    Blake

    How would the regexp change if there were multiple img tags within the String.I don't rightly know, I'm just a raw beginner in regexes.
    Would this regexp return all the img URLs found in the String.No, as it stands it would return only the last URL. But this will:String bodyText = " My name is Blake. " +
          "I live in New York City. See my image here: " +
          "<img href=\"http://www.blake.com/blake.jpg\"/>" +
          " isn't my picture awesome? Here's another: " +
          "<img href='http://www.blake.com/Vandelay.jpg'/>" +
          " Tata for now!";
    String regex = "(?<=<img\\shref=[\"'])http://.*?(?=[\"']/?>)";
    Pattern pattern = Pattern.compile (regex);
    Matcher matcher = pattern.matcher (bodyText);
    while (matcher.find ()) {
       System.out.println (matcher.group ());
    }Note the enhancement that takes into account that both single and double quotes are legal in HTML. But unlike the earlier example, this does not tolerate more than one space between <img and href=, I couldn't find a way to achieve that.
    Visit this thread later, there are some real regex experts around who may post a better solution.
    db

  • Tools for extracting strings from java code for internationalization

    For legacy code with lots and lots of strings what method is typically used to extract strings for internationalization?
    Am I right in thinking you couldnt simply grep for strings, it could get pretty complitcated, especially with escape characters, escaped quotes etc.,
    -SK

    When dealing with legacy code, it is nice to have an application that queries you as it extracts the strings. You have a choice whether to accept the string as a localizable entity or not.
    There are several tools that do this...including some IDEs like JBuilder. Although it isn't a fully supported or robust tool, Sun has a utility for extracting strings in Java source files:
    http://java.sun.com/products/jilkit/.
    Regards,
    John O'Conner

  • Read values from html response

    Hi,
    I am trying to make a call to an API using UTL_HTTP POST method over SSL and read the response html page and extract the values from the reponse.
    I am able to call and get a response back in html format. I have stored the html response in a clob variable.
    Now i want to parse this html and extract values from the form input items and send them out through OUT parameters.
    For example, from below reponse i want to extract the value '1111d7nhcwse30wq' from 'I4GO_UNIQUEID'
    Can anyone help me with the code to parse this html response and extract the values.
    Any help is greatly appreciated.
    Thanks
    Sharath
    sample Code:
    PROCEDURE get_token (
    p_requesterreference IN VARCHAR2,
    p_cardnumber IN VARCHAR2,
    p_cardtype IN VARCHAR2,
    p_cardholdername IN VARCHAR2,
    p_expirationmonth IN VARCHAR2,
    p_expirationyear IN VARCHAR2,
    p_streetaddress IN VARCHAR2,
    p_postalcode IN VARCHAR2,
    p_cvv2code IN VARCHAR2,
    po_uniqueid OUT VARCHAR2,
    po_errorindicator OUT VARCHAR2,
    po_primaryerrorcode OUT VARCHAR2,
    po_response OUT VARCHAR2,
    po_status_code OUT VARCHAR2,
    po_reason_phrase OUT VARCHAR2
    IS
    v_url VARCHAR2 (200);
    v_url_params VARCHAR2 (32767);
    v_resp_str VARCHAR2 (32767);
    l_http_req UTL_HTTP.req;
    l_http_resp UTL_HTTP.resp;
    v_requesterreference VARCHAR2 (12) := p_requesterreference;
    v_i4go_cardnumber VARCHAR2 (32) := p_cardnumber;
    v_i4go_streetaddress VARCHAR2 (30) := p_streetaddress;
    v_i4go_postalcode VARCHAR2 (9) := p_postalcode;
    v_i4go_expirationmonth VARCHAR2 (2) := p_expirationmonth; -- MM format
    v_i4go_expirationyear VARCHAR2 (2) := p_expirationyear; -- yy format
    v_i4go_cvv2code VARCHAR2 (3) := p_cvv2code;
    v_name VARCHAR2 (256);
    v_value VARCHAR2 (1024);
    l_clob CLOB;
    pv_amp CONSTANT CHAR (1) := CHR (38);
    CURSOR setup_cur
    IS
    SELECT interface_id, interface_name, interface_url, account_id, site_id
    FROM rsv.shift4_setup
    WHERE interface_name = 'I4GO';
    v_setup_rec setup_cur%ROWTYPE;
    BEGIN
    OPEN setup_cur;
    FETCH setup_cur
    INTO v_setup_rec;
    CLOSE setup_cur;
    v_url := 'https://certify.i4go.com//index.cfm?fuseaction=account.PostCardEntry';
    v_url_params :=
    pv_amp
    || 'i4GO_AccountID='
    || v_setup_rec.account_id
    || pv_amp
    || 'i4Go_SiteID='
    || v_setup_rec.site_id
    || pv_amp
    || 'i4Go_CardNumber='
    || v_i4go_cardnumber
    || pv_amp
    || 'i4Go_ExpirationMonth='
    || v_i4go_expirationmonth
    || pv_amp
    || 'i4Go_ExpirationYear='
    || v_i4go_expirationyear
    || pv_amp
    || 'i4Go_CVV2Code='
    || v_i4go_cvv2code
    || pv_amp
    || 'i4Go_PostalCode='
    || v_i4go_postalcode;
    -- begin request using POST method
    UTL_HTTP.set_response_error_check (FALSE);
    UTL_HTTP.set_transfer_timeout (180);
    UTL_HTTP.set_wallet ('file:/etc/ORACLE/WALLETS/oracle', 'welcome1');
    l_http_req := UTL_HTTP.begin_request (v_url, 'POST');
    UTL_HTTP.set_header (l_http_req, 'User-Agent', 'Mozilla/4.0');
    UTL_HTTP.set_header (l_http_req, 'Content-Type', 'application/x-www-form-urlencoded');
    UTL_HTTP.set_header (l_http_req, 'content-length', LENGTH (v_url_params));
    UTL_HTTP.write_text (l_http_req, v_url_params);
    -- get response
    l_http_resp := UTL_HTTP.get_response (l_http_req);
    po_status_code := l_http_resp.status_code;
    po_reason_phrase := l_http_resp.reason_phrase;
    -- read response into a clob
    DBMS_LOB.createtemporary (l_clob, FALSE);
    BEGIN
    LOOP
    UTL_HTTP.read_text (l_http_resp, v_resp_str, 32767);
    DBMS_LOB.writeappend (l_clob, LENGTH (v_resp_str), v_resp_str);
    END LOOP;
    EXCEPTION
    WHEN UTL_HTTP.end_of_body
    THEN
    -- end response
    UTL_HTTP.end_response (l_http_resp);
    END;
    -- Fre resources
    DBMS_LOB.freetemporary (l_clob);
    EXCEPTION
    WHEN OTHERS
    THEN
    DBMS_LOB.freetemporary (l_clob);
    DBMS_OUTPUT.put_line (UTL_HTTP.get_detailed_sqlerrm);
    RAISE;
    END;
    sample response:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
         <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
         <title>Return With Payment Token</title>
         <script src="js/jquery-1.6.4.min.js" type="text/javascript"></script>
         <script type="text/javascript"><!--
              picSpinner= new Image(40,40);
              picSpinner.src="images/loading040.gif";
              bodyOnLoad = function() {
                   $("#noScriptDiv").hide();
                   $("#scriptDiv").show();
                   $("#i4GoMainForm").submit();
         //--></script>
    </head>
    <body onload="bodyOnLoad();">
         <form name="i4GoMainForm" id="i4GoMainForm" action="http://google.com" method="POST" onsubmit="$('#i4Go_submit').attr('disabled','disabled');">
                   <input name="I4GO_RESPONSE" type="hidden" value="SUCCESS" />
                   <input name="I4GO_RESPONSECODE" type="hidden" value="1" />
                   <input name="I4GO_CARDTYPE" type="hidden" value="VS" />
                   <input name="I4GO_UNIQUEID" type="hidden" value="1111d7nhcwse30wq" />
                   <input name="I4GO_EXPIRATIONMONTH" type="hidden" value="12" />
                   <input name="I4GO_EXPIRATIONYEAR" type="hidden" value="2012" />
                   <input name="I4GO_CARDHOLDERNAME" type="hidden" value="" />
                   <input name="I4GO_STREETADDRESS" type="hidden" value="" />
                   <input name="I4GO_POSTALCODE" type="hidden" value="65000" />
              <div id="scriptDiv" style="font-family:Arial, Helvetica, sans-serif;font-size:18px;visibility:hidden;">
                   <img src="images/loading040.gif" alt="Spinner..." />  Loading...
              </div>
              <div id="noScriptDiv" style="font-family:Arial, Helvetica, sans-serif;">
                   <noscript>
                                       <h1>Statement of Tokenization</h1>
                                       <p>The payment information you have submitted has been securely stored in the Shift4 PCI-DSS certified data center and a token representing this information will be sent to the merchant for processing. Below is the information that will be returning to the originating merchant:</p>
                                       <ul>
                                            <li>Response: <strong>SUCCESS</strong></li>
                                            <li>Response Code: <strong>1</strong></li>
                                            <li>Card Type: <strong>VS</strong></li>
                                            <li>Token: <strong>1111d7nhcwse30wq</strong></li>
                                       </ul>
                   </noscript>
    <input type="submit" name="i4Go_submit" id="i4Go_submit" value="Continue" />
              </div>
         </form>
    </body>
    </html>
    Edited by: sgudipat on Apr 24, 2012 1:20 PM

    Here is working example for your HTML using xpath to extract values from html
    You can store your html response in clob variable and then extract the value with xpath
    declare
       l_clob clob;
       l_value varchar2(100);
       l_xml xmltype;
      begin
         l_clob :='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
      <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
      <title>Return With Payment Token</title>
      <script src="js/jquery-1.6.4.min.js" type="text/javascript"></script>
      <script type="text/javascript"><!--
       picSpinner= new Image(40,40);
       picSpinner.src="images/loading040.gif";
       bodyOnLoad = function() {
       $("#noScriptDiv").hide();
       $("#scriptDiv").show();
       $("#i4GoMainForm").submit();
      //--></script>
       </head>
       <body onload="bodyOnLoad();">
       <form name="i4GoMainForm" id="i4GoMainForm" action="http://google.com" method="POST" onsubmit="$(''#i4Go_submit'').attr(''disabled'',''disabled'');">
       <input name="I4GO_RESPONSE" type="hidden" value="SUCCESS" />
       <input name="I4GO_RESPONSECODE" type="hidden" value="1" />
       <input name="I4GO_CARDTYPE" type="hidden" value="VS" />
       <input name="I4GO_UNIQUEID" type="hidden" value="1111d7nhcwse30wq" />
       <input name="I4GO_EXPIRATIONMONTH" type="hidden" value="12" />
       <input name="I4GO_EXPIRATIONYEAR" type="hidden" value="2012" />
       <input name="I4GO_CARDHOLDERNAME" type="hidden" value="" />
       <input name="I4GO_STREETADDRESS" type="hidden" value="" />
       <input name="I4GO_POSTALCODE" type="hidden" value="65000" />
      <img src="images/loading040.gif" alt="Spinner..." />  Loading...
       <noscript>
       Statement of Tokenization
       The payment information you have submitted has been securely stored in the Shift4 PCI-DSS certified data center and a token representing this information will be sent to the merchant for processing. Below is the information that will be returning to the originating merchant:
           Response: SUCCESS
           Response Code: 1
           Card Type: VS
           Token: 1111d7nhcwse30wq
       </noscript>
       <input type="submit" name="i4Go_submit" id="i4Go_submit" value="Continue" />
       </form>
       </body>
       </html>';
         execute immediate 'alter session set events =''31156 trace name context forever, level 2''';
         l_xml := xmltype(l_clob);
         execute immediate 'alter session set events =''31156 trace name context off''';
         select extractvalue( l_xml
                            , '/html/body/form/input[@name="I4GO_CARDTYPE"]/@value'
                            , 'xmlns="http://www.w3.org/1999/xhtml"' )
         into l_value
         from dual;
         dbms_output.put_line(l_value);
       end;
    Problem when parsing html with xpath and xmltype
    Edited by: peterv6i.blogspot.com on Apr 26, 2012 9:38 AM

  • How extract properties from document saved in blob.

    I need extract all document properties from document (any file type) saved in blob.
    I've tried three ways...
    - I've tried using relational ORDDoc.getProperties() by example. But this method returned exception ORDSYS.ORDDOCEXCEPTIONS.DOC_PLUGIN_EXCEPTION for every blob.
    - I can extract some properties with ctxhx.exe and key Meta. But this method is very expensive: I have to 1)save blob to file, 2)convert binary file to html-file, and 3)extract properties from html <meta> and <title> tags.
    - I can extract all properties with COM Automation. But this way need to install a lot of applications for every document type. It's impossible.
    Has anybody did same task yearly? Help me oracle guru!

    ORDDOc getproperties will attempt to see if the media is a video, audio or image and set the appropriate properties.
    I am gussing you have a word style document or something like that?
    Can you tell me about your application adn what you are trying to do?
    Larry

  • Generate a thumbnail from HTML by pure Java on Linux without Graphics

    hi - we in a requirement where we have to generate thumbnails from HTML code. The solution must be implemented in pure Java on Linux where there is no graphics support.
    Options tried already are :--
    1. 3rd party websites - rolled out by our client.
    2. Paid products - rolled out by our client
    3. Media Tracker and other java API - no luck as there is no support after HTML 4.0
    4. Using any os dependent native library - rolled out by our client.
    5. Lobo browser - but having troubles like it opens the browser before screenshot is taken, sometimes. Gone through by putting Thread.sleep() in between and saving remote images into a local html file etc. We got some success in there but problem doesn't end here.
    Questions -
    1. In the point # 5 above, our Linux server had graphics support but in code we set the system property java.awt.headless= true before capturing and generating thumbnail. My question is, if we set this property in the code then does it mean 100% that our code will not use any graphics support, if present in the underlying OS?
    2. Is this really possible to generate images in java on Linux where there is no X window/X server installed? Are we just wasting time in order to achieve which is unachievable?
    Any suggestions are most welcome.
    Regards,
    Sanjeev

    Thanks for ur response! Yeah - we tried but requirements are little different. We have HTML that we have to first render. Whatever output comes, we have to take a screenshot. So in order to render the html we have to have a browser first and I believe every OS which is providing browser support is having Graphics capabilities because browser would have frames, windows, toolbars, menubars etc which fall under Graphics.
    The above way is the only way that I know. If there are another way which ofcourse doesn't require graphics support, please let me know.
    So the question basically is - if I follow above mentioned image (like opening browser and capture screenshot) then is it possible on Linux with no graphics support? Actually I read on internet that lobo browser (written in java) supports this kind of feature.

  • Trim filter for iPlanet - whitespace removing from html

    Hi,
    I've just published my Trim Filter NSAPI plugin for Sun One Web Server/iPlanet
    It is a plugin/filter which removes whitespaces from HTML code. Its LGPL and it can be found here: http://www.thrull.com/iplanet/
    Feel free to test it :-)
    BR,
    Igor

    Cool, thanks for sharing it!
    I took a quick look, and I think I spotted a bug in the write filter method:        /* workaround for a bug? in iPlanet, when returning empty content */
            if (!out_size && amount) {
                rv = net_write(layer->lower, NON_EMPTY_STRING, 1);
                out_size = 1;
            } else {
                rv = net_write(layer->lower, (const char *)buffer, out_size);
            return rv;According to the NSAPI Programmer's Guide (http://docs.sun.com/source/817-6252/npgnsapi.html#wp1004627), the write filter method should return the number of bytes consumed on success. It looks like your write filter method is returning the number of bytes written instead. Perhaps that's the reason you needed that work around?

Maybe you are looking for

  • Payment F110-  vendor credits-not clearing.

    Hi There Please help  me in this f110 payment I have  2 dcuments (doc type KG- vendor credit memo) each 300$. and 1 document (doc type- Invoice Gross)200$. When we run the f110 its not picking up the credits, error: "Due items with  currency USD, pmn

  • Message not saved in sent messages folder

    Hi everybody, When I send a mail by the Collaboration Launch Pad and check the option: "Save copy in Sent Messages Folder" I get this error: "Message has been sent, but could not be saved in the sent messages folder; check the mail server entry in yo

  • Problems with 40 height tables and form.

    Hi I have a a table about 40 in height.I have inserted a form and one textfield and button. When I upload the page the form seems to be at the bottom of the table instead of in the middle. I have centered the form but I am not sure how to center it h

  • Restrict "Create Purchase Order" Authorization using vendor number "LIFNR"

    Hello All, How can we restrict the creation of purchase order(ME21N) using the vendor number (LIFNR). Any helpful idea? Regards,

  • ALE Distribution for custom HR infotypes.

    Hi All, We have a scenario where we have to distibute custom infotypes. Here is the process i am following. Create Infotype 9004. Created segment type Z1P9004 with all the fields in infotype 9004. Attached Z1P9004 to custom infotype 9004 in table T77