Extract textdata from HTML with AUTO_FILTER

Hi,
we're using Oracle AUTO_FILTER to extract text-information from DOC and PDF - Documents.
Works fine.
We also have data stored within a HTML structure.
We use our filter with the following options:
ctx_ddl.create_preference('SEARCH_iMT_ATTRIB_AFILTER', 'AUTO_FILTER');
ctx_ddl.create_policy('SE_IMT_POLICY', 'SEARCH_iMT_ATTRIB_AFILTER');
The filter itself is called within a loop:
CTX_DOC.Policy_Filter('SE_IMT_POLICY', v_blobtab(i), v_ctmp, TRUE);
It seems as if AUTO_FILTER converts our HTML to HTML again.
When trying to insert the AUTO_FILTER, I get an ora-31167: XML nodes over 64K in size cannot be inserted
How can I force the AUTO_FILTER only to return plain-text?
Thanks in advance
Message was edited by:
user557708

You need to create a section group preference employing HTML_SECTION_GROUP and then use it when creating your Policy.
Faisal

Similar Messages

  • Extract data from Essbase with HAL to a flat file

    Hello,
    I would like to extract data from Essbase with HAL.
    I use the essbase Adapter with the "Extract Data" Method.
    All is working when I use only the essbase Adaptater and configure it.
    But I want to have a variable on my Years dimension.
    I have to fill "Years member port" and "Years Criteria port" of my essbase Adapter with variable.
    I have a "v_year" and "v_year_criteria" variable. I fill "v_year" with "FY05" but I don't know how fill my "v_year_criteria" variable.
    Thanks a lor for answer.

    If you are in 9.3x, I would recommend using the DATAEXPORT calc script function. You just FIX on what you want to export (including substitution variables), and then fill in some parameters and it writes a file. Look in the DBAG.

  • Table Header not Recurring - when Generate PDF from HTML with tables

    Hello,  It is not working as expected...and I'm not sure if the functionality is supported.  Want to create PDF from html document. The html document contains a html table that typically contains a large number of rows.  To make reading easier the html table used thead and tbody elements, and their children, so that when the table extends across pages when printed the header element recurs on each page.  However, the header element is not recurring in the generated PDF document (it only occurs in the first row of the table).  Just wondering if you have tried or used this functionality (created PDF from html with table with headers and the PDF included the table with recurring table header.  And if so, did you do anything special to make it work.  Thanks for any insight.

    If there's a problem with that package, I suggest you speak to the developer of that package and ask them to investigate.  It's not an Oracle supplied package so you are wrong to look for help here.

  • Problem to extract text from HTML document

    I have to extract some text from HTML file to my database. (about 1000 files)
    The HTML files are get from ACM Digital Library. http://portal.acm.org/dl.cfm
    The HTML page is about the information of a paper. I only want to get the text of "Title" "Abstract" "Classification" "Keywords"
    The Problem is that I can't find any patten to parser the html files"
    EX: I need to get the Classification = "Theory of Computation","ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY","Numerical Algorithms and Problem","Mathematics of Computing","NUMERICAL ANALYSIS"......etc .
    The section code about "Classification" is below.
    Please give any idea to do this, or how to find patten to extract text from this.
    <div class="indterms"><a href="#CIT"><img name="top" src=
    "img/arrowu.gif" hspace="10" border="0" /></a><span class=
    "heading"><a name="IndexTerms">INDEX TERMS</a></span>
    <p class="Categories"><span class="heading"><a name=
    "GenTerms">Primary Classification:</a></span><br />
    � <b>F.</b> <a href=
    "results.cfm?query=CCS%3AF%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Theory of Computation</a><br />
    � <img src="img/tree.gif" border="0" height="20" width=
    "20" /> <b>F.2</b> <a href=
    "results.cfm?query=CCS%3A%22F%2E2%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">ANALYSIS OF ALGORITHMS AND PROBLEM
    COMPLEXITY</a><br />
    � � � <img src="img/tree.gif" border="0" height=
    "20" width="20" /> <b>F.2.1</b> <a href=
    "results.cfm?query=CCS%3A%22F%2E2%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Numerical Algorithms and Problems</a><br />
    </p>
    <p class="Categories"><span class="heading"><a name=
    "GenTerms">Additional�Classification:</a></span><br />
    � <b>G.</b> <a href=
    "results.cfm?query=CCS%3AG%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Mathematics of Computing</a><br />
    � <img src="img/tree.gif" border="0" height="20" width=
    "20" /> <b>G.1</b> <a href=
    "results.cfm?query=CCS%3A%22G%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">NUMERICAL ANALYSIS</a><br />
    � � � <img src="img/tree.gif" border="0" height=
    "20" width="20" /> <b>G.1.6</b> <a href=
    "results.cfm?query=CCS%3A%22G%2E1%2E6%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Optimization</a><br />
    � � � � � <img src="img/tree.gif" border=
    "0" height="20" width="20" /> <b>Subjects:</b> <a href=
    "results.cfm?query=CCS%3A%22Linear%20programming%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Linear programming</a><br />
    </p>
    <br />
    <p class="GenTerms"><span class="heading"><a name=
    "GenTerms">General Terms:</a></span><br />
    <a href=
    "results.cfm?query=genterm%3A%22Algorithms%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Algorithms</a>, <a href=
    "results.cfm?query=genterm%3A%22Theory%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Theory</a></p>
    <br />
    <p class="keywords"><span class="heading"><a name=
    "Keywords">Keywords:</a></span><br />
    <a href=
    "results.cfm?query=keyword%3A%22Simplex%20method%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Simplex method</a>, <a href=
    "results.cfm?query=keyword%3A%22complexity%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">complexity</a>, <a href=
    "results.cfm?query=keyword%3A%22perturbation%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">perturbation</a>, <a href=
    "results.cfm?query=keyword%3A%22smoothed%20analysis%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">smoothed analysis</a></p>
    </div>

    One approach is to download Htmlparser from sourceforge
    http://htmlparser.sourceforge.net/ and write the rules to match title, abstract etc.
    Another approach is to write your own parser that extract only title, abstract etc.
    1. tokenize the html file. --> convert html into tokens (tag and value)
    2. write a simple parser to extract certain information
    find out about the pattern of text you want to extract. For instance "<class "abstract">.
    then writing a rule for extracting abstract such as
    if (tag is abstract ) then extract abstract text
    apply the same concept for other tags
    Attached is the sample parser that was used to extract title and abstract from acm html files. Please modify to include keyword and other fields.
    good luck
    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.util.ArrayList;
    import java.util.List;
    public class ACMHTMLParser
         private String m_filename;
         private URLLexicalAnalyzer lexical;
         List urls = new ArrayList();
         public ACMHTMLParser(String filename)
              super();
              m_filename = filename;
          * parses only title and abstract
         public void parse() throws Exception
              lexical = new URLLexicalAnalyzer(m_filename);
              String word = lexical.getNextWord();
              boolean isabstract = false;
              while (null != word)
                   if (isTag(word))
                        if (isTitle(word))
                             System.out.println("TITLE: " + lexical.getNextWord());
                        else if (isAbstract(word) && !isabstract)
                             parseAbstract();
                             isabstract = true;
                   word = lexical.getNextWord();
              lexical.close();
         public static void main(String[] args) throws Exception
              ACMHTMLParser parser = new ACMHTMLParser("./acm_html.html");
              parser.parse();
         public static boolean isTag(String word)
              return ( word.startsWith("<") && word.endsWith(">"));
         public static boolean isTitle(String word)
              return ( "<title>".equals(word));
         //please modify according to the html source
         public static boolean isAbstract(String word)
              return ( "<p class=\"abstract\">".equals(word));
         private void parseAbstract() throws Exception
              while (true)
                   String abs = lexical.getNextWord();
                   if (!isTag(abs))
                        System.out.println(abs);
                        break;
         class URLLexicalAnalyzer
           private BufferedReader m_reader;
           private boolean isTag;
           public URLLexicalAnalyzer(String filename)
              try
                m_reader = new BufferedReader(new FileReader(filename));
              catch (IOException io)
                System.out.println("ERROR, file not found " + filename);
                System.exit(1);
           public URLLexicalAnalyzer(InputStream in)
              m_reader = new BufferedReader(new InputStreamReader(in));
           public void close()
              try {
                if (null != m_reader) m_reader.close();
              catch (IOException ignored) {}
           public String getNextWord() throws IOException
              int c = m_reader.read();   
              if (-1 == c) return null; 
              if (Character.isWhitespace((char)c))
                return getNextWord();
              if ('<' == c || isTag)
                return scanTag(c);
              else
                   return scanValue(c);
           private String scanTag(final int c)
              throws IOException
              StringBuffer result = new StringBuffer();
              if ('<' != c) result.append('<');
              result.append((char)c);
              int ch = -1;
              while (true)
                ch = m_reader.read();
                if (-1 == ch) throw new IllegalArgumentException("un-terminate tag");
                if ('>' == ch)
                     isTag = false;
                     break;
                result.append((char)ch);
              result.append((char)ch);
              return result.toString();
           private String scanValue(final int c) throws IOException
                StringBuffer result = new StringBuffer();
                result.append((char)c);
                int ch = -1;
                while (true)
                   ch = m_reader.read();
                   if (-1 == ch) throw new IllegalArgumentException("un-terminate value");
                   if ('<' == ch)
                        isTag = true;
                        break;
                   result.append((char)ch);
                return result.toString();
    }

  • How to extract data from CRM with Data Service?

    Hi all,
                 I'll try to explain briefly the main issue; We need to read or extract data from SAP CRM with Data Service tool, then we'll show the data with webi, but now I would like to know if there are some standard tool as a rapid mart (i mean that it doesn't exists) but if some one knows how it works and if it's possible, it could be very helpful for us.
    BW option doesn't ok because the client needs a lot of queries for a report and the standard tool in our project is Data Service.
    Thanks in advance, and ask all u need if it can be a help for answer the question.
    Best Regards.
    Edited by: Cantabria on Jun 7, 2011 1:14 PM

    Hi,
    Your very familiar with LO data source even though your getting doubts on it.
    LBWE/SM13 belongs to only LO Data sources.
    CRM data sources also have delta settings based on data source.
    You can search on google about CRM data sources by using data source names.You may get some useful information.
    Before using CRM Data sources, CRM functional team need to set proper configurations at CRM system. those also you can easily get form google.
    use search term as " CRM - BW Adapter settings".
    Like other standard data sources for crm also,
    we need to activate from RAS5 and do the modification at RSA6.
    test at RSA3 and replicate into BW. Do the rest of the flow at bw system.
    Thanks

  • Extract article from HTML code

    Hi,
    I'm trying to build a search engine for an RSS feed. Thing is i'm trying to store every article as a BLOB field in the database. To optimize my search i'll need to extract the article only and nothing else (no unrelated hyperlinks or html code)
    I'm using the HTMLEditoKit of swing to get the html content without the code, but that's not enough. I need to clean the page of things like headers and footers (they affect the search results)

    i already parsed the XML in the RSS...i've got informa ; the problem is the article itself
    Take this article for example: [http://news.bbc.co.uk/2/hi/europe/7572635.stm] , you've got the article, but you've things all around it from link to different articles to headers, footers menus.
    All of this affects the search results dramatically so i just want the article itself.
    that's the greatest challenge.

  • Problems with extract metadata from Essbase with ODI 10.1.3.5.

    Hello there, I have a really annoying problem. I have 2 dimensions in essbase v11 with a space on it, (Itens Financeiros and Centros Financeiros), and this is causing a problem when I try to extract the metadata from essbace to a table. the problem occurs some times in 3 - loading - ss_0 - begin essbase metadata extract and some times in 4 - loading ss_0 - extract metadata, but in this 2 steps the error is the same:
    unknown member name [itens] in function [@idescendants]. I see in Johns blog someone sayng to add a '/"' in the front and at the end of the member filter criteria to allow ODI to parse the space as part of the member name. but I really don't understand this. Someone can helpme with this? Thanck you.

    It doesn't matter, I am able to replicate with an older version of the essbase adaptor...
    I take it this is the type of error you are receiving "com.hyperion.odi.essbase.ODIEssbaseException: Extract operation failed : Cannot query members by specs. Analytic Server Error(1200497): Error parsing formula for [REGION DEFINITION"
    If you download patch - Patch 8785893: ORACLE DATA INTEGRATOR 10.1.3.5.2_02 ONE-OFF PATCH
    It is easy to install just a matter of copying a jar file over an existing one in the \oracledi\drivers directory.
    It should resolve the issue.
    Ok?
    Cheers
    John
    http://john-goodwin.blogspot.com/                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

  • Extract URL from HTML text

    Suppose you have the following String that is body text with HTML.
    String bodyText = " My name is Blake. I live in New York City. See my image here: <img href="http://www.blake.com/blake.jpg"/> isn't my picture awesome? Tata for now!"
    I want to extract the URL that contains the location of the image in this bodyText. The ideal would be to create a function called public String extractor(String bodyText) to be used
    String imageURL = extractor(bodyText);
    //imageURL should be "http://www.blake.com/blake.jpg"
    My first thoughts are using reg exp, yet the place i would find to use that would using the .replace in String class. I am by no means an expert on reg exp so I haven't taken too much time to try to figure it out with reg exp. I obviously could do a linear search of the bodyText and do a ton of if statements, but thats just poor coding. I want see if anyone came across this or has insight to this problem.
    Thanks for all the help,
    Blake

    How would the regexp change if there were multiple img tags within the String.I don't rightly know, I'm just a raw beginner in regexes.
    Would this regexp return all the img URLs found in the String.No, as it stands it would return only the last URL. But this will:String bodyText = " My name is Blake. " +
          "I live in New York City. See my image here: " +
          "<img href=\"http://www.blake.com/blake.jpg\"/>" +
          " isn't my picture awesome? Here's another: " +
          "<img href='http://www.blake.com/Vandelay.jpg'/>" +
          " Tata for now!";
    String regex = "(?<=<img\\shref=[\"'])http://.*?(?=[\"']/?>)";
    Pattern pattern = Pattern.compile (regex);
    Matcher matcher = pattern.matcher (bodyText);
    while (matcher.find ()) {
       System.out.println (matcher.group ());
    }Note the enhancement that takes into account that both single and double quotes are legal in HTML. But unlike the earlier example, this does not tolerate more than one space between <img and href=, I couldn't find a way to achieve that.
    Visit this thread later, there are some real regex experts around who may post a better solution.
    db

  • Extracting info from HTML documents

    My program returns the HTML of any web page entered by the user. The HTML documents that are returned all contain pricing infomration that I want to extract. Any idea of the best way to search an HTML document for specific infomration I require. Seems like a huge task to split it all into tokens and searching for � sign!!!!!

    This a nightmare of a problem........... the html
    files that I am retrieving are huge. All I need from
    them are a couple of lines of information. How do I
    find the specific infomration I need???Load the entire file, search for it. You find the information in the same way like you'd do when ouy look for it in the file's source code.
    Is it possible from a java program to open the HTML
    file in web broweser, search, then return the info?
    The html files seem really complex to search on.How would this help?

  • Extracting Audio from Video With QuickTime Pro?

    I read on another discussion board that QuickTime Pro could record/extract audio only from a video file. So I purchased the QuickTime Pro upgrade today (for Windows XP) but can't seem to find out how to do it, assumimg that post was correct and it can be done at all.
    The Nero video editing software I use can get the video file in MPEG-2, MPEG-1, AVI and other formats. I tried MPEG-1, but even the conversion to QuickTime's .mov format came up with no audio at all. I noticed a post here saying that was a limitation of the software. I could purchase the QuickTime MPEG-2 add-on but that seems to only allow playback and doesn't look like it will help any.
    Another post on the other discussion board said that virtualdub.org has a software program that could do this extraction, but silly me I went with Apple and was delighted to read about the no refund policy just now. :-/
    I'm guessing another possible approach is some kind of hardware solution -- a recording device of some sort?

    In order to extract a track (video or audio) the file first must have those as independent tracks.
    MPEG 1 and 2 are "muxed" (video and audio in the same track) so QuickTime Pro has nothing to extract.
    MPEG Streamclip (free) can convert .mpg and .mpeg files to QuickTime formats.
    If you export from Nero as .avi you can also use QuickTime Pro to extract tracks as it will have at least two.
    MPEG 1 and 2 are playback formats and are not intended to be edited.

  • Web from html with Jdeveloper (Application Server)

    I configured OAS and I could access to my project (Jdeveloper) at the local.
    For instance ;
    when the client users use this "http:\\localhost:7777\srdemo" link, they can access the project. But I can't configure our server to web like this. What can I do for configure and run at the web my project.
    Note: I use Windows Server 2003

    The Connection refused error is caused because:
    1.     The database SID specified is incorrect.
    2.     The database instance has not been started.

  • Read values from html response

    Hi,
    I am trying to make a call to an API using UTL_HTTP POST method over SSL and read the response html page and extract the values from the reponse.
    I am able to call and get a response back in html format. I have stored the html response in a clob variable.
    Now i want to parse this html and extract values from the form input items and send them out through OUT parameters.
    For example, from below reponse i want to extract the value '1111d7nhcwse30wq' from 'I4GO_UNIQUEID'
    Can anyone help me with the code to parse this html response and extract the values.
    Any help is greatly appreciated.
    Thanks
    Sharath
    sample Code:
    PROCEDURE get_token (
    p_requesterreference IN VARCHAR2,
    p_cardnumber IN VARCHAR2,
    p_cardtype IN VARCHAR2,
    p_cardholdername IN VARCHAR2,
    p_expirationmonth IN VARCHAR2,
    p_expirationyear IN VARCHAR2,
    p_streetaddress IN VARCHAR2,
    p_postalcode IN VARCHAR2,
    p_cvv2code IN VARCHAR2,
    po_uniqueid OUT VARCHAR2,
    po_errorindicator OUT VARCHAR2,
    po_primaryerrorcode OUT VARCHAR2,
    po_response OUT VARCHAR2,
    po_status_code OUT VARCHAR2,
    po_reason_phrase OUT VARCHAR2
    IS
    v_url VARCHAR2 (200);
    v_url_params VARCHAR2 (32767);
    v_resp_str VARCHAR2 (32767);
    l_http_req UTL_HTTP.req;
    l_http_resp UTL_HTTP.resp;
    v_requesterreference VARCHAR2 (12) := p_requesterreference;
    v_i4go_cardnumber VARCHAR2 (32) := p_cardnumber;
    v_i4go_streetaddress VARCHAR2 (30) := p_streetaddress;
    v_i4go_postalcode VARCHAR2 (9) := p_postalcode;
    v_i4go_expirationmonth VARCHAR2 (2) := p_expirationmonth; -- MM format
    v_i4go_expirationyear VARCHAR2 (2) := p_expirationyear; -- yy format
    v_i4go_cvv2code VARCHAR2 (3) := p_cvv2code;
    v_name VARCHAR2 (256);
    v_value VARCHAR2 (1024);
    l_clob CLOB;
    pv_amp CONSTANT CHAR (1) := CHR (38);
    CURSOR setup_cur
    IS
    SELECT interface_id, interface_name, interface_url, account_id, site_id
    FROM rsv.shift4_setup
    WHERE interface_name = 'I4GO';
    v_setup_rec setup_cur%ROWTYPE;
    BEGIN
    OPEN setup_cur;
    FETCH setup_cur
    INTO v_setup_rec;
    CLOSE setup_cur;
    v_url := 'https://certify.i4go.com//index.cfm?fuseaction=account.PostCardEntry';
    v_url_params :=
    pv_amp
    || 'i4GO_AccountID='
    || v_setup_rec.account_id
    || pv_amp
    || 'i4Go_SiteID='
    || v_setup_rec.site_id
    || pv_amp
    || 'i4Go_CardNumber='
    || v_i4go_cardnumber
    || pv_amp
    || 'i4Go_ExpirationMonth='
    || v_i4go_expirationmonth
    || pv_amp
    || 'i4Go_ExpirationYear='
    || v_i4go_expirationyear
    || pv_amp
    || 'i4Go_CVV2Code='
    || v_i4go_cvv2code
    || pv_amp
    || 'i4Go_PostalCode='
    || v_i4go_postalcode;
    -- begin request using POST method
    UTL_HTTP.set_response_error_check (FALSE);
    UTL_HTTP.set_transfer_timeout (180);
    UTL_HTTP.set_wallet ('file:/etc/ORACLE/WALLETS/oracle', 'welcome1');
    l_http_req := UTL_HTTP.begin_request (v_url, 'POST');
    UTL_HTTP.set_header (l_http_req, 'User-Agent', 'Mozilla/4.0');
    UTL_HTTP.set_header (l_http_req, 'Content-Type', 'application/x-www-form-urlencoded');
    UTL_HTTP.set_header (l_http_req, 'content-length', LENGTH (v_url_params));
    UTL_HTTP.write_text (l_http_req, v_url_params);
    -- get response
    l_http_resp := UTL_HTTP.get_response (l_http_req);
    po_status_code := l_http_resp.status_code;
    po_reason_phrase := l_http_resp.reason_phrase;
    -- read response into a clob
    DBMS_LOB.createtemporary (l_clob, FALSE);
    BEGIN
    LOOP
    UTL_HTTP.read_text (l_http_resp, v_resp_str, 32767);
    DBMS_LOB.writeappend (l_clob, LENGTH (v_resp_str), v_resp_str);
    END LOOP;
    EXCEPTION
    WHEN UTL_HTTP.end_of_body
    THEN
    -- end response
    UTL_HTTP.end_response (l_http_resp);
    END;
    -- Fre resources
    DBMS_LOB.freetemporary (l_clob);
    EXCEPTION
    WHEN OTHERS
    THEN
    DBMS_LOB.freetemporary (l_clob);
    DBMS_OUTPUT.put_line (UTL_HTTP.get_detailed_sqlerrm);
    RAISE;
    END;
    sample response:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
         <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
         <title>Return With Payment Token</title>
         <script src="js/jquery-1.6.4.min.js" type="text/javascript"></script>
         <script type="text/javascript"><!--
              picSpinner= new Image(40,40);
              picSpinner.src="images/loading040.gif";
              bodyOnLoad = function() {
                   $("#noScriptDiv").hide();
                   $("#scriptDiv").show();
                   $("#i4GoMainForm").submit();
         //--></script>
    </head>
    <body onload="bodyOnLoad();">
         <form name="i4GoMainForm" id="i4GoMainForm" action="http://google.com" method="POST" onsubmit="$('#i4Go_submit').attr('disabled','disabled');">
                   <input name="I4GO_RESPONSE" type="hidden" value="SUCCESS" />
                   <input name="I4GO_RESPONSECODE" type="hidden" value="1" />
                   <input name="I4GO_CARDTYPE" type="hidden" value="VS" />
                   <input name="I4GO_UNIQUEID" type="hidden" value="1111d7nhcwse30wq" />
                   <input name="I4GO_EXPIRATIONMONTH" type="hidden" value="12" />
                   <input name="I4GO_EXPIRATIONYEAR" type="hidden" value="2012" />
                   <input name="I4GO_CARDHOLDERNAME" type="hidden" value="" />
                   <input name="I4GO_STREETADDRESS" type="hidden" value="" />
                   <input name="I4GO_POSTALCODE" type="hidden" value="65000" />
              <div id="scriptDiv" style="font-family:Arial, Helvetica, sans-serif;font-size:18px;visibility:hidden;">
                   <img src="images/loading040.gif" alt="Spinner..." />  Loading...
              </div>
              <div id="noScriptDiv" style="font-family:Arial, Helvetica, sans-serif;">
                   <noscript>
                                       <h1>Statement of Tokenization</h1>
                                       <p>The payment information you have submitted has been securely stored in the Shift4 PCI-DSS certified data center and a token representing this information will be sent to the merchant for processing. Below is the information that will be returning to the originating merchant:</p>
                                       <ul>
                                            <li>Response: <strong>SUCCESS</strong></li>
                                            <li>Response Code: <strong>1</strong></li>
                                            <li>Card Type: <strong>VS</strong></li>
                                            <li>Token: <strong>1111d7nhcwse30wq</strong></li>
                                       </ul>
                   </noscript>
    <input type="submit" name="i4Go_submit" id="i4Go_submit" value="Continue" />
              </div>
         </form>
    </body>
    </html>
    Edited by: sgudipat on Apr 24, 2012 1:20 PM

    Here is working example for your HTML using xpath to extract values from html
    You can store your html response in clob variable and then extract the value with xpath
    declare
       l_clob clob;
       l_value varchar2(100);
       l_xml xmltype;
      begin
         l_clob :='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
      <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
      <title>Return With Payment Token</title>
      <script src="js/jquery-1.6.4.min.js" type="text/javascript"></script>
      <script type="text/javascript"><!--
       picSpinner= new Image(40,40);
       picSpinner.src="images/loading040.gif";
       bodyOnLoad = function() {
       $("#noScriptDiv").hide();
       $("#scriptDiv").show();
       $("#i4GoMainForm").submit();
      //--></script>
       </head>
       <body onload="bodyOnLoad();">
       <form name="i4GoMainForm" id="i4GoMainForm" action="http://google.com" method="POST" onsubmit="$(''#i4Go_submit'').attr(''disabled'',''disabled'');">
       <input name="I4GO_RESPONSE" type="hidden" value="SUCCESS" />
       <input name="I4GO_RESPONSECODE" type="hidden" value="1" />
       <input name="I4GO_CARDTYPE" type="hidden" value="VS" />
       <input name="I4GO_UNIQUEID" type="hidden" value="1111d7nhcwse30wq" />
       <input name="I4GO_EXPIRATIONMONTH" type="hidden" value="12" />
       <input name="I4GO_EXPIRATIONYEAR" type="hidden" value="2012" />
       <input name="I4GO_CARDHOLDERNAME" type="hidden" value="" />
       <input name="I4GO_STREETADDRESS" type="hidden" value="" />
       <input name="I4GO_POSTALCODE" type="hidden" value="65000" />
      <img src="images/loading040.gif" alt="Spinner..." />  Loading...
       <noscript>
       Statement of Tokenization
       The payment information you have submitted has been securely stored in the Shift4 PCI-DSS certified data center and a token representing this information will be sent to the merchant for processing. Below is the information that will be returning to the originating merchant:
           Response: SUCCESS
           Response Code: 1
           Card Type: VS
           Token: 1111d7nhcwse30wq
       </noscript>
       <input type="submit" name="i4Go_submit" id="i4Go_submit" value="Continue" />
       </form>
       </body>
       </html>';
         execute immediate 'alter session set events =''31156 trace name context forever, level 2''';
         l_xml := xmltype(l_clob);
         execute immediate 'alter session set events =''31156 trace name context off''';
         select extractvalue( l_xml
                            , '/html/body/form/input[@name="I4GO_CARDTYPE"]/@value'
                            , 'xmlns="http://www.w3.org/1999/xhtml"' )
         into l_value
         from dual;
         dbms_output.put_line(l_value);
       end;
    Problem when parsing html with xpath and xmltype
    Edited by: peterv6i.blogspot.com on Apr 26, 2012 9:38 AM

  • How extract properties from document saved in blob.

    I need extract all document properties from document (any file type) saved in blob.
    I've tried three ways...
    - I've tried using relational ORDDoc.getProperties() by example. But this method returned exception ORDSYS.ORDDOCEXCEPTIONS.DOC_PLUGIN_EXCEPTION for every blob.
    - I can extract some properties with ctxhx.exe and key Meta. But this method is very expensive: I have to 1)save blob to file, 2)convert binary file to html-file, and 3)extract properties from html <meta> and <title> tags.
    - I can extract all properties with COM Automation. But this way need to install a lot of applications for every document type. It's impossible.
    Has anybody did same task yearly? Help me oracle guru!

    ORDDOc getproperties will attempt to see if the media is a video, audio or image and set the appropriate properties.
    I am gussing you have a word style document or something like that?
    Can you tell me about your application adn what you are trying to do?
    Larry

  • How to extract data from XML file with JavaScript

    HI All
    I am new to this group.
    Can anybody help me regarding XML.
    I want to know How to extract data from XML file with JavaScript.
    And also how to use API for XML
    regards
    Nagaraju

    This is a Java forum.
    JavaScript is something entirely different than Java, even though the names are similar.
    Try another website with forums about JavaScript.
    For example here: http://www.webdeveloper.com/forum/forumdisplay.php?s=&forumid=3

  • Applescript or workflow to extract text from PDF and rename PDF with the results

    Hi Everyone,
    I get supplied hundreds of PDFs which each contain a stock code, but the PDFs themselves are not named consistantly, or they are supplied as multi-page PDFs.
    What I need to do is name each PDF with the code which is in the text on the PDF.
    It would work like this in an ideal world:
    1. Split PDF into single pages
    2. Extract text from PDF
    3. Rename PDF using the extracted text
    I'm struggling with part 3!
    I can get a textfile with just the code (using a call to BBEDIT I'm extracting the code)
    I did think about using a variable for the name, but the rename functions doesn't let me use variables.

    Hello
    You may also try the following applescript script, which is a wrapper of rubycocoa script. It will ask you choose source pdf files and destination directory. Then it will scan text of each page of pdf files for the predefined pattern and save the page as new pdf file with the name as extracted by the pattern in the destination directory. Those pages which do not contain string matching the pattern are ignored. (Ignored pages, if any, are reported in the result of script.)
    Currently the regex pattern is set to:
    /HB-.._[0-9]{6}/
    which means HB- followed by two characters and _ and 6 digits.
    Minimally tested under 10.6.8.
    Hope this may help,
    H
    _main()
    on _main()
        script o
            property aa : choose file with prompt ("Choose pdf files.") of type {"com.adobe.pdf"} ¬
                default location (path to desktop) with multiple selections allowed
            set my aa's beginning to choose folder with prompt ("Choose destination folder.") ¬
                default location (path to desktop)
            set args to ""
            repeat with a in my aa
                set args to args & a's POSIX path's quoted form & space
            end repeat
            considering numeric strings
                if (system info)'s system version < "10.9" then
                    set ruby to "/usr/bin/ruby"
                else
                    set ruby to "/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby"
                end if
            end considering
            do shell script ruby & " <<'EOF' - " & args & "
    require 'osx/cocoa'
    include OSX
    require_framework 'PDFKit'
    outdir = ARGV.shift.chomp('/')
    ARGV.select {|f| f =~ /\\.pdf$/i }.each do |f|
        url = NSURL.fileURLWithPath(f)
        doc = PDFDocument.alloc.initWithURL(url)
        path = doc.documentURL.path
        pcnt = doc.pageCount
        (0 .. (pcnt - 1)).each do |i|
            page = doc.pageAtIndex(i)
            page.string.to_s =~ /HB-.._[0-9]{6}/
            name = $&
            unless name
                puts \"no matching string in page #{i + 1} of #{path}\"
                next # ignore this page
            end
            doc1 = PDFDocument.alloc.initWithData(page.dataRepresentation) # doc for this page
            unless doc1.writeToFile(\"#{outdir}/#{name}.pdf\")
                puts \"failed to save page #{i + 1} of #{path}\"
            end
        end
    end
    EOF"
        end script
        tell o to run
    end _main

Maybe you are looking for