Extracting info from HTML documents

My program returns the HTML of any web page entered by the user. The HTML documents that are returned all contain pricing infomration that I want to extract. Any idea of the best way to search an HTML document for specific infomration I require. Seems like a huge task to split it all into tokens and searching for � sign!!!!!

This a nightmare of a problem........... the html
files that I am retrieving are huge. All I need from
them are a couple of lines of information. How do I
find the specific infomration I need???Load the entire file, search for it. You find the information in the same way like you'd do when ouy look for it in the file's source code.
Is it possible from a java program to open the HTML
file in web broweser, search, then return the info?
The html files seem really complex to search on.How would this help?

Similar Messages

  • Problem to extract text from HTML document

    I have to extract some text from HTML file to my database. (about 1000 files)
    The HTML files are get from ACM Digital Library. http://portal.acm.org/dl.cfm
    The HTML page is about the information of a paper. I only want to get the text of "Title" "Abstract" "Classification" "Keywords"
    The Problem is that I can't find any patten to parser the html files"
    EX: I need to get the Classification = "Theory of Computation","ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY","Numerical Algorithms and Problem","Mathematics of Computing","NUMERICAL ANALYSIS"......etc .
    The section code about "Classification" is below.
    Please give any idea to do this, or how to find patten to extract text from this.
    <div class="indterms"><a href="#CIT"><img name="top" src=
    "img/arrowu.gif" hspace="10" border="0" /></a><span class=
    "heading"><a name="IndexTerms">INDEX TERMS</a></span>
    <p class="Categories"><span class="heading"><a name=
    "GenTerms">Primary Classification:</a></span><br />
    � <b>F.</b> <a href=
    "results.cfm?query=CCS%3AF%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Theory of Computation</a><br />
    � <img src="img/tree.gif" border="0" height="20" width=
    "20" /> <b>F.2</b> <a href=
    "results.cfm?query=CCS%3A%22F%2E2%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">ANALYSIS OF ALGORITHMS AND PROBLEM
    COMPLEXITY</a><br />
    � � � <img src="img/tree.gif" border="0" height=
    "20" width="20" /> <b>F.2.1</b> <a href=
    "results.cfm?query=CCS%3A%22F%2E2%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Numerical Algorithms and Problems</a><br />
    </p>
    <p class="Categories"><span class="heading"><a name=
    "GenTerms">Additional�Classification:</a></span><br />
    � <b>G.</b> <a href=
    "results.cfm?query=CCS%3AG%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Mathematics of Computing</a><br />
    � <img src="img/tree.gif" border="0" height="20" width=
    "20" /> <b>G.1</b> <a href=
    "results.cfm?query=CCS%3A%22G%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">NUMERICAL ANALYSIS</a><br />
    � � � <img src="img/tree.gif" border="0" height=
    "20" width="20" /> <b>G.1.6</b> <a href=
    "results.cfm?query=CCS%3A%22G%2E1%2E6%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Optimization</a><br />
    � � � � � <img src="img/tree.gif" border=
    "0" height="20" width="20" /> <b>Subjects:</b> <a href=
    "results.cfm?query=CCS%3A%22Linear%20programming%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Linear programming</a><br />
    </p>
    <br />
    <p class="GenTerms"><span class="heading"><a name=
    "GenTerms">General Terms:</a></span><br />
    <a href=
    "results.cfm?query=genterm%3A%22Algorithms%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Algorithms</a>, <a href=
    "results.cfm?query=genterm%3A%22Theory%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Theory</a></p>
    <br />
    <p class="keywords"><span class="heading"><a name=
    "Keywords">Keywords:</a></span><br />
    <a href=
    "results.cfm?query=keyword%3A%22Simplex%20method%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Simplex method</a>, <a href=
    "results.cfm?query=keyword%3A%22complexity%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">complexity</a>, <a href=
    "results.cfm?query=keyword%3A%22perturbation%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">perturbation</a>, <a href=
    "results.cfm?query=keyword%3A%22smoothed%20analysis%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">smoothed analysis</a></p>
    </div>

    One approach is to download Htmlparser from sourceforge
    http://htmlparser.sourceforge.net/ and write the rules to match title, abstract etc.
    Another approach is to write your own parser that extract only title, abstract etc.
    1. tokenize the html file. --> convert html into tokens (tag and value)
    2. write a simple parser to extract certain information
    find out about the pattern of text you want to extract. For instance "<class "abstract">.
    then writing a rule for extracting abstract such as
    if (tag is abstract ) then extract abstract text
    apply the same concept for other tags
    Attached is the sample parser that was used to extract title and abstract from acm html files. Please modify to include keyword and other fields.
    good luck
    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.util.ArrayList;
    import java.util.List;
    public class ACMHTMLParser
         private String m_filename;
         private URLLexicalAnalyzer lexical;
         List urls = new ArrayList();
         public ACMHTMLParser(String filename)
              super();
              m_filename = filename;
          * parses only title and abstract
         public void parse() throws Exception
              lexical = new URLLexicalAnalyzer(m_filename);
              String word = lexical.getNextWord();
              boolean isabstract = false;
              while (null != word)
                   if (isTag(word))
                        if (isTitle(word))
                             System.out.println("TITLE: " + lexical.getNextWord());
                        else if (isAbstract(word) && !isabstract)
                             parseAbstract();
                             isabstract = true;
                   word = lexical.getNextWord();
              lexical.close();
         public static void main(String[] args) throws Exception
              ACMHTMLParser parser = new ACMHTMLParser("./acm_html.html");
              parser.parse();
         public static boolean isTag(String word)
              return ( word.startsWith("<") && word.endsWith(">"));
         public static boolean isTitle(String word)
              return ( "<title>".equals(word));
         //please modify according to the html source
         public static boolean isAbstract(String word)
              return ( "<p class=\"abstract\">".equals(word));
         private void parseAbstract() throws Exception
              while (true)
                   String abs = lexical.getNextWord();
                   if (!isTag(abs))
                        System.out.println(abs);
                        break;
         class URLLexicalAnalyzer
           private BufferedReader m_reader;
           private boolean isTag;
           public URLLexicalAnalyzer(String filename)
              try
                m_reader = new BufferedReader(new FileReader(filename));
              catch (IOException io)
                System.out.println("ERROR, file not found " + filename);
                System.exit(1);
           public URLLexicalAnalyzer(InputStream in)
              m_reader = new BufferedReader(new InputStreamReader(in));
           public void close()
              try {
                if (null != m_reader) m_reader.close();
              catch (IOException ignored) {}
           public String getNextWord() throws IOException
              int c = m_reader.read();   
              if (-1 == c) return null; 
              if (Character.isWhitespace((char)c))
                return getNextWord();
              if ('<' == c || isTag)
                return scanTag(c);
              else
                   return scanValue(c);
           private String scanTag(final int c)
              throws IOException
              StringBuffer result = new StringBuffer();
              if ('<' != c) result.append('<');
              result.append((char)c);
              int ch = -1;
              while (true)
                ch = m_reader.read();
                if (-1 == ch) throw new IllegalArgumentException("un-terminate tag");
                if ('>' == ch)
                     isTag = false;
                     break;
                result.append((char)ch);
              result.append((char)ch);
              return result.toString();
           private String scanValue(final int c) throws IOException
                StringBuffer result = new StringBuffer();
                result.append((char)c);
                int ch = -1;
                while (true)
                   ch = m_reader.read();
                   if (-1 == ch) throw new IllegalArgumentException("un-terminate value");
                   if ('<' == ch)
                        isTag = true;
                        break;
                   result.append((char)ch);
                return result.toString();
    }

  • Remarks info from marketing document to journal

    Hi All
    I need a solution for displaying the remarks info from marketing documents in the journal entry.
    Regards
    Bongani

    Hi,
    To achieve this you have  to use SP , You may try this:
    -- FOR SALES A/R INVOICE JE
    IF @object_type = '15' AND (@transaction_type = 'A' OR @transaction_type = 'U')
    BEGIN
         UPDATE OJDT
         SET U_BD_Remarks = (SELECT Comments FROM ODLN WHERE DocEntry = @list_of_cols_val_tab_del)
         WHERE TransID = (SELECT TransId FROM ODLN WHERE DocEntry = @list_of_cols_val_tab_del)     
    END
    U_BD_Remarks, UDF in my case.
    For every other document you have to change the code accordingly.
    Thanks
    Ashutosh T

  • Extracting info from a web page

    Hi,
         I m not sure if i m asking this question at the right forum.
    Can anyone tell me if there is a way to extract data from a web page.
    This means, say for example a web site Yahoo displays stock quotes
    updated or NASDAQ values almost in real time.
    Now if i want to get that information from the web page into one
    of my applications ,say, something that uses that data. Is there
    a way to do it?
    Just curious

    Yes, it's possible. You can use the java.net.URL object to connect to websites and download the html. Doing the coding is not that easy, and you should also be mindful of not redistributing data you've gotten from another site without permission

  • Merge option during assembly of PDF from html documents.

    Hello,
    Can LiveCycle create a combined PDF document by converting
    HTML documents to PDF with the option of merging them (eliminate whitespace) during ddx assembly.
    Here is a simple case. Combine three html documents such as
    html-1: contains text ONE
    html-2: contains text TWO
    html-3: contains text THREE
    The default assembled document appears to have three pages with each having a its single word text of content. However, a combined document with one page containing the merged text is desired in some cases.
    Does LiveCycle handle this case. Thanks for any insight.
    Jesse

    Assembler will only deal with PDFs. PDF/G will take non PDF content and make PDF out of it. So in your case you would use PDF/G to change the HTML to PDF then use Assembler to manipulte the three docs into a single doc.
    Hope that helps

  • Extract textdata from HTML with AUTO_FILTER

    Hi,
    we're using Oracle AUTO_FILTER to extract text-information from DOC and PDF - Documents.
    Works fine.
    We also have data stored within a HTML structure.
    We use our filter with the following options:
    ctx_ddl.create_preference('SEARCH_iMT_ATTRIB_AFILTER', 'AUTO_FILTER');
    ctx_ddl.create_policy('SE_IMT_POLICY', 'SEARCH_iMT_ATTRIB_AFILTER');
    The filter itself is called within a loop:
    CTX_DOC.Policy_Filter('SE_IMT_POLICY', v_blobtab(i), v_ctmp, TRUE);
    It seems as if AUTO_FILTER converts our HTML to HTML again.
    When trying to insert the AUTO_FILTER, I get an ora-31167: XML nodes over 64K in size cannot be inserted
    How can I force the AUTO_FILTER only to return plain-text?
    Thanks in advance
    Message was edited by:
    user557708

    You need to create a section group preference employing HTML_SECTION_GROUP and then use it when creating your Policy.
    Faisal

  • Extracting info from webpages

    I am trying to create a hotel program that has various features including finding cheapest hotel prices on the web. My program searces the web and returns the approriate web page results in html format. My problem is I'm not sure the best way to extract the information I want. Below is an example of a web page (I know it's long but if you copy it into a web browser, it does work, honest!). From this page I want to extract the hotels name and prices.
    http://www.bookings.org/searchresults.html?class_interval=2&country=gb&error_url=http%3A%2F%2Fwww.bookings.org%2Fcountry%2Fgb.html%3F&search_by=city&city=-2595386&region=Avon+Aberdeenshire&class_key=1&class=0&do_availability_check=on&checkin_monthday=21&checkin_year_month=2005-6&checkout_monthday=22&checkout_year_month=2005-6&newlangurl=%2Fcountry%2Fgb.en.html&x=77&y=14
    At present I can return the HTML of this page. However can anyone suggest how I go about extracting the specific info i require. The html file is huge!
    Regards
    Ross

    Sorry pasted the wrong url. This one should work.
    http://www.bookings.org/searchresults.html?class_interval=2&country=gb&error_url=http%3A%2F%2Fwww.bookings.org%2Fcountry%2Fgb.html%3F&search_by=city&city=-2595386&region=Avon+Aberdeenshire&class_key=1&class=0&do_availability_check=on&checkin_monthday=24&checkin_year_month=2005-3&checkout_monthday=27&checkout_year_month=2005-3&newlangurl=%2Fcountry%2Fgb.en.html&x=88&y=5

  • Find Info from HTML file

    I am trying to develop a program to read URLS and extract specific content from the source of the URLS. So far my program
    Returns the HTML of a URL and writes the HTML to a file called Results.txt.
    I now need to write a program that opens up this Results file and extracts the info that appears after certain tags. Some of these files are rather large to say the least and parsing HTML files is no simple task compared to files separated by simple white space.
    Can anyone advise how I can search an HTML file for A particular tag. Is tokenisaing the file the answer? If so How can I define a token since HTML does not separate tokens by white spaces always.
    Thanks for your help
    Ross

    Well ok I agree with you in what you say however I have designed my final year at uni project for parsing HTML and that's what Im commited to doing now. In hindsight I would have done things differently.
    I am having difficulty knowing how to parse the HTML tho. Basically to look at, it's not nice at all. For example the HTML below how would I extract the info after the words "Double Rooms from" ?
    </td></tr>     <tr><td colspan="2"><hr size="1"/>
         <font size="3"><b>Orwell Lodge Hotel, Dalry</b></font>
    (2.6 miles / 3.6 km from the centre of Sighthill)
    </td></tr>
         <tr><td><img src="http://www.activehotels.com/photos/218697/AAB218697.jpg" border="0" width="96" height="72" alt="hotel" /></td>
         <td><font size="2">Single rooms from: &pound;40.00, Double rooms from: &pound;40.00</font>     
         <p /><font size="3"><b>For more details and online booking click here.</b></font>
         <p /><font size="2">Hotel details in other languages:
         <a href="http://www.orwelllodgehotel.activehotels.com/KNW&LANGUAGE=fr&subid=                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

  • Extract article from HTML code

    Hi,
    I'm trying to build a search engine for an RSS feed. Thing is i'm trying to store every article as a BLOB field in the database. To optimize my search i'll need to extract the article only and nothing else (no unrelated hyperlinks or html code)
    I'm using the HTMLEditoKit of swing to get the html content without the code, but that's not enough. I need to clean the page of things like headers and footers (they affect the search results)

    i already parsed the XML in the RSS...i've got informa ; the problem is the article itself
    Take this article for example: [http://news.bbc.co.uk/2/hi/europe/7572635.stm] , you've got the article, but you've things all around it from link to different articles to headers, footers menus.
    All of this affects the search results dramatically so i just want the article itself.
    that's the greatest challenge.

  • Extracting Values from XML-Document in pl/sql

    Hello!
    I need to extract the content of the following extract:
    <ns1:OXERPGetArticlesResponse xmlns:ns1="OXERPService">
    <ns1:OXERPGetArticlesResult>
    <ns1:OXERPType>
    <ns1:aResult>
    <ns1:ArrayOfString>
    <ns1:string>OXID</ns1:string>
    <ns1:string>531f91d4ab8bfb24c4d04e473d246d0b</ns1:string>
    </ns1:ArrayOfString>
    <ns1:ArrayOfString>
    <ns1:string>OXARTNUM</ns1:string>
    <ns1:string>0601-85-069</ns1:string>
    </ns1:ArrayOfString>
    <ns1:ArrayOfString>
    <ns1:string>OXPRICE</ns1:string>
    <ns1:string>100.5</ns1:string>
    </ns1:ArrayOfString>
    </ns1:aResult>
    <ns1:blResult>true</ns1:blResult>
    <ns1:sMessage/>
    </ns1:OXERPType>
    <ns1:OXERPType>
    <ns1:aResult>
    <ns1:ArrayOfString>
    <ns1:string>OXID</ns1:string>
    <ns1:string>531a8af7d9a9a5bb53b65a2b9a5356e5</ns1:string>
    </ns1:ArrayOfString>
    <ns1:ArrayOfString>
    <ns1:string>OXARTNUM</ns1:string>
    <ns1:string>0601-85-069-1</ns1:string>
    </ns1:ArrayOfString>
    <ns1:ArrayOfString>
    <ns1:string>OXPRICE</ns1:string>
    <ns1:string>89.9</ns1:string>
    </ns1:ArrayOfString>
    </ns1:aResult>
    <ns1:blResult>true</ns1:blResult>
    <ns1:sMessage/>
    </ns1:OXERPType>
    </ns1:OXERPGetArticlesResult>
    </ns1:OXERPGetArticlesResponse>
    The output should be:
    OXID OXARTNUM OXPRICE
    531f91d4ab8bfb24c4d04e473d246d0b 0601-85-069 100.5
    531a8af7d9a9a5bb53b65a2b9a5356e5 0601-85-069-1 89.9
    The count of rows and columns is variable.
    I want to do this by using xmltype.extract but I found no way to create a loop over the content of the xml document.
    Hopefully someone can help me!
    Regards
    Herbert

    OK, then you should be able to use something like :
    SQL> var xmldoc clob;
    SQL> begin
      2   :xmldoc := '<ns1:OXERPGetArticlesResponse xmlns:ns1="OXERPService">
      3  <ns1:OXERPGetArticlesResult>
      4  <ns1:OXERPType>
      5  <ns1:aResult>
      6  <ns1:ArrayOfString>
      7  <ns1:string>OXID</ns1:string>
      8  <ns1:string>531f91d4ab8bfb24c4d04e473d246d0b</ns1:string>
      9  </ns1:ArrayOfString>
    10  <ns1:ArrayOfString>
    11  <ns1:string>OXARTNUM</ns1:string>
    12  <ns1:string>0601-85-069</ns1:string>
    13  </ns1:ArrayOfString>
    14  <ns1:ArrayOfString>
    15  <ns1:string>OXPRICE</ns1:string>
    16  <ns1:string>100.5</ns1:string>
    17  </ns1:ArrayOfString>
    18  </ns1:aResult>
    19  <ns1:blResult>true</ns1:blResult>
    20  <ns1:sMessage/>
    21  </ns1:OXERPType>
    22  <ns1:OXERPType>
    23  <ns1:aResult>
    24  <ns1:ArrayOfString>
    25  <ns1:string>OXID</ns1:string>
    26  <ns1:string>531a8af7d9a9a5bb53b65a2b9a5356e5</ns1:string>
    27  </ns1:ArrayOfString>
    28  <ns1:ArrayOfString>
    29  <ns1:string>OXARTNUM</ns1:string>
    30  <ns1:string>0601-85-069-1</ns1:string>
    31  </ns1:ArrayOfString>
    32  <ns1:ArrayOfString>
    33  <ns1:string>OXPRICE</ns1:string>
    34  <ns1:string>89.9</ns1:string>
    35  </ns1:ArrayOfString>
    36  </ns1:aResult>
    37  <ns1:blResult>true</ns1:blResult>
    38  <ns1:sMessage/>
    39  </ns1:OXERPType>
    40  </ns1:OXERPGetArticlesResult>
    41  </ns1:OXERPGetArticlesResponse>';
    42  end;
    43  /
    Procédure PL/SQL terminée avec succès.
    SQL> SELECT x1.rec_id
      2       , x2.col_name
      3       , x2.col_value
      4  FROM XMLTable(
      5        XMLNamespaces('OXERPService' as "ns1"),
      6        '/ns1:OXERPGetArticlesResponse/ns1:OXERPGetArticlesResult/ns1:OXERPType/ns1:aResult'
      7        passing xmltype(:xmldoc)
      8        columns rec_id for ordinality
      9              , rec_xml xmltype path 'ns1:ArrayOfString'
    10       ) x1,
    11       XMLTable(
    12        XMLNamespaces('OXERPService' as "ns1"),'/ns1:ArrayOfString'
    13        passing x1.rec_xml
    14        columns col_name  varchar2(30) path 'ns1:string[1]'
    15              , col_value varchar2(30) path 'ns1:string[2]'
    16       ) x2
    17  ;
        REC_ID COL_NAME                       COL_VALUE
             1 OXID                           531f91d4ab8bfb24c4d04e473d246d
             1 OXARTNUM                       0601-85-069
             1 OXPRICE                        100.5
             2 OXID                           531a8af7d9a9a5bb53b65a2b9a5356
             2 OXARTNUM                       0601-85-069-1
             2 OXPRICE                        89.9
    6 ligne(s) sélectionnée(s).You mentioned that the number of column(s) is not known in advance. That's gonna be a problem to present the data column-wise.
    Version 11g has the PIVOT feature, but still you have to know how many columns there will be in the result set.
    How are you going to use the data after extraction?
    Maybe we could advise some other techniques more relevant for your requirement.

  • How to extract elements from a document

    I'm new to Java and I'm using JDOM in a JSP page. I've a document like this:
    <OFFERTA>
    <FACOLTA idFacolta="F1">
    <CORSO value="xxx"/>
    <CORSO value="yyy"/>
    </FACOLTA>
    <FACOLTA idFacolta="F2">
    <CORSO value="zzz"/>
    </FACOLTA>
    <FACOLTA idFacolta="F3">
    </FACOLTA>
    <FACOLTA idFacolta="F4">
    </FACOLTA>
    </OFFERTA>
    I'd like to get a document with the same structure but with the only FACOLTA elements that match a requested value for the idFacolta attribute.
    For example, if the requested idFacolta is "F2", I'd like to get:
    <OFFERTA>
    <FACOLTA idFacolta="F2">
    <CORSO value="zzz"/>
    </FACOLTA>
    </OFFERTA>
    I check in a loop if the element matches the idFacolta requested but when I find and try to delete it I get:
    java.util.ConcurrentModificationException
         at org.jdom.ContentList$FilterListIterator.checkConcurrentModification(ContentList.java:1230)
         at org.jdom.ContentList$FilterListIterator.hasNext(ContentList.java:942)
    Here's my code:
    Document doc = builder.build(file);
    Element root = doc.getRootElement();
    List children = root.getChildren();
    Iterator facoltaIterator = children.iterator();
    // Check if there's a request
    if (id!=null) {
         while (facoltaIterator.hasNext()) {
              Element facolta = (Element) facoltaIterator.next();
              if (!facolta.getAttribute("idFacolta").getValue().equalsIgnoreCase(id)) {
                   // Delete
                   root.removeContent(facolta);
    All seems to be right, except the line
    root.removeContent(facolta);
    So, I tried to make a new document for the results, adding the element facolta:
    Document doc = builder.build(file);
    Element root = doc.getRootElement();
    // Create a new document with the same root
    Element rootRisultati = new Element("OFFERTA");
    Document docRisultati = new Document(rootRisultati);
    List children = root.getChildren();
    Iterator facoltaIterator = children.iterator();
    if (id!=null) {
         while (facoltaIterator.hasNext()) {
              Element facolta = (Element) facoltaIterator.next();
              if (facolta.getAttribute("idFacolta").getValue().equalsIgnoreCase(id)) {
                   // Add the element to the new document
                   rootRisultati.addContent(facolta);
    but this causes a different exception:
    org.jdom.IllegalAddException: The element already has an existing parent "OFFERTA"
         at org.jdom.ContentList.add(ContentList.java:190)
         at org.jdom.ContentList.add(ContentList.java:146)
         at java.util.AbstractList.add(AbstractList.java:84)
         at org.jdom.Element.addContent(Element.java:1062)
    I'm not able to find a solution.
    There's a commonly used procedure to select elements maintaing the structure of the document?
    Can anyone help please? Thanks in advance.
    Stefano

    maybe you could try posting this in a JDOM forum? JDOM is not a standard XML tool from Sun / for Java... therefore out of the scope of this forum!

  • Extract info from RFH2 header in MQ message

    Hi All,
    I'm trying to send a MQ message with RFH header. Purpose is that JMS adapter extracts user data from RFH Header.
    A few questions:
    1.Format RFH2 header:
    -In the MQMD I assign the format: MQFMT_RF_HEADER_2
    -Apart from the mandatory fields in the RFH2 header I write user data to it as well and it looks like this:
    <usr><msgType dt="string">data</msgType></usr>    Is this sufficient or should there be an additional surrounding tag like <MQRFH2>?
    2.In PI, I use a JMS Adapter. On the Module-tab I have added AF_Modules/DynamicConfigurationBean (type: Local Enterprise Bean, Module Key: RFHHEADER), after ConvertBinaryToXMBMessage  and before CallSapAdapter.
    Under Module Configuration I added 2 entries:
    RFHHEADER     key.0     read http://sap.com/xi/XI/System/JMS DCJMSMessageProperty0
    RFHHEADER     value.0     msgType
    I want to extract the value of msgType and use it.
    On the Parameter-tab, I suppose I have to add Adapter-specific message attributes, but it is not exactly clear what I should put there.
    3.Dynamic Configuration Bean
    Is there anything I should do to activate that Dynamic Configuration Bean, because when I sent I message, I don't see anything in the monitor of the processed XML messages, not even an error.
    Kind Regards
    Edmond Paulussen

    Edmond Paulussen wrote:>
    > Hi All,
    >
    > I'm trying to send a MQ message with RFH header. Purpose is that JMS adapter extracts user data from RFH Header.
    >
    > A few questions:
    > 1.Format RFH2 header:
    > -In the MQMD I assign the format: MQFMT_RF_HEADER_2
    > -Apart from the mandatory fields in the RFH2 header I write user data to it as well and it looks like this:
    > <usr><msgType dt="string">data</msgType></usr>  
    > On the Parameter-tab, I suppose I have to add Adapter-specific message attributes, but it is not exactly clear what I should put there.
    The list of additional parameters are stored in the ASMA field.
    So you enter here the header value "msgType" with type String
    The value "data" is stored in DCJMSMessageProperty0
    You do not need the DynamicConfigurationBean in this scenario.

  • Extract URL from HTML text

    Suppose you have the following String that is body text with HTML.
    String bodyText = " My name is Blake. I live in New York City. See my image here: <img href="http://www.blake.com/blake.jpg"/> isn't my picture awesome? Tata for now!"
    I want to extract the URL that contains the location of the image in this bodyText. The ideal would be to create a function called public String extractor(String bodyText) to be used
    String imageURL = extractor(bodyText);
    //imageURL should be "http://www.blake.com/blake.jpg"
    My first thoughts are using reg exp, yet the place i would find to use that would using the .replace in String class. I am by no means an expert on reg exp so I haven't taken too much time to try to figure it out with reg exp. I obviously could do a linear search of the bodyText and do a ton of if statements, but thats just poor coding. I want see if anyone came across this or has insight to this problem.
    Thanks for all the help,
    Blake

    How would the regexp change if there were multiple img tags within the String.I don't rightly know, I'm just a raw beginner in regexes.
    Would this regexp return all the img URLs found in the String.No, as it stands it would return only the last URL. But this will:String bodyText = " My name is Blake. " +
          "I live in New York City. See my image here: " +
          "<img href=\"http://www.blake.com/blake.jpg\"/>" +
          " isn't my picture awesome? Here's another: " +
          "<img href='http://www.blake.com/Vandelay.jpg'/>" +
          " Tata for now!";
    String regex = "(?<=<img\\shref=[\"'])http://.*?(?=[\"']/?>)";
    Pattern pattern = Pattern.compile (regex);
    Matcher matcher = pattern.matcher (bodyText);
    while (matcher.find ()) {
       System.out.println (matcher.group ());
    }Note the enhancement that takes into account that both single and double quotes are legal in HTML. But unlike the earlier example, this does not tolerate more than one space between <img and href=, I couldn't find a way to achieve that.
    Visit this thread later, there are some real regex experts around who may post a better solution.
    db

  • Extract element from PDF document from automatized process

    Hi
    I have never worked on PDF document and I am looking a solution for :
    extracting elements (simply text in a first time) in a PDF document in paragraph and/or in table
    after this I could manipulated them in another processing
    Have you any idea or information about my need (what should be the best way for doing this ?)
    SDK package : possible for doing this, if yes : which one ?
    another solution than SDK package
    PDF version supported : latest ones ?
    any advice about best developpement language : java (I prefer), or other ?
    Thanks for all you advices !!!
    Lst

    Well, if you want Java - then Adobe only has server-side options for you. We don't offer desktop Java APIs.  Our server-side options are part of the Adobe LiveCycle family of products.
    For client-side, we have the Adobe Acrobat SDK (which also requires Adobe Acrobat to be installed) or the PDFLibrary SDK (for stand-alone applications).  Both are C/C++ based.

  • Extracting info from v$sqlarea

    Hi,
    I use Oracle 9i and I am trying to extract what a session is exactly doing in a certain moment.
    So, on sqlplus I execute
    select a.sid,a.username,b.sql_text
    from v$session a inner join v$sqlarea b
    on a.sql_address=b.address and
    a.sql_hash_value=b.hash_value
    where sid=&sid;
    but sql_text column is just 1000 characters size and I want the whole statement.
    Is there any way to extract this information using sqlplus? I know that Enterprise Manager shows me this information, but to get sql text at the right moment, I would like to use a script.
    Thanks in advance
    Alex

    Hi Arup,
    I am writing a linux shell script that will automatically logmine information from the archivelogs in the database,
    I want to try to build a shell script that will make information from the archivelogs aceesible to users i.e automating logminer with either shell scripting or set of
    pl/sql procedures so that a user can use his reporting tool to specify the start date , end date and all other parameters to spool the information.
    You could also help with any ideas you have.
    Best Regards

Maybe you are looking for

  • In Messages, is it possible to see when a person is typing for Google Talk account?

    In Adium, when in a Google Talk account you can see when a person is typing with a little ... indicator, so you know to wait. In Messages, that's also how it works with AIM chats - you see a chat bubble indicating the person on the other end is typin

  • Fed up with BT

    I contacted BT in early August about moving and had a date of August 21reconnected an Openreach engineer out to reconnect me. He duly arrived and said he couldn't do anything because there was a fault on the line, meaning I had no landline service an

  • Mail not showing up in macmail

    Mail isn't coming in to my macmail accounts. They show up when I check my mail on web-based email servers but not in macmail. Using OS 10.7.5

  • Oracle RAC 11g paper

    Hi, I have cleared OCP 9i DBA track and upgraded it to OCP 10g in dba track. Now I want to pass Oracle RAC 11g paper. I want to know which papers will have to pass me. What will the track?

  • Limited Sales Office in drop down

    All, At present we have 5 sales office in the live system now management redifined Sales Offices due to which one new sales office needs to be created ,2 will become obselete due to merging and 1 will remain as it is Now sales office which become obs