Xsl transformation from html to text.

Hi, i want to tranform an html source and produce a output as text. All i want to do is to output values from my input fields in my html source. Any ideas on how i would construct my xsl file.
example :
HTML:
<html>
<body>
<input name="od" type="text" value="123">
<input name="id" type="text" value="456">
</body>
</html>
would simply give :
123
456
Thanks for your help !!!

Here is what I came up with. I changed the regular HTML into XHTML then created a stylesheet that would use XPath to find and display the values or the value fields:
test.xml (XHTML version of the HTML you posted)
======================================
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="test.xsl"?>
<html>
<body>
<input name="od" type="text" value="123"/>
<input name="id" type="text" value="456"/>
</body>
</html>
test.xsl
======
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<title>Input values</title>
</head>
<body>
<xsl:for-each select="html/body/input">
<xsl:value-of select="@value"/>
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
this gives the desired result.

Similar Messages

  • Xsl transformation from version1 to version2, problem with namespaces

    Guys!
    In my current project we need to have an interface in Oracle ESB which is build on lets say a wsdl version1 and an interface build on wsdl version2.
    In esb i need to define a transformation which will transform the request on version1 to version2. Because the xsd for the operation is really huge (+1000 items) i made some templates in xsl to do most of the work, works great..only i'm having a few issues now.
    To re-order items from source to target i do the next in a template
    <nameGroep>
    <xsl:copy-of select="andhere the xpath from source"/>
    <xsl:copy-of select="andhere the xpath from source"/>
    <xsl:copy-of select="andhere the xpath from source"/>
    </nameGroep>The only problem from the xsl:copy-of is, it also copies the namespace along. So if my target document uses an other namespace, it fails.
    To correct this i hoped i could make use of <xsl:namespace-alias> but this doesn't work on a literal/text tag (hope i explain this correct).
    Other option is, for every element do something like
    [code[
    <elementname>
    <xsl:value-of select=""/>
    </elementname>
    but this will create the <elementname> always in the target whether or not it's in the source. You could do a check to see if it's in the source, but this isn't a solution because then i need to check for every 1000+ item in the source document, so..we skip this idea.
    So i reach a point where im still searching for a good solution and hoped you guys could help me a bit with it.
    If the problem isn't explain well please say so, and i will add extra info.

    Guys!
    In my current project we need to have an interface in Oracle ESB which is build on lets say a wsdl version1 and an interface build on wsdl version2.
    In esb i need to define a transformation which will transform the request on version1 to version2. Because the xsd for the operation is really huge (+1000 items) i made some templates in xsl to do most of the work, works great..only i'm having a few issues now.
    To re-order items from source to target i do the next in a template
    <nameGroep>
    <xsl:copy-of select="andhere the xpath from source"/>
    <xsl:copy-of select="andhere the xpath from source"/>
    <xsl:copy-of select="andhere the xpath from source"/>
    </nameGroep>The only problem from the xsl:copy-of is, it also copies the namespace along. So if my target document uses an other namespace, it fails.
    To correct this i hoped i could make use of <xsl:namespace-alias> but this doesn't work on a literal/text tag (hope i explain this correct).
    Other option is, for every element do something like
    [code[
    <elementname>
    <xsl:value-of select=""/>
    </elementname>
    but this will create the <elementname> always in the target whether or not it's in the source. You could do a check to see if it's in the source, but this isn't a solution because then i need to check for every 1000+ item in the source document, so..we skip this idea.
    So i reach a point where im still searching for a good solution and hoped you guys could help me a bit with it.
    If the problem isn't explain well please say so, and i will add extra info.

  • Error in generating a new XSL Transformer from large xslt File

    Good day to all,
    Currently I am facing a problem that whenever i try generating a Transformer object from TransformerFactory, I will have a TransformerConfigurationException threw. I have did some research from the net and understand that it is due to a bug that JVM memory limit of 64kb. However is there any external package or project that has already addressed to this problem? I have checked apache but they already patch the problem in Xalan 2.7.1. However I couldn't find any release of 2.7.1
    Please help
    Regards
    RollinMao

    If you have the transformation rules in a separate XSLT file, then, you can use com.icl.saxon package to get XML files transformed. I have used this package with large XSL files and has worked well.

  • Switch from HTML to Text

    When I set up my yahoo email I indicated to use HTML. However in my messages I get all the code garbage. How can I switch to Text mode? Thanks
    Alan

    The ability to read messages formatted in HTML will be available when you update your OS to 4.5. You can either wait till your carrier releases it, or follow these steps do install 4.5 onto your phone:
    After you download and install the handheld software from a carrier's website that has the 4.5 OS, do a search on your computer to find and delete the vendor.xml file. The file is most likely in the following location: C:\Program Files\Common Files\Research In Motion\AppLoader. Afterwards just start the Desktop Manager and the software will be available for you to install.
    **NOTE** - the handheld software you have must be for the same model as you have. You cannot install software for any other model.
    Message Edited by jmrmb80 on 09-29-2008 07:48 PM
    If someone has been helpful please consider giving them kudos by clicking the star to the left of their post.
    Remember to resolve your thread by clicking Accepted Solution.

  • Process to Generate PDF from HTML Rich Text Editor

    Hi,
    I have a HTML form with the Yahoo Rich Text Editor.
    The Form posts the rich text to a Livecycles process. [REST]
    The input type is String. How could I convert that string into a PDF?
    TIA
    Michael

    Thankyou,
    I saved the input in a temporary file and used the HTML2PDF component, then deleted the temp file,
    Cheers

  • XSL transformation not working

    Hi!
    I am having problems when trying to generate XSL transformation from XML to XML (where XML output is actually XHTML). It always fails executing <xsl:callTemplate name="something", when <xsl:callTemplate /> is executed from another <xsl:template> which is also called with <xsl:callTemplate. Version of database is 10.2.0.4.0, received error is: ORA-00604: invalid character value 'burek' for attribute 'name'.
    Transformation is working in Java and Altova XMLSpy.
    PL/SQL code:
    procedure process_xsl(p_xml in clob, p_xsl in clob, p_result out clob) is
    w_xsl_proc dbms_XSLProcessor.Processor;
    w_xsl_ss dbms_XSLProcessor.Stylesheet;
    w_dom_xsl dbms_xmldom.DOMDocument;
    w_dom_xml dbms_xmldom.DOMDocument;
    w_parser dbms_xmlparser.Parser;
    begin
    --xml in xsl iz cloba v DOMDocument
    w_parser := dbms_xmlparser.newParser;
    dbms_xmlparser.parseClob(w_parser, p_xml);
    w_dom_xml := dbms_xmlparser.getDocument(w_parser);
    dbms_xmlparser.freeParser(w_parser);
    w_parser := dbms_xmlparser.newParser;
    dbms_xmlparser.parseClob(w_parser, p_xsl);
    w_dom_xsl := dbms_xmlparser.getDocument(w_parser);
    dbms_xmlparser.freeParser(w_parser);
    --xsl procesiranje
    w_xsl_proc := dbms_XSLProcessor.newProcessor;
    w_xsl_ss := dbms_XSLProcessor.newStylesheet(w_dom_xsl, null); <-- Here error is received
    END;
    Stylesheet:
    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
         <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"></xsl:output>
         <xsl:decimal-format name="dec" decimal-separator="," grouping-separator="."/>
         <!-- Predefined constants from einvoice xml schema -->
         <xsl:variable name="einvoiceIssuerCode" select="'II'"></xsl:variable>
         <xsl:variable name="einvoiceRecipientCode" select="'IV'"></xsl:variable>
         <xsl:variable name="einvoiceIssueLocationCode" select="91"></xsl:variable>
         <xsl:variable name="einvoiceIssueDateCode" select="137"></xsl:variable>
         <!-- Constants directly from document which is a part of transformation -->
         <xsl:variable name="einvoiceNumber" select="/IzdaniRacunEnostavni/Racun/GlavaRacuna/StevilkaRacuna/text()"></xsl:variable>
         <!-- Intro template -->
         *<xsl:template name="burek"> <!-- Second template called with xsl:call template -->*
              <xsl:text>TEST</xsl:text>
         </xsl:template>
         <!-- Template in which we create html structure including css -->
         <xsl:template name="einvoice">
              <html xmlns="http://www.w3.org/1999/xhtml">
              <head>
                   <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
                   <title>Vizualizacija e-računa št. </title>
                   <xsl:call-template name="burek"></xsl:call-template>
              </head>
              <body>
              </body>
              </html>
         </xsl:template>
         <!-- Intro template -->
         <xsl:template match="/">
    *          <xsl:call-template name="einvoice"></xsl:call-template> <!-- This call is OK -->*
         </xsl:template>
    </xsl:stylesheet>
    XML document
    <?xml version="1.0" encoding="UTF-8"?>
    <IzdaniRacunEnostavni>
    <Racun Id="data">
    <GlavaRacuna>
    <VrstaRacuna>380</VrstaRacuna>
    <StevilkaRacuna>1205019908211</StevilkaRacuna>
    <FunkcijaRacuna>9</FunkcijaRacuna>
    </GlavaRacuna>
    <DatumiRacuna>
    <VrstaDatuma>137</VrstaDatuma>
    <DatumRacuna>2012-05-07T00:00:00.0Z</DatumRacuna>
    </DatumiRacuna>
    <DatumiRacuna>
    <VrstaDatuma>263</VrstaDatuma>
    <DatumRacuna>2012-04-28T00:00:00.0Z</DatumRacuna>
    </DatumiRacuna>
    <DatumiRacuna>
    <VrstaDatuma>263</VrstaDatuma>
    <DatumRacuna>2012-05-27T00:00:00.0Z</DatumRacuna>
    </DatumiRacuna>
    <DatumiRacuna>
    <VrstaDatuma>263</VrstaDatuma>
    <DatumRacuna>2012-03-28T00:00:00.0Z</DatumRacuna>
    </DatumiRacuna>
    <DatumiRacuna>
    <VrstaDatuma>263</VrstaDatuma>
    <DatumRacuna>2012-04-26T00:00:00.0Z</DatumRacuna>
    </DatumiRacuna>
    <DatumiRacuna>
    <VrstaDatuma>263</VrstaDatuma>
    <DatumRacuna>2012-04-27T00:00:00.0Z</DatumRacuna>
    </DatumiRacuna>
    <Lokacije>
    <VrstaLokacije>91</VrstaLokacije>
    <NazivLokacije>Ljubljana</NazivLokacije>
    </Lokacije>
    </Racun>
    </IzdaniRacunEnostavni>
    Edited by: 938026 on 01-Jun-2012 00:35

    Hi,
    I think your problem lies in the <title>. You are using non UTF-8 characters in the title (š), but you marked your XML as UTF-8. So change the title to have unicode charaters and it will work.
    Herald ten Dam
    http://htendam.wordpress.com

  • How do we stop the XSL transform creating an empty tag when there is no inp

    Here is a way to stop the XSL transform from creating an empty tag when there is no input.
    1. Open the XSL Map in Jdev
    2. Go to the Design Tab
    3. Right click the tag in the target tree and select "Add XSL node -> xsl:if"
    4. Create a new link from the source tag (the same that is linked to the target tag) to the newly created xsl:if

    For anyone coming in to find this, I located my answer here:
    [Special Applet Attributes|http://java.sun.com/javase/6/docs/technotes/guides/plugin/developer_guide/special_attributes.html#codebase]
    Thanks for reading.
    Sorry for the interruption.

  • XSL Transformation crashes DW CS3

    Whenever I select XSL Transformation from any of the menus,
    it crashes DW CS3 asking the typical, "do you want to tell
    microsoft about this error."
    Anyone have this problem with using XSL Transformations? I'm
    thinking about reinstalling DW to see if it fixes the problem.
    Thanks

    Forgot to mention that this is a Windows XP SP2 machine.

  • Extra texts got spits out from XML XSL transformation?

    Hi:
    I was trying to output a transformed text from a XSL with XML file, It seems that it spits out all the values from xml file inbetween all the element tags. But all I really need is just a small chunk of it. Does any one know how to get rid of the extra stuff from the generated text?
    Thanks

    If your XSL doesn't specify what to do with a particular node, the default is to copy it to the output. That's what is happening to you. The remedy is for your XSL to specify a template for the root node ("/") that produces the output you want. Or something like that, your details are rather sketchy.

  • XSL Fragment into HTML via Client-Side Transform

    I am designing a site for a school. I searched and found the
    post here from July 25, and I have also read the Dreamweaver
    help file till I'm blue in the face. They talk all around the
    answer but never definitively say if it's possible to do this.
    Dreamweaver help mentions:
    -- Workflow for performting client-side xsl transformations
    Do one of the following:
    In your Dreamweaver site, create an entire XSLT page. See
    Creating entire XSLT pages.
    Convert an existing HTML page to an entire XSLT page. See
    Converting HTML pages to XSLT pages.
    All the online tutorials show server-side transforms but I'm
    not skilled in that...nor do I know if the hosting entity will
    provide that level of access to their .NET server.
    ---- ok. that's the background of the situation. Now to my
    problem. ---
    We plan to have two mutually exclusive areas on the home
    page, such as news & events, that will be updated by a single
    school employee. The plan is to create two XML text files that one
    teacher can update.
    The XMLfiles will be manually uploaded to the web site and
    the home page will read that data into properly formatted
    information on the home page. I would greatly prefer to keep the
    entire process as a client-side procedure.
    I have created and linked XSL fragments to the XML data.
    If I try to copy and paste code from the XSL fragment into
    the index HTML page, I get nothing.
    Success comes only after converting the home page into an
    XSLT 1.0 file using Dreamweaver and copying and pasting the code
    fromt he XSL file into the newly created XSLT file.
    Hence my questions:
    1 Can I bring these XSL fragments into an HTML home page or
    do I have to convert it to XSLT?
    2. If I must convert the HTML file to an XSLT file, can
    people still type the website address in as www dot site dot com
    and the XSLT file will open without anyone knowing the difference?
    3. Can I even do this with a client-side transform?
    4. Is it possible for one page to reference two separate XSL
    fragments pointing to the two separate respective XML files?
    Thank you very much for your help.

    Hi Eric,
    these are the cache control headers of the request that serves the XSLT:
    GET http://www.carsten-leue.de/test/iframe_xslt/xslt.php HTTP/1.1
    Accept: */*
    Referer: http://www.carsten-leue.de/test/iframe_xslt/xslt.php
    Accept-Encoding: gzip, deflate
    User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
    Host: www.carsten-leue.de
    DNT: 1
    Connection: Keep-Alive
    There does not seem to be a header involved that prevents caching.
    You mention the "legacy ActiveX" control. In which sense is this control involved in the usecase? In my scenario I am pointing the browser to the XML document that has an associated stylesheet and the browser automatically executes the transform.
    I am not explicitly triggering the transform via some script in the page.
    Does the ActiveX control still play a role in this scenario?
    Carsten

  • Xmltype.transform and xsl:output method="html"

    hi, 9.2.0.4 winxp,
    i wonder whether xmltype.transform regards any output instructions in the stylesheet. i requested any of xml, html and text and always got the same result?
    any ideas or hints to more info?
    regards peter

    Sorry for jumping in on this thread, but I have a question regarding you reply. I have an XSL stylesheet that preforms XML to HTML conversion. Everything works correctly with the exception of those HTML tags that are not weel formed. Using your example if I have something like:
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html"/>
    <input type="text" name="{NAME}" size="{DISPLAY_LENGTH}" maxlength="{LENGTH}"></input>
    </xsl:stylesheet>
    It would render HTML in the format of
    <HTML>
    <input type="text" name="in1" size="10" maxlength="20"/>
    </HTML>
    While IE can handle this Netscape can not. Is there anyway to generate completely cross browser complient HTML with XSL?
    Thanks!

  • How Do I Display HTML Formatted Text From A Data Table In Crystal Reports?

    I'm creating reports in Crystal XI.  The information being displayed in the reports comes from data tables where the text is formatted in HTML.
    I've worked with Crystal Reports enough to know that HTML text pulled from a data table doesn't appear in Crystal the same way it does in a web browser.  Crystal Reports ignores all the tags (...unless I'm missing something...) and just displays the text.
    Someone far more Crystal savy than I (...who I don't have access to...) came up with a Formula Field workaround that tricks Crystal Reports into displaying some basic HTML tags.  Here's that workaround:
    <!--
    stringVar TableName := ;
    TableName := Replace (TableName, "<ul>","<br> <br>");
    TableName := Replace (TableName, "<li>", "<br>   &bull; ");
    TableName := Replace (TableName, "</li>", "");
    TableName := Replace (TableName, "</ul>","<br> <br>");
    TableName := Replace (TableName, "<a", "<u><font color='blue'");
    TableName := Replace (TableName, "</a>", "</font></u>");
    TableName
    -->
    QUESTION - Does any similar workaround exist so I can display an HTML Table in Crystal Reports?  If not, is there any way to display HTML formatted text from a data table in Crystal Reports as it would appear in a web browser?

    Hi Steven,
    To display html text in Crystal Reports follows these steps.
    1. Right click on the field and select Paragraph tab.
    2. Under 'Text Interpretation' select 'HTML Text' and click OK.
    I have tried using the way,but it never works.So reply me if there is any way to solve the issue

  • Problem to extract text from HTML document

    I have to extract some text from HTML file to my database. (about 1000 files)
    The HTML files are get from ACM Digital Library. http://portal.acm.org/dl.cfm
    The HTML page is about the information of a paper. I only want to get the text of "Title" "Abstract" "Classification" "Keywords"
    The Problem is that I can't find any patten to parser the html files"
    EX: I need to get the Classification = "Theory of Computation","ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY","Numerical Algorithms and Problem","Mathematics of Computing","NUMERICAL ANALYSIS"......etc .
    The section code about "Classification" is below.
    Please give any idea to do this, or how to find patten to extract text from this.
    <div class="indterms"><a href="#CIT"><img name="top" src=
    "img/arrowu.gif" hspace="10" border="0" /></a><span class=
    "heading"><a name="IndexTerms">INDEX TERMS</a></span>
    <p class="Categories"><span class="heading"><a name=
    "GenTerms">Primary Classification:</a></span><br />
    � <b>F.</b> <a href=
    "results.cfm?query=CCS%3AF%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Theory of Computation</a><br />
    � <img src="img/tree.gif" border="0" height="20" width=
    "20" /> <b>F.2</b> <a href=
    "results.cfm?query=CCS%3A%22F%2E2%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">ANALYSIS OF ALGORITHMS AND PROBLEM
    COMPLEXITY</a><br />
    � � � <img src="img/tree.gif" border="0" height=
    "20" width="20" /> <b>F.2.1</b> <a href=
    "results.cfm?query=CCS%3A%22F%2E2%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Numerical Algorithms and Problems</a><br />
    </p>
    <p class="Categories"><span class="heading"><a name=
    "GenTerms">Additional�Classification:</a></span><br />
    � <b>G.</b> <a href=
    "results.cfm?query=CCS%3AG%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Mathematics of Computing</a><br />
    � <img src="img/tree.gif" border="0" height="20" width=
    "20" /> <b>G.1</b> <a href=
    "results.cfm?query=CCS%3A%22G%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">NUMERICAL ANALYSIS</a><br />
    � � � <img src="img/tree.gif" border="0" height=
    "20" width="20" /> <b>G.1.6</b> <a href=
    "results.cfm?query=CCS%3A%22G%2E1%2E6%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Optimization</a><br />
    � � � � � <img src="img/tree.gif" border=
    "0" height="20" width="20" /> <b>Subjects:</b> <a href=
    "results.cfm?query=CCS%3A%22Linear%20programming%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Linear programming</a><br />
    </p>
    <br />
    <p class="GenTerms"><span class="heading"><a name=
    "GenTerms">General Terms:</a></span><br />
    <a href=
    "results.cfm?query=genterm%3A%22Algorithms%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Algorithms</a>, <a href=
    "results.cfm?query=genterm%3A%22Theory%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Theory</a></p>
    <br />
    <p class="keywords"><span class="heading"><a name=
    "Keywords">Keywords:</a></span><br />
    <a href=
    "results.cfm?query=keyword%3A%22Simplex%20method%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Simplex method</a>, <a href=
    "results.cfm?query=keyword%3A%22complexity%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">complexity</a>, <a href=
    "results.cfm?query=keyword%3A%22perturbation%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">perturbation</a>, <a href=
    "results.cfm?query=keyword%3A%22smoothed%20analysis%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">smoothed analysis</a></p>
    </div>

    One approach is to download Htmlparser from sourceforge
    http://htmlparser.sourceforge.net/ and write the rules to match title, abstract etc.
    Another approach is to write your own parser that extract only title, abstract etc.
    1. tokenize the html file. --> convert html into tokens (tag and value)
    2. write a simple parser to extract certain information
    find out about the pattern of text you want to extract. For instance "<class "abstract">.
    then writing a rule for extracting abstract such as
    if (tag is abstract ) then extract abstract text
    apply the same concept for other tags
    Attached is the sample parser that was used to extract title and abstract from acm html files. Please modify to include keyword and other fields.
    good luck
    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.util.ArrayList;
    import java.util.List;
    public class ACMHTMLParser
         private String m_filename;
         private URLLexicalAnalyzer lexical;
         List urls = new ArrayList();
         public ACMHTMLParser(String filename)
              super();
              m_filename = filename;
          * parses only title and abstract
         public void parse() throws Exception
              lexical = new URLLexicalAnalyzer(m_filename);
              String word = lexical.getNextWord();
              boolean isabstract = false;
              while (null != word)
                   if (isTag(word))
                        if (isTitle(word))
                             System.out.println("TITLE: " + lexical.getNextWord());
                        else if (isAbstract(word) && !isabstract)
                             parseAbstract();
                             isabstract = true;
                   word = lexical.getNextWord();
              lexical.close();
         public static void main(String[] args) throws Exception
              ACMHTMLParser parser = new ACMHTMLParser("./acm_html.html");
              parser.parse();
         public static boolean isTag(String word)
              return ( word.startsWith("<") && word.endsWith(">"));
         public static boolean isTitle(String word)
              return ( "<title>".equals(word));
         //please modify according to the html source
         public static boolean isAbstract(String word)
              return ( "<p class=\"abstract\">".equals(word));
         private void parseAbstract() throws Exception
              while (true)
                   String abs = lexical.getNextWord();
                   if (!isTag(abs))
                        System.out.println(abs);
                        break;
         class URLLexicalAnalyzer
           private BufferedReader m_reader;
           private boolean isTag;
           public URLLexicalAnalyzer(String filename)
              try
                m_reader = new BufferedReader(new FileReader(filename));
              catch (IOException io)
                System.out.println("ERROR, file not found " + filename);
                System.exit(1);
           public URLLexicalAnalyzer(InputStream in)
              m_reader = new BufferedReader(new InputStreamReader(in));
           public void close()
              try {
                if (null != m_reader) m_reader.close();
              catch (IOException ignored) {}
           public String getNextWord() throws IOException
              int c = m_reader.read();   
              if (-1 == c) return null; 
              if (Character.isWhitespace((char)c))
                return getNextWord();
              if ('<' == c || isTag)
                return scanTag(c);
              else
                   return scanValue(c);
           private String scanTag(final int c)
              throws IOException
              StringBuffer result = new StringBuffer();
              if ('<' != c) result.append('<');
              result.append((char)c);
              int ch = -1;
              while (true)
                ch = m_reader.read();
                if (-1 == ch) throw new IllegalArgumentException("un-terminate tag");
                if ('>' == ch)
                     isTag = false;
                     break;
                result.append((char)ch);
              result.append((char)ch);
              return result.toString();
           private String scanValue(final int c) throws IOException
                StringBuffer result = new StringBuffer();
                result.append((char)c);
                int ch = -1;
                while (true)
                   ch = m_reader.read();
                   if (-1 == ch) throw new IllegalArgumentException("un-terminate value");
                   if ('<' == ch)
                        isTag = true;
                        break;
                   result.append((char)ch);
                return result.toString();
    }

  • Read Text from HTML-Pages and want to solve "ChangedCharSetException"

    Hello,
    I have an app that connect via threads with pages and parse them an gives me only the Text-version of a HTML-page. Works fine, but if it found a page, where the text is within images, than the whole app stopps and gave me the message:
    javax.swing.text.ChangedCharSetException
            at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:169)
            at javax.swing.text.html.parser.Parser.startTag(Parser.java:372)
            at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1846)
            at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1881)
            at javax.swing.text.html.parser.Parser.parse(Parser.java:2047)
            at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:106)
            at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:78)
            at aufruf.main(aufruf.java:33)So I tried to catch them with "getCharSetSpec()" and "keyEqualsCharSet( )" from the class "javax.swing.text.ChangedCharSetException" and hoped that this solved the problem. But still doesen't work...
    Then I looked at the web and found, that I have to add the line:
    doc.putProperty("IgnoreCharsetDirective", new Boolean(true));"doc." is a new HTML Dokument, created with the HTMLEditorKit. I do not have much knowledge about that and so I hope, that someone can explain me, how I can solve that problem, within my code.
    Here we go:
    import javax.swing.text.*;
    import java.lang.*;
    import java.util.*;
    import java.net.*;
    import java.io.*;
    import javax.swing.text.html.*;
    import javax.swing.text.html.parser.*;
    public class myParser extends Thread
            private String name;
            public void run()
                    try
                            URL viele = new URL(name);                       // "name" ia a variable with a lot of links
                    URLConnection hs = viele.openConnection();
                    hs.connect();
                    if (hs.getContentType().startsWith("text/html"))
                            InputStream is = hs.getInputStream();
                            InputStreamReader isr = new InputStreamReader(is);
                            BufferedReader br = new BufferedReader(isr);
                            Lesen los = new Lesen();
                            ParserDelegator parser = new ParserDelegator();
                            parser.parse(br,los, false);
            catch (MalformedURLException e)
                    System.err.print("Doesn't work");
            catch (ChangedCharSetException e)
                    e.getCharSetSpec();
                    e.keyEqualsCharSet();
                    e.printStackTrace();
            catch (Exception o)
            public void vowi(String n)
                    name = n;
    }and for the case that it is important here is the class "Lesen"
    import java.net.*;
    import java.io.*;
    import javax.swing.text.*;
    import javax.swing.text.html.*;
    import javax.swing.text.html.parser.*;
    class Lesen extends HTMLEditorKit.ParserCallback
            public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos)
                    try
                            if ((t==HTML.Tag.P) || (t==HTML.Tag.H1) || (t==HTML.Tag.H2) || (t==HTML.Tag.H3) || (t==HTML.Tag.H4) || (t==HTML.Tag.H5) || (t==HTML.Tag.H6))
                                    System.out.println();
                    catch (Exception q)
                            System.out.println(q.getMessage());
            public void handleSimpleTag(HTML.Tag t,MutableAttributeSet a, int pos)
                    try
                            if (t==HTML.Tag.BR)
                                    System.out.println(); // Neue Zeile
                                    System.out.println();
                    catch (Exception qw)
                            System.out.println(qw.getMessage());
            public void handleText(char[] data, int pos)
                    try
                            System.out.print(data);                                           // prints the text from HTML-pages
                    catch (Exception ab)
                            System.out.println(ab.getMessage());
    }Thanks a lot for helping...
    Stephan

    parser.parse(br,los, false);
    parser.parse(br,los, true);

  • Creating an XML From a Deep Structure  using XSL Transformation

    Hi ABAPers,
    I have a requirement to use XSL Transformations on an ABAP deep type structure.
    Currently i have an API that fills in this deep structure and by using CALL TRANSFORMATION ID.... i will get the BIG XML having having 100s of nodes . But actualy form the deep structure i need only some NODES (say 50)... So i tried writing an XSLT
    in the transaction STRANS.. but on using this TRANSFORMATION which i wrote i am getting an error messgae like INVALID XML...
    Am i going in right track or is there a good solution...
    My sample transformation is as below...
    <xsl:transform version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
    <xsl:value-of select="DATA/NODE_ELEMENTS/UUID_KEY/UUID"/>
    <xsl:value-of select="DATA/NODE_ELEMENTS/SEMANTICAL_NAME"/>
    <xsl:value-of select="DATA/NODE_ELEMENTS/STRUCT_CAT"/>
    <xsl:value-of select="DATA/NODE_ELEMENTS/USAGE_CAT"/>
    <xsl:value-of select="DATA/NODE_ELEMENTS/RESTRICTED_IND"/>
    <xsl:value-of select="VALUES/DATA/NODE_ID"/>.
    </xsl:template>
    </xsl:transform>
    Please help me in solving this issue....
    Thanks,
    Linda.

    Hi Linda,
        I am replying based on your sample code.
       Try the below following suggestions.
       here 'GRPHDR' is the node where I am selecting the data.
               IGRPHDR is the name of the reference.
    First calling the transformation in you program.
    TYPES: BEGIN OF tl_hdr,
               msgid(20)    TYPE c,
                 END OF tl_hdr.
    DATA : t_hdr           TYPE STANDARD TABLE OF tl_hdr.
      GET REFERENCE OF t_hdr INTO l_result_xml-value.
        l_result_xml-name = 'IGRPHDR'.
        APPEND l_result_xml TO t_result_xml.
       TRY.
            CALL TRANSFORMATION yfi_xml_read
            SOURCE XML it_xml_data
            RESULT (t_result_xml).
          CATCH cx_root INTO l_rif_ex.
            l_var_text = l_rif_ex->get_text( ).
            l_bapiret-type = 'E'.
            l_bapiret-message = l_var_text.
            APPEND l_bapiret TO errormsgs.
            EXIT.
        ENDTRY.
    in XSL transformation
       First write a block of statement to specify from which node you are taking the data.
       No matter it is a node or sub-node.
    <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
      <xsl:output encoding="iso-8859-1" indent="yes" method="xml" version="1.0"/>
      <xsl:strip-space elements="*"/>
    <xsl:template match="/">
          <asx:abap xmlns:asx="http://www.sap.com/abapxml" version="1.0">
          <asx:values>
            <IGRPHDR>  " reference name of internal table
              <xsl:apply-templates select="//GrpHdr"/>
            </IGRPHDR>
      </asx:values>
        </asx:abap>
    </xsl:template>
    Next select the data from the nodes under the nodes specified in the transformation.
    here msgid is the field i am selecting for value.
    <xsl:template match="GrpHdr">
        <item>
          <MSGID>  " field in the internal table t_hdr where data has to go
            <xsl:value-of select="MsgId"/>
          </MSGID>
        </item>
      </xsl:template>
    reply back if further clarification is needed.
    Thanks and regards,
    Kannan N

Maybe you are looking for