Database to text/ascii/html conversion

I need to convert an Oracle db to a text file. Preservation of the exact format of the db is not critical, although desireable, but the entire contents of the db in text format (headers aside) is. Is there a filter or migration utility that will easily do this? Conversion to html or xml would be acceptable. Thank you in advance.

This is not the right forum, and the question really doesn't make sense. There's no simple mapping between a relational database and a flat text file.
You can generate XML from an individual table of the database - see the XML developers kit for more details.

Similar Messages

  • Transport Agent Text To HTML Conversion Problem

    I have been building a transport agent that works fine except when I have to convert a plain text email to html. I have been looking for samples on how to use the textconverters and texttohtml. However, I'm not sure what they really are supposed to do. If
    I use it to convert the body it will convert what was plain text to html as in the example below...but it never converts the actual body type to html so it's still a plain text email with a body that has html text in it. Therefore, when read...it doesn't display
    properly. Are the converters supposed to change the mail body type also? Can you change the mail body type?
    <html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
    <meta name="Generator" content="Microsoft Exchange Server">
    <!-- converted from text -->
    <style><!-- .EmailQuote { margin- padding- border- } --></style></head>
    <body>
    <font size="2"><span style="font-size:10pt;"><div class="PlainText">Hello<br>
    </div></span></font>
    </body>
    </html>

    Hello, do you  find answer?

  • Database to text/ascii conversion

    Please excuse if this is the wrong forum. I need to convert an Oracle db to a text file. Preservation of the exact format of the db is not critical, although desireable, but the entire contents of the db in text format (headers aside) is. Is there a filter or migration utility that will easily do this? Conversion to html or xml would be acceptable. Thank you in advance.

    This is not the right forum, and the question really doesn't make sense. There's no simple mapping between a relational database and a flat text file.
    You can generate XML from an individual table of the database - see the XML developers kit for more details.

  • Smartform to HTML conversion

    Hi,
    I need to convert Smartform data stream into HTML format and
    pass the same to Webdynpro application where it will be displayed on the browser.
    I have specified Smartform output format as 'XSF output+HTML'.
    Use of BSP application is ruled out due to certain limitations.
    The FM ‘CONVERT_OTF’ returns data in ASCII or PDF format only.
    Can any one tell some Function Module name to convert
    Smartform data to HTML format or any other way out?
    thanks.

    check out this link
    Smartform to HTML conversion
    thnks
    jaideep
    *reward points if useful

  • How to include text as HTML elements (see DOMElement)

    I am working with Flash PRO CC v. 14.0.  to convert my Flash website to HTML5 / javascript
    I have converted a file to the HTML5 Canvas
    I am very happy that the new Flash Pro has the feature to convert to HTML5 canvas
    HOWEVER:
    In my original .FLA file project I use only one font: Copperplate Bold.  I use several sizes of that font within the project / scene
    In the original file for all text I use static text, Letter spacing, AntiAlias, AutoKern and single line (Linetype)
    - none of which the HTML5 canvas seem to allow / support?
    How do I maintain the FONT look that I have chosen in my original FLASH project, after I convert to HTML5 canvas?
    Is there a way in the HTML canvas to maintain the FONT look that I want?
    HTML5 canvas will not allow Font embedding
    The device font destroys the LOOK of my Copperplate Bold font.
    How do I include text as HTML elements (see DOMElements)?
    WARNINGS generated when I convert the original file into an HTML Canvas:
    Warnings generated while copying/importing in 140827a HTML test.fla:
    * AntiAlias is not supported in HTML5 Canvas document, and has been converted to DeviceFonts in an instance of Text.
    * AutoKern is not supported in HTML5 Canvas document, and has been removed in an instance of Text.
    * Frame Scripts have been commented
    * LetterSpacing is not supported in HTML5 Canvas document, and has been converted to 0.0 in an instance of Text.
    * LineType is not supported in HTML5 Canvas document, and has been converted to MultiLineNoWrap in an instance of Text.
    * Some artwork contains Hairline stroke, which is not supported in HTML5 Canvas document, and has been converted to Solid.
    * StaticText is not supported in HTML5 Canvas document, and has been converted to DynamicText in an instance of Text.
    New HTML Canvas Document created.
    NOTE:  So far the only way I have been able to maintain the font look is to convert the fonts to .png files
    This is painstaking work that I would like to avoid.
    Even then I still get a WARNING when I test my scene - (no doubt because I left the original FONT text  in guide layers)
    After conversion ON TEST SCENE:
    WARNINGS:
    Frame numbers in EaselJS start at 0 instead of 1. For example, this affects gotoAndStop and gotoAndPlay calls. (18)
    Only circular (not oval) radial gradients are supported. (85)
    Text support is limited. It is generally recommended to include text as HTML elements (see DOMElement). (6)
    Color effects are published as a filter and subject to the same limitations. (4)
    Filters are very expensive and are not updated once applied. Cache as bitmap is automatically enabled when a filter is applied. This can prevent animations from updating. (2)
    Content with both Bitmaps and Buttons may generate local security errors in some browsers if run from the local file system.
    HOW CAN I MAINTAIN the FONT LOOK that I have chosen for my project?
    How do I include text as HTML elements (see DOMElements)?
    ANY HELP will be appreciated
    A good, in depth, tutorial on the subject (FONTS) would be a BIG help to many using the convert to HTML5 canvas features.

    GOOGLE HAS
    https://www.google.com/fonts
    choose a font from above site
    then:
    google generates instructions on how to embed that font
    Montserrat
    3. Add this code to your website:
    <link href='http://fonts.googleapis.com/css?family=Montserrat:400,700' rel='stylesheet' type='text/css'>
    4. Integrate the fonts into your CSS:
    The Google Fonts API will generate the necessary browser-specific CSS to use the fonts. All you need to do is add the font name to your CSS styles. For example:
    font-family: 'Source Sans Pro', sans-serif;
    font-family: 'Ubuntu', sans-serif;
    font-family: 'Montserrat Alternates', sans-serif;
    font-family: 'Montserrat', sans-serif;
    font-family: 'Open Sans', sans-serif;

  • Problem to extract text from HTML document

    I have to extract some text from HTML file to my database. (about 1000 files)
    The HTML files are get from ACM Digital Library. http://portal.acm.org/dl.cfm
    The HTML page is about the information of a paper. I only want to get the text of "Title" "Abstract" "Classification" "Keywords"
    The Problem is that I can't find any patten to parser the html files"
    EX: I need to get the Classification = "Theory of Computation","ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY","Numerical Algorithms and Problem","Mathematics of Computing","NUMERICAL ANALYSIS"......etc .
    The section code about "Classification" is below.
    Please give any idea to do this, or how to find patten to extract text from this.
    <div class="indterms"><a href="#CIT"><img name="top" src=
    "img/arrowu.gif" hspace="10" border="0" /></a><span class=
    "heading"><a name="IndexTerms">INDEX TERMS</a></span>
    <p class="Categories"><span class="heading"><a name=
    "GenTerms">Primary Classification:</a></span><br />
    � <b>F.</b> <a href=
    "results.cfm?query=CCS%3AF%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Theory of Computation</a><br />
    � <img src="img/tree.gif" border="0" height="20" width=
    "20" /> <b>F.2</b> <a href=
    "results.cfm?query=CCS%3A%22F%2E2%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">ANALYSIS OF ALGORITHMS AND PROBLEM
    COMPLEXITY</a><br />
    � � � <img src="img/tree.gif" border="0" height=
    "20" width="20" /> <b>F.2.1</b> <a href=
    "results.cfm?query=CCS%3A%22F%2E2%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Numerical Algorithms and Problems</a><br />
    </p>
    <p class="Categories"><span class="heading"><a name=
    "GenTerms">Additional�Classification:</a></span><br />
    � <b>G.</b> <a href=
    "results.cfm?query=CCS%3AG%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Mathematics of Computing</a><br />
    � <img src="img/tree.gif" border="0" height="20" width=
    "20" /> <b>G.1</b> <a href=
    "results.cfm?query=CCS%3A%22G%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">NUMERICAL ANALYSIS</a><br />
    � � � <img src="img/tree.gif" border="0" height=
    "20" width="20" /> <b>G.1.6</b> <a href=
    "results.cfm?query=CCS%3A%22G%2E1%2E6%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Optimization</a><br />
    � � � � � <img src="img/tree.gif" border=
    "0" height="20" width="20" /> <b>Subjects:</b> <a href=
    "results.cfm?query=CCS%3A%22Linear%20programming%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Linear programming</a><br />
    </p>
    <br />
    <p class="GenTerms"><span class="heading"><a name=
    "GenTerms">General Terms:</a></span><br />
    <a href=
    "results.cfm?query=genterm%3A%22Algorithms%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Algorithms</a>, <a href=
    "results.cfm?query=genterm%3A%22Theory%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Theory</a></p>
    <br />
    <p class="keywords"><span class="heading"><a name=
    "Keywords">Keywords:</a></span><br />
    <a href=
    "results.cfm?query=keyword%3A%22Simplex%20method%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Simplex method</a>, <a href=
    "results.cfm?query=keyword%3A%22complexity%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">complexity</a>, <a href=
    "results.cfm?query=keyword%3A%22perturbation%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">perturbation</a>, <a href=
    "results.cfm?query=keyword%3A%22smoothed%20analysis%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">smoothed analysis</a></p>
    </div>

    One approach is to download Htmlparser from sourceforge
    http://htmlparser.sourceforge.net/ and write the rules to match title, abstract etc.
    Another approach is to write your own parser that extract only title, abstract etc.
    1. tokenize the html file. --> convert html into tokens (tag and value)
    2. write a simple parser to extract certain information
    find out about the pattern of text you want to extract. For instance "<class "abstract">.
    then writing a rule for extracting abstract such as
    if (tag is abstract ) then extract abstract text
    apply the same concept for other tags
    Attached is the sample parser that was used to extract title and abstract from acm html files. Please modify to include keyword and other fields.
    good luck
    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.util.ArrayList;
    import java.util.List;
    public class ACMHTMLParser
         private String m_filename;
         private URLLexicalAnalyzer lexical;
         List urls = new ArrayList();
         public ACMHTMLParser(String filename)
              super();
              m_filename = filename;
          * parses only title and abstract
         public void parse() throws Exception
              lexical = new URLLexicalAnalyzer(m_filename);
              String word = lexical.getNextWord();
              boolean isabstract = false;
              while (null != word)
                   if (isTag(word))
                        if (isTitle(word))
                             System.out.println("TITLE: " + lexical.getNextWord());
                        else if (isAbstract(word) && !isabstract)
                             parseAbstract();
                             isabstract = true;
                   word = lexical.getNextWord();
              lexical.close();
         public static void main(String[] args) throws Exception
              ACMHTMLParser parser = new ACMHTMLParser("./acm_html.html");
              parser.parse();
         public static boolean isTag(String word)
              return ( word.startsWith("<") && word.endsWith(">"));
         public static boolean isTitle(String word)
              return ( "<title>".equals(word));
         //please modify according to the html source
         public static boolean isAbstract(String word)
              return ( "<p class=\"abstract\">".equals(word));
         private void parseAbstract() throws Exception
              while (true)
                   String abs = lexical.getNextWord();
                   if (!isTag(abs))
                        System.out.println(abs);
                        break;
         class URLLexicalAnalyzer
           private BufferedReader m_reader;
           private boolean isTag;
           public URLLexicalAnalyzer(String filename)
              try
                m_reader = new BufferedReader(new FileReader(filename));
              catch (IOException io)
                System.out.println("ERROR, file not found " + filename);
                System.exit(1);
           public URLLexicalAnalyzer(InputStream in)
              m_reader = new BufferedReader(new InputStreamReader(in));
           public void close()
              try {
                if (null != m_reader) m_reader.close();
              catch (IOException ignored) {}
           public String getNextWord() throws IOException
              int c = m_reader.read();   
              if (-1 == c) return null; 
              if (Character.isWhitespace((char)c))
                return getNextWord();
              if ('<' == c || isTag)
                return scanTag(c);
              else
                   return scanValue(c);
           private String scanTag(final int c)
              throws IOException
              StringBuffer result = new StringBuffer();
              if ('<' != c) result.append('<');
              result.append((char)c);
              int ch = -1;
              while (true)
                ch = m_reader.read();
                if (-1 == ch) throw new IllegalArgumentException("un-terminate tag");
                if ('>' == ch)
                     isTag = false;
                     break;
                result.append((char)ch);
              result.append((char)ch);
              return result.toString();
           private String scanValue(final int c) throws IOException
                StringBuffer result = new StringBuffer();
                result.append((char)c);
                int ch = -1;
                while (true)
                   ch = m_reader.read();
                   if (-1 == ch) throw new IllegalArgumentException("un-terminate value");
                   if ('<' == ch)
                        isTag = true;
                        break;
                   result.append((char)ch);
                return result.toString();
    }

  • How to convert a Word document to text or html in an ABAP program

    Hi,
    At my client's site, for the recruitment system, they have the word processing system set to RTF, instead of SAP Script. This means that all the correspondence is in Word format. A standard SAP program takes the word letter, loads word, does the mail merge with the applicant's info and then sends the document to a printer.
    The program name is RPAPRT05. The program creates a document proxy (interface I_OI_DOCUMENT_PROXY) and manipulates the document using the methods of the interface.
    Now what we want to do is to instead of sending the document to a printer, we want to email the document contents to the applicant. But I don't know how to get the content from the Word document into text or html format so that I can make an email from it.
    I know I can send an email with the word document as an attachment, but we'd prefer not to do that.
    I would appreciate any help very much.
    Thanks

    Ok, here's what I ended up doing:
    First of, in order to call FM 'CONVERT_RTF_TO_ITF' you need the RTF document in a table with line length 156. The document is returned from FM 'DP_CREATE_URL' in a table with line length 132. So first I convert the table:
        Transform data table from 132 character lines to
        256 character lines
          LOOP AT data_table INTO dataline.
            IF newrow = 'X'.
            Add row to new table
              APPEND INITIAL LINE TO xdatatab ASSIGNING .
              newrow = space.
            ENDIF.
          Convert the raw line of old table to characters
            ASSIGN dataline TO .
          Check line lengths to determine how to add the
          next line of old table
            newlinelen = STRLEN( newline ).
            ADD addspaces TO newlinelen.
            linepos = linemax - newlinelen.
            IF linepos > datalen.
            Enough space available in new table line for all of old table line
              newline+newlinelen = oldline.
              oldlinelen = STRLEN( oldline ).
              addspaces = datalen - oldlinelen.
              CONTINUE.
            ELSE.
            Fill up new table line
              newline+newlinelen(linepos) = oldline(linepos).
              ASSIGN newline TO .
              newrow = 'X'.
            Save the remainder of old table to the new table line
              IF linepos < datalen.
                oldlinelen = STRLEN( oldline ).
                addspaces = datalen - oldlinelen.
                CLEAR newline.
                newline = oldline+linepos.
              ELSE.
                CLEAR newline.
              ENDIF.
            ENDIF.
          ENDLOOP.
        Write the last line to the table
          IF newrow = 'X'.
            APPEND INITIAL LINE TO xdatatab ASSIGNING .
    Next I call FM 'CONVERT_RTF_TO_ITF' to get the document in SAPScript format:
        Convert the RTF format to SAPScript
          CALL FUNCTION 'CONVERT_RTF_TO_ITF'
            EXPORTING
              header            = dochead
              x_datatab         = xdatatab
              x_size            = xsize
            IMPORTING
              with_tab_e        = withtab
            TABLES
              itf_lines         = itf_table
            EXCEPTIONS
              invalid_tabletype = 1
              missing_size      = 2
              OTHERS            = 4.
    This returns the document still containing the mail merge fields which needs to be filled in:
          LOOP AT itf_table INTO itf_line.
            WHILE itf_line CS '«'.
              startpos = sy-fdpos + 1.
              IF itf_line CS '»'.
                tokenlength = sy-fdpos - startpos.
              ENDIF.
              token = itf_line+startpos(tokenlength).
              REPLACE '_' IN token WITH '-'.
              ASSIGN (token) TO .
              ENDIF.
              MODIFY itf_table FROM itf_line.
            ENDWHILE.
          ENDLOOP.
    And finally I use FM 'CONVERT_ITF_TO_ASCII' to convert the SAPScript to text. I set the line lengths to 60, since that's a good length to format emails to.
        Convert document to 60 char wide ascii document for emailing
          CALL FUNCTION 'CONVERT_ITF_TO_ASCII'
            EXPORTING
              formatwidth       = 60
            IMPORTING
              c_datatab         = asciidoctab
              x_size            = documentsize
            TABLES
              itf_lines         = itf_table
            EXCEPTIONS
              invalid_tabletype = 1
              OTHERS            = 2.
    And then the text document gets passed to FM 'SO_NEW_DOCUMENT_ATT_SEND_API1' as the email body.

  • How to convert plain text into html?

    Hi
    I'm looking for a nice method which converts any plain text to html. For example, text: "Me and you\nand a dog named boo."Conversion result should be:
    <html>
    <body>
    Me and you<br>
    and a dog named boo.
    </body>
    </html>I know, I could write such a code myself using regex. But I just wonder whether something like this already exists in the java api?
    Greetings from Switzerland
    Mickey

    Use a StringReader to read the lines and add the lines between <html><pre> ... </pre></html>

  • Xml to html conversion using xslt

    xml contains exponential no i.e. number in scientific notation. When it is converd to HTML, we get NaN for that number. It happens in JDK 1.4 i.e. WLS8.1 with jdk 1.4 bea jrockit jvm.
    It worked fine with wls7 using xalan-j_2_1_0/bin/xalan.jar
    ANy solution?

    Do you know of a method in the xdk that takes a well formed HTML doc and using xsd / xslt convert back to original xml spec?
    Because you created (and as long as you create) the HTML from XML it will be well formed (every tag will be ended with an end-tag) and you can therefore transform it back into XML.
    Most times it will not be possible to convert HTML found on the 'internet' into XML because this HTML is not well formed. For example, many people forget to end a paragraph of text within HTML with the </p> tag.
    We are evaluating using xslt to convert the XML to a form based medium for content maintenance. Wondering if once a XML document is parsed to HTML (DOM) can it be parsed back to XML for subsequent update to stored value in blob column. Specifically interested in conversion (parser) from HTML to XML
    Simply can HTML (in DOM format validated against a xsd) be transformed back to XML ?

  • Plain text ASCII format file

    This may be slightly off-topic, but I'm hoping maybe someone knows the answer:
    I received a license for the Messiah animation suite as part of a one-time offer, and it says to paste the License text "into a text file... (this file must be a plain text ASCII format file, NOT Rich Text or doc)."
    I have Microsoft Office 2008. It has a Plain Text (.txt) format, but googling there seems to be some uncertainty if in Mac it is plain text ASCII format. The TextEdit app I'm not sure of either I'm pretty sure isn't.
    Anyone got a solution?
    Thanks!

    The problem is whether that version of .txt or the MS Office 2008 is actually ASCII format or unicode.
    I don't think that is the issue, since ascii and unicode are identical for the usual 26 letters and 10 digits that are probably in the license text. The point is that it be .txt and not .doc or .rtf or .html, which has all kinds of other junk added to the real content. I would use TextEdit set to Plain text.

  • Converting PDF CLOBS to text or HTML

    I would like to run though all the PDFs (stored as CLOBS) in a database table and copy them to a text or HTML CLOB. Doing this beforehand will should allow me to rapidly index and snippet-ify these fields duirng queries.
    How exactly can I use the built-in facilities in Oracle Text to do this?
    Roger Ford has had some great input on my snippet performance problems and had this to say:
    "The key is to pre-convert before indexing. You can do that with a pl/sql procedure that uses ctxdoc.policy_filter or ctxdoc.ifilter."
    The Reference Manual, page B-2, has this to say:
    "This technology [AUTO_FILTER] also enables you to convert documents to HTML for document presentation with the CTX_DOC package."

    I apologize for posting prematurely....
    I should be able to use CTX_DOC.FILTER as Roger suggested.
    I think I can just loop through every PDF in the table and dump each converted PDF to the result table. I will set the query id to the key from the PDF table thus allowing me to get at the metadata.

  • Database error text: invalid number

    Hi Gurus,
    I am calling a procedure proxy from ECC and it is giving me a short dump:
    Error 339 has occurred while executing database procedure
      ""_SYS_BIC"."Krishna_Demo_Proj.Model/KC_GET_MARA"" on the
    current database connection "R/3".
    Database error text: invalid number: ''
    Triggering statement: "dsql_open_proc"
    I have created a table with only one field.
    Mapped the Data types after creating the Procedure Proxy
    Which data type I need to use? I tried with lots of combinations but, still the same error.
    Regards,
    Krishna Chauhan

    Hi Srinu,
    I have used NVARCHAR 18 and corresponding to that CHAR18 is used.
    Please see the attached screen shots.
    Regards,
    Krishna Chauhan

  • The database error text is: ORA-01843: not a valid month

    I am trying to use a date field as a query filter and I keep getting the
    following error:
    A database error occurred. The database error text is: ORA-01843: not a
    valid month. (WIS 10901).
    When I remove the query filter and run the query it works as
    expected. I want to be able to allow the users to use the date field in order
    to select a date range. Can someone provide me with some information on how to
    resolve this issue.

    SQL> SELECT (to_char(tO_date('09/29/2006', 'mm/dd/yyyy'))||':'||TO_CHAR(systimestamp,'hh24:mi:ss:ff6'))
      2  FROM dual;
    (TO_CHAR(TO_DATE('09/29/2006
    29-SEP-06:01:33:09:023000
    But you want mm/dd/yyyy hh24:mi:ss:ff6 format then use TO_CHAR function for format specifier
    SQL> SELECT to_char(to_timestamp((to_char(tO_date('09/29/2006', 'mm/dd/yyyy'))||':'||TO_CHAR(systimestamp,'hh24:mi:ss:ff6')), 'dd/mm/yyyy hh24:mi:ss:ff6'),'mm/dd/yyyy hh24:mi:ss:ff6')
      2  FROM DUAL
      3  /
    TO_CHAR(TO_TIMESTAMP((TO_CHAR
    09/29/0006 01:40:27:113000
    SQL> Khurram

  • A database error occured. The database error text is: ORA-29275: partial multibyte character . (WIS 10901)

    Hi,
    My Webi report is geeting failed with the error
    "A database error occured. The database error text is: ORA-29275: partial multibyte character . (WIS 10901)"
    may i know the root cause of the above error and how to resolve it. I am using BO 3.1.
    Its very important to provide the report. Please help urgently.
    Thanks in advance.
    Abid

    Hi Abid,
    Please see SAP Note 1556127.
    Symptom
    A database error occurs after refreshing a web intelligence report in java report panel or web intelligence in interactive mode
    The database error text is: ORA 29275 with partial multibyte character (WIS 10901)
    Environment
    windows 2003 Server
    Cause
    Environment variables are not set with value UTF-8:LC_ALL,LANG, and NLS_LANG
    Resolution
    Set following system environment variables: LC_ALL,LANG, and NLS_LANG with value UTF-8. For example, LC_ALL=EN_US.UTF-8

  • Read Text from HTML-Pages and want to solve "ChangedCharSetException"

    Hello,
    I have an app that connect via threads with pages and parse them an gives me only the Text-version of a HTML-page. Works fine, but if it found a page, where the text is within images, than the whole app stopps and gave me the message:
    javax.swing.text.ChangedCharSetException
            at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:169)
            at javax.swing.text.html.parser.Parser.startTag(Parser.java:372)
            at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1846)
            at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1881)
            at javax.swing.text.html.parser.Parser.parse(Parser.java:2047)
            at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:106)
            at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:78)
            at aufruf.main(aufruf.java:33)So I tried to catch them with "getCharSetSpec()" and "keyEqualsCharSet( )" from the class "javax.swing.text.ChangedCharSetException" and hoped that this solved the problem. But still doesen't work...
    Then I looked at the web and found, that I have to add the line:
    doc.putProperty("IgnoreCharsetDirective", new Boolean(true));"doc." is a new HTML Dokument, created with the HTMLEditorKit. I do not have much knowledge about that and so I hope, that someone can explain me, how I can solve that problem, within my code.
    Here we go:
    import javax.swing.text.*;
    import java.lang.*;
    import java.util.*;
    import java.net.*;
    import java.io.*;
    import javax.swing.text.html.*;
    import javax.swing.text.html.parser.*;
    public class myParser extends Thread
            private String name;
            public void run()
                    try
                            URL viele = new URL(name);                       // "name" ia a variable with a lot of links
                    URLConnection hs = viele.openConnection();
                    hs.connect();
                    if (hs.getContentType().startsWith("text/html"))
                            InputStream is = hs.getInputStream();
                            InputStreamReader isr = new InputStreamReader(is);
                            BufferedReader br = new BufferedReader(isr);
                            Lesen los = new Lesen();
                            ParserDelegator parser = new ParserDelegator();
                            parser.parse(br,los, false);
            catch (MalformedURLException e)
                    System.err.print("Doesn't work");
            catch (ChangedCharSetException e)
                    e.getCharSetSpec();
                    e.keyEqualsCharSet();
                    e.printStackTrace();
            catch (Exception o)
            public void vowi(String n)
                    name = n;
    }and for the case that it is important here is the class "Lesen"
    import java.net.*;
    import java.io.*;
    import javax.swing.text.*;
    import javax.swing.text.html.*;
    import javax.swing.text.html.parser.*;
    class Lesen extends HTMLEditorKit.ParserCallback
            public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos)
                    try
                            if ((t==HTML.Tag.P) || (t==HTML.Tag.H1) || (t==HTML.Tag.H2) || (t==HTML.Tag.H3) || (t==HTML.Tag.H4) || (t==HTML.Tag.H5) || (t==HTML.Tag.H6))
                                    System.out.println();
                    catch (Exception q)
                            System.out.println(q.getMessage());
            public void handleSimpleTag(HTML.Tag t,MutableAttributeSet a, int pos)
                    try
                            if (t==HTML.Tag.BR)
                                    System.out.println(); // Neue Zeile
                                    System.out.println();
                    catch (Exception qw)
                            System.out.println(qw.getMessage());
            public void handleText(char[] data, int pos)
                    try
                            System.out.print(data);                                           // prints the text from HTML-pages
                    catch (Exception ab)
                            System.out.println(ab.getMessage());
    }Thanks a lot for helping...
    Stephan

    parser.parse(br,los, false);
    parser.parse(br,los, true);

Maybe you are looking for