Database to text/ascii/html conversion
I need to convert an Oracle db to a text file. Preservation of the exact format of the db is not critical, although desireable, but the entire contents of the db in text format (headers aside) is. Is there a filter or migration utility that will easily do this? Conversion to html or xml would be acceptable. Thank you in advance.
This is not the right forum, and the question really doesn't make sense. There's no simple mapping between a relational database and a flat text file.
You can generate XML from an individual table of the database - see the XML developers kit for more details.
Similar Messages
-
Transport Agent Text To HTML Conversion Problem
I have been building a transport agent that works fine except when I have to convert a plain text email to html. I have been looking for samples on how to use the textconverters and texttohtml. However, I'm not sure what they really are supposed to do. If
I use it to convert the body it will convert what was plain text to html as in the example below...but it never converts the actual body type to html so it's still a plain text email with a body that has html text in it. Therefore, when read...it doesn't display
properly. Are the converters supposed to change the mail body type also? Can you change the mail body type?
<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text -->
<style><!-- .EmailQuote { margin- padding- border- } --></style></head>
<body>
<font size="2"><span style="font-size:10pt;"><div class="PlainText">Hello<br>
</div></span></font>
</body>
</html>Hello, do you find answer?
-
Database to text/ascii conversion
Please excuse if this is the wrong forum. I need to convert an Oracle db to a text file. Preservation of the exact format of the db is not critical, although desireable, but the entire contents of the db in text format (headers aside) is. Is there a filter or migration utility that will easily do this? Conversion to html or xml would be acceptable. Thank you in advance.
This is not the right forum, and the question really doesn't make sense. There's no simple mapping between a relational database and a flat text file.
You can generate XML from an individual table of the database - see the XML developers kit for more details. -
Hi,
I need to convert Smartform data stream into HTML format and
pass the same to Webdynpro application where it will be displayed on the browser.
I have specified Smartform output format as 'XSF output+HTML'.
Use of BSP application is ruled out due to certain limitations.
The FM CONVERT_OTF returns data in ASCII or PDF format only.
Can any one tell some Function Module name to convert
Smartform data to HTML format or any other way out?
thanks.check out this link
Smartform to HTML conversion
thnks
jaideep
*reward points if useful -
How to include text as HTML elements (see DOMElement)
I am working with Flash PRO CC v. 14.0. to convert my Flash website to HTML5 / javascript
I have converted a file to the HTML5 Canvas
I am very happy that the new Flash Pro has the feature to convert to HTML5 canvas
HOWEVER:
In my original .FLA file project I use only one font: Copperplate Bold. I use several sizes of that font within the project / scene
In the original file for all text I use static text, Letter spacing, AntiAlias, AutoKern and single line (Linetype)
- none of which the HTML5 canvas seem to allow / support?
How do I maintain the FONT look that I have chosen in my original FLASH project, after I convert to HTML5 canvas?
Is there a way in the HTML canvas to maintain the FONT look that I want?
HTML5 canvas will not allow Font embedding
The device font destroys the LOOK of my Copperplate Bold font.
How do I include text as HTML elements (see DOMElements)?
WARNINGS generated when I convert the original file into an HTML Canvas:
Warnings generated while copying/importing in 140827a HTML test.fla:
* AntiAlias is not supported in HTML5 Canvas document, and has been converted to DeviceFonts in an instance of Text.
* AutoKern is not supported in HTML5 Canvas document, and has been removed in an instance of Text.
* Frame Scripts have been commented
* LetterSpacing is not supported in HTML5 Canvas document, and has been converted to 0.0 in an instance of Text.
* LineType is not supported in HTML5 Canvas document, and has been converted to MultiLineNoWrap in an instance of Text.
* Some artwork contains Hairline stroke, which is not supported in HTML5 Canvas document, and has been converted to Solid.
* StaticText is not supported in HTML5 Canvas document, and has been converted to DynamicText in an instance of Text.
New HTML Canvas Document created.
NOTE: So far the only way I have been able to maintain the font look is to convert the fonts to .png files
This is painstaking work that I would like to avoid.
Even then I still get a WARNING when I test my scene - (no doubt because I left the original FONT text in guide layers)
After conversion ON TEST SCENE:
WARNINGS:
Frame numbers in EaselJS start at 0 instead of 1. For example, this affects gotoAndStop and gotoAndPlay calls. (18)
Only circular (not oval) radial gradients are supported. (85)
Text support is limited. It is generally recommended to include text as HTML elements (see DOMElement). (6)
Color effects are published as a filter and subject to the same limitations. (4)
Filters are very expensive and are not updated once applied. Cache as bitmap is automatically enabled when a filter is applied. This can prevent animations from updating. (2)
Content with both Bitmaps and Buttons may generate local security errors in some browsers if run from the local file system.
HOW CAN I MAINTAIN the FONT LOOK that I have chosen for my project?
How do I include text as HTML elements (see DOMElements)?
ANY HELP will be appreciated
A good, in depth, tutorial on the subject (FONTS) would be a BIG help to many using the convert to HTML5 canvas features.GOOGLE HAS
https://www.google.com/fonts
choose a font from above site
then:
google generates instructions on how to embed that font
Montserrat
3. Add this code to your website:
<link href='http://fonts.googleapis.com/css?family=Montserrat:400,700' rel='stylesheet' type='text/css'>
4. Integrate the fonts into your CSS:
The Google Fonts API will generate the necessary browser-specific CSS to use the fonts. All you need to do is add the font name to your CSS styles. For example:
font-family: 'Source Sans Pro', sans-serif;
font-family: 'Ubuntu', sans-serif;
font-family: 'Montserrat Alternates', sans-serif;
font-family: 'Montserrat', sans-serif;
font-family: 'Open Sans', sans-serif; -
Problem to extract text from HTML document
I have to extract some text from HTML file to my database. (about 1000 files)
The HTML files are get from ACM Digital Library. http://portal.acm.org/dl.cfm
The HTML page is about the information of a paper. I only want to get the text of "Title" "Abstract" "Classification" "Keywords"
The Problem is that I can't find any patten to parser the html files"
EX: I need to get the Classification = "Theory of Computation","ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY","Numerical Algorithms and Problem","Mathematics of Computing","NUMERICAL ANALYSIS"......etc .
The section code about "Classification" is below.
Please give any idea to do this, or how to find patten to extract text from this.
<div class="indterms"><a href="#CIT"><img name="top" src=
"img/arrowu.gif" hspace="10" border="0" /></a><span class=
"heading"><a name="IndexTerms">INDEX TERMS</a></span>
<p class="Categories"><span class="heading"><a name=
"GenTerms">Primary Classification:</a></span><br />
� <b>F.</b> <a href=
"results.cfm?query=CCS%3AF%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Theory of Computation</a><br />
� <img src="img/tree.gif" border="0" height="20" width=
"20" /> <b>F.2</b> <a href=
"results.cfm?query=CCS%3A%22F%2E2%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">ANALYSIS OF ALGORITHMS AND PROBLEM
COMPLEXITY</a><br />
� � � <img src="img/tree.gif" border="0" height=
"20" width="20" /> <b>F.2.1</b> <a href=
"results.cfm?query=CCS%3A%22F%2E2%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Numerical Algorithms and Problems</a><br />
</p>
<p class="Categories"><span class="heading"><a name=
"GenTerms">Additional�Classification:</a></span><br />
� <b>G.</b> <a href=
"results.cfm?query=CCS%3AG%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Mathematics of Computing</a><br />
� <img src="img/tree.gif" border="0" height="20" width=
"20" /> <b>G.1</b> <a href=
"results.cfm?query=CCS%3A%22G%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">NUMERICAL ANALYSIS</a><br />
� � � <img src="img/tree.gif" border="0" height=
"20" width="20" /> <b>G.1.6</b> <a href=
"results.cfm?query=CCS%3A%22G%2E1%2E6%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Optimization</a><br />
� � � � � <img src="img/tree.gif" border=
"0" height="20" width="20" /> <b>Subjects:</b> <a href=
"results.cfm?query=CCS%3A%22Linear%20programming%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Linear programming</a><br />
</p>
<br />
<p class="GenTerms"><span class="heading"><a name=
"GenTerms">General Terms:</a></span><br />
<a href=
"results.cfm?query=genterm%3A%22Algorithms%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Algorithms</a>, <a href=
"results.cfm?query=genterm%3A%22Theory%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Theory</a></p>
<br />
<p class="keywords"><span class="heading"><a name=
"Keywords">Keywords:</a></span><br />
<a href=
"results.cfm?query=keyword%3A%22Simplex%20method%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Simplex method</a>, <a href=
"results.cfm?query=keyword%3A%22complexity%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">complexity</a>, <a href=
"results.cfm?query=keyword%3A%22perturbation%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">perturbation</a>, <a href=
"results.cfm?query=keyword%3A%22smoothed%20analysis%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">smoothed analysis</a></p>
</div>One approach is to download Htmlparser from sourceforge
http://htmlparser.sourceforge.net/ and write the rules to match title, abstract etc.
Another approach is to write your own parser that extract only title, abstract etc.
1. tokenize the html file. --> convert html into tokens (tag and value)
2. write a simple parser to extract certain information
find out about the pattern of text you want to extract. For instance "<class "abstract">.
then writing a rule for extracting abstract such as
if (tag is abstract ) then extract abstract text
apply the same concept for other tags
Attached is the sample parser that was used to extract title and abstract from acm html files. Please modify to include keyword and other fields.
good luck
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
public class ACMHTMLParser
private String m_filename;
private URLLexicalAnalyzer lexical;
List urls = new ArrayList();
public ACMHTMLParser(String filename)
super();
m_filename = filename;
* parses only title and abstract
public void parse() throws Exception
lexical = new URLLexicalAnalyzer(m_filename);
String word = lexical.getNextWord();
boolean isabstract = false;
while (null != word)
if (isTag(word))
if (isTitle(word))
System.out.println("TITLE: " + lexical.getNextWord());
else if (isAbstract(word) && !isabstract)
parseAbstract();
isabstract = true;
word = lexical.getNextWord();
lexical.close();
public static void main(String[] args) throws Exception
ACMHTMLParser parser = new ACMHTMLParser("./acm_html.html");
parser.parse();
public static boolean isTag(String word)
return ( word.startsWith("<") && word.endsWith(">"));
public static boolean isTitle(String word)
return ( "<title>".equals(word));
//please modify according to the html source
public static boolean isAbstract(String word)
return ( "<p class=\"abstract\">".equals(word));
private void parseAbstract() throws Exception
while (true)
String abs = lexical.getNextWord();
if (!isTag(abs))
System.out.println(abs);
break;
class URLLexicalAnalyzer
private BufferedReader m_reader;
private boolean isTag;
public URLLexicalAnalyzer(String filename)
try
m_reader = new BufferedReader(new FileReader(filename));
catch (IOException io)
System.out.println("ERROR, file not found " + filename);
System.exit(1);
public URLLexicalAnalyzer(InputStream in)
m_reader = new BufferedReader(new InputStreamReader(in));
public void close()
try {
if (null != m_reader) m_reader.close();
catch (IOException ignored) {}
public String getNextWord() throws IOException
int c = m_reader.read();
if (-1 == c) return null;
if (Character.isWhitespace((char)c))
return getNextWord();
if ('<' == c || isTag)
return scanTag(c);
else
return scanValue(c);
private String scanTag(final int c)
throws IOException
StringBuffer result = new StringBuffer();
if ('<' != c) result.append('<');
result.append((char)c);
int ch = -1;
while (true)
ch = m_reader.read();
if (-1 == ch) throw new IllegalArgumentException("un-terminate tag");
if ('>' == ch)
isTag = false;
break;
result.append((char)ch);
result.append((char)ch);
return result.toString();
private String scanValue(final int c) throws IOException
StringBuffer result = new StringBuffer();
result.append((char)c);
int ch = -1;
while (true)
ch = m_reader.read();
if (-1 == ch) throw new IllegalArgumentException("un-terminate value");
if ('<' == ch)
isTag = true;
break;
result.append((char)ch);
return result.toString();
} -
How to convert a Word document to text or html in an ABAP program
Hi,
At my client's site, for the recruitment system, they have the word processing system set to RTF, instead of SAP Script. This means that all the correspondence is in Word format. A standard SAP program takes the word letter, loads word, does the mail merge with the applicant's info and then sends the document to a printer.
The program name is RPAPRT05. The program creates a document proxy (interface I_OI_DOCUMENT_PROXY) and manipulates the document using the methods of the interface.
Now what we want to do is to instead of sending the document to a printer, we want to email the document contents to the applicant. But I don't know how to get the content from the Word document into text or html format so that I can make an email from it.
I know I can send an email with the word document as an attachment, but we'd prefer not to do that.
I would appreciate any help very much.
ThanksOk, here's what I ended up doing:
First of, in order to call FM 'CONVERT_RTF_TO_ITF' you need the RTF document in a table with line length 156. The document is returned from FM 'DP_CREATE_URL' in a table with line length 132. So first I convert the table:
Transform data table from 132 character lines to
256 character lines
LOOP AT data_table INTO dataline.
IF newrow = 'X'.
Add row to new table
APPEND INITIAL LINE TO xdatatab ASSIGNING .
newrow = space.
ENDIF.
Convert the raw line of old table to characters
ASSIGN dataline TO .
Check line lengths to determine how to add the
next line of old table
newlinelen = STRLEN( newline ).
ADD addspaces TO newlinelen.
linepos = linemax - newlinelen.
IF linepos > datalen.
Enough space available in new table line for all of old table line
newline+newlinelen = oldline.
oldlinelen = STRLEN( oldline ).
addspaces = datalen - oldlinelen.
CONTINUE.
ELSE.
Fill up new table line
newline+newlinelen(linepos) = oldline(linepos).
ASSIGN newline TO .
newrow = 'X'.
Save the remainder of old table to the new table line
IF linepos < datalen.
oldlinelen = STRLEN( oldline ).
addspaces = datalen - oldlinelen.
CLEAR newline.
newline = oldline+linepos.
ELSE.
CLEAR newline.
ENDIF.
ENDIF.
ENDLOOP.
Write the last line to the table
IF newrow = 'X'.
APPEND INITIAL LINE TO xdatatab ASSIGNING .
Next I call FM 'CONVERT_RTF_TO_ITF' to get the document in SAPScript format:
Convert the RTF format to SAPScript
CALL FUNCTION 'CONVERT_RTF_TO_ITF'
EXPORTING
header = dochead
x_datatab = xdatatab
x_size = xsize
IMPORTING
with_tab_e = withtab
TABLES
itf_lines = itf_table
EXCEPTIONS
invalid_tabletype = 1
missing_size = 2
OTHERS = 4.
This returns the document still containing the mail merge fields which needs to be filled in:
LOOP AT itf_table INTO itf_line.
WHILE itf_line CS '«'.
startpos = sy-fdpos + 1.
IF itf_line CS '»'.
tokenlength = sy-fdpos - startpos.
ENDIF.
token = itf_line+startpos(tokenlength).
REPLACE '_' IN token WITH '-'.
ASSIGN (token) TO .
ENDIF.
MODIFY itf_table FROM itf_line.
ENDWHILE.
ENDLOOP.
And finally I use FM 'CONVERT_ITF_TO_ASCII' to convert the SAPScript to text. I set the line lengths to 60, since that's a good length to format emails to.
Convert document to 60 char wide ascii document for emailing
CALL FUNCTION 'CONVERT_ITF_TO_ASCII'
EXPORTING
formatwidth = 60
IMPORTING
c_datatab = asciidoctab
x_size = documentsize
TABLES
itf_lines = itf_table
EXCEPTIONS
invalid_tabletype = 1
OTHERS = 2.
And then the text document gets passed to FM 'SO_NEW_DOCUMENT_ATT_SEND_API1' as the email body. -
How to convert plain text into html?
Hi
I'm looking for a nice method which converts any plain text to html. For example, text: "Me and you\nand a dog named boo."Conversion result should be:
<html>
<body>
Me and you<br>
and a dog named boo.
</body>
</html>I know, I could write such a code myself using regex. But I just wonder whether something like this already exists in the java api?
Greetings from Switzerland
MickeyUse a StringReader to read the lines and add the lines between <html><pre> ... </pre></html>
-
Xml to html conversion using xslt
xml contains exponential no i.e. number in scientific notation. When it is converd to HTML, we get NaN for that number. It happens in JDK 1.4 i.e. WLS8.1 with jdk 1.4 bea jrockit jvm.
It worked fine with wls7 using xalan-j_2_1_0/bin/xalan.jar
ANy solution?Do you know of a method in the xdk that takes a well formed HTML doc and using xsd / xslt convert back to original xml spec?
Because you created (and as long as you create) the HTML from XML it will be well formed (every tag will be ended with an end-tag) and you can therefore transform it back into XML.
Most times it will not be possible to convert HTML found on the 'internet' into XML because this HTML is not well formed. For example, many people forget to end a paragraph of text within HTML with the </p> tag.
We are evaluating using xslt to convert the XML to a form based medium for content maintenance. Wondering if once a XML document is parsed to HTML (DOM) can it be parsed back to XML for subsequent update to stored value in blob column. Specifically interested in conversion (parser) from HTML to XML
Simply can HTML (in DOM format validated against a xsd) be transformed back to XML ? -
This may be slightly off-topic, but I'm hoping maybe someone knows the answer:
I received a license for the Messiah animation suite as part of a one-time offer, and it says to paste the License text "into a text file... (this file must be a plain text ASCII format file, NOT Rich Text or doc)."
I have Microsoft Office 2008. It has a Plain Text (.txt) format, but googling there seems to be some uncertainty if in Mac it is plain text ASCII format. The TextEdit app I'm not sure of either I'm pretty sure isn't.
Anyone got a solution?
Thanks!The problem is whether that version of .txt or the MS Office 2008 is actually ASCII format or unicode.
I don't think that is the issue, since ascii and unicode are identical for the usual 26 letters and 10 digits that are probably in the license text. The point is that it be .txt and not .doc or .rtf or .html, which has all kinds of other junk added to the real content. I would use TextEdit set to Plain text. -
Converting PDF CLOBS to text or HTML
I would like to run though all the PDFs (stored as CLOBS) in a database table and copy them to a text or HTML CLOB. Doing this beforehand will should allow me to rapidly index and snippet-ify these fields duirng queries.
How exactly can I use the built-in facilities in Oracle Text to do this?
Roger Ford has had some great input on my snippet performance problems and had this to say:
"The key is to pre-convert before indexing. You can do that with a pl/sql procedure that uses ctxdoc.policy_filter or ctxdoc.ifilter."
The Reference Manual, page B-2, has this to say:
"This technology [AUTO_FILTER] also enables you to convert documents to HTML for document presentation with the CTX_DOC package."I apologize for posting prematurely....
I should be able to use CTX_DOC.FILTER as Roger suggested.
I think I can just loop through every PDF in the table and dump each converted PDF to the result table. I will set the query id to the key from the PDF table thus allowing me to get at the metadata. -
Database error text: invalid number
Hi Gurus,
I am calling a procedure proxy from ECC and it is giving me a short dump:
Error 339 has occurred while executing database procedure
""_SYS_BIC"."Krishna_Demo_Proj.Model/KC_GET_MARA"" on the
current database connection "R/3".
Database error text: invalid number: ''
Triggering statement: "dsql_open_proc"
I have created a table with only one field.
Mapped the Data types after creating the Procedure Proxy
Which data type I need to use? I tried with lots of combinations but, still the same error.
Regards,
Krishna ChauhanHi Srinu,
I have used NVARCHAR 18 and corresponding to that CHAR18 is used.
Please see the attached screen shots.
Regards,
Krishna Chauhan -
The database error text is: ORA-01843: not a valid month
I am trying to use a date field as a query filter and I keep getting the
following error:
A database error occurred. The database error text is: ORA-01843: not a
valid month. (WIS 10901).
When I remove the query filter and run the query it works as
expected. I want to be able to allow the users to use the date field in order
to select a date range. Can someone provide me with some information on how to
resolve this issue.SQL> SELECT (to_char(tO_date('09/29/2006', 'mm/dd/yyyy'))||':'||TO_CHAR(systimestamp,'hh24:mi:ss:ff6'))
2 FROM dual;
(TO_CHAR(TO_DATE('09/29/2006
29-SEP-06:01:33:09:023000
But you want mm/dd/yyyy hh24:mi:ss:ff6 format then use TO_CHAR function for format specifier
SQL> SELECT to_char(to_timestamp((to_char(tO_date('09/29/2006', 'mm/dd/yyyy'))||':'||TO_CHAR(systimestamp,'hh24:mi:ss:ff6')), 'dd/mm/yyyy hh24:mi:ss:ff6'),'mm/dd/yyyy hh24:mi:ss:ff6')
2 FROM DUAL
3 /
TO_CHAR(TO_TIMESTAMP((TO_CHAR
09/29/0006 01:40:27:113000
SQL> Khurram -
Hi,
My Webi report is geeting failed with the error
"A database error occured. The database error text is: ORA-29275: partial multibyte character . (WIS 10901)"
may i know the root cause of the above error and how to resolve it. I am using BO 3.1.
Its very important to provide the report. Please help urgently.
Thanks in advance.
AbidHi Abid,
Please see SAP Note 1556127.
Symptom
A database error occurs after refreshing a web intelligence report in java report panel or web intelligence in interactive mode
The database error text is: ORA 29275 with partial multibyte character (WIS 10901)
Environment
windows 2003 Server
Cause
Environment variables are not set with value UTF-8:LC_ALL,LANG, and NLS_LANG
Resolution
Set following system environment variables: LC_ALL,LANG, and NLS_LANG with value UTF-8. For example, LC_ALL=EN_US.UTF-8 -
Read Text from HTML-Pages and want to solve "ChangedCharSetException"
Hello,
I have an app that connect via threads with pages and parse them an gives me only the Text-version of a HTML-page. Works fine, but if it found a page, where the text is within images, than the whole app stopps and gave me the message:
javax.swing.text.ChangedCharSetException
at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:169)
at javax.swing.text.html.parser.Parser.startTag(Parser.java:372)
at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1846)
at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1881)
at javax.swing.text.html.parser.Parser.parse(Parser.java:2047)
at javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:106)
at javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:78)
at aufruf.main(aufruf.java:33)So I tried to catch them with "getCharSetSpec()" and "keyEqualsCharSet( )" from the class "javax.swing.text.ChangedCharSetException" and hoped that this solved the problem. But still doesen't work...
Then I looked at the web and found, that I have to add the line:
doc.putProperty("IgnoreCharsetDirective", new Boolean(true));"doc." is a new HTML Dokument, created with the HTMLEditorKit. I do not have much knowledge about that and so I hope, that someone can explain me, how I can solve that problem, within my code.
Here we go:
import javax.swing.text.*;
import java.lang.*;
import java.util.*;
import java.net.*;
import java.io.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
public class myParser extends Thread
private String name;
public void run()
try
URL viele = new URL(name); // "name" ia a variable with a lot of links
URLConnection hs = viele.openConnection();
hs.connect();
if (hs.getContentType().startsWith("text/html"))
InputStream is = hs.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
Lesen los = new Lesen();
ParserDelegator parser = new ParserDelegator();
parser.parse(br,los, false);
catch (MalformedURLException e)
System.err.print("Doesn't work");
catch (ChangedCharSetException e)
e.getCharSetSpec();
e.keyEqualsCharSet();
e.printStackTrace();
catch (Exception o)
public void vowi(String n)
name = n;
}and for the case that it is important here is the class "Lesen"
import java.net.*;
import java.io.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
class Lesen extends HTMLEditorKit.ParserCallback
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos)
try
if ((t==HTML.Tag.P) || (t==HTML.Tag.H1) || (t==HTML.Tag.H2) || (t==HTML.Tag.H3) || (t==HTML.Tag.H4) || (t==HTML.Tag.H5) || (t==HTML.Tag.H6))
System.out.println();
catch (Exception q)
System.out.println(q.getMessage());
public void handleSimpleTag(HTML.Tag t,MutableAttributeSet a, int pos)
try
if (t==HTML.Tag.BR)
System.out.println(); // Neue Zeile
System.out.println();
catch (Exception qw)
System.out.println(qw.getMessage());
public void handleText(char[] data, int pos)
try
System.out.print(data); // prints the text from HTML-pages
catch (Exception ab)
System.out.println(ab.getMessage());
}Thanks a lot for helping...
Stephanparser.parse(br,los, false);
parser.parse(br,los, true);
Maybe you are looking for
-
I just updated "troubleshooting QuickTime errors with After Effects CS5, CS5.5, and CS6" with information about hyperthreading and Apple's H.264 encoder. The gist is that Apple's H.264 exporter component for QuickTime fails when the computer has a la
-
my timecapsule is full - i am currently backing up 3 different macs to it. should i buy another time capsule or buy an external hard drive and hook it up through the usd on the TC - if i do this will it continue to work seamlessly? also other tha
-
FCP7 PAUSES EVERY TIME BEFORE PLAYBACK!!!
Hey guys, My FCP7 just recently started acting a little weird. Almost every time I press play it will pause for a few seconds as it spins the color wheel and then start playing. The weird thing is that it does that not just when I press play but pret
-
Hi Can anyone help me I signed into iCloud with an email address that i can't use now I have forgotten the password I used have tried all the ones i would normally use. The problem is i am changing my phone to another provider but need to back up my
-
How to access server from TOAD
i have a doubt as to how to access a Database server from TOAD. If, we have to access from SQLplus command prompt, we define the entry in tnsnames.ora file. But, how to access a Database server from TOAD. I hope, my question is clear. Please, help in