How to get text of an HTML file?

I wish to get the content of an HTML file (as is displayed in a browser). This is my code:
FileInputStream fis = new FileInputStream("myfile.html");
HTMLEditorKit kit = new HTMLEditorKit();
HTMLDocument document = new HTMLDocument();
kit.read(fis, document, 0);
String content = document.getText(0, document.getLength());But content returns null.
Can someone tell me what is wrong? Thank you!

Yeah, open a URL and read the contents from that input stream. That'll bring the URL down to your machine.
This will do it:
package vampire;
import java.io.BufferedReader;
import java.io.FileWriter;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.URL;
* Simple program for demonstrating the URL class.
public class URLVampire
     * Driver for reading the contents of a URL
     * @param command line arguments.
    public static void main(String [] args)
        if (args.length == 0)
            System.err.println("Usage: java URLVampire <URL 1> <URL 2>...<URL n>");
            System.exit(0);
        try
            String [] urlContents = new String[args.length];
            for (int i = 0; i < args.length; ++i)
                URL url                 = new URL(args);
System.out.println("Now reading the contents of URL " + args[i]);
InputStream uis = url.openStream();
InputStreamReader isr = new InputStreamReader(uis);
BufferedReader br = new BufferedReader(isr);
StringBuffer buffer = new StringBuffer();
String line = null;
while ((line = br.readLine()) != null)
buffer.append(line);
br.close();
System.out.println("Read " + buffer.length() + " chars for URL " + args[i]);
urlContents[i] = buffer.toString();
// Now write out all the URL contents
for (int i = 0; i < urlContents.length; ++i)
PrintWriter pw = new PrintWriter(new FileWriter("url" + i + ".html"));
pw.println("<!-- URL: " + args[i] + "-->");
pw.println(urlContents[i]);
pw.close();
catch (Exception e)
e.printStackTrace(System.err);

Similar Messages

  • How to get source of remote html file.

    i want to read the remote html file source
    i don't have any physical / original path of the file
    i have only the url path
    example url : http://mydomain.com/myhomepage.html
    using this url can i get the source of the file myhomepage.html
    thanx
    senthil.

    U can use java.io.*, java.net.* API
    here goes a sample code
    import java.io.*;
    import java.net.*;
    public class URLconnecting{
         public static void main(String[] args)throws Exception{
              URL url = new URL("http://www.yahoo.com");
              URLConnection conn = url.openConnection();
              BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
              String line;
              while( (line= reader.readLine()) != null) System.out.println(line);
              reader.close();

  • Dreamweaver CC, how to get php coloring inside html files

    Before I could do it by changing the extension.txt file. Now when I try to change the file extension.txt I can't save it but I get  an access denied error.
    I am running with administrative privileges.
    Any ideas?

    Sorry, I did not explain myself.
    Before I used to be able to tweak the configuration file from DW (extensions.txt) and Dreamweaver would color the php code that was contained inside an html files.
    I need the code coloring to apply in files with extension html.
    I hope I am more clear now.  And yes, @Rob Hecker2 you areright, I see now that it was very lousy explained 

  • How to get Text from (.txt) file to display in the JTextArea ?

    How to get Text from (.txt) file to display in the JTextArea ?
    is there any code please tell me i am begginer and trying to get data from a text file to display in the JTextArea /... please help...

    public static void readText() {
      try {
        File testFile = new File(WorkingDirectory + "ctrlFile.txt");
        if (testFile.exists()){
          BufferedReader br = new BufferedReader(new FileReader("ctrlFile.txt"));
          String s = br.readLine();
          while (s != null)  {
            System.out.println(s);
            s = br.readLine();
          br.close();
      catch (IOException ex){ex.printStackTrace();}
    }rykk

  • How to get text when mousing over image?

    Hey there, I am extremely new to Flash, but know other Adobe programs so tend to learn quickly.
    I desperately need to find out how to get text to pop up when i mouse over PART of the image and i havent been able to find any help online in the last two days (pulling my hair out in stress).
    Now i uploaded a rough mock up i did on Photoshop, so you can see when i mouseover the top layer of the cake i need a line to stretch out and the text to pop up at the end of it. Similarly if i moused over the cherry another piece of text needs to come up in the same manor.
    Also since i need it to be a website link so what format do i open it with when i go to FILE- NEW?
    Honestly thank you so much in advance to anyone that helps

    First you have to choose if you want to handle devices. If not and you want to stay in Flash I'll stick on topic.
    In the HTML version you could either use a good old fashioned image map (they're still fine in the HTML5 era) or you could use a layering technique (here's a random layering example).
    In Flash you can do it a few different ways. If you intend on keeping the image intact as a single object then you'll be essentially doing the same as an image map. You can draw invisible hitareas on the various parts of the object and have those areas trigger a specific function that will display your text. If you break up the image into the separate parts then you can directly assign those parts to fire off a function themselves.
    First I'd like to know your desired direction.

  • How to extract text from a PDF file?

    Hello Suners,
    i need to know how to extract text from a pdf file?
    does anyone know what is the character encoding in pdf file, when i use an input stream to read the file it gives encrypted characters not the original text in the file.
    is there any procedures i should do while reading a pdf file,
    File f=new File("D:/File.pdf");
                   FileReader fr=new FileReader(f);
                   BufferedReader br=new BufferedReader(fr);
                   String s=br.readLine();any help will be deeply appreciated.

    jverd wrote:
    First, you set i once, and then loop without ever changing it. So your loop body will execute either 0 times or infinitely many times, writing the same byte every time. Actually, maybe it'll execute once and then throw an ArrayIndexOutOfBoundsException. That's basic java looping, and you're going to need a firm grip on that before you try to do anything as advanced as PDF reading. the case.oops you are absolutely right that was a silly mistake to forget that,
    Second, what do the docs for getPageContent say? Do they say that it simply gives you the text on the page as if the thing were a simple text doc? I'd be surprised if that's the case.getPageContent return array of bytes so the question will be:
    how to get text from this array? i was thinking of :
        private void jButton1_actionPerformed(ActionEvent e) {
            PdfReader read;
            StringBuffer buff=new StringBuffer();
            try {
                read = new PdfReader("d:/getjobid2727.pdf");
                read.getMetaData();
                byte[] data=read.getPageContent(1);
                int i=0;
                while(i>-1){ 
                    buff.append(data);
    i++;
    String str=buff.toString();
    FileOutputStream fos = new FileOutputStream("D:/test.txt");
    Writer out = new OutputStreamWriter(fos, "UTF8");
    out.write(str);
    out.close();
    read.close();
    } catch (Exception f) {
    f.printStackTrace();
    "D:/test.txt"  hasn't been created!! when i ran the program,
    is my steps right?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

  • How to get the values from html:select? tag..?

    i tried with this, but its not working...
    <html:select styleClass="text" name="querydefs" property="shortcut"
                 onchange="retrieveOptions()" styleId="firstBox" indexed="true">
    <html:options collection="advanced.choices" property="shortcut" labelProperty="label" />
    </html:select>
                        <td align="left" class="rowcolor1">
                        <script language="javascript" type="text/javascript">
                              function retrieveOptions(){
                             var sel = document.querydefs.options;
                             var selectedOption = sel[sel.selectedIndex].value;
                             document.write(selectedOption);
                           </script>

    <td align="left" class="rowcolor1">
                        <script language="javascript" type="text/javascript">
                              function retrieveOptions(){
                             var sel = document.querydefs.options;
                             var selectedOption = sel[sel.selectedIndex].value;
                             document.write(selectedOption);
                           </script>This java script is not working at all..its not printing anything in document.write();
    This is code..
    <td class="rowcolor1" width="20%">
    <html:select styleClass="text" name="querydefs" property="shortcut"
                             onchange="retrieveSecondOptions()" styleId="firstBox"
                             indexed="true">
                             <html:options collection="advanced.choices" property="shortcut"
                                  labelProperty="label"  />
                        </html:select>i tried with this also. but no use..i'm not the getting the seleced option...
    function retrieveOptions(){
    firstBox = document.getElementById('firstBox');
                             if(firstBox.selectedIndex==0){
          return;
        selectedOption = firstBox.options[firstBox.selectedIndex].value;
    }actually , how to get the values from <html:select> ...?
    my idea is to know which value is selected from the combo box(<html:select> ) if that value is equal some string i have enable a hyperlink to open a popup window

  • How 2 get the path of a file

    how 2 get the path of a file Using jsp
    i have tried getPath...but i'm geting the error
    The method getPath(String) is undefined for the type HttpServletRequest
    any idea how 2 get the path of a file

    You need ServletContext#getRealPath().
    API documentation: http://java.sun.com/javaee/5/docs/api/javax/servlet/ServletContext.html#getRealPath(java.lang.String)

  • How 2 get the path of a file Using jsp

    how 2 get the path of a file Using jsp
    i have tried getPath...but i'm geting the error
    The method getPath(String) is undefined for the type HttpServletRequest
    any idea how 2 get the path of a file

    You need ServletContext#getRealPath().
    API documentation: http://java.sun.com/javaee/5/docs/api/javax/servlet/ServletContext.html#getRealPath(java.lang.String)

  • How To: Get encoding of a remote file

    How To: Get encoding of a remote file
    Java EE
    URL url = new URL ("http://www.someSite.com/myCsvFile.csv"); // comma separated
    InputStream is = url.openConnection().getInputStream();
    InputStreamReader reader = new InputStreamReader(is);
    System.out.println("reader.getEncoding(): " + reader.getEncoding());
    For both an ISO-8859-1 file and a UTF-8 file I get the following print out:
    reader.getEncoding(): Cp1252
    Could it have something to do with this warning during boot the .war?
    [WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform dependent!
    If I use local files it prints ISO8859_1 for one of the files.

    All of that is because the HTTP server attaches a charset to every response. That's what you are seeing. The server may be using some logic to determine the actual encoding of the file it returns, or it may simply be using a hard-coded charset which may or may not be suitable for reading the file. The latter is unfortunately more likely.
    By the way if you receive an XML file over HTTP, and the HTTP charset differs from the encoding declared in the XML document, there's a rule which says the HTTP charset takes precedence. (I don't know where that rule is documented, but I have encountered that situation in real life -- the data came from a Google application -- and that rule was indeed the right thing to do.)
    If you're still under the impression that there's something which can look at a file and determine what encoding was used to produce it, let me tell you that there isn't. Sure, there's that XML prolog thing which works for XML files (if they weren't botched by the producer), but for text files in general there's no way to determine their encoding. Short of asking the person who created them, that is.

  • Help: how  to get text from IFRAME

    <!-- The box where we type-->
    <IFRAME class="mytext" width="100%" ID="mytext" height="200">
    </IFRAME>
    someone can tell me how i get text in my servlet from
    <IFRAME>
    thankx in advance...

    someone can tell me how i get text in my servlet from
    <IFRAME>
    thankx in advance...Hmm. I think you are mixing something up here. Why would you use an IFrame for entering text? IFrame is used for including content from different HTML-pages inside your page.
    If you want to have a textbox for an user to enter text into and submit it to a server, you need a form and a textarea inside that. Like this:
    <form action="myServlet" method="post">
    <textarea name="myArea">
    </textarea>
    <input type="Submit" value="Ok">
    </form>Change the action in the form to reflect the mapping to your servlet.
    Then you can just do a String enteredText = request.getParameter( "myArea" ); inside your servlet.
    If you insist that you need to use an IFrame, I guess the only way to do it would be to write a Javascript function, that copies the contents from the IFrame to a hidden field before the page is submitted to your servlet. In your servlet you would read the value from the hidden field.
    .P.

  • How do I download and saves html files off my website to store and save using Dreamweaver CS3?

    How do I download and saves html files off my website to store and save using Dreamweaver CS3?
    I need to save all files from web and store onto a drive to reupload to a new domain name.  I use Dreamweaver CS3.

    First define your Local Site folder in Dreamweaver.  DW will use this folder to store your site files.
    Go to Manage Sites > New or Edit site.  See screenshots.
    Servers:  Enter your remote server's log-in details.  When complete, hit TEST to see if the connection is working.  If all is well, hit SAVE.
    From the Files Panel (F8), click on Remote Server to show the files that are currently there.  Click the green Down Arrow to GET files from remote server to your local site folder.
    Nancy O.

  • Plz help to get the tag of html file

    hi
    I wants to convert the HTML file into PDF file dynamically.
    So first i wants to get the tag of html file when we pass the path of html file.
    so how i do this plz help me.

    Plz Tell me how I get the HTML Tag through the java code.

  • How to get the complete abap help file

    hi
    how to get the complete abap help file i mean F1 file.
    please provide some me links to download that file.

    Hi Kiran,
       If u want complete help for particular topic ; for that SAP has provided in built Transaction for help.
    Transaction is ABAPDOCU.
    I have certain link which will help u.
    SAP, ABAP interview question and answers
    http://www.geocities.com/sap_interviewquestions/
    IMP Link for All
    http://www.geocities.com/SiliconValley/Campus/6345/abapindx.htm
    Common Links
    http://www.sappoint.com/abap.html
    http://www.sap-img.com/abap-function.htm
    http://www.easymarketplace.de/online-pdfs-q-s.php
    http://help.sap.com/
    http://sapassist.com/groups/groups.asp?v=sap-r3-dev&m=3&y=2004
    http://training.saptechies.com/sap-basis-certification-sample-questions/
    http://www.geocities.com/mpioud/Abap_programs.html
    http://cma.zdnet.com/book/abap/index.htm
    http://www.sapdevelopment.co.uk/
    http://www.sap-img.com/
    http://juliet.stfx.ca/people/fac/infosys/abap.htm
    http://help.sap.com
    http://www.sap-img.com
    http://www.thespot4sap.com
    http://www.sap-basis-abap.com/
    http://www.sapdevelopment.co.uk/
    http://www.sap-img.com/
    http://juliet.stfx.ca/people/fac/infosys/abap.htm
    http://help.sap.com/saphelp_46c/helpdata/en/d3/2e974d35c511d1829f0000e829fbfe/frameset.htm
    http://help.sap.com/saphelp_46c/helpdata/en/d6/0db357494511d182b70000e829fbfe/frameset.htm
    http://www.henrikfrank.dk/abapexamples/SapScript/sapscript.htm
    http://www.sapgenie.com/abap/example_code.htm
    http://www.geocities.com/SiliconValley/Campus/6345/abapindx.htm
    http://help.sap.com/printdocu/core/Print46c/en/Data/Index_en.htm
    http://help.sap.com/saphelp_40b/helpdata/en/4f/991f82446d11d189700000e8322d00/applet.htm
    http://www.sap-img.com/abap-function.htm
    http://www.sapgenie.com/abap/code/abap19.htm
    http://www.sap-img.com/abap/more-than-100-abap-interview-faqs.htm
    http://www.planetsap.com/Tips_and_Tricks.htm
    http://help.sap.com/saphelp_40b/helpdata/ru/d6/0dc169494511d182b70000e829fbfe/applet.htm
    http://www.henrikfrank.dk/abapexamples/SapScript/symbols.htm
    http://www.henrikfrank.dk/abapexamples/index.html
    http://sap.ittoolbox.com/documents/document.asp?i=752
    http://members.aol.com/_ht_a/skarkada/sap/
    http://sappoint.com/abap/
    http://members.tripod.com/abap4/SAP_Functions.html
    http://members.ozemail.com.au/~anmari/sap/index.html
    http://www.planetsap.com/Userexit_List.htm
    http://www.planetsap.com/Tips_and_Tricks.htm
    http://www.kabai.com/abaps/q.htm
    http://www.planetsap.com/Userexit_List.htm
    http://help.sap.com/saphelp_bw21c/helpdata/en/c4/3a8090505211d189550000e829fbbd/frameset.htm
    http://www.sapgenie.com/abap/bapi/example.htm
    http://help.sap.com/saphelp_45b/helpdata/en/65/897415dc4ad111950d0060b03c6b76/content.htm
    http://www.sap-basis-abap.com/index.htm
    http://help.sap.com/saphelp_40b/helpdata/en/fc/eb2c46358411d1829f0000e829fbfe/frameset.htm
    http://help.sap.com/saphelp_46c/helpdata/en/aa/aeb23789e95378e10000009b38f8cf/frameset.htm
    http://www.geocities.com/ResearchTriangle/1635/system.html
    http://www.sapdesignguild.org/resources/MiniSG/3_Managing/3_Functions_Table_Control.htm
    http://help.sap.com/saphelp_45b/helpdata/en/d1/801bdf454211d189710000e8322d00/content.htm
    http://www.sapfans.com/sapfans/repos/saprep.htm
    http://www.planetsap.com/howdo_a.htm
    http://help.sap.com/saphelp_util464/helpdata/en/69/c2516e4ba111d189750000e8322d00/content.htm
    http://www.sapgenie.com/abap/smartforms_detail.htm
    http://www.sap-img.com/abap.htm
    http://help.sap.com/saphelp_46c/helpdata/en/fc/eb2d67358411d1829f0000e829fbfe/content.htm
    http://www.geocities.com/victorav15/sapr3/abap.html
    http://www.henrikfrank.dk/abapexamples/SapScript/sapscript.htm
    http://abap4.tripod.com/Other_Useful_Tips.html
    http://help.sap.com/saphelp_45b/helpdata/en/cf/21ee2b446011d189700000e8322d00/content.htm
    http://www.sap-basis-abap.com/sapmm.htm
    http://sap.ittoolbox.com/nav/t.asp?t=303&p=448&h1=303&h2=322&h3=448
    http://sapfans.com/
    http://cma.zdnet.com/book/abap/ch03/ch03.htm
    http://help.sap.com/saphelp_40b/helpdata/en/4f/991f82446d11d189700000e8322d00/applet.htm
    http://sappoint.com/abap/
    http://www.henrikfrank.dk/abapuk.html
    http://www.sts.tu-harburg.de/teaching/sap_r3/ABAP4/abapindx.htm
    http://www.sapgenie.com/abap/index.htm
    http://www.sap-img.com/abap.htm
    http://www.sapdevelopment.co.uk/tips/tipshome.htm
    http://help.sap.com/printdocu/core/Print46c/en/Data/Index_en.htm
    http://sap.ittoolbox.com/nav/t.asp?t=322&p=322&h1=322
    http://sap.ittoolbox.com/nav/t.asp?t=448&p=448&h1=448
    http://www.thespot4sap.com/
    http://www.kabai.com/abaps/q.htm
    http://www.geocities.com/mpioud/Abap_programs.html
    http://www.sapgenie.com/abap/tips_and_tricks.htm
    http://www.sapassist.com/code/d.asp?whichpage=1&pagesize=10&i=10&a=c&o=&t=&q=&qt=
    Mark Helpfull answrs
    Regards
    Manoj

  • How to get text to align to the very top of a DIV?

    How to get text to align to the very top of a DIV?
    There is a gap at the top. I am using in this example H1 and
    a paragraph tag.
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
    Transitional//EN" "
    http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="
    http://www.w3.org/1999/xhtml">
    <head>
    <meta http-equiv="Content-Type" content="text/html;
    charset=iso-8859-1" />
    <title>Text Alignment</title>
    <style type="text/css">
    body {
    background-color: #4a7186;
    p {
    font-family: Georgia, "Times New Roman", Times, serif;
    font-size: 2em;
    margin: 0px;
    padding: 0px;
    h1 {
    font-family: Georgia, "Times New Roman", Times, serif;
    font-size: 3em;
    margin: 0px;
    padding: 0px;
    #box1 {
    width: 450px;
    background-color: #FFFFFF;
    margin-right: auto;
    margin-left: auto;
    margin-top: 30px;
    padding: 0px;
    margin-bottom: 0px;
    #box2 {
    width: 300px;
    background-color: #FFFFFF;
    margin-right: auto;
    margin-left: auto;
    margin-top: 30px;
    padding: 0px;
    margin-bottom: 0px;
    </style>
    </head>
    <body>
    <div id="box1">
    <h1>Lorem ipsum H1</h1>
    </div>
    <div id="box2">
    <p>Lorem ipsum P</p>
    </div>
    </body>
    </html>

    <h1 style="margin-top:0;">Lorem ipsum H1</h1>
    It's a margin issue in SOME browsers.
    Murray --- ICQ 71997575
    Adobe Community Expert
    (If you *MUST* email me, don't LAUGH when you do so!)
    ==================
    http://www.projectseven.com/go
    - DW FAQs, Tutorials & Resources
    http://www.dwfaq.com - DW FAQs,
    Tutorials & Resources
    ==================
    "davidhelp" <[email protected]> wrote in
    message
    news:gpbg8q$8c6$[email protected]..
    > How to get text to align to the very top of a DIV?
    > There is a gap at the top. I am using in this example H1
    and a paragraph
    > tag.
    >
    >
    >
    > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
    Transitional//EN"
    > "
    http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    > <html xmlns="
    http://www.w3.org/1999/xhtml">
    > <head>
    > <meta http-equiv="Content-Type" content="text/html;
    charset=iso-8859-1" />
    > <title>Text Alignment</title>
    > <style type="text/css">
    > body {
    > background-color: #4a7186;
    > }
    > p {
    > font-family: Georgia, "Times New Roman", Times, serif;
    > font-size: 2em;
    > margin: 0px;
    > padding: 0px;
    > }
    >
    >
    >
    > h1 {
    > font-family: Georgia, "Times New Roman", Times, serif;
    > font-size: 3em;
    > margin: 0px;
    > padding: 0px;
    > }
    >
    > #box1 {
    > width: 450px;
    > background-color: #FFFFFF;
    > margin-right: auto;
    > margin-left: auto;
    > margin-top: 30px;
    > padding: 0px;
    > margin-bottom: 0px;
    > }
    >
    > #box2 {
    > width: 300px;
    > background-color: #FFFFFF;
    > margin-right: auto;
    > margin-left: auto;
    > margin-top: 30px;
    > padding: 0px;
    > margin-bottom: 0px;
    > }
    > </style>
    > </head>
    >
    > <body>
    >
    > <div id="box1">
    > <h1>Lorem ipsum H1</h1>
    > </div>
    >
    > <div id="box2">
    > <p>Lorem ipsum P</p>
    > </div>
    >
    > </body>
    > </html>
    >
    >

Maybe you are looking for

  • Any way to make Pages always open in a user selected screen position?

    It is a continuing irritation to have to close the Styles Drawer, drag the window all the way to the left of my screen, and reopen the Styles Drawer on the right. This is the way I need to work and I'm not going to change that. What I'm hoping is tha

  • Create PO with reference to a contract

    Hi all,. Can someone explain me the steps  to create PO from SOCO using a contract?? During contract creation,under header tab,what is the use of the check box "BASIC CONTRACT"??? How are price changes distributed from contracts to catalog???

  • HT4664 compressor doesn t work till maverick is install

    till i install marverick, compressor doesn t work, what should i do, i can t open it to install update... it say to old version... it does t appear on the app strode cause it was furnish with tha IMac

  • Automatically display remote images from trusted senders?

    To help control spam, I have Mail set to not display remote images. However, I get a couple daily emails that are from trusted senders and clicking on the "Display Images" button for these gets old pretty quickly. Is there a way I can get Mail to aut

  • Can I Change Photo/File Date In Photoshop Elements 2?

    Whenever I change the batteries in my digital camera I have to reset the time and date. Often I don't have time (or just forget) with the result that all the pictures have incorrect times and dates on them. This is NOT destroying my life but it is ir