Copy html source

how can I, from a given URL, copy the content from tag <body> to tag </body> of the html resul page in a JTextArea?

> can I have little example?
suhanduman code with a few modifications:
* URLReader
import java.net.*;
import java.io.*;
public class URLReader  {
private static StringBuffer strb = new StringBuffer();
    /** main */
    public static void main(String[] args) {
        URL url;
        URLConnection conn;
        BufferedReader input;
        try {
            url = new URL("http://www.google.com/index.html");
            conn = url.openConnection();
            conn.setDoInput(true);
            conn.setDoOutput(true);
            BufferedReader inbound = new BufferedReader(new InputStreamReader(url.openStream()));
            String line;
            while ((line = inbound.readLine()) != null) {
                strb.append(line);
            getEverythingBetween("<body", "</body>");
        catch(Exception e) {
            System.out.println(e);
    /** Get all text between s_1 and s_2 */
    private static void getEverythingBetween(String s_1, String s_2) {
        if(strb.indexOf(s_1) != -1 && strb.indexOf(s_2) != -1) {
            strb = strb.delete(0, strb.indexOf(s_1));
            strb = strb.delete(strb.indexOf(s_2), strb.length());
            strb = strb.delete(0, strb.indexOf(">")+1);
            System.out.println(strb);
} // class URLReaderGood luck!

Similar Messages

  • Java app to save html source and copy to clipboard

    I'm new to Java, and I'm doing a research paper which requires gathering data from the internet. I basically need to learn commands to create an app (which I will execute on a daily basis using Windows scheduler) to do the following task:
    1. Access a specific URL
    2. Save the HTML source as a text file
    3. Copy some of the text in the saved text file to the clipboard in Windows
    4. Paste this data in Excel
    5. Run a macro in Excel to sort the data
    I know enough VBA and Excel functions to parce the data, but if it is easier to use Java to sort the data into columns and rows in Excel, I would like to learn how to do that too.
    I'm running Windows XP. My friend tells me to download Dr. Java to edit .java files and also the SDK to compile the files. He also told me to find modules from the internet which I can use, but I'm not sure how to go about it. (I tried searching on the net but didn't get the hits I wanted.)
    Any help at this point would be much appreciated.

    If your target html is so fancy and complicated(table cell contain another table) or the table miss some <tr> or <td> tags, dividing the table into rows and column can be a difficult task.
    I recommend you choose the website with high download speed. Otherwise, you have to write more code to handle any loading failure.
    I recommend you use httpunit to get webpage with Java program. (There is usually proxy server in University, you can config the parameter for proxy server in the Java program calling httpunit)
    http://httpunit.sourceforge.net/
    It will take some times to learn those Java technique.

  • View the html source code of an apex page

    Hi everyone,
    I search to how I can view the html source code of an apex page and to be able to modify it. That's why viewing the html source code from the browser when the application is running doesn't arrange me.
    Has anyone an idea how it can this be possible?
    Best regards,

    Khadija Khalfallah wrote:
    Hi everyone,
    I search to how I can view the html source code of an apex page and to be able to modify it. That's why viewing the html source code from the browser when the application is running doesn't arrange me. What do you mean?
    Do you want to be able to pull up the HTML source generated by Apex, modify that copy, and then feed it back into Apex with the chagnes you made? If so you can't. Apex generates the HTML through its tools and you have to modify the generation routines to get different HTML.
    Do you merely want to look at the generated HTML? In Internet Explorer all you have to do is right click on the page and choose View Source to open a window with the HTML source in an editor. I sometimes find it useful to save a page and manually edit the copy to immediately see the effects of certain changes to the underlying HTML and/or Javascript without permanantly making the change in Apex.

  • How to edit the html source code for my site

    I have just started a blog, and am VERY new to it. I am trying to edit the html source code on my site (ie, to insert google adsense search bars). I go to my blog site, get to page source and see the html but I am not able to edit it. Not sure what I am doing wrong. Thanks!

    You can use any editor you want mine is set up for notepad.exe
    :see http://dmcritchie.mvps.org/firefox/firefox.htm#notepad
    :to invoke use "Ctrl+U" or View > Page Source
    :this is for sites that you maintain on your local drives or servers, and copy over to a website.

  • I have installed Copy HTML but cannot find it as an option under any Menu. Only copy is there.

    I have installed the Copy HTML plug-in but cannot find it as an option under any Menu. The only option is "copy". Where do I find this option so I can use it?

    Your screenshot shows that you have the page source visible, so you already have the source code including HTML code available to copy.<br />
    You should see the Copy HTML entry if you right-click a normally rendered HTML page.

  • PDF outputs continually crash after editing an html source file

    I have English and multiple localized versions of a RoboHelp webhelp project. I have created several PDF outputs for the localized versions. I can generate the PDFs fine when I don't touch any of the html source files.
    However, I had to make a change to one of the html source files to remove a <dl> from the source file. After making the change to the source file in an outside text editor (Notepad ++), I went back to RoboHelp to generate the PDF, and the PDF output crashed.
    The reason I was using an outside text editor is that when I tried to edit the <dl> HTML tags in RH, its HTML editor deleted all text within the tags being edited.  There was no undo at this point.  In anotehr attempt to use the RH Design editor, RH didn’t delete the <dd> or <dt> tags when those tag styles were changed to <h4> and <p>, respectively. RH inserted the new html wrapper inside the existing ones, instead of replacing it.
    Also, during the PDF generation process, a Visual Basic error message will appear dozens of times stating that the application was interrupted.  I must press OK to continue for each error message.
    What gives with the PDF generation failing like this?

    I hate to say it but it sounds like something you are doing in the external editor is not liked by RoboHelp.
    It could just be the way that text editor codes the file. I use EditPad Pro and that has various options for text encoding. My guess is your editor is not set to what RoboHelp expects.
    Aftr that what I would do here is create a new one topic project and recreate the problem, with the minimum of text. It may help you spot the code RoboHelp does not like.
    See www.grainge.org for RoboHelp and Authoring tips
    @petergrainge

  • Help in creating a Java class to convert Html source to XML

    He Everyone!
    I am using selenium as my automation tool
    I got the htmlsource of the page using selenium.
    Now i have to write a Java class which will convert the data (html source)
    (and output a data structure in standard XML format
    Can anyone give me an example codejava class) in how to acheive this please?
    Please mail to
    [email protected]
    Many thanks

    getafix14 wrote:
    Can anyone give me an example codejava class) in how to acheive this please?
    Please mail to
    [email protected]
    Sorry Charlie, but this isn't a "mail me the codez" forum. Either use Google, or try to share your problem and their solutions in the forum. Besides, a little Googling will find you a solution.

  • How to get the html source for these web page ?

    My code work well for standart page, but I'm unable to get the html source from these page with my vb program :
    http://www.slashdot.org
    http://userfriendly.org
    http://segfault.org
    here my code
    private sub commandgethtml_Click ()
    Inet1.Cancel
    Inet1.Protocol = icHTTP
    Inet1.URL = theURL
    HTMLcode = Inet1.OpenURL(theURL, icString)
    RichTextBox1.Text = HTMLcode
    end sub
    thanks in advance.

    Hello Cyrano,
    This Developer Forum focuses on the National Instruments product "Measurement Studio for Visual Basic" (formerly known as ComponentWorks). Our goal is to help people to better integrate this product into their test, measurement, and automation applications. Your question directly pertains to the Microsoft Internet Transfer Control. I think you would find an increased number of responses that are better focused on your question if you would repost it to a forum that specializes in general VB and internet programming. Good luck!
    Jeremiah Cox
    Applications Engineer
    National Instruments
    http://www.ni.com/ask

  • Copy HTML Code into DW?

    In CS6 I create a simple rectangle and increase the roundness of the rectangle. I then select the shape and choose edit > copy HTML code and I open Dreamweaver CS6 and paste the code into the code window. I thought it was supposed to write code to replicate the shape I created in FW, though instead it is just referencing a graphic of the shape I made in FW that was exported. What did I do wrong?
    Thanks.

    Use the CSS Properties panel, not Edit > Copy HTML Code.
    Create the rectangle
    Make sure it's selected
    Open the CSS propreties panel
    Click the Copy All button
    Go to Dreamweaver
    Create a Div and apply a class to the div
    Create a CSS class and within the rule, press Ctrl or Command + V to paste the CSS properties from Fireworks.
    Refresh the page (and switch to Live View) and you should see the rectangle, with all the supported visual properties, displayed using CSS.
    HTH

  • New Safari 2.0.3 displays some Router Firmware pages as HTML Source text

      Hello Safari users and gurus.
    I installed the new 10.4.4 Combo Updater yesterday, preceded and followed by permissions repairs and other maintenance, including cleaning Safari Caches. Safari 2.0.3 works.
    My problem: 10.4.4's new Safari 2.0.3 displays HTML source text after I click "Save" from within my Linksys (latest firmware) router's web based administration pages that worked correctly with the previous version of Safari.
    When I check or change my router settings, the initial router settings pages appear as they did with the previous version of Safari. However, with 10.4.4's new Safari 2.0.3, as soon as I click "Save" to attempt to save a changed router setting, the confirming info from the router is displayed as HTML source code instead of the expected HTML page display.
    If I wait for the router lights to indicate activity has ceased and then reload the router's opening admin address, the page opens again normally, and I can navigate to other pages and see that the changes have been saved. However, any click on any "Save" changes button in the router's pages delivers another page of HTML source text.
    I will watch for future router firmware updates to see whether the issue is resolved from the Linksys side.
    Does anyone have any suggestions for improvement now? All assistance will be appreciated.
    EZ Jim

     Thanks glefand. As you suggested, I tried my old iBook G3 that is still running 10.3.9, and it works fine for me.
    My G5 DP also worked normally up through OS X 10.4.3. My problem only began after I installed the 10.4.4 update. I think the Safari update included with the 10.4.4 update is what is causing the issue with the firmware pages on my Linksys router.
    Hopefully a future router firmware update (or Safari update?) will restore proper operation. If not, I will continue to reload from the router's Start page. The need to work around this glitch is annoying, but the actual function of my router setup works without problem, even though the pages do not display properly.
    Thanks again for your helpful suggestion,
    Jim
    G5 DP 1.8, 4.5G RAM, 2x160GB Seagate, 1,000va UPS   Mac OS X (10.4.4)   20"ACD, iSight, AirportCard, Klipsch GMX A-2.1 Audio

  • CQ page is not rendering properly. It is not rendering HTML. It is showing HTML source code as is.

    On some of the pages, I am getting this weird behavior wherein page is not rendering properly. It is showing HTML source code as is. Could you please help me out? What could be the issue? And how can we get rid of the same?

    Check your component jsp page. it is possible that it is just plan file without directives <@ or you might have miss to close tag which is creating source as text to render
    Paste your jsp code in case you need further help
    Thanks,
    Ajit

  • I see the license agreement as HTML source code

    Hello,
    I downloaded a trial version of Adobe Acrobat 9.0,
    but after I had installed it and start it, it wants me to agree on the license agreement.
    but my problem is that I can't click on Agree nor Disagree because I see only HTML source code
    see the printscreen I added below.
    does anyone know what is wrong, or wat I have to install to 'use' these HTML pages whitin Adobe Acrobat?
    I also downloaded the software again and installed it again and it happens also with trial versions of photoshop,
    I have not tested other software so far.
    Thanks in advance
    Kees
    PS: I am dutch so my english is not perfect.

    You seem to have missed this thread:
    http://www.adobeforums.com/webx/.3bc424c7/
    I understand Liam's suggestion has worked for many.

  • Characters missing in HTML source

    In ApEx 3.0.0.00.20 on Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod I have problems viewing the HTML pages in frontend. I miss some characters in the HTML source. The contents displayed are loaded from the database where they are saved correctly. When reloading the page, the errors appear randomly in other places in the source.
    Example:
    <b>(Textb></b>
    instead of
    <b>(Text)</b>
    The missing characters are not accpetable if buttons do not appear or processes do not run. The appilcation also runs on another instance without those bugs so this is not an application error.
    What I would like to know is if someone else ever had the same problem and if it is a performance error or if the error is found in the database or in the ApEx installation.
    Thanks in forward for your replies,
    Patrick
    Edited by: Pennypacker on Sep 16, 2009 11:00 AM
    Edited by: Pennypacker on Sep 16, 2009 11:01 AM
    Edited by: Pennypacker on Sep 16, 2009 11:02 AM

    Sorry for my bad example, this might confuse because I cannot add HTML tags here. The problem is not that the text is bold or not. I try to give more examples to make the problem clear.
    Example 1:
    input type="radio" name="f15" value="1" />1
    The < character is missing in front of the line, so no radio button is displayed and the HTML source code is shown as text in the browser.
    Example 2:
    Shown Text: zu helfennd einige Fragen zu beantworten
    Text in Database: zu helfen und einige Fragen zu beantworten
    +2 Characters missing+
    In the example above I meant that also the '<' character was missing and the bold-tag was not closed so that all text was shown bold.
    So the HTML code is not generated correctly by ApEx and randomly 1 or 2 characters are missing which makes the site unuseable. Any ideas?

  • Getting HTML source of current page

    Hi All,
    I generate an HTML report using an HTML DB application. Through the same application I want to get the source of this HTML page/report & send it as an inline HTML attachment.
    Can any body tell me how I can get the HTML source of the current page?
    Thanks,
    Ayush

    Hi
    [Follow this |PCD Location of Page Dynamically;
    BR
    Satish Kumar

  • Html source

    I'm new to apex.. I had written my HTML source code in the html body attribute section that is available in the page.. i had created a link.. this link i want it to appear in the main region(reports region).. but it appears at top of my page while i run this page dynamically.. what should i do about that?

    Hi,
    you have to create regions, position them on the page, and put your HTML source in the Region Source.
    The "html body attribute" must only contains attributes for the <body> tag of the page (for example onload="...").
    I suggest you to read the documentation.
    Yann.

Maybe you are looking for