How to read the html source code of a webpage.

How can I read the html source code of a webpage with a java application?
Is there a good idea?

>
How can I read the html source code of a webpage
with a java application?
Is there a good idea?
I don't know if this is a good idea, but it works.
1) Use a URL to obtain the document's location
2) Use a URLConnection to open a connection between your computer and the
document server
3) Connect to the server
4) Get the InputStream of said connection
5) Associate the Input Stream with a Buffered Input Stream
At this point you can use a loop to read lines from the BufferedInput Stream and append them to a TextArea or other suitable text component.

Similar Messages

  • How to get the HTML Source code from the active browser ?

    Hi All,
    I need to get the HTML Source code from the active browser (IE). I tried with the below code, but I am not able to get the Source code all the time, with respect to the different applications (http or https) and the user authentication has to be changes in few applications (_I dont know or not able to given that in the below code_). More over there is also a dependence of the URL to get the HTML Source code.
    Therefore what I feel is getting the HTML Source code from the given or active browser will be consistent than the URL. Since the Source code is available in the browser (IE) . Please help me with a sample code to achieve this . . . !
    HTMLDocument doc=(HTMLDocument) kit.createDefaultDocument();
    doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
    URL url = new URL(strURL);
    Reader HTMLReader = new InputStreamReader(url.openConnection().getInputStream());
    kit.read(HTMLReader, doc, 0);Thanks in advance,
    Regards,
    Jothi Venkatachalam
    Edited by: j0o on May 7, 2009 3:11 AM

    The simple answer is: you don't.
    Not only is it simply not possible, but the entire concept of "the active browser" doesn't exist.
    You were on the right track with your code to retrieve the page directly from the server, but as you noticed that code will only work for regular http connections.
    For https and other protocols you will need to use appropriate libraries for each protocol. Something like Apache Commons can help you with that. There are networking libraries in there for a lot of commonly used protocols.

  • How to edit the html source code for my site

    I have just started a blog, and am VERY new to it. I am trying to edit the html source code on my site (ie, to insert google adsense search bars). I go to my blog site, get to page source and see the html but I am not able to edit it. Not sure what I am doing wrong. Thanks!

    You can use any editor you want mine is set up for notepad.exe
    :see http://dmcritchie.mvps.org/firefox/firefox.htm#notepad
    :to invoke use "Ctrl+U" or View > Page Source
    :this is for sites that you maintain on your local drives or servers, and copy over to a website.

  • How to read the Java source code (in Netbeans)

    I use OS X10.5.5, NetBeans 6.1 and JSE 6 on a 64 bit mac.
    When I downloaded NB6.1 it had JSE 5 as it's default (and only) java platform. I ran Software Update to get Java 6 from Apple, used the Java Prefrences utitlity to set JSE6 as default. In NB I added the JDK6 platform, registered the JDK6 javadocs and noticed that I also have the option of registering the Java source code.
    I have three questions:
    1) How do I make JDK6 the default in NetBeans. The JDK5 keeps being default after I did the steps above and I don't see anywhere to change that.
    2) How do I read the Java 6 source code? I can see sun provides [source code| http://download.java.net/jdk6/] for their supported platforms. I dont see Apple doing the same for its JDK port. What would I need to do to get to read the java SE6 sources? or is it actually hiding somewhere in the /System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home hierarchy?
    3) Where does the JVM look for the binary code to run when I make a call to, say java.util.ArrayList or any other library. In my naivety I would have assumed it would be a .class file somewhere in the java Home folder, but I don't see anything like it.
    thanks in advance,
    chris

    This is taken from the help included with netbeans. In response to question 1.
    By default, the IDE uses the version of the Java SE platform (JDK) with which the IDE runs as the default Java platform
    for compilation, execution, and debugging. You can view your IDE's JDK version by choosing Help > About and clicking the
    Detail tab. The JDK version is listed in the Java field.
    You can run the IDE with a different JDK version by starting the IDE with the --jdkhome jdk-home-dir switch on the command line
    or in your IDE-HOME/etc/netbeans.conf file. For more information, see IDE Startup Parameters.
    In the IDE, you can register multiple Java platforms and attach Javadoc and source code to each platform. For example, if you
    want to work with the new features introduced in JDK 5.0, you would either run the IDE on JDK 5.0 or register JDK 5.0 as a
    platform and attach the source code and Javadoc to the platform.
    In  , you can switch the target JDK in the Project Properties dialog box. In  , you have to set the target JDK in the Ant script itself,
    then specify the source/binary format in the Project Properties dialog box.
    To register a new Java platform:
    Choose Tools > Java Platforms from the main window.
    Click New Platform and select the directory that contains the Java platform. Java platform directories are marked with a  
    in the file chooser.
    Use the Sources and Javadoc tabs to attach Javadoc documentation and source code for debugging to the platform.
    Click Close.
    To set the default Java platform for a standard project:
    Right-click the project's root node in the Projects window and choose Properties.
    In the Project Properties dialog box, select the Libraries node in the left pane.
    Choose the desired Java platform in the Java Platform combo box.
    Switching the target JDK for a standard project does the following:
    Offers the new target JDK's classes for code completion.
    If available, displays the target JDK's source code and Javadoc documentation.
    Uses the target JDK's executables (javac and java) to compile and execute your application.
    Compiles your source code against the target JDK's libraries.
    If you want to register additional Java platforms with the IDE, you can do so by clicking the Manage Platforms button.
    Then click the Add Platform button and navigate to the desired platform.
    To set the target Java platform for a free-form project:
    In your Ant script, set the target JDK as desired in the javac, java, and javadoc tasks.
    Right-click the project's root node in the Projects window and choose Properties.
    In the Sources panel, set the level of JDK you want your application to be run on in the Source/Binary Format combo box.
    When you access Javadoc or source code for JDK classes, the IDE searches the Java platforms registered in the
    Java Platform Manager for a platform with a matching version number. If no matching platform is found, the IDE's default platform is used instead.
    See Also
    Managing the Classpath
    Declaring the Classpath in a Free-Form Project
    Stepping Through Your Program
    Legal Notices

  • How do I delete the html source code

    I found the html source code, selected it but am not able to delete it so I can put in my own code. I am using firefox

    Could you describe in more detail what file it is. Is this a page on a website that you are editing, or a page on your computer you use as your home page, or some other page?

  • View the html source code of an apex page

    Hi everyone,
    I search to how I can view the html source code of an apex page and to be able to modify it. That's why viewing the html source code from the browser when the application is running doesn't arrange me.
    Has anyone an idea how it can this be possible?
    Best regards,

    Khadija Khalfallah wrote:
    Hi everyone,
    I search to how I can view the html source code of an apex page and to be able to modify it. That's why viewing the html source code from the browser when the application is running doesn't arrange me. What do you mean?
    Do you want to be able to pull up the HTML source generated by Apex, modify that copy, and then feed it back into Apex with the chagnes you made? If so you can't. Apex generates the HTML through its tools and you have to modify the generation routines to get different HTML.
    Do you merely want to look at the generated HTML? In Internet Explorer all you have to do is right click on the page and choose View Source to open a window with the HTML source in an editor. I sometimes find it useful to save a page and manually edit the copy to immediately see the effects of certain changes to the underlying HTML and/or Javascript without permanantly making the change in Apex.

  • Are we allowed to use the Web developer function in Firefox version 5.0 to edit the html source code associated with the Firefox home page?

    Locking at request of OP - https://support.mozilla.com/en-US/questions/844506
    Are we allowed to use the Web developer function, under the "Firefox" tab in Firefox version 5.0, to edit the html source code associated with the Firefox version 5.0 home page ( so that we can personalize the home page )? Is this legal?
    Sincerely in Christ,
    Russell E. Willis

    Solution: (Free Download Manager)
    Go here: http://codecpack.co/download/Free_Download_Manager.html and download Free Download Manager 3.8.1067 Beta 3, it works perfectly with Firefox 5.0.1
    Solution: (to Google mail aka Gmail)
    I have had this problem for a while since I did a previous Firefox update, where I had to force Gmail to load in Basic HTML else it's next to impossible to use it. The solution is this: simply update your Java, and Gmail will work without a problem using Standard HTML. To update your Java go here: http://www.java.com/en/ and select "Free Java Download".
    And beta normally, universally, means "the not quite there yet version of the version we're aiming for" NORMALLY used during production and testing of a type of software.

  • How to read the HTML code from a webpage

    Hi, I want to be able to read the HTML code of a web page
    In order to extract some info from some pages.
    How can I do that?
    is it using cl_http_client ? I played with that class a bit, but wih no sucess to what I need...

    Hi RagnaRock,
    You can use the following form, hope it helps you.
    Regards,
    Ozcan.
    form get_data_from_url using iv_url type clike changing iv_data type string.
    DATA: HTTP_CLIENT TYPE REF TO IF_HTTP_CLIENT .
      clear  iv_data.
      CALL METHOD CL_HTTP_CLIENT=>CREATE_BY_URL
           EXPORTING
            URL                = IV_URL
    *          PROXY_HOST         = '10.1.1.1'
    *          PROXY_SERVICE      = '1234'
    *       SSL_ID             =
           IMPORTING
             CLIENT             = HTTP_CLIENT
           EXCEPTIONS
             ARGUMENT_NOT_FOUND = 1
             PLUGIN_NOT_ACTIVE  = 2
             INTERNAL_ERROR     = 3
             OTHERS             = 4.
      CHECK SY-SUBRC = 0.
      CALL METHOD HTTP_CLIENT->SEND
        EXCEPTIONS
          HTTP_COMMUNICATION_FAILURE = 1
          HTTP_INVALID_STATE         = 2.
      CHECK SY-SUBRC = 0.
      CALL METHOD HTTP_CLIENT->RECEIVE
        EXCEPTIONS
          HTTP_COMMUNICATION_FAILURE = 1
          HTTP_INVALID_STATE         = 2
          HTTP_PROCESSING_FAILED     = 3.
      CHECK SY-SUBRC = 0.
      iv_data = HTTP_CLIENT->RESPONSE->GET_CDATA( ).
    endform.                    "get_data_from_url
    Edited by: Ozcan Gurdal on Aug 11, 2010 4:07 PM

  • How can I get HTML source code of dynamic Web page ?

    Hi,
    I would like to write a program that can get HTML source code of some Web pages, but there are some dynamic Web pages that I can't do it with URL class. For example: http://www.europlex.ch/web/3schedule/showAll.jsp?context=schedule.
    Thank you very much for any solution.

    Thank you for your attention. I'm sorry because I could't describe it clearly. If you don't mind, please follow this step to understand my problem:
    In the browser, enter URL: http://www.europlex.ch/web/main.jsp?locale=_english
    Click on Schedule, you will see the page with Cinema name, Film, Time... I would like to write Java Program to get HTML code of only this Page (Not frame page).
    Do you have any idea ?
    Thanhks,

  • How to get the html source for these web page ?

    My code work well for standart page, but I'm unable to get the html source from these page with my vb program :
    http://www.slashdot.org
    http://userfriendly.org
    http://segfault.org
    here my code
    private sub commandgethtml_Click ()
    Inet1.Cancel
    Inet1.Protocol = icHTTP
    Inet1.URL = theURL
    HTMLcode = Inet1.OpenURL(theURL, icString)
    RichTextBox1.Text = HTMLcode
    end sub
    thanks in advance.

    Hello Cyrano,
    This Developer Forum focuses on the National Instruments product "Measurement Studio for Visual Basic" (formerly known as ComponentWorks). Our goal is to help people to better integrate this product into their test, measurement, and automation applications. Your question directly pertains to the Microsoft Internet Transfer Control. I think you would find an increased number of responses that are better focused on your question if you would repost it to a forum that specializes in general VB and internet programming. Good luck!
    Jeremiah Cox
    Applications Engineer
    National Instruments
    http://www.ni.com/ask

  • How to read the html code from a specific page

    is there a way that I can get the source code of a specific url and display that into a textarea with java?

    Sure. One thing you could do would be to have a servlet that fetches html from a given url and stores the html in a jsp bean. You can then use the data in this bean to populate the text area.

  • How to understand the Vector source code for function elements()?

    hello all:
    This following code snippet is extracted from java.util.Vector source code.
    who can tell me how i can understand what the function nextElement does?
    thank you
    public Enumeration elements() {
         return new Enumeration() {
             int count = 0;
             public boolean hasMoreElements() {
              return count < this.num_elem;
             public Object nextElement() {
              synchronized (Vector.this) {   //???
                  if (count < elementCount) {//???
                   return elementData[count++];
              throw new NoSuchElementException("Vector Enumeration");
        }

    Perhaps code would help more. This codeimport java.util.Vector;
    import java.util.Enumeration;
    import java.util.Iterator;
    public class Test {
        public static void main(String[] arghs) {
         Vector v = new Vector();
         Integer two = new Integer(2);
         // Fill the Vector
         v.add("One");
         v.add(two);
         v.add(new StringBuffer("Three"));
         // Enumerate through the objects in the vector
         System.out.println("---- Enumeration ----");
         Enumeration e = v.elements();
         while (e.hasMoreElements()) System.out.println(e.nextElement());
         // Loop through the objects in the vector
         System.out.println("---- Loop ----");
         for (int i=0; i<v.size(); i++) System.out.println(v.get(i));
         // Iterate through the objects in the vector
         System.out.println("---- Iterator ----");
         Iterator i = v.iterator();
         while (i.hasNext()) System.out.println(i.next());
    }produces this output
    ---- Enumeration ----
    One
    2
    Three
    ---- Loop ----
    One
    2
    Three
    ---- Iterator ----
    One
    2
    Three
    So, for a Vector, all three do the same thing. However, Iterator is part of the Collection Framework (see [url http://java.sun.com/j2se/1.4.2/docs/guide/collections/index.html]here). This allows you to treat a whole bunch of [url http://java.sun.com/j2se/1.4.2/docs/guide/collections/reference.html]objects which implement the Collection interface all the same way. I know this is way more than you were asking, but I hope it at least clears up what you were asking.

  • How to load the JSP Source Code from Browser

    I have made a program using JSP ( one file ) and Java Beans ( 2 files ).
    The processed in those JavaBeans files and for output to Browser using JSP File.
    And I'm using JBuilder7 Software for building my applicaton.
    I already tested my source through JBuilder ( Applet Viewer ) and my source is running well but I got problem when to load that JSP file through Browser ( IE 6 ) with URL.
    Can anybody help me with this ?
    Best Regards,
    Yeppy S.

    I have made a program using JSP ( one file ) and Java Beans ( 2 files ).
    The processed in those JavaBeans files and for output to Browser using JSP File.
    And I'm using JBuilder7 Software for building my applicaton.
    I already tested my source through JBuilder ( Applet Viewer with TOMCAT Virtual Machine ) and my source is running well but I got problem when to load that JSP file through Browser ( IE 6 ) with URL.
    I put my source at F:\Source\Kiosk9 and my JSP file at F:\Source\KiosK9\defaultroot.
    And the JavaBeans Files at F:\Source\KiosK9\src\kiosk9.
    And the name of the package is kiosk9.
    My structure directory :
    F:\Source\KiosK9
    F:\Source\KiosK9\bak
    F:\Source\KiosK9\classess
    F:\Source\KiosK9\defaultroot
    F:\Source\KiosK9\src
    F:\Source\KiosK9\tomcat
    This directory structure was made automatically by JBuilder7 ( F:\JBuilder7 ).
    URL : http:\\localhost:8080\defaultroot\KiosK9Jsp.jsp
    And the error is :
    "Socket Error
    Connection refused by Remote Host"
    Can anybody help me with this ?
    Best Regards,
    Yeppy S.

  • How to Read the one Source Column data and Display the Results

    Hi All,
         I have one PR_ProjectType Column in my Mastertable,Based on that Column we need to reed the column data and Display the Results
    Ex:
    Pr_ProjectType
    AD,AM
    AD
    AM
    AD,AM,TS,CS.OT,TS
    AD,AM          
    like that data will come now we need 1. Ad,AM then same we need 2. AD also same we need 3. AM also we need
    4.AD,AM,TS,CS.OT,TS in this string we need AD,AM  only.
    this logic we need we have thousand of data in the table.Please help this is urgent issue
    vasu

    Hi Vasu,
    Based on your description, you want to eliminate the substrings (eliminated by comma) that are not AD or AM in each value of the column. Personally, I don’t think this can be done by just using an expression in the Derived Column. To achieve your goal, here
    are two approaches for your reference:
    Method 1: On the query level. Replace the target substrings with different integer characters, and create a function to eliminate non-numeric characters, then replace the integer characters with the corresponding substrings. The statements
    for the custom function is as follows:
    CREATE FUNCTION dbo.udf_GetNumeric
    (@strAlphaNumeric VARCHAR(256))
    RETURNS VARCHAR(256)
    AS
    BEGIN
    DECLARE @intAlpha INT
    SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric)
    BEGIN
    WHILE @intAlpha > 0
    BEGIN
    SET @strAlphaNumeric = STUFF(@strAlphaNumeric, @intAlpha, 1, '' )
    SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric )
    END
    END
    RETURN ISNULL(@strAlphaNumeric,0)
    END
    GO
    The SQL commands used in the OLE DB Source is like:
    SELECT
    ID, REPLACE(REPLACE(REPLACE(REPLACE(dbo.udf_GetNumeric(REPLACE(REPLACE(REPLACE(REPLACE([ProjectType],'AD,',1),'AM,',2),'AD',3),'AM',4)),4,'AM'),3,'AD'),2,'AM,'),1,'AD,')
    FROM MyTable
    Method 2: Using a Script Component. Add a Derived Column Transform to replace the target substrings as method 1, use Regex in script to remove all non-numeric characters from the string, add another Derived Column to replace the integer
    characters to the corresponding substring. The script is as follows:
    using System.Text.RegularExpressions;
    Row.OutProjectType= Regex.Replace(Row.ProjectType, "[^.0-9]", "");
    References:
    http://blog.sqlauthority.com/2008/10/14/sql-server-get-numeric-value-from-alpha-numeric-string-udf-for-get-numeric-numbers-only/ 
    http://labs.kaliko.com/2009/09/c-remove-all-non-numeric-characters.html 
    Regards,
    Mike Yin
    TechNet Community Support

  • How to generate the Cobol Source part of the ApplicationViewer

    Hello,
    I'm actually generate the Cobol Source part of the application viewer on a windows environment.
    I'm trying to build it from a Unix environment but I don't know how because I doesn't find the same script in Unix I use in windows.
    Can someone tell me how to generate the Cobol Source part of the application Viewer please ?
    Thank you in advance.

    >
    How can I read the html source code of a webpage
    with a java application?
    Is there a good idea?
    I don't know if this is a good idea, but it works.
    1) Use a URL to obtain the document's location
    2) Use a URLConnection to open a connection between your computer and the
    document server
    3) Connect to the server
    4) Get the InputStream of said connection
    5) Associate the Input Stream with a Buffered Input Stream
    At this point you can use a loop to read lines from the BufferedInput Stream and append them to a TextArea or other suitable text component.

Maybe you are looking for