Java app to save html source and copy to clipboard

I'm new to Java, and I'm doing a research paper which requires gathering data from the internet. I basically need to learn commands to create an app (which I will execute on a daily basis using Windows scheduler) to do the following task:
1. Access a specific URL
2. Save the HTML source as a text file
3. Copy some of the text in the saved text file to the clipboard in Windows
4. Paste this data in Excel
5. Run a macro in Excel to sort the data
I know enough VBA and Excel functions to parce the data, but if it is easier to use Java to sort the data into columns and rows in Excel, I would like to learn how to do that too.
I'm running Windows XP. My friend tells me to download Dr. Java to edit .java files and also the SDK to compile the files. He also told me to find modules from the internet which I can use, but I'm not sure how to go about it. (I tried searching on the net but didn't get the hits I wanted.)
Any help at this point would be much appreciated.

If your target html is so fancy and complicated(table cell contain another table) or the table miss some <tr> or <td> tags, dividing the table into rows and column can be a difficult task.
I recommend you choose the website with high download speed. Otherwise, you have to write more code to handle any loading failure.
I recommend you use httpunit to get webpage with Java program. (There is usually proxy server in University, you can config the parameter for proxy server in the Java program calling httpunit)
It will take some times to learn those Java technique.

Similar Messages

  • Help in creating a Java class to convert Html source to XML

    He Everyone!
    I am using selenium as my automation tool
    I got the htmlsource of the page using selenium.
    Now i have to write a Java class which will convert the data (html source)
    (and output a data structure in standard XML format
    Can anyone give me an example codejava class) in how to acheive this please?
    Please mail to
    [email protected]
    Many thanks

    getafix14 wrote:
    Can anyone give me an example codejava class) in how to acheive this please?
    Please mail to
    [email protected]
    Sorry Charlie, but this isn't a "mail me the codez" forum. Either use Google, or try to share your problem and their solutions in the forum. Besides, a little Googling will find you a solution.

  • HP AiO Remote app will not recognize scan and copy function of HP Envy 120

    Good morning! HP AiO Remote App is installed on my iPad4 and in the same WiFi network with my HP Envy 120 printer. The printer is recognized by the app and marked with a green led. When I tap on scan or copy in the app menu, it tells me that these functions were available in the app only for printers which provide these features. But HP Envy 120 has both scanner and copier. And last time it worked. Some idea what could have happened here? Thanks. UJ

    Replied on:
    I work on behalf of HP
    Please click “Accept as Solution ” if you feel my post solved your issue, it will help others find the solution.
    Click the “Kudos, Thumbs Up" on the bottom to say “Thanks” for helping!

  • Help Migrating Java app to Weblogic 9.2 and Workshop -- startup errors

    I need a little help getting my application up and running in Workshop. I recently migrated to Weblogic 9.2. Before I downloaded Workshop, I installed Weblogic 9.3 and have successfully setup and deployed the app on my PC outside of Workshop. The application is packaged as an EAR – I have extracted the war file and other contents from the ear onto my PC in order to do the deployment.
    In Workshop, I have setup the application as a Java project, have pointed to the source code and required libraries, and have been able to successfully compile the application. I have setup a ‘run dialog’ with similar parameters as in the config.xml and other configuration files.
    The problem I am currently having is when I start the application in Workshop, it is asking for a username. I have tried to enter the username that I setup with the original deployment (before I installed Workshop) and it does not accept it – it always shuts down.
    How can I either make it not require a username, or else use (point to) the same startup configuration as my original deployment, from before installing Workshop?

    On Tue, 5 Jun 2007 05:30:09 -0700, Paul Harvey <> wrote:
    I've been unable to find clear, authoritative information on this...
    I have an app which requires Java 1.4.x. Should I be able to deploy
    it on WL 9.2, or does 9.2 only support apps written in 1.5?WL9.2 has a Java 1.5 JVM. That should run Java 1.4 programs just fine.
    Tim Slattery
    [email protected]

  • Applescript. Select cell range and copy to clipboard

    Hi everyone,
    I have been looking at trying to select a range of cells, eg B1 to H40, once selected I would like to copy the selection to the clipboard.
    I am having difficulty selecting a range of cells...
    any help would be very much appreciated.
    Many thanks

    Prepend my original suggestion to Jerry's and you have what you want:
    Make the Summary table reference only that data you want in your summary:
    select the table
    Command-C (Copy)
    Switch to Preview
    Command-N (New document from Clipboard)
    Command-S (Save)
    another way to to place the summary table in another sheet.  Then you can just print that sheet to PDF by selecting the sheet, then select the menu item "File > Print…", then the option in the "PDF" menu at the bottom right:

  • Read HTML Source

    How are all of you? I have a very simple but urgent question.
    I just want to know if it is possible to read the page's HTML
    source and store it in a String type variable. For example,
    If I have this function
    public function GetHTML():void
    HTMLService = new HTTPService();
    HTMLService.url = "";
    Now can you please let me know how to store the page source
    as a String. I tried to use var str:String =
    event.result.toString(); but it display "[object][/object]"
    Any suggestion and help will be greatly appreciated.
    I shall be very thankful to you.
    Best Regards

    BIJ001 wrote:
    Did you try to pull the URL "by hand", that is, with a browser? the previous poster just asked you to give it a try. So sit sown, use your web browser, type the URL in question and look what happens. You also get a 403 FORBIDDEN? Then it is really forbidden. Even Java can't do it.

  • Adobe acrobat reader 9 problem saving forms and copying photos

    I used adobe acrobat reader 6 at
    work and I have a form that I save with new data. I am also able to copy text and photos using version 6.
    When I try to save edited forms and copy files from versiion 9 It wont allow me to do that.
    Is their any way that I can copy text, photos and save forms in Adobe version 9?
    When I try to save forms in version 9 it states that I am unable to save the form. When I try to copy text or photos it will not allow me to copy.

    Are you talking about Reader or Acrobat. These are two different programs and Reader definitely has restrictions. Reader used to be called Acrobat Reader before version 6, but that has been gone a long time. The change was likely because of the confusion between Acrobat and Reader. The problems you indicate are suggestive of Reader, not of Acrobat.

  • How get a java app 2 run invusibly

    how do i get my java app to run at startup and behind the scenes

    Don't give the application a GUI. Make an executable and place it in the startup folder.

  • File movement and copying

    Here where i work we use a software, whose files are stored on the server. But we also have a local copy of these files so that we can use the software when not in the office. Or when the server is down. The problem is, that many of my co workers are not very computer literate and the constant copying back and forth of files confuses them.
    So I am trying to write an updater application. Basically all they would have to do is click a button, it would tell them whether there had been any important changes, and allow them to update by pressing a single other button.
    I have the gui written, I have all the checks done, but i cannot figure out how to make Java take files in one place and copy them to another place. I would think this would be something that java is capable of. But as of this moment I am drawing a blank.
    Any ideas?

    Quick and dirty method is to use Runtime.exec() to execute the copy/move commands directly, though this is platform dependant.
    Another way is to use InputStream and OutputStream to read the original file and write out the copy of the file.

  • I can not "save as source" an MP4 object web page in Firefox 6 for MAC

    I download MP4 music from VIMEO. When I click the link on their website to download the video it downloads and opens in a firefox webpage. I used to be able to Control click on the open VIMEO webpage window on my MAC and then I could choose "Save as Source" and the video would then save on my desktop as an MP4 file. Now with Foxfire 6.0 when I try to Control click in the window nothing happens. I do not get the little dialog box that allows me to access the Save as Source command.

    I am in the airport in Hong Kong right now and the Internet is very slow. I do not have time to download a full video, but I did start to access one and used the download helper add on and it appeared to be working. I will post the full result when I get home. At this point it appears to be a solution.

  • 5800XM need keyboard to enter data in java app

    I have downloaded an app to my 5800 but once in the app i need to enter my account details.
    There is no keyboard in the app to allow me to enter the details.
    any ideas please

    No idea's yet, but same issue.  I get a D-Pad in my Java apps (not a number pad) and I have a few apps where clicking th custom edit box doesn't give you a choice to bring up a keyboard (the Opera Mini app - address bar is an example.  filed a bug with Opera against that)

  • How to read and then extract HTMl source code using java program?

    Could someone tell me how to read and then extract the content of certain tag from html source code. For example, given url http://.... , I would like to know what the <Title> content <Title> in that page is.
    Any help is greatly appreciate.

    Use a URLConnection to make the connection to the page at the needed URL. From the URLConnection, you can get an InputStream that is the stream of data from that page. Just search through the stream and find the <title> tags (don't forget to check for case sensitivity).

  • RoboHelp HTML 10 is taking one of my topic files and copying it multiple times.

    I am using RoboHelp HTML 10 to update a reference project with about 1,500 topics in it. Today, I created a new topic with several internal bookmark links. I have edited this topic file several times and saved it without any problems but late today, when I wanted to save the file and close the project, RH gave me an error message about the file name being too long and the TOC reference was too long. This error message repeated over and over so I used Task Manager to kill RH and I rebooted my PC.  When I got RH running again, I discovered 40 copies of the topic file listed in the Project Manager window.  Each file name is a unique creation made up of the original topic file name and one or more of the internal links. Here is a sample of the names:
    Ultimate_Data_Display.htm (topic file with several bookmarks)
    Ultimate_Data_Display.htm#View_Data (no .htm suffix - when I open one of these files using NotePad, the file contains an exact copy of the original html code)
    The file names get longer and longer until the 260-? character limit is reached.
    RH appears to be parsing the topic file, pulling out the bookmark code, concatenating various bits together and then saving copies of the original file under these new names. Perhaps this is occuring because of some interaction between RH and the version control system we are using.  However, I don't know what version control system we have. I have a "source folder" on my local desktop and RH is doing some type of checking in and checking out of all the files that I work on.
    Any suggestions on how to stop this problem would be great, as I have several new topic files to create over the next few work-days.

    This is a new one on me.
    Have you tried deleting the CPD file and then working on the project?
    Where is the project located?
    Does your working environment have a backup tool running all the time, such as Memeo?
    See for RoboHelp and Authoring tips

  • Read HTML tags and Save Images in web page

    I had problem with reading HTML tags and save all images in that page. I can source code in web page but I dont know how to Identifly the image tag ( IMG tag ). I think i want to use string tokenizer class.
    But i dont know how to use it in my problem. If any one know how to do it. reply this.

    cnapagoda wrote:
    I had problem with reading HTML tags and save all images in that page. I can source code in web page but I dont know how to Identifly the image tag ( IMG tag ). I think i want to use string tokenizer class.
    But i dont know how to use it in my problem. If any one know how to do it. reply this.If you have a big, long string with HTML content in it you might try splitting on a regex like so:
    String html = ...
    String[] imgTags = html.split("<img.*?>");[|]
    to get your image tag data and then parsing that to get the src attribute. You can either treat this problem as a big string-parsing problem, or getting some HTML DOM library and using that to structure the page as a tree for easier access.
    If you want more help you'll have to show the code you have so far. We can't write this for you.

  • Safari is unable to save in HTML (source page option) offline webarchives

    Some years ago I made a very difficult and prolonged research: I found over 1500 web pages of medical researches and I stored it on the hard disk, using safari.
    I didn't know that its default format file is .webarchive.
    Now I have these 1500 .webarchive files and I need to use it on other PCs (also Windows and Linux systems), and they take 800 MB on my pendrive! (webarchive are bigger than imageless html files).
    I need to convert them in HTML format.
    Only 3 methods exist in the world:
    1) from terminal: textutil -convert html FILENAME.webarchive. I drag and drop 10 files a time in the webarchive window after the command string, but it converts about 5-6 files on 10: for the other it gives an error. Some files are impossible to convert using this method.
    2) WebArchive folderizer: I could drag and drop all the webarchive files on its small window: it will convert each webarchive file in a folder with the same name of the file. Then I could select all the folders, use FINDER's search function letting it find just HTML files and then move them in a folder and delete all the subfolders.
    But the HTML files extracted do not keep the original filename! This is important for me, because I need that each file must have the name of the medical research. I'll have to find (reading a list made in neooffice) the files and read them: over 1500 files, and it will need lots of time make an internal file text search method each time!
    3) Webarcher: it converts the files in .war format, not html. It is compatible with Windows, but it keeps the images, like a webarchive, and the space requied is high too.
    The method that I could use is this:
    Open the file with Safari in OSX, and re-save it in the "source page" (HTML) format... but.... SURPRISE!!!
    If I open a web-page with Safari, and I choose "save as...", it let me choose the format (html or webarchive).
    If I open a hard disk stored page with Safari, and I choose "save as...", the small menu in which I should choice the format... disappears!!!
    This happened years ago... now the OS X is newer... the Safari is newer... but the problem is the same!
    How could I solve it?
    Is this a bug?
    Thank you!

    Since Safari is able to save as HTML file a page if it is opened online (from web), making "Save as...", and it is unable to make so if the webarchive file is on the hard disk...I tried to upload on a web server some .webarchive files.
    Then I went online in the web page with the list of all the uploaded .webarchive files... (ex.: http://....filename1.webarchive, etc...)...
    My goal was to open ONLINE (not on hard disk) the files and then make "save as...",... but... SURPRISE!!! Safari is UNABLE to open a webarchive file from the web, it asks me where to save the file... it can only save it, it writes that it is an "application"...
    I think this happens because .webarchive is not a real web page but a "package"... but my displeasure is this:
    Once opened an hard disk's webarchive file, its content is loaded in ram and it appears like a normal web page (html + images + etc...)... so, in this moment, Safari should be able to save it "as...." HTML file too.... discarding images and other objects....
    Any advice?
    Thank you!!!

Maybe you are looking for

  • How am I supposed to choose between all these DE's?

    For the purposes of my post: DE = WM Hey, It's hard enough for me to choose which distribution to use. I think the next biggest decision for me is which DE I am to use. It's most likely between: - KDE - XFCE4 - Fluxbox - Enlightenment KDE Why I shoul

  • Why won't kodak5210 scan after upgrading to lion

    i have upgraded to lion noe my kodak 5210 won't scan


    where do we have screen layout for cost center master data,to control fields.I need to supress business area field in KS01.

  • Clean install of Apps in Leopard

    I know I am probably missing something here, but here is what I think I want to do, just not sure how... My goal: Just received new Mac Pro Dual 2.8 (early 2008). It has Leopard installed on an included 320-Gig Hard Drive. I have a WD 750-Gig Hard Dr

  • Indesign Server CC Trial issues

    The installation instructions states You need to download Adobe Provisioning Toolkit Enterprise Edition (APTEE) for using the InDesign Server trial The download link only refers to CS6, CS5.5 & CS5 not CC Since there is no Adobe Provisioning Toolkit