Extracting information from web pages

I have chosen a really difficult topic for my dissertation at uni and need some urgent advice as I am running out of time.
I am trying to develop a program to find the cheapest hotels available in the Edinburgh area. My program will take user input (eg dates, prices range, star prefererence etc) and search the web for closest matches. I am having difficulty how to approach this tho. Any advice would be greatly appreciated.
My biggest concern is how to actually access the information I want and return it to my program. Also do you think it would be best advisable to have a list of say 10 - 15 web sites that my program uses everytime and search each page individually? I am concerened that I have chosen a really difficult project that is not going to be acheiveable. Initially if I could even get something really simple working, that would give me some encouragement. ANy ideas?
Many Thanks
Andy

I have managed to set up a simple program whereby you enter the URL of any webpage and it will return the html code of that webpage. This is excatly what I will need to do. However can I send requets to a page and modify the page (eg selecting a date, no of rooms etc) and returning the html of the result. How can this be achieved??? If I can work out how to do this then I can simply parse the html result for the info I am looking for.
I appreciate some websites won't allow me to do this, but as long as I can get it working on a few sites I will be happy.
Andy

Similar Messages

  • How to extract data from web URL

    I was doing one project which need to extract data from web pages and then analyze these data. the question is how to extract data from there, using html parser? need help, thanks a lot

    I was doing one project which need to extract data
    from web pages and then analyze these data. the
    question is how to extract data from there, using
    html parser? need help, thanks a lotTry this:
    http://java.sun.com/docs/books/tutorial/networking/urls/readingURL.html
    Or, like you said yourself, use an HTML parser:
    http://java-source.net/open-source/html-parsers

  • I need an addon to extract music files from web pages like myspace.

    I'm running Firefox 23 on Ubuntu 12.04. I would like to be able to extract/download music files from web pages like MySpace. I searched through the addons but didn't find any. Is there and addon to download music from MySpace?

    There's a Search box at addons.mozilla.org which you can use. <br />
    [https://addons.mozilla.org/en-US/firefox/search/?q=MySpace+downloads&appver=23.0&platform=windows MySpace downloads addons search]
    The first one listed there says it works with MySpace.

  • Create PDF From Web Page Does Not Create Image Maps Properly

    I have a website that contains image maps. When I "Create PDF From Web Page", these image maps are not rendered properly. From what I can tell, this is a BUG within Acrobat, and I'm hoping that people can confirm this for me.
    I created a test page at http://www.quantumdynamix.net/clients/image-map-test/. I placed the images maps my manually coding the coordinate information, so the maps are PRECISELY placed exactly over the squares. Each image map navigates to an anchor corrisponding to the number on the red square.
    When I created the PDF using the "Create PDF From Web Page" feature, the image maps are rendered improperly. This file can be viewed at http://www.quantumdynamix.net/clients/image-map-test/ImageMapTest.pdf. To view the outlines of the image maps, please select "Tools" -> "Advanced Editing" -> "Link Tool". You can see the outlines are substantially incorrect.
    Please confirm that other can replicate this problem. Any solution to this issue would be very helpful!

    I tested this in Acrobat X and the exact same issue occurs
    http://www.quantumdynamix.net/clients/image-map-test/ImageMapTest-AcrobatX.pdf
    This has to be considered a legatimate bug, especially since IMAGE MAPS is listes as one of the supported HTML features via the help files

  • I am no longer able to open or download most PDFs from web pages. I have the pop-up blocker turned off in Preferences.

    In recent months I have found that I am unable to download or open PDF documents from web pages. Download links either open a blank page in a new tab or do nothing. PDFs that I am unable to view in Firefox can be viewed and/or downloaded in Safari most of the time and Chrome all of the time so I assume the problem is not my computer or operating system. I recently turned off the pop-up blocker but that did not make a difference. The problem has worsened over the past couple months, it appears that more and more web sites are moving to a different protocol for viewing/downloading PDFs. It used to happen occasionally but now I find that I am unable to view a majority - about 75% - of PDFs. I am using Firefox v18 for Mac with Mac OS X 10.6.8.

    hello ahegarty, you could try using the [https://addons.mozilla.org/firefox/addon/pdfjs/ PDF Viewer] addon, for more details also see [[PDF files are blank and can't be downloaded on Mac]].

  • Error message says need Adobe reader 8 or 9 installed to open pdfs from web pages yet Reader 9alredy

    Error message says need Adobe reader 8 or 9 installed to open pdfs from web pages yet Reader 9 is alredy installed on computer. Is this a 64 bit ossue although I am sure I did not have this problem prior to a replacement hard drive being installed.

    What is your operating system, browser?
    What is the exact message you are getting?

  • Where are files downloaded to on the mac when creating a pdf from web pages?

    where on the mac HD are files downloaded to  when creating a pdf from web pages?
    Im creating web pages from the whole site so creating a large document, so wondered where these fiels are stored on the mac so I can delete when finished as dont want to clog up the hard drive.

    Look at the LiveCycle server products.

  • Automize Create a PDF From Web Page

    I am trying to create PDF's from a list of 100+ web pages / URLs.
    I would like to automize the process as to save me time.
    (I have been doing some research but I am a newbie when it comes to programing / scripting / coding)
    My questions are
    1) Is it possible to write a script to execute Acrobat's Create PDF from Web Page option in order to automize a long list of URLs?
    Something along the lines of .bat file (reading something like)
    acrobat.exe "http://www.google.com" "%CD%\GoogleTest.pfd"
    acrobat.exe "http://www.yahoo.com" "%CD%\YahooTest.pdf"
    2) If it is possible to script / automize Acrobat's Create PDF from Web Page can you lead me to the right direction as to what programs I need to write a script (if any) and any examples of how the script would be written would be extremely helpful
    Thank you for your time

    You can post your question in the forum for Acrobat Scripting or Acrobat SDK.

  • When I try to print from web page I get "printer not activated - error code 30", printer works on Internet Explorer

    When I try to print from web page I get a dialogue box "Printer not activated - error code 30" followed by a dialogue box "An unknown error occurred while printing". The printer works on websites when I use Internet Explorer. Also, if I copy the web page and paste it into a Word document, I can print a copy. This issue just started this week; I have not encountered this problem prior to this week. Please advise.

    What does that error message say?
    See this: http://kb.mozillazine.org/Problems_printing_web_pages

  • How can I carry information from one page to another page???

    Hi, I have two different pages, and two different tables on them. Like city and house pages and tables. I wantto add new record in house tables which depends on city tables and they must on be different pages, I mean I cant use master detail tables or master detail forms.
    So this is the problem :This two tables have same column called cityId. And I have a button on city page called " create new house"How can I carry Cityid information from city page to haouse page when I pushed the button.Thanks for your help.

    Hi,
    I would suggest using something like what eax23 proposed, but using pageFlowScope instead. So in the 1st page:
    <af:commandButton .....>
      <af:setActionListener from="cityID" to="#{pageFlowScope.cityID}"/>
    </af:commandButton>In the second page you have your managed bean read the cityID from the page flow scope directly. This will prevent having issues from the bean scope that eax23 solution could cause sometimes.
    Regards,
    ~ Simon

  • How to include non web pages to the "Create PDF from Web Page" feature?

    In Acrobat Pro (v. 10), when I use the "Create PDF from Web Page" feature, it works great for html pages, but it skips non-html links (doc, pdf, ppt, xls, etc). I need Acrobat Pro to convert those files and put them in the order as well. I don't see an option for this in settings. Is there ANY way I can do this? This is for an archiving purpose and I have 10,000 plus files to convert. Please help.

    This is a question i'm trying to answer too. My issue is that I have a PDF file which itself contains links to both DOC and PDF files. The end result is that I need one consolidated PDF containing all the linked files (in order).
    I can run the "create from web page" on this PDF file, and it'll download them, but not convert them. It just adds them as "jumbled" text to the end of the document. I need it to download, convert, and then append them.
    So, as isunshine3 asked above, any way to have Adobe convert the files that it finds linked when running the "create from web page"?
    THanks
    Matt

  • Create PDF from Web page using Acrobat X - Page Order

    I have a structured web site that is in fact Program Help The web pages are structured as follows:
    index.html - Main Topic Index page with links to all topic subject index pages
    topic/index.html - Topic Subject Index Page with links to all subject pages
    topic/subject.html - Subject page
    .....etc
    Using Acrobat 5 "Create PDF from Web Page" created a perfect logical PDF  page structure in the page order of of the web site. In Acrobat 5 page 1  was the Main Topic Index Page, page 2 was the 1st Topic Subject Index  Page, page 3 was the 1st Subject Page, then the 2nd subject of the 1st  Topic, etc. until the Topic Subjects were exhausted after which the 2nd  Topic Subject Index Page and so it went on. As a result the bookmark  structure was sensible. The page order was as follows:
    Main Topic Contents
    Topic 1 Contents
    Subject 1 of Topic 1
    Subject 2 of Topic 1
    Topic N Contents
    Subject 1 of Topic N
    Subject 2 of Topic N
    Acrobat X (just purchased) produces a differently structured PDF from the same HTML pages. The order is:
    Main Topic Contents
    Topic 1 Contents
    Topic 2 Contents
    Topic N Contents
    Main Topic Contents (a second time)
    Subject 1 of Topic 1
    Subject 2 of Topic 1
    Subject N of Topic 1
    Subject 1 of Topic 2
    Subject 2 of Topic 2
    Subject N of Topic N
    Question: Is there any way I can get back with Acrobat X the same page order I got with Acrobat 5?
    Any help appreciated. The website is www.caliach.com/caliach/vision/help/index.html
    Chris

    Acrobat is using the underlying mark up of the rendered HTML page.
    This may or may not provide an adequate input to Acrobat when it is noodling out how and what to tag.
    I suspect you may find that, to obtain an adequately tagged PDF, you may have to capture the web page content with the create bookmarks and create tags options off.
    Once you have the PDF make working copies.
    Try letting Acrobat tag this already created PDF to see what happens.
    You may have to manually tag the PDF.
    n.b., The default read order for western language can be altered by user selections in the accesibility setup or by selection in the PDF page(s) Page Properties.
    Be well...

  • File, create PDF, from web page, entire site...questions

    I am new to adobe, is this the proper forum for Adobe Acrobat 9 Pro for macintosh?  I think yes...
    Created a PDF from:  File, create PDF, from web page, entire site.
    Is there a way to print this without the background color?  If you printed from a browser, you could choose not to print the back ground color.  I know exactly the color.
    Is there a way to make this PDF look like the web, with no page breakes?  I have tried various things, but the page breaks are always displayed.
    Is there a way to create bookmarks on somthing other than the title tag in the web site?  The title tag is an SEO 1 sentence summary of the page, which makes for very long book mark names.
    Thanks for your help.
    bob
    www.answerstat.net

    I don't use v9, but what I would do is click the FILE--PRINT option, print to PDF, and enter to print one page (default is first page)

  • Extracting information from a table based on different criteria

    Post Author: shineysideup
    CA Forum: Formula
    Hi Folks
    I have a bit of a strange one here.
    I need to extract information from a single table based on different critera.
    Sounds simple enough but here's the tricky part.
    This table is a table that contains the build of a product. All the parts that are used to make the product and also the sub-parts that are used to make the primary product parts.
    Example:
    I have a part that is in the product and the part no is 1111. This part is actually part of another part that is part no 1112
    What I need to do is display part no 1111 with all of its details but then also show that it is also part of part no 1112.
    The way the table holds this information is as follows.
    Seq_No      Parent_Seq_No     Part_No
    The seq_no is item no that is given to the part number. If the part is a member of another part then there is also a parent_seq_no.
    Everything needs to tie back to the seq_no and the parent_seq no as the part itself can be used in a parent or it can be used on its own. This way you can actually have the same part appearing in the list several times but the seq_no will be different for each one. If the part can be used in two different sub-builds (with each part being used twice in each sub-build) and also on its own once then you would have 5 different seq_nos two parent_seq_nos.
    What I need to do is to list all of the parts but then also when a part is part of a parent_seq_no I need to be able to display the parent seqno but also the part_no for that as the parent would also be listed as an individual item in the part list.
    At the moment listing the part_no, seq_no and parent_seq_no is easy but when I try to list the part_no for the parent I jsut keep getting the original sub part again. I can do this with a sub-report but with what I need to do with the data after listing the parts a sub-report is not an option for me.
    This make sense?
    Thanks

    Post Author: Charliy
    CA Forum: Formula
    As long as the chain only goes one link deep, you should be able to Alias the table and link it (left outer)  from the child part to the parent part.  Then build a Detail B (or Group Footer if that's where you're printing) and conditionally suppress is if there is no "Parent Part".

  • "Create PDF from Web Page" Yields Authorization Failure

    Acrobat 9 Pro Extended running on Windows XP Service Pack 3:
    When using "Create PDF from Web Page," certain linked pages result in an "Authorization Failure" error message. Is there any way to instruct Acrobat to disregard pages that are not downloadable and continue creating the PDF?

    I am having the same issue AND none of my pages or files require a UserID or Password. My issue appears to be something with the domain because a and b work just fine and produce a PDF file while item c does not work and produces the error msg.
    http://www.dot.wi.gov/projects/neregion/151/index.htm works just fine and produces a PDF file.
    http://www.dot.state.wi.us/projects/neregion/151/index.htm works just fine and produces a PDF file.
    http://www.wisconsindot.gov/projects/neregion/151/index.htm produces an error msg. ‘Nothing done’.Error info. - Authorization Failure    http://www.wisconsindot.gov/projects/neregion/151/index.htm
    [email protected]

Maybe you are looking for

  • My Phone Sorta Bricked Itself, Any Ideas?

    So here's my odd string of events. About a year ago, a good customer of mine gave me his 1st Gen iPhone when he got his 3G phone. This was very nice of him, although the phone wasn't in perfect condition. The back was a little dented and the hold but

  • FireWire 400 External Superdrive Not Working

    Hi I have an older Lacie Design by FA Porsche Superdrive connected to Mac Book Pro running OS 10.5.1 via Fire Wire 400. I am trying to burn a DVD using Toast 8 but I keep recieving Error message -50 'the connection is unstable' and the DVD won't even

  • 298GB HD suddenly says it is only 1GB

    I have a 297.77GB hard drive which, one day after trying to download A SMALL PICTURE, a message came up saying "Safari could not download the file ... because there is not enough free disk space. Try deleting documents or downloading to another disk.

  • How do I accurately detect overset contents when using tables?

    Hello, all. I am familiar with how to detect overset contents of a text frame, namely: if ( myTextFrame.parentStory.contents.length > myTextFrame.contents.length) {   alert('There is overset text'); } else {   alert('There is no overset text'); Howev

  • Integrating CATS with MM-SRV, service entry sheet acceptance

    Hi, We're importing time sheets from CATS to MM-SRV using transaction CATM. When doing so, a service entry sheet is created. However, the service entry sheet is not accepted per default, which means each service entry sheet needs to be  accepted in a