Web bot/crawler for sites that shows dynamic content

Hello
I need to create a web bot/crawler/spider that would go into different web sites and collect data for us and store in a database. The crawler needs to 'READ' the options on a website (either from drop-downs, radio-buttons or check-boxes ) to create some input itself OR use some generic predefined words (that we provide it with).
For example, a web page might be structure with a text field and some drop-downs. Typically, if the user enters the case number of a court case the web-site displays the status, and also there might be different legal documents they could be retrieved through drop-down options like: 'Industry Permits', 'Civil Cases', 'Criminal cases' etc. So the crawler should be able to read and self-generate a list of suitable options and use them to get the data. we want to create a bot/crawler/spider that will automatically enter the information about multiple cases etc. i.e. case numbers (text field), case type (from drop-downs) and retrieve the data about the relevant cases available on the website.
What is the best approach to achieve this? We can write individual bots for each website but are trying to come-up with a more intelligent bot or crawler that can be used to crawl multiple websites. Please advise on how we can achieve this.
We are not doing anything illegal, everything perfectly legal. Please advise on how we can achieve this.
Regards
Krishna

Google for JSpider. Very good, very extensible.

Similar Messages

  • Looking for a site that shows problems with windows patches or updates.

    Looking for a site that shows problems with windows patches or updates...so I know what not to install.
    Plenty of sites list and summarize individual patches, I want one that consolidates problems and complaints so I can better assess the risk. Currently I'm waiting to install patches several days and doing google searches. That works, but I'd rather find a place
    that does it with expertise.
    Which forums would be best to look for such problems.
    Thanks in and advance for suggestions

    Hi,
    I agreed with Cyber and Rick.
    Windows update helps to keep your PC safer—and your software current—by fetching the latest security and feature updates from Microsoft via the Internet.
    Although there might be some problems when installing it, Windows update is not the one to blame.
    For troubleshooting Windows updates, if needed:
    Fix Microsoft Windows Update Issues
    http://support2.microsoft.com/gp/windows-update-issues
    Best regards
    Michael Shao
    TechNet Community Support

  • Firefox is not remembering passwords for sites that i visit regularly. Ihave checked the box "remember passwords for sites" in the security tab under tools, but the passwords are not being remembered when I return to the sites.

    Firefox is not remembering passwords for sites that i visit regularly. Ihave checked the box "remember passwords for sites" in the security tab under tools, but the passwords are not being remembered when I return to the sites. I am running v 3.6.18

    Websites remembering you and automatically log you in is stored in a cookie.
    * Create an allow cookie exception (Tools > Options > Privacy > Cookies: Exceptions) to keep such a cookie, especially for secure websites and if cookies expire when Firefox is closed.
    Make sure that you do not run Firefox in Private Browsing mode.
    * https://support.mozilla.com/kb/Private+Browsing
    * In [[Private Browsing]] mode all cookies are session cookies that expire if that session is ended, so websites won't remember you.
    * Do not use [[Clear Recent History]] to clear the "Cookies" and the "Site Preferences"
    Clearing "Site Preferences" clears all exceptions for cookies, images, pop-up windows, software installation, and passwords.
    * http://kb.mozillazine.org/Cookies

  • How can I force Safari v.6.1 to Save passwords for sites that request passwords NOT BE Saved"?

    Since the last OS X update when Safari was updated to version 6.1, I am having problems with passwords not being saved from  specific sites that request that passwords "not be saved".
    Example: Adobe.com
    I did not have problem with Safari 5.1.
    When I go to Safari Preferences and "click on box" "Allow Autofill even for websites that request passwords not be saved", a pop up box opens and refers me to "SECURITY & PRIVACY" indicating that a "SCREENLOCK" must be engged to allow for this action.
    I have my firewall on.
    I don't understand what they are requesring me to lock & how to do it.
    Anyone who understands what I am being asked to do -or- how I can re-enable password saving, please email me.
    Thanks,
    <Email Edited by Host>

    Actually, Linc's reply is EXACTLY the answer that is required. 
    If you go to the setting he mentions and enable the password after screen saver or sleep, then Safari 6.1 will allow you to save all the autofill login information you like, even the stuff for sites that request not to save them.

  • API for tools that show differences  between two file in applet

    I am searching Api for tools that show differences between two data file
    that represent as bytes[] in the memory in applet .
    the applet is not sign Applet.

    I gotta it.
    File f=new File("\\\\"+"Linshuaibing"+"\\card\\DSC00134.jpg");[Thank you very much v!

  • How to have a web-based interface for Lumira that also performs Ad-Hoc visualizations on data that should be loaded live from HANA.

    How to have a web-based interface for Lumira that also performs Ad-Hoc visualizations on data that should be loaded live from HANA. I have another tool that puts data into HANA, So don't want to reload this new data into Lumira every time I want to run a report.
    so do i  have the ability to create polished ad hoc dashboards, reports, infographics and storyboards Apart from Ad-Hoc reports, I also need a dashboard with some fixed reports that update with the live data.
    So please suggest me to accomplish this task.
    Thanks and regards
    Shashi kiran

    Please have a look at Ludek's document here which contains links: SAP Lumira Family Supported Versions Matrix
    Ludek has also attached the PAM's as zipped files; Lumira comes in many flavors so I encourage you to research options
    Also see this "HANA Live" document: [SAP HANA Academy] Visualized: Lumira & HANA

  • How do I stop the Master Password box from activating for sites that are not inclused in the list.

    The Master Password Box will request the password for any site that I open including sites that I do not have the password stored.
    == This happened ==
    Every time Firefox opened
    == I entered Gmail or Yahoo

    Start Firefox in [[Safe Mode]] to check if one of your add-ons is causing your problem (switch to the DEFAULT theme: Tools > Add-ons > Themes).
    See [[Troubleshooting extensions and themes]] and [[Troubleshooting plugins]]
    If it does work in Safe-mode then disable all your extensions and then try to find which is causing it by enabling one at a time until the problem reappears.
    You can use "Disable all add-ons" on the [[Safe mode]] start window to disable all extensions.
    You have to close and restart Firefox after each change via "File > Exit" (Mac: "Firefox > Quit"; Linux: "File > Quit")

  • Adding PHP pages, not showing dynamic content?

    Hi all,
    I'm new and hoping I don't get ripped apart for a question that seems simple. I've looked a lot though and can't seem to find an exact scenario like this. I recently took over a PHP site for a friend that was built in Dreamweaver CS3.
    There's a master page (main.php) that includes a header and footer and middle area of dynamic content. Thus, all of the pages appear like: http://www.SITENAME.com/main.php?mod=about (for the 'about' page) or www.SITENAME.com/main.php?mod=welcome (for the landing/home page), etc.
    In main.php, there's a section of: // Include Multiple Static Pages that looks like this:
    $mxiObj->IncludeStatic("about", "about.php");
    for every static page.
    All of this makes sense to me. Here's my problem:
    When adding a new page now, I can't get it to show any of the dynamic content on the live site. I just get the static header and footer to show.
    In the main.php file, there's this section:
    </div>
      <div id="mainContent">
        <p> 
          <?php
      $incFileName = $mxiObj->getCurrentInclude();
      if ($incFileName !== null)  {
        mxi_includes_start($incFileName);
        require(basename($incFileName)); // require the page content
        mxi_includes_end();
    ?></p>
    </div>
    that's supposed to pull in the dynamic content. It's doing this just fine on the older pages, but not on the ones I'm trying to build now.
    I guess I'm asking if there's some type of file in an 'includes' folder that I'm missing where I need to make sure the FileName is also listed (not just in the static section on main.php)? What's the mising link that will get this dynamic content to show up?
    Thanks for any help in advance and for reading this!

    The quickest way to add new pages is to
    copy the template (index.php) to a new document (newdoc.php or similar)
    remove the PHP stuff - stuff that is not required for the new content
    if there are inludes for the menu, header, footer or similar, then you can link those back into the document using standard includes
    add the new content to the newly created document
    This will give you a stand-alone document. At a later stage you can convert these documents into a DW-template system.

  • How translate alts and titles for a images in dynamic content?

    Hi,
    I have a classic report with a column that shows diff. images based on decode expression for that column in report sql. Is there a way to translate alt and title for those images displayed dynamically?
    Thanks.

    Hello Edward,
    >> Is there a way to translate alt and title for those images displayed dynamically?
    Yes. You can use the APEX Dynamic Translation option:
    http://download.oracle.com/docs/cd/E17556_01/doc/user.40/e15517/global.htm#BABFHCJA
    Regards,
    Arie.
    &diams; Please remember to mark appropriate posts as correct/helpful. For the long run, it will benefit us all.
    &diams; Author of Oracle Application Express 3.2 – The Essentials and More

  • Alerts are not working for Site collections in a content database

    Hi,
    We have been into a strange issue that alerts are not working for site collections from a particular content database. When user subscribed for the alerts, they are getting mail about their subscription but not after that for any new items/documents added
    or changes happen in the list/document library.
    We have tried with Immediate as well as scheduled alerts both are not working.
    FYI, Alerts for all other site collections from different content database for the same web application are working without any issues.
    Any help would be appreciated.
    Thanks
    Ramkumar

    Looks like an Timer job Issue for your web application. Can you please check if these jobs are enabled and check error log for the alerts Timer jobs .Check whether the "Immediate Alerts" job is enabled for your web application.
    job-immediate-alerts
    job-daily-alerts
    job-weekly-alerts
    Please check this wiki -
    Troubleshooting Steps for SharePoint Alert Email Does Not Go Out
    Thanks
    Ganesh Jat [My Blog |
    LinkedIn | Twitter ]
    Please click 'Mark As Answer' if a post solves your problem or 'Vote As Helpful' if it was useful.

  • PDFs that cause Digital Editions to crash/PDFS that show no content...

    I've been happily using Digital Editions with lots of my PDFs and have been very pleased with the first version of this product. However, I've run into two issues that I was hoping someone could help me with...
    1) I've found a PDF that causes Digital Editions to just crash when you try to view it. The PDF is InfoQ's free version of "Getting Started with Grails" (downloadable at http://www.infoq.com/resource/minibooks/grails/en/pdf/grails-getting-started.pdf if you register). The PDF is secured so it can't be printed. I loaded this into Acrobat Reader 8, no problem. I loaded it into Digital Editions, opened it, got a blank page. Tried to go to page 2, Digital Editions quits with the Windows standard dialog "Digital Editions has encountered a problem and has closed".
    2) Some older PDFs (such as O'Reilly's free X Windows books) import but show no content at all (every page is blank). They work fine in Reader 8.
    Has anyone else run into this or found a solution? I'm using Digital Editions 1.0.467.
    Mark Davidson

    dang! you're right. It does crash. I have logged a bug and we will fix it for the next release. The blank files may well be JPEG2000 images, which we do not yet support.

  • Why a site that has follow content disable can be follow from my site

    Hi, this is my question.
    I have a site where Follow Content feature is deactivated, so, the follow icon doesn't appear.
    Now, If I search the site from MySite, I can find the site, click on the elipsis and click on Follow, so, if the site has this feature disable, why can I follow the site?

    Hi Javi,
    According to your description, my understanding is that after you disable Following Content feature for a site, and you could also follow the site via Hover Panel Follow button in search result.
    As your description, I could reproduce this issue.
    And also I found that :
    I searched some documents in the site that I have diabled Following Content feature, the Follow button in the hover panel also displayed. If I clicked it, nothing happened.
    Go to Search Center, disable the Following Content feature. Then seach another site that I enabled the feature, the Follow button in hover panel also displayed. If I clicked it, nothing happened.
    It seems that the Following Content feature work for the current site Follow button. If the feature is enabled, the button can work normally. If the feature is disabled, the button in search result hover panel will have no response.
    So, if you don’t want users to follow sites in seach center, please disable the Following Content feature in the Search Center.
    Best Regards,
    Wendy
    Forum Support
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact
    [email protected]
    Wendy Li
    TechNet Community Support

  • Problem creating custom tags for site that has no outside internet connecti

    I've created a set of custom tags that work fine until we install our app at the customer site. The customer site has no outside Internet access, and so the DOCTYPE is failing since it references the web-jsptaglibrary_1_1.dtd located on Sun's site.
    I tried copying the dtd locally and got it to work, but the solution sucks because this web-jsptaglibrary_1_1.dtd file is referenced in both my taglib.tld file AND the web-jsptaglibrary_1_1.dtd itself. Soooo....I can put in a URL that references it on the local machine, e.g.,
    In the taglib.tld file:
    <!DOCTYPE taglib PUBLIC "-//Sun Microsystems, Inc.// DTD JSP Tag Library 1.1//EN"
    "http://ClientAAA/web-jsptaglibrary_1_1.dtd">
    In the web-jsptaglibrary_1_1.dtd file:
    <!ATTLIST taglib id ID #IMPLIED
    xmlns CDATA #FIXED
    "http://ClientAAA/AdProduction/web-jsptaglibrary_1_1.dtd"
    >
    but that means for every client that uses this app (and we have several) I have to change that URL inside both these files.
    I tried simply changing it to the relative "web-jsptaglibrary_1_1.dtd", e.g.,
    taglib.tld:
    <!DOCTYPE taglib PUBLIC "-//Sun Microsystems, Inc.// DTD JSP Tag Library 1.1//EN"
    "web-jsptaglibrary_1_1.dtd">
    web-jsptaglibrary_1_1.dtd:
    <!ATTLIST taglib id ID #IMPLIED
    xmlns CDATA #FIXED
    "web-jsptaglibrary_1_1.dtd"
    >
    but then it is requiring me to put the dtd in both my web app root directory AND my jakarta/bin directory. I get the following error:
    XML parsing error on file ../vtaglib.tld: java.io.FileNotFoundException: D:\jakarta\jakarta-tomcat-4.1.29\bin\web-jsptaglibrary_1_1.dtd (The system cannot find the file specified)
    It seems like I must be missing something here. This shouldn't be this hard. And it seems funny that to use custom tags, you have to have Internet access in the first place.
    Help!!! :)
    Thanks.

    Yeah, I think it's a bit ridiculous that in order to make all the tag library examples and instructions work, you have to have access to the Internet. I haven't seen a single example on how to make it work if there is no Internet access. That's very limiting. And I've tried all sorts of other ways of doing it, such as
    <!DOCTYPE taglib SYSTEM "web-jsptaglibrary_1_1.dtd">
    but even then it won't work because I get an error message saying:
    XML parsing error on file /assets/../vtaglib.tld: java.io.FileNotFoundException: D:\jakarta\jakarta-tomcat-4.1.29\bin\web-jsptaglibrary_1_1.dtd (The system cannot find the file specified)
    I just don't think I should have to place this file in the bin directory. There has to be another way. Do I need to modify the dtd somehow? Cuz the dtd has the following line...is this messing it up??
    <!ATTLIST taglib id ID #IMPLIED xmlns CDATA #FIXED "web-jsptaglibrary_1_1.dtd">
    I sure could use some help.

  • Firefox 3.6.3 sending web email - waiting for 'site'. Works ok IE

    Have created new profile, run firefox in safe mode. When I run in safe mode, first sent message goes ok, but subsequent ones hang with waiting for mail.bluebottle.com. Webmail works fine in IE 8
    == This happened ==
    Not sure how often
    == I moved to firefox 3.6.3

    A possible cause is security software (firewall) that blocks or restricts Firefox without informing you about that, possibly after detecting changes (update) to the Firefox program.
    Remove all rules for Firefox from the permissions list in the firewall and let your firewall ask again for permission to get full unrestricted access to internet for Firefox.
    See [[Server not found]] and [[Firewalls]] and http://kb.mozillazine.org/Firewalls

  • Can i add custom thumbnail for sites that do not have them?

    I frequent a boardhost discussion board. apparently it has no thumbnail, so pinning the tab is useless, as I only see an empty square. is there a way to choose a thumbnail for it? or upload one?
    thank you
    Mac Mountain Lion
    Firefox 19.0.2
    site is public board: http://members7.boardhost.com/VADARE/

    Use this add-on. <br />
    https://addons.mozilla.org/en-US/firefox/addon/bookmark-favicon-changer/

Maybe you are looking for

  • 12 days of Christmas app

    I accidentally deleted,one of the 2 movies that came with the toy story bundle available of the 12 days of Christmas app (which is great by the way).  iI tried to redownload the movie but I can't as it tells me it is part of a bundle and I already ha

  • Sync contacts BACK to exchange server?

    My exchange contacts are showing up on my phone, but I was hoping, that when I add contacts on my iphon, they'd sync back to my exchange server. Is this not possible? If it is, how do I do it? Thanks.

  • Connect a front panel hea

    I have audio jacks on the front of my computer case with a cable that says HD Audio and AC '97. I don't know where to put the cables into. I have the Xi-Fi Xtreme Gamer Fatality Pro. Wow... so I've done some searching and it appears this card does no

  • BLOB/filter problem

    Hello, I am using 8.1.7 on Solaris. I store a Word document in a blob column and index it without errors. However when I try to select from this column using "select...contains" it always results in zero rows being returned. Changing the filter to NU

  • Can't update/install quicktime plugin

    I have uninstalled and reinstalled the plugin and the whole version even the quicktime alternative and it still doesn't want to play mov files. I am going to the apple.com/trailers page to watch some trailers and it gives me an error that I need to u