Creating alert from search results on web content *DEEPWEB*?

Hope this an appropriate location to post this question.
Looking for suggestions on approaching this problem [hoping for an ArchWay approach  ]
1. Login to ssl secure website with user id and password [CURL?]
2. Perform a search for an article/award/bid notification with a set of keywords [CURL?]
3. Parse (tokenize) the results to identify amounts greater than $5,000 - [AeroText, LingPipe, NetOwl, Inxight... others???]
4. Produce an alert with results and links to specific items [db and rss feed or some alerting mechanism]
Looked at CURL but not sure if there is perhaps another framework to leverage (can start from scratch but trying not to reinvent the wheel).
Any ideas, suggestions greatly appreciated from the community.
Thanks,
Dave

The Mechanize gem is perfect for automating sites like this (Ruby).
gem install mechanize
The (untested) example below:
  - logs into an SSL website (handling cookies etc)
  - searches for a list of keywords
  - stores results for a hash for storage/email etc
#!/usr/bin/env ruby
# encoding: utf-8
require 'rubygems'
require 'mechanize'
# create a browser agent and set its alias
agent = Mechanize.new { |a| a.user_agent_alias = 'Mac Safari' }
words = %w( i can haz taco? ) # list of keywords
result = Hash.new { |h, k| h[k] = [] } # hash default is an array
agent.get('https://www.tacos-vs-poutine.com/') do |page|
# login to the site
form = page.form_with(:name => 'login')
form.email = 'vegemite'
form.password = 'sandwich'
page = agent.submit(form)
# search site for each word
words.each do |word|
search_result = page.form_with(:name => 'search') do |search|
search.q = word
end.submit
# store results in a hash
search_result.links.each do |link|
if link.text.to_i > 5000
result[word] << link.text
end
end
end
# store results in DB, convert to RSS feed, email etc
# eg: Sequel + Sqlite3, RubyRSS, Mail gems
end
** EDIT: fixed syntax
Last edited by awkwood (2010-09-23 01:31:05)

Similar Messages

  • Create PDF from Search results...

    Hello all.
    I am currently creating PDF's of all my company websites. We are going to uses these PDFs as sources of information.
    Basically, We are going to use Acrobats search functionality to search across a large number of PDFs of the websites. Lets say I search for "Cancer"
    The search results come back with the PDF document title link and link to the exact page were the words Cancer is found.
    What I would like to do it to be able to create one PDF based on all the pages located in the search results.
    Is it possible to automate this process without having to physically open every page and extract it?
    Cheers
    John

    Hi,
    You can use Adobe Services to create PDF using Webdynpro.
    You can display your search results from SAP R/3 in a PDF file.
    Check out this link
    http://help.sap.com/saphelp_nw04/helpdata/en/1a/ff773f12f14a18e10000000a114084/frameset.htm
    Thanks
    Senthil
    P.S. If you find the answer useful, allocate points.

  • How to Filter list/library view pages from Search Results?

    Hi All,
    Currently my search configuration is searching everything on a site collection. I have created custom scope for that.
    But I would like to remove the search results for list/library views. The search should only show the documents, pages (but not list/library view pages).
    Please guide me on this.
    Appreciate your help.
    Thanks,
    Rahul Babar
    ASP.NET, C# 4.0, Sharepoint 2007/2010, Infopath 2007/2010 Developer http://sharepoint247.wordpress.com/

    Navigate to the site that contains the list or library that you want to change.
    Locate and click the list or library you want to customize.
    Click Site Actions, and then click Site Settings.
    Under Site Administration, click Site Libraries and lists.
    Click an item from the list, for example, Customize “Shared Documents.”
    On the List Settings page, under General Settings, click
    Advanced settings.
    In the Search section, under Allow items from this document library to appear in search results, select
    Yes to include all of the items in the list or library in search result or
    No to exclude all items from search results.
    http://office.microsoft.com/en-in/office365-sharepoint-online-enterprise-help/enable-content-to-be-searchable-HA010379092.aspx

  • UCM 11g File missing from search result although file is accessable

    When I do search without any criteria in UCM 11g, some files missing from search result although I know the files were checked into UCM already since I'm able to see these files using url similar to:
    http://ucm/cs/idcplg?idcService=GET_FILE&dID=12345
    or use following url to get document information:
    http://ucm/cs/idcplg?idcService=DOC_INFO&dID=12345
    the file is not in the search resule even search by ID.
    also, the seach result say "displaying 1-20" but only display few files (e.g. 2 files, less than 20)
    Is this a known problem? the same search was working in UCM 10 perfectly.
    Thanks

    The query seems able to retrieve the missing document. for example, I search by content ID, in the audit log (eanbled as you mentioned) give th efollowing information:
    fusionappsattachments/6     09.13 06:22:03.878     IdcServer-3474     --- @ResultSet SearchResults ---
    fusionappsattachments/6     09.13 06:22:03.878     IdcServer-3474     numFields=66,*numRows=1*,currentRow=0
    also, the infomation following above give the deteail field information which match with the missing document.
    I looks like UCM just did not bring it to UI as part of search result.
    This seems match with my another finding as I mentioned earlier: the seach result say "displaying 1-20" but only display few files (e.g. 2 files, less than 20)
    it seems query did find the documents but UI did not show it.
    Not sure if this is a known bug.

  • How to remove Forms/DispForm.aspx from search results

    HI
    I configured enterprise search in our share point public facing portal,
    and when a user search for any content why the search results are displaying from below links and when user clicks it asking authentication.
    /Pages/Forms/DispForm.aspx?ID=477
    /PublishingImages/Forms/DispForm.aspx?ID=3
    /SiteAssets/Forms/DispForm.aspx?ID=1
    /Documents/Forms/DispForm.aspx?ID=1
    and how to remove these from search results .
    adil

    To remove from search results...
    Central Administration > Application Management > Manage service applications > Search Service Application > Crawl Rules
    In the Path enter: *://*/DispForm.aspx* and for the Crawl Configuration check "Exclude all items in this path" and check "Exclude complex URLs (URLs that contain question marks - ?)"
    You will then need to do a crawl of your content to remove any URLs that match this path.

  • Remove material from search results

    Is anyone aware of a way to remove a material from search results?  I have a large contingent of materials that are no longer valid for sale and the search results create a lot of clutter.   I need to retain the records for historical sales transactions but since there is no stock and the materials are discontinued, they don't need to be displayed on search.

    Hi Craig,
    You can do this through customising: SPRO > Logistics - General > Material Master >  Tools > Maintain Search Helps
    There, give double clicl to the "Search Help" = MAT1_A.
    Then, select the tab "Include search helps" and select the search where you want to add a field. i.e. MAT1V - Material by Supply Area.
    Once you are there, check if you can add the field LVORM - DF at client level.
    Then, save the data and in this search you can have the option to see all the materials deleted or not deleted.
    By the way, Narla Rama Krishna Mohan has a document about "Material Master Search Help" and it can help you too.
    This is the link:
    http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/8047f84e-7973-2c10-548a-ab4cb1c3c498?quicklink=index&overridelayout=true
    Hope this help.
    Kind regards,
    SP

  • I am getting an alert window that says: "SAFARI web content has quit unexpectedly"

    I am getting a alert window that says: "SAFARI web content has quit unexpectedly" - but Safari hasn't shut down or anything. It happens a few times a day.
    QUESTION: How do I fix this?
    There is a HUGE long block of text that appears, here's the opening set of lines:
    Process:         com.apple.WebKit.WebContent [386]
    Path:            /System/Library/StagedFrameworks/Safari/WebKit.framework/Versions/A/XPCServices /com.apple.WebKit.WebContent.xpc/Contents/MacOS/com.apple.WebKit.WebContent
    Identifier:      com.apple.WebKit.WebContent
    Version:         9600 (9600.1.17)
    Build Info:      WebKit2-7600001017000000~8
    Code Type:       X86-64 (Native)
    Parent Process:  ??? [1]
    Responsible:     Safari [364]
    User ID:         501
    Date/Time:       2014-10-22 11:08:40.126 -0600
    OS Version:      Mac OS X 10.9.5 (13F34)
    Report Version:  11
    Anonymous UUID:  C4AFA704-2C2F-2CAB-2A2D-CD5F959444A5
    Crashed Thread:  4  JavaScriptCore::Marking
    Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
    Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000800028
    VM Regions Near 0x800028:
    -->
    And - Here's what it says on Console > All Messages > Diagnostic and Usage Information >
    10/22/14 11:08:40.836 AM Safari[364]: CFPropertyListCreateFromXMLData(): Old-style plist parser: missing semicolon in dictionary on line 3. Parsing will be abandoned. Break on _CFPropertyListMissingSemicolon to debug.

    Hi Mike ..
    Reinstall OS X. That will also reinstall a new copy of Safari for you.
    Startup your Mac while holding down the Command + R keys.
    From there you should be able to access the built in utiliities to reinstall OS X using OS X Recovery.

  • How to: Create a custom search result screen?

    Hello,
    Is it possible to create a custom search result screen?
    I need it because the users must be able to directly e-mail
    the items or a selection of the items returned by the search.
    For this modification of the standard result page isn't sufficient.
    Thanks,
    Steven.

    Steven, please see:
    http://technet.oracle.com/products/iportal/files/pdk/plsql/doc/sdk23pkg.htm

  • Remove K-Tel from search results?

    I'm doubtful, but wondering if there's a way to remove K-Tel from search results, or add a column showing record label so they can be easily ignored?

    Robert -
    Is your custom search doing an auto-query? If yes, check the Results Display tab in the portlet defaults and ensure there is no <Blank Line> in the Displayed Attributes section.
    If you are not doing an auto-query, check the properties on the Search Results pages in the Portal Design Time Pages Page Group.
    Regards,
    Candace

  • Outlook 2013 - moving emails from search results to folders.

    in outlook 2013 - when trying to move a few emails that were found from search results to various folders - the emails doesn't move to the folder that i specify.
    no error message!

    Hi,
    What type of email account are you using?
    How did you move the emails? Via dragging/dropping or right-clicking and selecting Move? Please try both methods to see the result.
    I used to see a similar issue caused by a problematic add-in. Please try starting Outlook in safe mode to see if this issue continues. To do this, press Windows key + R to open the Run command, type
    outlook /safe and press Enter. If the problem would be gone in safe mode, please go to File > Options > Add-ins and disable suspicious add-ins to fix the issue.
    Please let me know the result.
    Regards,
    Steve Fan
    TechNet Community Support
    It's recommended to download and install
    Configuration Analyzer Tool (OffCAT), which is developed by Microsoft Support teams. Once the tool is installed, you can run it at any time to scan for hundreds of known issues in Office
    programs.
    Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact
    [email protected]

  • Re: How can I filter out email and website results from search results?

    I can't seem to add a further question to my original question.
    Although with your help I found how to select (and drag!) Spotlightcategories, the next time I went to search, it used different parameters than those I'd selected. For example, I'd unchecked Mail and web pages entirely, yet mail appeared second in the search results and web pages at the bottom.
    If I don't click that you helped me this time, it's because I want to keep the discussion open until I am SURE I've fixed it.
    Thank you.

    Setting 'Spotlight' preferences and using it to search seems to be the best answer, because it sorts results by type.
    , System Preferences, Personal, Spotlight, Search Results (or clicking Spotlight Preferences at the bottom of an active search) lets you control search results by types of files quite specifically:
    Message was edited by: kostby

  • Sorry, something went wrong - Open Office File from Search Results Page with Office Web Apps

    Hi,
    I'm getting "Sorry, something went wrong" error when I'm trying to open any office document from inside Search Results Page with Office Web Apps, the same error is appearing in the document preview as well.
    The error in SharePoint logs says that the file cannot be found.
    Please note that this error is coming only when the "Filename" of the document is not written in English (in my case its written with Arabic characters).
    If I try to open it from the document library, its opening properly with no errors.
    The only difference between two URLs (document URL in Search Results and in Document Library) is the value for "sourcedoc" attribute;
    In the case of Search Results page, the filename in "sourcedoc" attribute is kept as is with Arabic characters.
    While in the Document Library, the filename in "sourcedoc" attribute is converted into different characters (something like: "B9%D9%85%D9%8").
    Anybody have an idea on how to overcome this issue, implement a workaround or modify the "sourcedoc" behavior?
    Thanks in Advance.
    Hamza AlSughier

    Dear Wendy,
    Thanks for your efforts, I already tried your last suggestion before, but this didn't solve my problem.
    Actually my end users are accessing this portal using ADFS and HTTPS.
    Finally I got this solved, I have done below to get my overall solution working perfectly:
    - First I have configured Windows Authentication and ADFS Authentication on the same zone which is the default zone.
    - The issue when opening office documents with Arabic file name has disappeared as a result of first change.
    - Then I have faced an issue where we are not able to crawl content under Default zone, however we have to do so, after too many efforts, I found that its related to the Load-balancer/proxy, I have made the crawler server
    to crawl himself (http://CrawlServerName:PortNumber).
    - Also a change on Alternate Access Mapping was needed, I have set one of the extended zones (which is running Windows Authentication only) as Internal URL for the Default Zone, and this is was the URL I used for Crawling.
    I have configured Server Name Mappings to make sure we got proper URLs in Search Results.
    - Then we faced another issue, which is Authentication selection on login page (How to bypass this page, and authenticate using ADFS auto), I used this solution (Set Custom Sign In Page):
    http://0ut0ftheb0x.wordpress.com/2014/01/04/skip-the-authentication-selection-page-at-_logindefault-aspx-in-a-mixed-authentication-environment/.
    - I faced one more issue as a result of above workaround; Sign Out functionality wasn't working well; users get logged in automatically whenever I click on Sign Out. I solved this by modifying the "Sign Out" Control under _layouts;
    I made it to redirect users to ADFS Sign Out Page instead of SharePoint Sign Out Page (I know its not recommended, but I don't have any other option).
    Hamza AlSughier

  • How to exclude Web App from search results

    Hi
    Search results link to a unstyled Web App instead to the actual page it resides in.
    Please do this:
    1. go to: http://kinship.businesscatalyst.com/
    2. search for "Michael" on the top global search
    3. on the search results page click on the name (link).
    4. you will see Michael's web app item not the actuall page it resides in (http://kinship.businesscatalyst.com/About/the-team)
    How to avoid getting web app results in search?
    Thanks
    Micha

    Hi Micha
    Just add “&OT=35 “ at the end of the action in your search form:
    Ex:
    <form name=”xxxx” method=”post” action=”/Default.aspx?SiteSearchID=3566&amp;ID=/results&OT=35”>
    <div class=”search-box”><input type=”text” class= ............../>
    <input type=”submit” class=”cat_button” value=”search” />
    </div>
    </form>
    Here are the rest of the content types IDs, should you come across similar situations in the future:
    Web Pages = 1
    Literature = 6
    Announcements = 7
    FAQs = 9
    Forums = 43
    Blogs = 55
    Web Apps = 35
    Catalogs = 26
    Bookings = 48
    You can exclude multiple areas from a search, simply list them with commas: &OT=35,1,6

  • SharePoint 2010 Content from top level site hidden from search results on subsite.

    Hi Everyone,
    The issue I am having is that when a group of people are in there site, we'll call it site B, and they hit the search function, they only get results from site B and subsites of site B, but nothing from the top level site, site A. does anyone know of anyway
    to get results from site A when searching in site B? This would really help me out a lot.
    Thanks!
    Best regards, Mike

    Hi Mike,
    According to your description, my understanding is that you want to display the content in top level site in search results page when doing search in subsites.
    By default, the search scope is set to This Site in all sites, so the search scope in subsites is limited to the subsites.
    I recommend to create a Search Center and then edit the Search settings in the top level site settings, please refer to the image below:
    Thanks,
    Victoria
    Forum Support
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact
    [email protected]
    Victoria Xia
    TechNet Community Support

  • Index created but no search results

    After a bit of trouble I dropped and then created the interMedia indexes for the documents in my portal content areas. I had too many documents to use the web interface so I used the plsql scripts (ctxdrind.sql and ctxcrind.sql) to drop and recreate the indexes. The indexes show up in my database with 1.33 million rows of data but nothing is ever returned in the search results from the portal. The search was working but stopped updating itself a few weeks ago because the process was killed during a server move. Since it had been awhile since it was synched up, I dropped and recreated but now nothing is returned. Any ideas?
    Thanks,
    Ben

    Intermedia text is now Oracle Text, you will want to post this question in the Oracle text forum:
    http://forums.oracle.com/forums/forum.jsp?id=477576
    Unless a text person happens to go in here every now and then.
    You may want to post it in the portal forum as well.
    Larry

Maybe you are looking for

  • Password change fails in SQL Developer with verify function...

    A couple of months ago I enforced a password verify function on our 11.2.0.3 databases and also one legacy 10.2.0.4 database. At the time I tested on my account (which had elevated privileges...doh!).   Now some users are hitting expiry, they can't c

  • How to catch ALL Exception in ONE TIME

    I'm explain my issue: I'm making a program with Class, Swing, Thread ... Then all action I do on my graphical application, I use a new thread, well I want to capture in my Startup programs, all unknow exception and then, I display it with a JOptionPa

  • How can i build a virtual network on mavericks please?

    I'd like to run mavericks and os x server on my mac mini and be able to connect them to simulate a network. I bought vmware fusion 6 for this and think i got it wrong. I can run both as separate virtual machines but i cant get them to connect to each

  • Spot Removal VERY slow in LR CC

    Doing my tests and the spot removal is atrociously slow...what gives?

  • JScrollPane policy

    I have a GuiPanel which I add to a JScrollPane policy. If I have the following (where p is my GuiPanel): add(new JScrollPane(p)) The vertical scroll bar never appears. If I have: add(new JScrollPane(p,JScrollPane.VERTICAL_SCROLLBAR_AS_NEEDED,JScrollP