Parser for experimental search engine

We are building an experimental search search engine for the moment, and are looking for a good parser. The HTML parser we are looking for needs to be able to extract the content from HTML pages but keep the TAG data. The result has to be put in a database MYSQL type.
Who can help?

The JDBC tutorial...
http://java.sun.com/docs/books/tutorial/jdbc/index.html

Similar Messages

  • I like to use Google for my search engine but it has been replaced with Amazon. How do I get google back?

    In the upper right corner of the Firefox screen, there is a box for a search engine search. I used to have google there, but recently it somehow changed to Amazon.com. How do I switch it back to google?

    OK, might want to make that change in your profile to get faster answers to future questions. If you leave the "iOS6" that's now in your profile, helpers will have to ask the OS question every time you post, slowing your getting help you deserve. Note the level of detail I was able to cram in my Equipment profile (at the bottom of this post).
    I don't  have 10.7 on any of our Macs but I suspect a system update may have removed that iPhoto option. However, you can still get to your iPhoto files if 10.7 is anything like 10.8.
    In System Preferences > Desktop and Screensaver > Screensaver tab, there is a "Source" dropdown menu direct under the preview pane. Click it and select "Choose Folder..":
    That gives you a dialog box that should point to your "Pictures" folder. Within that is your iPhoto library. Select it and see if that gives you access to your images.

  • Adding keywords and phrases for high search engine hits

    I'm new to iWeb and most things seem to be working. I have one question and one problem, hopefully you guys can help out.
    Question: how do I add html code that is not visible on the page but that will help Google and other search engines find my page using keywords, phrases, etc.? Please assume I know very little about this and need the holding hand explanation.
    Problem: I have a media file portion on my website that displays pictures (working very well) and one that is supposed to show a movie (not working at all). The movie was made with iMovie and has an .m4v extension. When you click on the link for the movie it shows a blank screen on Firefox and a Quicktime logo with a big question mark on Safari. I'm attempting to view the website on my Mac with the latest Quicktime software on it but it doesn't work. Any ideas on how to fix it?
    Some additional info: the hosting provider is 1and1.com and I have a Windows server package (unfortunately). Also, if I publish the site to a folder the movie area works flawlessly, it just does not work on the hosted site.
    Thanks!!

    You don't need to do post publication manual editing of your site to add keywords and other metadata. You can use the freeiWeb SEO Tool. If you're publishing to MobileMe you can add them after publication. If you're publishing to a commercial hosting server you'll need to publish to a folder on the desktop, run SEO and then use a 3rd party ftp client like the free Cyberduck to upload your site.
    If you have to republish SEO remembers what you added and can quickly add them again before uploading.
    OT

  • Safari 6 dropdown menu for switching search engines gone? Nooo!

    Safari 6 no longer has dropdown menu for switching quickly between search engines? I know how to set/change the default search engine but I like/use some features on Google and others on Bing.  I do research all day - this stinks!  (They got rid of the snapback feature too.  Are they nuts?  The Reader feature is nice, but doesn't make up for losing these other things - would go back to previous version if it weren't such a PITA to do it) 
    Is there a quick workaround - something faster than loading the alternative search engine main page and inputting search field info all over again?

    download the latest version of Glims... now works with Mountain Lion and Safari 6
    http://wiki.machangout.com/howdoi/glims-development-build
    haven't had any problems with it .... brings back favicons, too

  • Optimising video for web search engine results

    I have little experience of producing video, so by default I have little experience or knowledge of premier... so I am here to ask a question on the back of a conversation I have just had with a client who is fairly video savvy, but not entirely premier pro savvy.
    He is under the impression that premier pro has a facility to be able to analyse the content of a video and then automatically create a metadata file which can then be utilised for search engine ranking. Is this so? Has premier pro got this capability?
    Thanks in advance.
    Mat

    > Is that what's being asked for, though?  I understood the OP wants
    Google to scan the video itself for metedata and use that for rankings.
    Yep. The tutorials that I linked to show how to embed the speech-to-text metadata in the video and then extract it for search within Flash Player.
    > Actually whether or not any Adobe app is capable of such, I think the
    bigger question is can a search engine actually scan video files, or do
    they just scan web pages?
    The search engine companies are all moving toward using metadata within video files, but at different speeds and with different degrees of transparency. For now, I don't think that it's much used. But the time is coming (and is already here in a minimal, primitive form.)
    The reason that I then suggested using the metadata as something to put as text on an HTML page was that I thought that this might be more valuable for the current state of search engine technology.

  • Today's update seems to have modified the interface for the search engine bar to icons rather than a list. Is there any way I can change it back?

    I know everyone says they "hate" changes to familiar interfaces at first before growing accustomed to them, but I just personally don't care for the icon-tray look in anything I use (it seems to defeat at least some of the point of using Internet on an actual computer rather than on a mobile phone). I also sort of have a lot of search engines up there, not all of which I can immediately recognize by icon. I know it's a minor inconvenience, but having to scroll over an icon to see the name of its associated search engine is pretty annoying after being used to having a nice list that just told you all of their names. Any way that I can change it back to the old list-interface?

    Thank you, much appreciated!

  • Trouble integrating SAX parser in a servlet for mini search engine.

    OK, what I'm trying to do is to create a servlet that searches RSS feeds for titles that match the query, parses them using SAX, and writes them out to HTML, so that the user sees them. The problem is trying to get Ant to compile it since it appears that syntax errors are holding me back such as trying to use the out statements that are in the endElement and trying to find an algorithm that compares the search query with it's instance in the contents of the title tag. In other words, the search query seeks the contents in the title tag to see if there is an instance of the search query in it, so that it spits out the corresponding titles. Can someone please help me since I tried serching about this and came empty handed.
    P.S.: I started working with servlets for about a month now, thus my knowledge is pretty much limited. Also, how would I go about in using multiple feeds for searching particular titles?
    Here's the code:
    import javax.servlet.*;
    import javax.servlet.http.*;
    import java.net.URL;
    import java.io.*;
    import java.util.*;
    import org.xml.sax.SAXException;
    import org.xml.sax.XMLReader;
    import org.xml.sax.helpers.XMLReaderFactory;
    import org.xml.sax.*;
    public class SearchAndDeliver extends HttpServlet implements ContentHandler {
    String queryString=null;
    boolean inTitleElement;
    boolean inLinkElement;
    String title=null;
    String link=null;
    String comparison=null;
    java.io.PrintWriter out = response.getWriter();
         public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts)
              throws SAXException
                   if(localName.equals("title"))
                             title="";
                             inTitleElement=true;
                   if(localName.equals("link"))
                             link="";
                             inLinkElement=true;
              public void endElement(String namespaceURI, String localName, String qualifiedName)
                   if(localName.equals("title"))
                             System.out.println(title);
                             inTitleElement=false;
                   if(localName.equals("link"))
                             System.out.println(link);
                             inLinkElement=false;
                   comparison=title.indexOf(queryString);
                   try
                        if(queryString.equals(comparison))
                                  out.println("<a href=\""+link+"\""+">"+title+"</a><br>");
                        else
                                  out.println(" Your search of "+queryString+" produced no results.");
                   catch (IOException e)
                             // TODO Auto-generated catch block
                             e.printStackTrace();
              public void characters(char[] text, int start, int length)
              throws SAXException
                   if(inTitleElement)
                             title+=String.valueOf(text,start,length);
                   if(inLinkElement)
                             link+=String.valueOf(text,start,length);
              public void setDocumentLocator(Locator locator) {}
              public void startDocument()
              throws SAXException
                    inTitleElement=false;
                    inLinkElement=false;
              public void endDocument() {}
              public void startPrefixMapping(String prefix, String uri) {}
              public void endPrefixMapping(String prefix) {}
              public void ignorableWhitespace(char[] text, int start,int length) throws SAXException {}
              public void processingInstruction(String target, String data){}
              public void skippedEntity(String name){}
    public void doPost(HttpServletRequest request, HttpServletResponse response)
    throws ServletException, java.io.IOException
                   //XMLReader parser=new SAXParser();
                   XMLReader parser=XMLReaderFactory.createXMLReader();
                   queryString = request.getParameter("query");
                   response.setContentType("text/html");
                   out.println("<html>");
                   out.println("<head>");
                   out.println("</head>");
                   out.println("<body>");
                   out.println("<h2>Headlines from The New York Times Arts Section.</h2>");
                   try
                        parser.setContentHandler(this);
                        parser.parse("http://www.nytimes.com/services/xml/rss/nyt/Arts.xml");
                   catch(SAXException e)
                             //System.out.println(args[0]+" is not well formed");
                             System.out.println(e.getMessage());
                   catch(IOException e)
                             System.out.println("Due to an IOException, the parser could not check"+parser);//I don't know what to make of this as well.
                   // do-nothing methods
                   out.println("</body>");
                   out.println("</html>");
                   out.close();
             public void doGet(HttpServletRequest request, HttpServletResponse response)
             throws ServletException, java.io.IOException
             doPost(request,response);
    }

    Here's the errors that I'm getting, and the rest are within the code since there's something wrong with the syntax and structure itself.
    ant
    Buildfile: build.xml
    prepare:
    compile:
    [javac] Compiling 1 source file to C:\Program Files\Apache Software Foundati
    on\Tomcat 5.0\webapps\searchanddeliver\build\WEB-INF\classes
    [javac] C:\Program Files\Apache Software Foundation\Tomcat 5.0\webapps\searc
    handdeliver\src\SearchAndDeliver.java:20: cannot resolve symbol
    [javac] symbol : variable response
    [javac] location: class SearchAndDeliver
    [javac] java.io.PrintWriter out = response.getWriter();
    [javac] ^
    [javac] C:\Program Files\Apache Software Foundation\Tomcat 5.0\webapps\searc
    handdeliver\src\SearchAndDeliver.java:48: incompatible types
    [javac] found : int
    [javac] required: java.lang.String
    [javac] comparison=title.indexOf(querySt
    ring);
    [javac] ^
    [javac] 2 errors

  • Optimise galleries for web search engines

    This is a re-post - apologies but I had little response earlier and the subject is important to me (and many others too I would have thought.
    I have IT skills but am no web designer. I had someone create a simple photography web site using Flash, with the page design looking similar to a Lightroom Flash web gallery template. (See www.duncangrove.com) I am now able to easily update my galleries using no specialist skills, which is great. My concern though is that the galleries do not seem to be "search friendly". The site itself always shows up in Google but the images do not when searching for a specific phrase which is relevant to the title/keywords. I suspect I could do something to each of the gallery's Index.html page to add meta tags but since a new index.html page is created every time I add a new image, this would be very time consuming.
    The issue really came to light when a client approached me to purchase an image for a large sum of money. He had found the image on my old Picasa site that I have not bothered with for years (and am ashamed of since it looks amateurish.) A Google search for the subject matter lists it fairly high up the fist search page. My "proper site" does not even show up in the same Google search, even though the image titles contain the key words that were searched for. Does Google focus on key words embedded in the images or on the image titles? Would an html version alongside the flash version help with search results?
    I would appreciate any comments/suggestions.
    Regards
    Duncan Grove ARPS

    Google focus on image titles.
    I could find any flash pages on www.duncangrove.com to look at.

  • Code for a search engine front end

    I am implementing lucene on my site, i wrote some code that shows something like this:
    displaying results 11-20 of 3434
    1. url / description
    2. url / description
    page 1, 2, _3_, 4 .. 25 [next] [prev]for some reason i found this not so easy to write as i thought. calculating all those ranges etc. (the results per page is user defined). Now i am looking at my code and it seems ugly.
    so im wondering if anyone knows of an example implementation of that in java / jsp?
    i looked at nutch but its search front end is a mess of jsp java includes and is plain ugly.

    here is the code that figures out the first and last result on the page based on how many results are found, how many are on a page, and what page we are on.
    totalResults  = results.length();
              double totalPagesDbl = ((double)totalResults)/((double)resultsPerPage);
              totalPages = (int)Math.ceil(totalPagesDbl);
              if(currentPage >= totalPages) {
                   currentPage = totalPages;
              resultsPerPage = (resultsPerPage <= totalResults)?resultsPerPage:totalResults;//how many on this page
              endingResult = resultsPerPage * currentPage;
              if(endingResult > totalResults) {
                   startingResult = endingResult - resultsPerPage;
                   endingResult = totalResults;
              } else {
                   startingResult = endingResult - resultsPerPage;               
              if(startingResult > endingResult) {
                   startingResult = endingResult -1;
              }it used to be worse, i rewrote it

  • The URL bar can also be used for a search engine, but when I search something a page pops up reading "The URL is not valid."

    When I click on the URL bar it says to insert address or search, yet when I search it says the URL is not valid, which is obvious because I didn't put in a URL, I searched something. I never installed an add on or anything. It just quit working one time and hasn't worked since.

    Does it happen to all the search words or particular words?
    It may be a malware issue too.
    The Reset Firefox feature can fix many issues by restoring Firefox to its factory default state while saving your essential information. <br>
    '''Note''': ''This will cause you to lose any Extensions, Open websites, and some Preferences.''
    To Reset Firefox do the following:
    #Go to Firefox > Help > Troubleshooting Information.
    #Click the "Reset Firefox" button.
    #Firefox will close and reset. After Firefox is done, it will show a window with the information that is imported. Click Finish.
    #Firefox will open with all factory defaults applied.
    Further information can be found in the [[Reset Firefox – easily fix most problems]] article.
    Did this fix your problems? Please report back to us!
    Thank you.

  • Search engine within a site

    I know it has to do with the meta keywords in the html file, but how do i parse it. Do i have to make my own meta-keywords parser?
    Or is there an easier way to do this?

    So you want to parse HTML pages and index them for a search engine?
    There are HTML parses out there that can transform an HTML page into some kind of object model that you can then traverse.
    However, in practice many web pages are coded incorrectly, and this causes many parses to hiccup from time to time. You could search for some Java HTML parses out on www.google.com, but if I were you, I'd load the HTML page into a String, and then search for the <META> tag and then parse its attributes manually.
    Let me know if you want more detailed help.

  • How can I add a search engine to Safari in Mavericks?

    Safari only gives me three choices for a search engine:
       Google
       Yahoo
       Bing
    I would like it to default to StartPage, or maybe even Duck Duck Go.
    How do I add sites to the drop down menu?

    Glims is a 3rd party addon that will let you add many different search engines to Safari's menu bar:
    Just check those you want to use in the Safari/Glims preference pane:
    OT

  • Is the meta data in google search engine changeable?

    Hi there,
    I have made a website using iweb, published with mobile me and using web forwarding. The website when seen in google comes up with the following meta data to describe the site:
    GALLERY shapeimage10_link0. ABOUT shapeimage11_link0. HIRE shapeimage12_link0. CONTACT shapeimage15_link0.
    the website is catebakescake.co.uk
    How can I change this? I have been onto iwebFAQ and it seems to say you cannot change the meta data and google looks at your page and describes it from the info on the page; but I am sure there must be a way to change the information as for some reason it is looking at the shape files of the page and not the text! Obviously I just want to put information about the site which explains what the site is about. I thought I had done this as I put the descriptions in when publishing the site with the web hosting company.
    Thanks in advance

    You can use iWeb SEO Tool to add tags and meta data but what you really need to do is write some interesting and relevant text for the search engine spiders to read.
    The problem with your landing page is that there is no actual text for the spiders to read. Your "text" is an image so all they can pick up is the image titles and not the text.
    Read more about SEO here....
    http://www.iwebformusicians.com/SearchEngines/SEO.html
    ... and SEO Tool here....
    http://www.iwebformusicians.com/SearchEngines/Tags.html
    "I may receive some form of compensation, financial or otherwise, from my recommendation or link."

  • Web Site Search Engine

    Hi,
    I was wondering if someone could help me locate a code for a search engine of my web site. My dilemma is, most of the search engines I've located search the web. This's not what I want. I'm looking for a search engine where you can select from a menu display and then hit search and it will return results in a text format from my web site only.
    If you're not sure what I'm looking for, an example can be found at www.westernvirtualairlines.com
    They use the Quick Flight Search which is exactly what I would like to use.
    If you enter Salt Lake City in the "from" box, and Los Angeles in the "to" box it will return the results in the same format I wish to use.
    Any help would be awsome and greatly appreciated.
    Thanks for any help and assistance.
    jak62562

    Well, you can use Google to search your own web pages, but I don't think that's quite what you're asking.
    That site you pointed to undoubtedly has its own database set up, so when you click search, it searches through their database and returns any relevant data to your search. If this is what you're attempting to accomplish, there's not going to be any script you can just cut and paste and have everything magically work for you.
    While you may not see it, those aren't simply HTML pages you're looking at. That site uses Active Server Pages, orASP, Microsoft's parallel to JSP (Java Server Pages). This pages query the database, organize the result and display it on the page. If you want something this complex, you'll either need to invest some serious time into learning a technology that can handle it, or hire a developer to do it for you.

  • 6.1 search engine customization doc error

    I'm not sure who to report this to, but there seems to be an error in the documentation for the search engine customization examples.
    The page is at http://docs.sun.com/source/817-1831-10/agsearch.html
    The section in question says, "The following sample code lists the top ten articles on Java Web Services on a site"
    The first line of HTML/JSP says:
    <s1ws:search Collection="Articles" Query="Java Web Services" />
    Note that "Collection" and "Query" are uppercase.
    If you copy that whole example block into a JSP and run it, you get server errors.
    On my machine (Win 2000), the error messages say the attributes are invalid according to the TLD. In the TLD, they're lowercase and if I change them to lowercase, the page runs (after having changed the collection to match my local config.)
    If that saves anybody any headache time, so be it.
    Dave

    Thanks for the note, David. It's been forwarded along.

Maybe you are looking for

  • "No Bootable Device -- insert a boot disk and press any key" Error Message.

    Last month I just got a HP netbook (HP Mini 110-3100) and when I got it from the box, the error message; "No bootable device -- insert a boot disk and press any key" appeared. But then I managed to boot up the netbook normally. After that, I can use

  • Error in BAPI_ENTRYSHEET_CREATE

    Hi, I am uploading data through BAPI_ENTRYSHEET_CREATE to create service entry sheet. we have tested the same through se37 and from there it is working fine. but when test the same throgh program it is giving the SES number but in return it is saying

  • Saving same png file gives me image with 10x smaller size

    Hi. I would appriciate if someone explained me what I'm doing wrong. I have a png image, of 256x128 size. It has 8bits per pixel for color. Now this image's size is 97kB. I open it in photoshop, I save it as different png file, and then... this image

  • Transaction Reading in Workflow

    Hi Experts, I need to create a workflow which will take care of sending a mail to the managers and senior managers based on the duration the complain was created. Suppose from the time created no action taken on complain and it has become 24 hrs then

  • ORA-01482: unsupported character set

    Hello, does anybody have an experience with error message "ORA-01482: unsupported character set" which is displayed instead of the content of page region, which is of type select SQL. The strange thing with this is that e.g. LOVs are rendered/display