Indexed substring search

Let's say I have lots of instances of the following class:   public class Bob {
      public final String name;
      public Bob(String name) {
         if (name==null) throw new ArgumentException();
         this.name=name;
   }The name of a bob can be pretty much anything, but usually it is 5-50 characters long and consists of ordinary words.
E.g.   List bobs=new ArrayList();
   bobs.add(new Bob("I am bob"));
   bobs.add(new Bob("foo bar"));
   bobs.add(new Bob("once upon a time there was a bob with a long name, and he's me"));How do I index the bobs so that I can search for the set of all bobs whose names contain a given substring. E.g., searching for "am" should result in a set containing the first ("I am bob") and the third ("...long name...") of the three bobs above.
If the substring is a whole word or at the beginning of one then it's easy; just keep a list of ⟨String word, Bob[] bobs⟩ tuples (each one listing all bobs whose name contains the word), sorted by word. Then you can do a binary search for the words you're interested in and put all matching bobs in a set that you'll then return.
However, if the substring is not at a word boundary it gets a lot harder!
The fact that there are so many bobs (perhaps a million) also places some constraints on the relative size of the indexes. After all, we don't want the index data structures to consume many times more memory than the bobs themselves do.
Any ideas?
- Marcus Sundman

I came up with this. I don't know if it's a good solution for you or at all but it seems to work and minimizes the nodes down to the size of the alphabet.
It can surely be optimized and cleaned up. Feel free to modify. Let me know what you think:
import java.util.*;
public class FancyTree
    public static void main(String[] args)
        FancyTree tree = new FancyTree();
        tree.add("amble");
        tree.add("same");
        tree.add("name");
        tree.add("mean");
        tree.add("earn");
        tree.add("glean");
        tree.add("same meaning");
        tree.add("meanie");
        System.out.println("a: " + tree.findStrings("a"));
        System.out.println("i: " + tree.findStrings("i"));
        System.out.println("am: " + tree.findStrings("am"));
        System.out.println("sa: " + tree.findStrings("sa"));
        System.out.println("me: " + tree.findStrings("me"));
        System.out.println("samean: " + tree.findStrings("samean"));
        System.out.println("mean: " + tree.findStrings("mean"));
        System.out.println("ean: " + tree.findStrings("ean"));
        System.out.println("ea: " + tree.findStrings("ea"));
    private Node[] nodes = new Node[255];
    public void add(String s)
        Link parent = null;
        for (int i = 0; i < s.length(); i++) {
            Link link = new Link(s, parent);
            getNode(s.charAt(i)).add(link);
            parent = link;
    private Node getNode(char c)
        if (c > 255) throw new IllegalArgumentException();
        Node node = nodes[c];
        if (node == null) {
            node = new Node();
            nodes[c] = node;
        return node;
    public Set findStrings(String subString)
        Collection links = getNode(subString.charAt(0)).getLinks();
        for (int i = 1; i < subString.length() && links.size() > 0; i++) {
            links = getNode(subString.charAt(i)).getLinks(links);
        Set strings = new HashSet();
        for (Iterator i = links.iterator(); i.hasNext(); ) {
            strings.add(((Link) i.next()).getStringParent());
        return strings;
class Node
    private Map links = new HashMap();
    public void add(Link link)
        links.put(link.getParent(), link);
    public Collection getLinks()
        return links.values();
    public Set getLinks(Collection parents)
        HashSet set = new HashSet();
        for (Iterator i = parents.iterator(); i.hasNext();) {
            Object link = links.get(i.next());
            if (link != null) set.add(link);
        return set;
class Link
    private String stringParent;
    private Link linkParent;
    Link(String stringParent, Link linkParent)
        this.stringParent = stringParent;
        this.linkParent = linkParent;
    public Object getParent()
          return linkParent == null ? (Object) stringParent : linkParent;
    public String getStringParent()
        return stringParent;
}

Similar Messages

  • Substring search with Oracle context indexes

    Hi,
    i would like to know if it is possibile to do a substring search with one of the obtion offer with the context indexes.
    (ctxcat,ctxrule,context)
    example:
    i would like to search the word 'berub' in a column A in table_example.
    the value in the column a are :
    The betther
    berube
    A.berube
    berub
    Berub
    BERUB
    R berube
    S tartif
    Y Thibeault
    the rows return should be :
    berube
    A.berube
    berub
    Berub
    BERUB
    R berube
    A simple sql could be
    select * from table_example where upper(a) like upper('%berub%' );
    How i can do this same action with the context indexes and a select (catsearch, contains, matches), if it is possible?
    A example will be welcome
    Thanks

    I know how to do explain plan.
    my point is not the query i post, it's just a example.
    I have many query on my production we optimize many times (they past from 3min to 15 sec with optimisation, but we want to have better result). At this point we are looking to implant the context indexes to make them more efficient.
    Do make this sql more efficient we have to deal with like '%xxxxxx%' and the context indexes like to be a option, but we have to be able to do some substring search with context option.
    Is it possible to do it and how?
    This is my question and why i post it here. The query is just a simple example to illsutrate what i want.
    Thanks to anyone who can answer my question.

  • Spotlight searching no longer working - indexing and search disabled.

    I've been searching the web and tried everything:
    Server 10.5.8
    In Server Admin - the attached drive is a SharePoint with Spotlight search on.
    I've used mdutil to enable Spotlight.
    I've checked permissions.
    I can search the Boot Drive. I can't search the attached drive.
    mdutil returns indexing and search disabled when used to turn it on.
    very frustrating.
    Anyone out there have a clue?
    Thanks,
    Mark

    HI James,
    Open System Preferences/Spotlight and click the Privacy tab. Where you see; Delete any locations listed, Quit System Preferences and restart your Mac and see if you can use Spotlight.
    Spotlight Tips
    Spotlight: How to re-index folders or volumes
    Carolyn

  • How to show only Indexes in Search Scope

    Hi All,
    Is it possbile to allow users to choose only Indexed in search scope.
    I tried editing the Search Options set, there i found that I can enable and disable search scope, but whn search scope is enabled both the Indexes and folders are visible.
    I want to show only Indexes to the user. How can I hide the folder options alone in the search scope?
    Thanks and Regards
    George

    Instead of using the Search Options set in the KM Search iView, use the Search Component Set.
    Create your own Search Component Set by going to <b>System Administration > System Configuration > Knowledge management > Content Management > User Interface > Search > Search Components</b>
    While creating the Search Component Set, Specify <b>Component for Search Options</b> as <u>search_input_indexes</u>.
    This should help you.
    Or you could do the following.
    In the Search Options set, Do the following.
    1) Uncheck <b>Enable Search Scope Selection</b>.
    2) Provide the various index names in <b>Search Index IDs (csv)</b>
    The above option would by default search the selected indexes for the search term and would not provide the user with the functionality of searching in folders.
    Pradeep.

  • If Image is Saved As a Text, Would the Image Text, As a Link, Be Indexed by Search Engines.

    Hi I want to put a text long the height of the brown box.
    The text would be vertical as one looking at the page.
    I believe with page make the text can be rotated.
    If Image is Saved As a Text, Would the Image Text, As a Link, Be Indexed by Search Engines.
    or how else to do this?
    an

    If Image is Saved As a Text
    Impossible. It's either image or text.  Save a JPEG as a Word document.  Try it.  Seriously, change the file extension, open it in Word and see what happens.
    Would the Image Text, As a Link,
    Huh?  You lost me here.
    Be Indexed by Search Engines.
    Indexing of search engines.  Yes, Google sees links and Google sees alt text for images.  Google does not see images nor Flash.

  • WebHelp Pro and FlashHelp Pro not allowing Substring Search?

    The parameter to enable the substring search is not showing up in the Webhelp Pro or FlashHelp Pro output format?
    I read through the Adobe help file and it states that it should be there.  We are trying out the RoboServer and am finding out that search results are not that accurate.  I have switched from server side to client side with no real changes.
    Thank you.
    JC

    When you report a problem you should always try to include as much relevant information as possible. Saying "it doesn't work, help me!" is not very helpful to us, because there could be a million reasons for it not working... What exactly happens when you try to convert a PDF to Word? Are there any error messages? How exactly are you trying to do it? Also, what operating system do you use? What version of Word? What exactly version of Acrobat? etc.

  • Problem with hide TOC, Index ans Search

    Hello,
    I want to generate a chm where TOC , Index ans Search are
    hide when I open my chm.
    I don't find a solution to do this.
    I try to find in creating a new "Window", but I doesn't
    operate.
    If someone have a solution, thanks to tell me,
    because I'm going to loose lots of time for that...

    Hi vlavergne,
    In theory you should be able to hide the TOC, Index and
    Search simply by selecting
    Hide Nav Pane on Startup in the Window properties. However,
    after a bit of experimentation it seems that you also have to
    deselect the
    Remember Window Size and Position option.
    Hope this helps
    Anne

  • Getting th following error while trying to do BUILD INDEX for search in BCC

    Hi I am getting the following error while I am trying to build index through search administration from BCC.
    I am using windows7, Weblogic 10.3.2 and ATG9.2
    On BCC screen I am getting the message as : An unexpected error has occurred. Please try again later or contact system administrator.
    In the console logs, the following error is occurring:
    2011-10-06 18:26:58,319;;;org.apache.commons.digester.Digester.sax;;;DEBUG;endDocument()
    **** Error Thu Oct 06 18:27:35 CEST 2011 1317918455348 /atg/searchadmin/repository/service/SyncService No partition_step step found in task '
    700001' of type 'check' atg.search.exception.ObjectNotFoundException: SyncStepDefinition not found.. id=null, item-desciptor=null
    **** Error Thu Oct 06 18:27:35 CEST 2011 1317918455348 /atg/searchadmin/repository/service/SyncService at atg.searchadmin.repository.
    beans.methods.BaseSyncTaskMethods.setSyncStepDefinitionOption(BaseSyncTaskMethods.java:307)
    **** Error Thu Oct 06 18:27:35 CEST 2011 1317918455348 /atg/searchadmin/repository/service/SyncService at atg.searchadmin.repository.
    beans.methods.BaseSyncTaskMethods.setPartitionReuseType(BaseSyncTaskMethods.java:101)
    **** Error Thu Oct 06 18:27:35 CEST 2011 1317918455348 /atg/searchadmin/repository/service/SyncService at atg.searchadmin.repository.
    beans.methods.BaseSyncTaskMethods.setPartitionReuseType(BaseSyncTaskMethods.java:91)
    **** Error Thu Oct 06 18:27:35 CEST 2011 1317918455348 /atg/searchadmin/repository/service/SyncService at atg.searchadmin.repository.
    beans._SyncTaskDefinition_Impl.setPartitionReuseType(_SyncTaskDefinition_Impl.java:107)
    **** Error Thu Oct 06 18:27:35 CEST 2011 1317918455348 /atg/searchadmin/repository/service/SyncService at atg.searchadmin.adminui.for
    mhandlers.EstimateIndexSummaryFormHandler.createTask(EstimateIndexSummaryFormHandler.java:144)
    **** Error Thu Oct 06 18:27:35 CEST 2011 1317918455348 /atg/searchadmin/repository/service/SyncService at atg.searchadmin.adminui.for
    mhandlers.EstimateIndexSummaryFormHandler.handlePerformSyncTask(EstimateIndexSummaryFormHandler.java:236)
    **** Error Thu Oct 06 18:27:35 CEST 2011 1317918455348 /atg/searchadmin/repository/service/SyncService at sun.reflect.NativeMethodAcc
    essorImpl.invoke0(Native Method)
    **** Error Thu Oct 06 18:27:35 CEST 2011 1317918455348 /atg/searchadmin/repository/service/SyncService at sun.reflect.NativeMethodAcc

    Please check:
    1) Values of in engineDir and deployShare in LaunchingService component
    2) Search environment name
    3) To create a full index, the indexing engine requires a clean partition, a file from which all indexes are created: /atg/search/routing/RoutingSystemService
    You need to identify the location of the clean partition by creating a /localconfig/atg/search/routing/RoutingSystemService.properties file. Use the cleanPhysicalPartitionPath property
    to identify the full path to the clean partition. There is a copy of the clean partition located at <Searchdir>/SearchEngine/operatingsystem/data/initial.index. To resolve the path
    correctly, use a relative path to identify the clean partition location as a local copy. For example: cleanPhysicalPartitionPath =../data/initial.index
    Thanks and regards,
    Anuj

  • Dynamic website indexable by search engine

    Hi<
    i was looking on the web and apparently I can make my dynamic
    webpages indexable by search engine spiders by using a "/"
    character instead of the standard "?" when passing a URL Query
    String. I am using Coldfusion MX7. I do not know where to
    reconfigure this, can anyone help? My email address is
    [email protected]
    Thanks
    Senator

    Search Engine Safe URL's - theres an article over on
    CommunityMX you could
    use to do this.
    http://www.communitymx.com/abstract.cfm?cid=7BFFE
    Brendon
    "howonearthdidigetin" <[email protected]>
    wrote in message
    news:eoc8t8$11h$[email protected]..
    > Hi<
    >
    > i was looking on the web and apparently I can make my
    dynamic webpages
    > indexable by search engine spiders by using a "/"
    character instead of the
    > standard "?" when passing a URL Query String. I am using
    Coldfusion MX7.
    > I do
    > not know where to reconfigure this, can anyone help? My
    email address is
    > [email protected]
    >
    > Thanks
    >
    > Senator
    >

  • Substring search in HtmlHelp?

    Hello again!
    Is it possible to enable the substring search in HtmlHelp? If so: how?
    Or is there any other possibility to show the topics "Standardeinstellungen", "Sondereinstellungen" and "Einstellungen" if somebody searches for "Einstellungen"?
    Janine

    Hi, Janine,
    Yes, you can locate all three terms by entering the following in the "Type in the keyword to find" box on the Search tab:
    *einstellungen
    Here, the asterisk is a wildcard character that stands for zero or more characters.
    The following page provides more information on the use of wildcards and describes other advanced search facilities that are available in HTML Help files:
    http://helpware.net/htmlhelp/hhfindingtext.htm
    If you want to give your users instructions on how to conduct effective searches, you can obtain a file called "ViewHlp.chm" from the Microsoft site, decompile it, and include the appropriate topics in your own Help file. You can get the ViewHlp.chm file from here:
    http://msdn.microsoft.com/en-us/library/ms669985(VS.85).aspx
    Pete

  • Need some advice with indexing and search server in multihomed environment.

    Hi,
    I want to introduce the JISS (java indexing and search server). We have an multihomed environment with two frontends for convergence and imap/pop proxy service and two mail stores in a cluster HA environment
    (Sun cluster 3.2, messaging server 7u3-15.01, convergence 1.0-12.01 running on glassfish 2.1.1). The directory servers (multimaster) are on speperated servers.
    I viewed the jiss deployment pages in the wiki (http://wikis.sun.com/display/CommSuite/Indexing+and+Search+Service+Deployment+Planning), but they are more confusing than helpful.
    My questions are as follows:
    Can I put the jiss web service on the convergence server (to share the same glassfish server?
    Is it better to put the indexing part of JISS on a seperate server or on the convergence server or better on the mail store servers?
    Can I run the JMQ broker in an HA environment on the cluster? Is it possible to run JMQ together with messaging server in the same cluster group?
    Can JISS index two mail stores (I didn't find anything in the config guide)?
    Best Regards,
    Ruediger

    ruediger_kunze wrote:
    I want to introduce the JISS (java indexing and search server). We have an multihomed environment with two frontends for convergence and imap/pop proxy service and two mail stores in a cluster HA environmentI would recommend holding off until the next release (Communication Suite 7 update 1) as ISS update 1 provides a large number of useful enhancement.
    I viewed the jiss deployment pages in the wiki (http://wikis.sun.com/display/CommSuite/Indexing+and+Search+Service+Deployment+Planning), but they are more confusing than helpful.
    My questions are as follows:
    Can I put the jiss web service on the convergence server (to share the same glassfish server?Yes. This is the scenario used in the single-host-install guide:
    http://wikis.sun.com/display/CommSuite7/Sun+Java+Communications+Suite+7+on+a+Single+Host
    Is it better to put the indexing part of JISS on a seperate server or on the convergence server or better on the mail store servers?This is answered in the Deployment Planning guide:
    "Indexing requires significant CPU resources, thus, it is best to install the indexing service on a separate host dedicated to an ISS single server installation. If this is not an option, then install ISS on the back-end host as a single server installation, and install GlassFish Server as well for ISS."
    Can I run the JMQ broker in an HA environment on the cluster? Is it possible to run JMQ together with messaging server in the same cluster group?This article may help:
    http://wikis.sun.com/display/CommSuite/Deploying+GlassFish+Message+Queue+in+a+Highly+Available+Environment
    Can JISS index two mail stores (I didn't find anything in the config guide)?When you Bootstrap the account you point at the mailhost that the account resides on:
    http://wikis.sun.com/display/CommSuite/Administering+Indexing+and+Search+Service
    "Creating New ISS Accounts"
    Regards,
    Shane.

  • Webinar: Understanding TREX Indexing and Search Options

    <b>SAP NetWeaver Know-How Network Webinar: 
    Understanding TREX Indexing and Search Options
    Wednesday 25 August 2004
    11 a.m. EDT</b>
    On Wednesday 25 August, Larry Brambrut, an EP RIG Consultant, hosts the webinar titled  <b>Understanding TREX Indexing and Search Options</b> as part of the ongoing SAP NetWeaver Know-How Network Webinar Series.
    Here’s how Larry describes his webinar presentation:
    “This session will describe the enhancements to "Search and Classification"(TREX) in NetWeaver '04 and EP 6.0 SP2 Patch 6. The session will include a discussion of the CM enhancements such as new crawlers, new search UI options and plug-ins, and TREX enhancements such as the new TREX architecture, delta indexing, and new TREX Admin Tool.”
    SDN invites you to post your questions to the presenter prior to the webinar and continue the online discussion afterward.
    <b>How to Participate</b>
    (Please go to the SDN webinar schedule page to find more information)
    Dial-in Information:
    Date: Wednesday 25 August 2004
    Time: 11 a.m. EDT
    Within the U.S., call: +1.888.428.4473
    Outside the U.S., call: +1.651.291.0618
    Password: NetWeaver04
    WebEx Information:
    Topic: SAP NetWeaver Know-How Network
    Date: Wednesday 25 August 2004
    Time: 11 a.m. EDT
    Meeting Number: 742391500
    Meeting Password: netweaver04 (lowercase)
    WebEx Link: sap.webex.com
    Replay Information:
    A recorded replay of this call will be available for approximately three months after the webinar. Access this recording by dialing the appropriate number and using the replay access code 720155.
    Toll-free: +1.800.475.6701
    International: +1.320.365.3844
    <b>
    About the SAP NetWeaver Know-How Webinar Series</b>
    The SAP NetWeaver Know-How Webinar Series is driven by the SAP NetWeaver Regional Implementation Group (RIG), part of the SAP Development organization. The mission of the SAP NetWeaver RIG is to enable customers, employees, and partners to successfully implement the SAP NetWeaver solution. This SAP RIG has expertise in BI, EP, XI, and WebAS. They contribute their implementation expertise to the SDN implementation forums as well as to the SAP NetWeaver Know-How Webinar Series.
    <b>Disclaimer</b>
    SDN is not responsible for any changes to the webinar schedule. The webinar schedule may be changed or cancelled without prior notice.

    Hi there,
    I just read this thread, and maybe someone here can answer my current trex question:
    I have created an ordinary CM repository, and created an index with this repository as source. Now the problem: I would like to exclude files in the repository with specific mimetypes from the TREX indexing process.
    I have verified that the TrexValidMimetypes.ini does not contain any reference to the Mimetypes I'm creating, but never the less, the document titles are searchable and are returned when searching.
    How do I get around this issue?
    Is it possible in NW04 or EP6.0 SP3 PXXX??
    Regards,
    Hco

  • InterMedia indexing and searching of zipped files

    Hello, I have interMedia successfully configured to index and query a repository of files (MS Word, Excel, PPT, PDFs, txt files)which are located on a file system. My issue is with zip files. I cannot successfully index and search zip files. I've tried zips that contain both ascii(text) and formatted files (doc, ppt), but interMedia seems not to recognize this particular MIME type. Is there a way to have interMedia index and search zip files? Thanks in advance for any assistance.

    You will have more luck with this question if you post it in the Oracle Text forum. This forum is for interMedia (image, audio video).

  • Indexing and searching excel file

    hai friends,
    i need to index and search the records from the excel file using lucene java
    if u ve any code for that plz give me
    thank you in advance

    gimbal2 wrote:
    I'm not even going to try and tell you just how wrong your post is.But I will! ;-)
    Ok, checking the items from [_How To Ask Questions The Smart Way_|http://www.catb.org/~esr/faqs/smart-questions.html]:
    - [_Write in clear, grammatical, correctly-spelled language_|http://www.catb.org/~esr/faqs/smart-questions.html#writewell]
    - [_Be precise and informative about your problem_|http://www.catb.org/~esr/faqs/smart-questions.html#beprecise] (especially the third item)
    - [_Be explicit about your question_|http://www.catb.org/~esr/faqs/smart-questions.html#explicit]

  • Sun Java Indexing and Search Service - services not starting(maintainance)

    I installed Comm Suite 7 in a single solaris host. I installed jiss as in wiki*. Installation was ok but the jiss index and search services won't start up(maintainance).
    --------------------------------- /var/iss/logs/iss-indexsvc.log.0---------------------------
    Wed Nov 04 16:16:16 IST 2009 com.sun.comms.iss.indexapi.IndexService startService WARNING: St
    arting index service.
    Wed Nov 04 16:16:17 IST 2009 com.sun.comms.iss.indexapi.IndexService startService SEVERE: JMS
    Exception: com.sun.messaging.jms.JMSSecurityException: [C4060]: Login failed: user=jmquser,
    broker=webmail.example.com:7676(39599)
    -----------------------/var/svc/log/application-jiss-indexSvc:default.log-----------------
    webmail.example.com:389 (tcp) => Active
    webmail.example.com:7676 (tcp) => Active
    Nov 4, 2009 4:16:16 PM com.sun.comms.iss.indexapi.IndexService main
    INFO: Begin checking write.lock files.
    Nov 4, 2009 4:16:16 PM com.sun.comms.iss.indexapi.IndexService startService
    WARNING: Starting index service.
    Nov 4, 2009 4:16:17 PM com.sun.messaging.jmq.jmsclient.ExceptionHandler logCaughtException
    WARNING: [I500]: Caught JVM Exception: java.io.EOFException
    Nov 4, 2009 4:16:17 PM com.sun.comms.iss.indexapi.IndexService startService
    SEVERE: JMS Exception: com.sun.messaging.jms.JMSSecurityException: [C4060]: Login failed: us
    er=jmquser, broker=webmail.example.com:7676(39599)
    Error getting IndexService instance: com.sun.comms.iss.common.IssException: JMS Exception:
    Service startup failed
    [ Nov  4 16:16:26 Method "start" exited with status 1 ]
    i run the setup.sh file several times with different values. but problem remains. i check the troubleshooting page too.
    Any help appriciated.
    wiki:
    (http://wikis.sun.comdisplayCommSuite7Communications+Suite+7+Installation+Scenario+-Indexingand+Search+Service)
    Thusith.

    Thusith.M wrote:
    =============================
    # ./imqusermgr list
    User repository for broker instance: imqbroker
    User Name Group Active State
    admin admin true
    guest anonymous true
    jmquser user true
    ============================
    The instance name above should be change i guess? am i correct?Given that the Application Server JMQ instance runs on port 7676 by default, you were most likely changing the wrong instance.
    Try adding the jmquser to the Application Server JMQ instance and perform the login test again e.g.
    /opt/SUNWappserver/imq/bin/imqusermgr add -u jmquser -p adminpass -g user
    /opt/SUNWappserver/imq/bin/imqcmd -b webmail.example.com:7676 list dst
    => login with user "jmquser" and password "adminpass"If you see the following message it means the "jmquser" user exists and the password is correct (the jmquser doesn't have enough rights to see the destinations by default):
    com.sun.messaging.jms.JMSSecurityException: [C4084]: User authentication failed:  user=jmquser, broker=webmail.example.com:7676(38692)
    Please check your security configurations.
    Listing destinations failed.Once that is verified try starting the indexSvc again and see if the original error persists.
    Regards,
    Shane.

Maybe you are looking for