Better word-boundary matches in text searches

Hi All,
The performance I've had from contains() searches in text-laden elements indexed with node-element-substring-string has been good, but I read somewhere else on this forum that regex searches using matches() don't make use of the same optimisations. So, as I found, performance plummets when you try approximating word-boundary or start-of-word matches with things like
//textyElement[matches(., "\bKEYWORD\b")]
or
//textyElement[matches(., "\bKEYWORD")]
The solution (at least as it applied in my case) was pretty obvious, but not as immediately obvious as I'd have liked it to be, so I thought I'd post it here for those others who are fairly new to XQuery and haven't found a better solution. Put a contains() first and get the benefits of its optimisations for literal searches...
//textyElement[contains(., "KEYWORD") and matches(., "\bKEYWORD")]
Of course there may be other/better ways of doing this -- and if there are I'd love to hear about them -- but on a pure performance level this took my test query from around 220 sec back down to around 28 ms, and such news is too good to keep to myself.

Tim,
Thank you very much for sharing your tip. It's a great idea for handling many types of regular expressions.
Regards,
George

Similar Messages

  • How to execute exact match & contains text search simultaneoulsy in Oracle 10g

    Hi,
    We have scenario where there are more than 50 million rows in a table with description column length as 1000 character. We have a web interface from where we generate a rule of comma separated keywords like
    "Standard", Single, Cancel, "deal"    & so on. The words in quotes needs to be checked for exact match & the one without quotes will be searched using contains.
    The problem is that we can have a rule of such a combination as large as 4000 characters of inclusion & 2000 such characters for exclusion (not to consider the description under exclusion) and this search when run on the table with millions of rows does not work using oracle regular expression, but works with smaller no. of search keywords.
    Is there a better way to do such a kind of search in Oracle or if not then outside oracle using any other tool
    Thanks,
    AP

    Please find below the table script, few insert statements along with the SP & function. Please help.
    -- Create Table
    CREATE TABLE Roomdescriptionmaster
    ID long,
    ROOMDESCRIPTION nvarchar2, --- max 1000 charaters
    Createddate datetime
    ----- Insert statements
    INSERT INTO ROOMDESCRIPTION (ID, ROOMDESCRIPTION, Createddate ) VALUES (1, 'Double Room (2 Adults + 2 Children) | FREE cancellation before Mar 16, 2014 PAY LATER All-Inclusive [ Included:10 % VAT] Meals:All meals and select beverages are included in the room rate.Cancellation:If canceled or modified up to 2 days before date of arrival,no fee will be charged.If canceled or modified later,100 percent of the first two nights will be charged.In case of no-show, the total price of the reservation will be charged. Prepayment:No deposit will be charged', sysdate)
    INSERT INTO ROOMDESCRIPTION (ID, ROOMDESCRIPTION, Createddate ) VALUES (1, 'Double or Twin Room | FREE cancellation before Feb 1, 2014 PAY LATER All-Inclusive [ Included:10 % VAT] Meals:All meals and select beverages are included in the room rate.Cancellation:If canceled or modified up to 2 days before date of arrival,no fee will be charged.If canceled or modified later,100 percent of the first two nights will be charged.In case of no-show, the total price of the reservation will be charged. Prepayment:No deposit will be charged', sysdate)
    INSERT INTO ROOMDESCRIPTION (ID, ROOMDESCRIPTION, Createddate ) VALUES (1, 'Quadruple Room (3 Adults + 1 Child) | FREE cancellation before Mar 16, 2014 PAY LATER Full board included [ Included:10 % VAT] Meals:Breakfast, lunch & dinner included.Cancellation:If canceled or modified up to 2 days before date of arrival,no fee will be charged.If canceled or modified later,100 percent of the first two nights will be charged.In case of no-show, the total price of the reservation will be charged. Prepayment:No deposit will be charged', sysdate)
    INSERT INTO ROOMDESCRIPTION (ID, ROOMDESCRIPTION, Createddate ) VALUES (1, 'Triple Room with Lateral Sea View (2 Adults + 1 Child) | FREE cancellation before Dec 6, 2013 PAY LATER All-Inclusive [ Included:10 % VAT] Meals:All meals and select beverages are included in the room rate.Cancellation:If canceled or modified up to 2 days before date of arrival,no fee will be charged.If canceled or modified later,100 percent of the first two nights will be charged.In case of no-show, the total price of the reservation will be charged. Prepayment:No deposit will be charged', sysdate)
    INSERT INTO ROOMDESCRIPTION (ID, ROOMDESCRIPTION, Createddate ) VALUES (1, 'Single Room with Lateral Sea View | FREE cancellation before Dec 6, 2013 PAY LATER All-Inclusive [ Included:10 % VAT] Meals:All meals and select beverages are included in the room rate.Cancellation:If canceled or modified up to 2 days before date of arrival,no fee will be charged.If canceled or modified later,100 percent of the first two nights will be charged.In case of no-show, the total price of the reservation will be charged. Prepayment:No deposit will be charged', sysdate)
    --SP
    CREATE OR REPLACE PROCEDURE
    SP_PGHGETROOMDESCRIPTION(v_BId                number,
                               v_DaysOfData         integer,
                               v_Incl1              nvarchar2,
                               v_Incl2              nvarchar2,
                               v_Incl3              nvarchar2,
                               v_Excl1              nvarchar2,
                               v_CurrentIndex       integer,
                               v_RecordPerPage      integer,
                               v_IndexMultiplier    integer,
                               ref_recordset        out sys_refcursor
                               ) as
    start_index        integer;
    end_index          integer;
    Incl1              nvarchar2(2000);
    Incl2              nvarchar2(2000);
    Incl3              nvarchar2(2000);
    Excl1              nvarchar2(2000);
    v_desc_utf_value   VARCHAR2(10);
    begin
    v_desc_utf_value:= 'utf8';
      if v_incl1 is null or trim(v_incl1) = '' then
         --dbms_output.put_line('include 1 is null or blank');
         Incl1 := '';
      else
         Incl1 := lower(v_Incl1);
      end if;
      if v_incl2 is null or trim(v_incl2) = '' then
         --dbms_output.put_line('include 2 is null or blank');
         Incl2 := '';
      else
         Incl2 := lower(v_Incl2);
      end if;
      if v_incl3 is null or trim(v_incl3) = '' then
         --dbms_output.put_line('include 3 is null or blank');
         Incl3 := '';
      else
         Incl3 := lower(v_Incl3);
      end if;
      if v_Excl1 is null or trim(v_Excl1) = '' then
         --dbms_output.put_line('Exclude 1 is null or blank');
         Excl1 := '';
      else
         Excl1 := lower(v_Excl1);
      end if;
       -- Old code
       --     and regexp_like(lower(ROOMDESCRIPTION), Incl1, 'i')
       --    and regexp_like(lower(ROOMDESCRIPTION), Incl2, 'i')
       --     and regexp_like(lower(ROOMDESCRIPTION), Incl3, 'i')
       --     and not regexp_like(lower(ROOMDESCRIPTION), Excl1, 'i')
    --- First call to SP
      if v_CurrentIndex = 1 then
          start_index := v_RecordPerPage * v_IndexMultiplier;
          end_index   := (v_CurrentIndex - 1 + v_IndexMultiplier) * v_RecordPerPage;
          open ref_recordset for
         select * from (
          select ROOMDESCRIPTION, Createddate,  rownum as rn
          from roomdescriptionmaster
          where BID = v_BId
          and TO_NUMBER(trunc(sysdate) - to_date(to_char(createddate, 'yyyy-mm-dd'),'yyyy-mm-dd')) <= v_DaysOfData
          and length(FN_GET_RESTRICTION(lower(ROOMDESCRIPTION),Incl1,Incl2,Incl3,Excl1,v_desc_utf_value)) > 0
          and row_num <= v_RecordPerPage * v_IndexMultiplier
          order by row_num;
      else
      --- Subsequent calls to SP using paging from UI
          start_index := (v_CurrentIndex - 1) * v_RecordPerPage + 1;
          end_index   := (v_CurrentIndex - 1 + v_IndexMultiplier) * v_RecordPerPage;
          open ref_recordset for
          select * from (
          select ROOMDESCRIPTION, Createddate,  rownum as rn
          from roomdescriptionmaster
          where BID = v_BId
          and TO_NUMBER(trunc(sysdate) - to_date(to_char(createddate, 'yyyy-mm-dd'),'yyyy-mm-dd')) <= v_DaysOfData
          and length(FN_GET_RESTRICTION(lower(ROOMDESCRIPTION),Incl1,Incl2,Incl3,Excl1,v_desc_utf_value)) > 0
          order by roomdescriptionmasterid desc
          where rn >= start_index
          and rn <= end_index
          order by rn;
      end if;
      commit;
    end SP_PGHGETROOMDESCRIPTION;
    --Function
    CREATE OR REPLACE FUNCTION FN_GET_RESTRICTION(
    v_rate_description IN NVARCHAR2,
    v_include_1 IN NVARCHAR2,
    v_include_2 IN NVARCHAR2,
    v_include_3 IN NVARCHAR2,
    v_exclude IN NVARCHAR2, v_desc_utf_value IN VARCHAR2)
      RETURN NVARCHAR2
    IS
    CURSOR include_1_cur IS
    select regexp_substr(str, '[^,]+', 1, level) str
    from (select v_include_1 str from dual)
    connect by level <= length(str)-length(replace(str,','))+1;
    CURSOR include_2_cur IS
    select regexp_substr(str, '[^,]+', 1, level) str
    from (select v_include_2 str from dual)
    connect by level <= length(str)-length(replace(str,','))+1;
    CURSOR include_3_cur IS
    select regexp_substr(str, '[^,]+', 1, level) str
    from (select v_include_3 str from dual)
    connect by level <= length(str)-length(replace(str,','))+1;
    CURSOR exclude_cur IS
    select regexp_substr(str, '[^,]+', 1, level) str
    from (select v_exclude str from dual)
    connect by level <= length(str)-length(replace(str,','))+1;
    include_1_rec include_1_cur%rowtype;
    include_2_rec include_2_cur%rowtype;
    include_3_rec include_3_cur%rowtype;
    exclude_rec exclude_cur%rowtype;
    tmp_var NVARCHAR2(200);
    tmp_var_int NUMBER;
    tmp_flag_int NUMBER;
    return_str NVARCHAR2(200);
    tmp_length NUMBER;
    tmp_length_include_1 NUMBER;
    tmp_length_include_2 NUMBER;
    tmp_length_include_3 NUMBER;
    tmp_length_exclude NUMBER;
    tmp_regex_pattern VARCHAR2(1000);
    flag_include_1_match INTEGER;
    flag_include_2_match INTEGER;
    flag_include_3_match INTEGER;
    flag_exclude_match INTEGER;
    BEGIN
      tmp_length_include_1 := nvl(length(v_include_1),0);
      tmp_length_include_2 := nvl(length(v_include_2),0);
      tmp_length_include_3 := nvl(length(v_include_3),0);
      tmp_length_exclude := nvl(length(v_exclude),0);
      flag_include_1_match := 0;
    flag_include_2_match := 0;
    flag_include_3_match := 0;
    flag_exclude_match := 0;
      IF tmp_length_include_1>0 OR tmp_length_include_2 >0
      OR tmp_length_include_3 >0 OR tmp_length_exclude >0 THEN
      IF v_desc_utf_value ='utf8' THEN
        ----------------------------------------------------- UTF 8 STARTED --------------
    -----------------------------------------   INCLUDE 1
      tmp_length := tmp_length_include_1;
      IF tmp_length > 0 THEN
      tmp_flag_int :=0;
    FOR include_1_rec in include_1_cur
    LOOP
      tmp_var := trim('' || include_1_rec.str);
      --dbms_output.put_line(tmp_var);
      tmp_regex_pattern := '[^[:alnum:]]'||tmp_var||'[^[:alnum:]]|^'||tmp_var||'$|^'||tmp_var||'[^[:alnum:]]|[^[:alnum:]]'||tmp_var||'$';
      tmp_var_int := nvl(regexp_instr(v_rate_description,tmp_regex_pattern,1,1),0);
      IF (tmp_var_int <> 0) THEN
      tmp_flag_int :=1;
      flag_include_1_match := 1;
      EXIT;
      END IF;
    END LOOP;
    ELSE
      flag_include_1_match := 1;
    END IF;
    --------------------------------------------  INCLUDE 2
    tmp_length := tmp_length_include_2;
      IF tmp_length > 0 THEN
        tmp_flag_int :=0;
    IF flag_include_1_match =1 THEN
      FOR include_2_rec in include_2_cur
    LOOP
      tmp_var := trim('' || include_2_rec.str);
    tmp_regex_pattern := '[^[:alnum:]]'||tmp_var||'[^[:alnum:]]|^'||tmp_var||'$|^'||tmp_var||'[^[:alnum:]]|[^[:alnum:]]'||tmp_var||'$';
      tmp_var_int := nvl(regexp_instr(v_rate_description,tmp_regex_pattern,1,1),0);
      IF (tmp_var_int <> 0) THEN
      tmp_flag_int :=1;
      flag_include_2_match := 1;
      EXIT;
      END IF;
    END LOOP;
    END IF;
    ELSE
      flag_include_2_match := 1;
    END IF;
    -------------------------------------------- INCLUDE 3
    tmp_length := tmp_length_include_3;
      IF tmp_length > 0 THEN
      tmp_flag_int :=0;
       IF flag_include_2_match =1 THEN
      FOR include_3_rec in include_3_cur
    LOOP
      tmp_var := trim('' || include_3_rec.str);
    tmp_regex_pattern := '[^[:alnum:]]'||tmp_var||'[^[:alnum:]]|^'||tmp_var||'$|^'||tmp_var||'[^[:alnum:]]|[^[:alnum:]]'||tmp_var||'$';
      tmp_var_int := nvl(regexp_instr(v_rate_description,tmp_regex_pattern,1,1),0);
      IF (tmp_var_int <> 0) THEN
      tmp_flag_int :=1;
      flag_include_3_match := 1;
      EXIT;
      END IF;
    END LOOP;
    END IF;
    ELSE
      flag_include_3_match := 1;
    END IF;
    -------------------------------------------- EXCLUDE
    tmp_length := tmp_length_exclude;
    IF tmp_length > 0 and flag_include_3_match =1 THEN
      FOR exclude_rec in exclude_cur
    LOOP
      tmp_var := trim('' || exclude_rec.str);
    tmp_regex_pattern := '[^[:alnum:]]'||tmp_var||'[^[:alnum:]]|^'||tmp_var||'$|^'||tmp_var||'[^[:alnum:]]|[^[:alnum:]]'||tmp_var||'$';
      tmp_var_int := nvl(regexp_instr(v_rate_description,tmp_regex_pattern,1,1),0);
      IF (tmp_var_int <> 0) THEN
      tmp_flag_int := -1;
      return_str := '';
      EXIT;
      END IF;
    END LOOP;
    END IF;
      ELSE
      ----------------------------------------------------- UTF 16 STARTED --------------
      -----------------------------------------   INCLUDE 1
      tmp_length := tmp_length_include_1;
      IF tmp_length > 0 THEN
      tmp_flag_int :=0;
      FOR include_1_rec in include_1_cur
    LOOP
      tmp_var := trim('' || include_1_rec.str);
      --dbms_output.put_line(tmp_var);
    tmp_var_int := nvl(INSTR(v_rate_description,tmp_var,1,1),0);
      IF (tmp_var_int <> 0) THEN
      tmp_flag_int :=1;
      flag_include_1_match := 1;
      EXIT;
      END IF;
    END LOOP;
    ELSE
      flag_include_1_match := 1;
    END IF;
    --------------------------------------------  INCLUDE 2
    tmp_length := tmp_length_include_2;
      IF tmp_length > 0 THEN
        tmp_flag_int :=0;
    IF flag_include_1_match =1 THEN
      FOR include_2_rec in include_2_cur
    LOOP
      tmp_var := trim('' || include_2_rec.str);
    tmp_var_int := nvl(INSTR(v_rate_description,tmp_var,1,1),0);
      IF (tmp_var_int <> 0) THEN
      tmp_flag_int :=1;
      flag_include_2_match := 1;
      EXIT;
      END IF;
    END LOOP;
    END IF;
    ELSE
      flag_include_2_match := 1;
    END IF;
    -------------------------------------------- INCLUDE 3
    tmp_length := tmp_length_include_3;
      IF tmp_length > 0 THEN
      tmp_flag_int :=0;
       IF flag_include_2_match =1 THEN
      FOR include_3_rec in include_3_cur
    LOOP
      tmp_var := trim('' || include_3_rec.str);
    tmp_var_int := nvl(INSTR(v_rate_description,tmp_var,1,1),0);
      IF (tmp_var_int <> 0) THEN
      tmp_flag_int :=1;
      flag_include_3_match := 1;
      EXIT;
      END IF;
    END LOOP;
    END IF;
    ELSE
      flag_include_3_match := 1;
    END IF;
    -------------------------------------------- EXCLUDE
    tmp_length := tmp_length_exclude;
    IF tmp_length > 0 and flag_include_3_match =1 THEN
      FOR exclude_rec in exclude_cur
    LOOP
      tmp_var := trim('' || exclude_rec.str);
      tmp_var_int := nvl(INSTR(v_rate_description,tmp_var,1,1),0);
      IF (tmp_var_int <> 0) THEN
      tmp_flag_int := -1;
      return_str := '';
      EXIT;
      END IF;
    END LOOP;
    END IF;
    END IF;
    IF tmp_flag_int = 1 THEN
    return_str := 'truely matched';
    ELSE
    return_str := '';
    END IF;
    ELSE
    return_str := '';
    END IF;
    return return_str;
    EXCEPTION
      WHEN OTHERS THEN
      --dbms_output.put_line('Exception');
        RAISE;
    END FN_GET_RESTRICTION;

  • Problems using and configuring Oracle 10gR2 database full-text search

    I am having problems trying to set up full-text indexing and search with Universal Content Management (UCM). I followed the Oracle Content Server Installation Guide for windows at [http://download-west.oracle.com/docs/cd/E10316_01/cs/cs_doc_10/documentation/integrator/install_cserver_win_10en.pdf].
    What I did was:
    1. Modify E:\oracle\ucm\server\config\config.cfg by adding SearchIndexerEngineName=DATABASE.FULLTEXT to the end of the file.
    2. Restart the content server.
    3. Rebuild the search indexing using Repository Manager.
    However, I keep seeing the following error when I query by entering words in the "Full-Text Search" box.
    Unable to retrieve search results. Unable to retrieve search results. Unable to create result set for query 'SELECT IdcColl1.dID, dDocName, dDocTitle, dDocType, dRevisionID, dSecurityGroup, dDocAuthor, dDocAccount, dRevLabel, dFormat, dOriginalName, dExtension, dWebExtension, dInDate, dOutDate, dCreateDate, dPublishType, dRendition1, dRendition2, VaultFileSize, WebFileSize, URL, dFullTextFormat, dFullTextCharset, DocMeta.*
    FROM IdcColl1, DocMeta
    WHERE IdcColl1.dID=DocMeta.dID AND (((CONTAINS(dDocFullText,'test') > 0 ))) ORDER BY dInDate Desc'. ORA-20000: Oracle Text error:
    DRG-10599: column is not indexed
    Some web searches suggested the following (all of which I have tried but not resolved this problem).
    1. Publish the schema using Configuration Manager (applet) and then rebuild index
    2. Set the dDocFullText as a "zone field". This is not possible, because dDocFullText does not show up under the list of fields under "Database" or "DatabaseFullText" for the Search Engine drop down (when using Zone Fields Configuration).
    3. Reboot the server (did not work either).
    I logged onto the Oracle database and checked the IdcColl1 table. There is indeed, no index for the field, dDocFullText. There is only 1 index for the field, did. The field, dDocFullText, is a BLOB. The question is, if I am supposed to create an index manually for this field, how would I do it? A web search has not been fruitful in answering this question.
    Here are my server settings.
    For UCM:
    Operating System: Windows 2003 Enterprise
    UCM : 10gR3
    Memory: 1 GB
    Web Server: Apache 2.2.11
    For Oracle:
    Operating System: Windows 2003 Enterprise
    Oracle: 10gR2
    Memory: 1 GB
    Thanks.

    I found out what the problem was. The problem was that I had to create the role, stellent_role, as described in the installation manual. After I created this role and assigned the database user to this role, a restart of the Content Server services and collection rebuild of the index fixed the problem.
    However, I did notice one thing. I checked in 3 PDF files, and when I used Repository Manager to do a collection rebuild, I noticed that for Indexer Counters, the count for Full Text was 0 and the count for Meta Only was 3.
    Anyone have any ideas? Is there something else that I missed? From reading the installation manual, it was not clear how database full-text indexing/searching would handle PDF files.

  • Is it possible to ignore noise words conditionally in working with Full text search containstable

    I have a question on stoplist file. I need to search for exact phrase string("this is the incident") which contains   noise words. As part of the FT search engine, during parsing it eliminates noise word and search on remaining string in
    the given phrase. 
    let us say there are 10 rows which contains the term "incident" in the FT table . and 1 row which has the exact phrase.i.e."this is the incident".
    if we use containstable() to search for "this is the incident", we are getting 10 rows instead of 1 row.
    To resolve the issue, we have 3 solutions
    1.either stoplist file needs to be modified to remove the words (this,is,the)
    2. set stoplist = OFF.
    3.empty stoplist.
    Apart from the above solutions, is there any better solution with out touching noise words file list.
    If any solution that provides flexibility to ignore noise words conditionally at one time and not to ignore them. 
    Please provide your suggestion.
    kkprasad

    One question that I ask is:  Why would I want to exclude noise words?
    Noise words were created to limit the size of the full text indexes and avoid processing the many 'this', 'is', and 'the' common words.  But the disadvantage of doing so is that you cannot find some things as you would like.
    My feeling is that computers are more powerful and have more storage and it is often better to just index everything.  As long as your search does not include 'the', then the large number of 'the's in the system will pretty much be ignored. 
    NOTE: If you change the noise words, including SET STOPLIST = OFF, you have to rebuild the index in order for it to implement your decision.
    Of course, for very, very large full text indexes you would need to test.
    Is your full text search on relation database columns, e.g.  Description NVARCHAR(1000)  or are you searching Word, Excel, and other more complex data?
    If your full text is relational columns, it might be that you could:
    1. Select only the fulltextkey into a temp table (e.g. #FTSfulltextkey) from the full text index using noise words.  That would give you 10 rows.
    2. Then directly query the table to find the string as you define above.  (But remember that punctuation and symbols are generally ignored by Full Text Indexing, but would still be there in the string of text.)
    SELECT *
    FROM MyTextTable T
    JOIN #FTSfulltextkey K
    ON T.fulltextkey = k.fulltextkey
    WHERE T.Description like '%this is the incident%'
    Full text search is powerful, but it has limits. And the behaviour changes depending on the Language of the search.
    RLF

  • Search for a phrase rather than a single word in speech analysis text?

    Is it possible to search for a phrase rather than a single word in speech analysis text?

    Did you try Apache POI?
    It's here:
    http://jakarta.apache.org/poi/

  • SQL Server Free Text Search with multiple search words inside a stored procedure

    I am trying to do a free text search. basically the search string is being sent to a stored procedure where it executes the free text search and returns the result.
    If I search for red
    flag, I want to return the results that matches both red and flag text.
    Below is the query I use to return the results.
    select * from customer where FREETEXT (*, '"RED" and "flag"')
    This doesn't give me the desired result. Instead this one give the desired result.
    select * from customer where FREETEXT (*, 'RED') AND FREETEXT (, 'FLAG')
    My problem is since it's inside a stored procedure, I will not be able to create the second query where clause. I thought both query should return the same result. Am I doing something wrong here?

    I am moving it to Search.
    Kalman Toth Database & OLAP Architect
    IPAD SELECT Query Video Tutorial 3.5 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012

  • Need sample code to do text search using boolean operators AND OR + -

    I'm looking for an algorithm doing text searches in files
    I need it to support AND OR + - keywords (for example "ejb AND peristence")
    Does anyone knows where I cand find this kind of algorithm with the full source ?
    Of course I can adapt C,C++ to Java.
    In fact my target language is Serverside javascript (sorry) so I prefere rather low level solutions !
    Any help will be grealy appreciated and the resulting code will be posted
    here and on my website : http://www.tips4dev.com

    Firstly, a little note to the technical solution: what you probably need the most is speed. I may sound strange, but personally I am convinced that if you could use system tools a naive algorithm:for i:=1 to m do
    grep (word)
    od; whose complexity is O(m.n), where m is the number of words to be processed and n somehow represents the cardinality of the text to-be-sought-through, so this naive algorithm would actually be in 99% of cases much faster than any implementation of the algorithm below, whose complexity is O(m+n), because the implementation of the grep routine (O(n)) would be optimized and m will be low (who queries 153 words at once?)
    Anyway, you asked for an algorithm and you'll have it. It is quite elegant.
    Aho, A.V. - Corasick, M.V.: Efficient String Matching - An Aid to Bibliographic Search, Communication of the ACM, 1975 (vol. 18), No. 6, pg. 333-340
    [i]The task: let's have an alphabet X and a string x = d1d2...dn (d's are characters from X) and a set K = {y1, ... ym} of words, where yj = tj,1 ... tj,l(j) (t's are again characters from X).
    Now we search for all <i, yp> where yp is the suffix of d1...di (occurences of the word yp in x)
    (note: if you want to search for the whole words tj,1 and tj,l(j) must be blanks)
    The idea of the algorithm is that we first somehow process words yp to construct a search machine and with this machine we will loop through X to search for occurrences of all the words at once.
    Example:
    K = {he, she, his, hers}
    X = ushers
    search machine M(Q - set of states, g - "step forward" function, f - "step back" function, out - reporting function):
    (function g)
    0 (initial state)  h-> state 1 e-> state 2 r-> state 7 s-> state 8 ... for {he, hers}
    state 1 i-> state 6 s-> state 7 ... for {his}
    state 0 s-> state 3 h-> state 4 e-> state 5 e ... for {she, he}
    And for all the characters is defined 0  x -> 0
    Now, in
    (function out)
    state 2: report {he}
    state 5: report {she, he}
    state 7: report {his}
    state 9: report {hers}
    "Step back" function f for this particular set of word would be:
    9 -> 3, 7 -> 3, 5 -> 2, 4 -> 1 otherwise the machine would return to the initial state 0
    Processing of ushers will look like:
    <0,0> u-stay in the state 0 <1,0> s-move to state 3 <2,3>, <3,4>, <4,5> state 5-report (he, she}, cannot move forward -> must step back (like if "he" was received) <4,2> r-move to state 8, <5,8>, <6,9>
    Before we show how to construct the searching machine M (Q,g,f,out) let�s consider the algorithm how to use it:
    Alg 1.begin
    state:= 0;
    for i = 1 to n do
    //if cannot move forward, move back
    while g(state, di) not defined do state:=f(state) od;
    //move forward to a new state
    state:=g(state, di);
    //report all the words represented by the new state
    for all y from out(state) do Report(i,y) od;
    od
    end.
    Alg 2. � build of the �step forward� function g and an auxiliary function o that will be later used for the construction of outvar q:integer;
    procedure Enter(T1�Tm);
    begin
    s:=0; j:=1;
    //processing a prefix of a new word that is a prefix of an already processed word too
    while j&ltm and g(s,Tj) defined do
    s:=g(s,Tj); j:=j+1;
    od;
    while j&ltm do
    q:=q+1; //a new state � global variable
    define g(s,Tj) = q; //definition of a single step forward
    s:=q;
    j:=j+1;
    od;
    //the last state must be a state when at least the processed word is reported
    define o(s) = [T1, � Tm];
    end;
    begin
    q:=0; //initial state
    for p:= 1 to k do Enter(yp) od;
    for all d from the alphabet X do
    if g(0,d) not defined then define g(0,d) = 0 fi
    od
    end. Alg 3. � build of the �step back� function f and the reporting function outcreate an empty queue
    define f(0) = 0; out(0) = {} //an empty set � we expect words of the length 1 at least
    for all d from X do
    //process children of the initial state
    s:=g(0,d);
    if s!=0 then
    define f(s) = 0; //1-character states, if we throw away the first character we return to the initial
    define out(s):=o(s); //report 1-character words, if any
    move s at the end of the queue
    fi
    od
    while queue not empty do
    r:= the first member of the queue; remove r from the queue;
    for all d from X do //process all the children of r
    if g(r,d) defined then
    s:= g(r,d); //get a child of r
    t:= f(r); //f(r) has already been defined
    while g(t,d) not defined do t:=f(t) od;
    //we found a state from which g(t,d) has sense
    define f(s) = g(t,d);
    define out(s) = o(s) UNION with out(f(s));
    move s at the end of the queue;
    fi
    od
    od
    Processing of a query � normal forms
    Until now we have solved the problem how to search for multiple words in a text at once. The algorithm returns not only not only whether a word was found or not, but also where exactly a word can be found � all the occurrences and their locations.
    However, the initial task was slightly different: procession of a query like �X contains (y1 AND/OR y2 � yn)� In order to decide a question like that it might not be necessary to find all the occurrences of given words, actually not even an occurrence of all the words (e.g. word1 OR word2 is fulfilled as soon as either word1, or word2 is found).
    Let�s suppose that a searching query is given in its disjunctive normal form (DNF):
    A1 OR A2 OR ...Ak where each of Ax = B1 AND B2 AND ...Bkx and Byz is a statement �X contains yp�
    Now, the query is successful whenever any of Ax is fulfilled.
    (I don�t know how much you know about transformation of a logical formula to its disjunctive form � it is quite a famous algorithm and can be found in any textbook of logic or NP-completeness. I hope that evaluation of the formula, which is what happens in the procedure Report of the algorithm Alg. 1, is trivial.)

  • Inconsistent  Full Text Search Results

    I have built quite a comprehensive JavaHelp system, but seem to be having problems with the full text searching.
    Eg Typeing in "Start" will bring back "Starting Transformation Manager" but not "StartsWith". Both HTML files seem to have the same structure.
    I have been generating my help using the Helen software. I downloaded JavaHelp1.1.3 thisa fternoon and also generated the index database using jhindexer. This did not solve the problem.
    Has anyone had a similar problem
    Helen

    Hello...
    I am at the point where I too have this problem of text searching. I put in hex and only exact matches of "words" of hex are displayed. "hexidecimal" is not.
    How did you get around this problem? Any hints or suggestions would be greatly appreciated.
    Thank you.
    Mike

  • Multiple words don't work in search field

    When I enter multiple keywords in any search field nothing shows up.
    Let's say I'm looking for pink flowers. If I enter 'flowers' I see all my flower shots, as soon as I enter a comma, all photos disappear. If I go ahead and add 'pink', I still don't see anything. This happens in any search field whether in the browser or within a HUD.
    If I use a HUD, add text fields and enter 'flowers' and 'pink', think I see pink flowers. If I use the keyword checklist in the HUD, I see pink flowers. But I can never use multiple words separated by commas.
    What's wrong? I've checked the Aperture Manual and can't figure out why the multiple word method doesn't work.

    I always use the keyword search panel in the HUD when I am looking for keyworded images...multiples work fine there. I suspect the "text" search field is parsed EXACTLY as it says, examining a string and looking for a match anywhere in the metadata. My suspicion is that the comma is being parsed as JUST THAT, a comma, rather than a search field separator.
    My 2¢
    cheers,
    david

  • Different behavoiur of word boundary pattern \b in JDK1.4 and JDK1.5

    I noticed that word boundary pattern '\b' behaviour in jdk1.5.0 beta 1 differs from standard expected regular expression behaviour, that presents for example in jdk1.4
    There is simple test code BoundaryTest.java below:
    import java.util.regex.*;
    public class BoundaryTest {
         public static void main(String[] args) {
              String testString = new String("word1 word2 word3");
              System.out.println("Test string: " + testString);
              Pattern p = Pattern.compile("\\b");
              Matcher m = p.matcher(testString.subSequence(0,testString.length()));
              int position = 0;
              int start = 0;
              while (m.find(position)){
                   start = m.start();
                   if (start == testString.length() ) break;
                   if (m.find(start+1)){
                        position = m.start();
                   } else {
                        position = testString.length();
                   System.out.println(testString.substring(start, position));
    }After compiling (in jdk1.5 or jdk1.4) one could get the next results:
    >...\jdk1.4\bin\java BoundaryTest
    Test string: word1 word2 word3
    word1
    word2
    word3
    And it is usual beahaviour, but in jdk1.5 we have:
    >..\jdk1.5\bin\java BoundaryTest
    Test string: word1 word2 word3
    w
    o
    r
    d
    1
    w
    o
    r
    d
    2
    w
    o
    r
    d
    3
    Seems that '\b' works just like '\w' in JDK1.5.
    Is it a bug of new JDK or just some new feature?

    To be honest, I have no idea if that the case, I am new to Java so I only had 1.4.2 for about a week before removing it and installing 1.5.0 beta... Buy the code bellow now shows the out put you desired...
    import java.util.regex.*;
    public class BoundaryTest {
         public static void main(String[] args) {
         String testString = new String("word1 word2 word3");
              System.out.println("Test string: " + testString);
              Pattern p = Pattern.compile("\\b");
              Matcher m = p.matcher(testString.subSequence(0,testString.length()));
              int position = 0;
              int start = 0;
              String datastring = ""; // initializes with nothing
              while (m.find(position)){
                   start = m.start();
                   if (start == testString.length() ) break;
                   if (m.find(start+1)){
                        position = m.start();
                   } else {
                        position = testString.length();
                   datastring += testString.substring(start,position); // adds to string
              System.out.println(datastring); // out puts legible string
    }Feel free to ignore anything I say as to this matter since as mention above, I can not test it with an earlier version since that one is now gone...
    -MaxxDmg...
    - ' Aye, Its Bright out... Where me put me Ale...'

  • Full text search for web ? Yes or no ?

    Hi,
    I have a DB that has more then 1.8mil records in a single table .... and would like to implement full text search or some sort of caching for quicker Web search ....
    Let me describe you what I have .... The table that holds 1.8mil records is made out of 30 clob columns ... each holding text .... actually these are alphabetic columns ... so words that start with char 'A' are in the first clob ... 'B' in second 'C' in third and so forth ....
    Searching is always done first by customerID and CreateDate which are both indexed columns , and then clobs are searched using instr ...
    Execution plan was good ... but searching times started to increase ....
    So therefor I would like to improve the search ... by implementing some sort of caching mechanism ....
    I read a lot about this and found an example where I would create a table containing unique words and table for occurrences of the words ... but this would then
    make like 1.8mil articles containing approc 500 words , which would then repeat through articles ... so ok there would be less then 50.000 unique words (in our language ) , but the occurrences would dramatically increase cause every word inside article has to have a link in occurrences table ... so this would then be like 900mil records inside table ..
    Is this at all possible to have so many records inside single table ? And still make it quick ?
    Is the Oracle Full text search the only right way in this situation ?
    Any suggestions ? Did anyone implement anything like this ?
    Thanks,
    Kris

    Let's start with your Oracle version. Please specify which version you run because Text capabilities vary dramatically between releases.
    >
    I tried using Oracle Text as suggested ... now if I understand correctly ....
    CTXCAT - would be great because when new records are added, index is updated automatically .... but doesn't support CLOBs ... so no go
    >
    CTXCAT is a concatenated transactional index that is supposed to optimize combined searches on text and other columns. No go for you as it indeed does not support CLOB columns.
    >
    CONTEXT - supports CLObs , but I need to explicitly synchronize index ....
    There are like 4000 inserts per day ..... and they all need to be indexed in a real-time ...
    >
    Not true, at least since 10g: SYNC(ON COMMIT) parameter makes this index type transactional (it's synchronized automatically on commit with this parameter set.)
    >
    If CTX_DLL.SYNC_INDEX procedure synchronize the whole table which is now 1.8mil records, this can take a while ... so it can't be run after inserts ....
    >
    It does not, it only synchronizes changed data since last sync operation.
    So CONTEXT is actually perfectly suited for your needs (just redesign those 30 columns into one document column and index it.) Note that you need to regularly maintain CONTEXT indexes by scheduling CTX_DDL.OPTIMIZE_INDEX to run at off-hours and purge stale/removed data and rebuild its own internal index bitmaps for better performance. Otherwise you will see performance degrade as changes to the indexed data accumulate. You might also want to tweak initial indexing parameters, especially MEMORY parameter, as it greatly affects resulting index fragmentation - the more memory you give for initial indexing or optimization, the less fragmented and the more performant the index will be all other things equal.

  • Full-Text search is not working with PDF files - SQL Server 2012 64 bit

    Hi,
    We are in the process of storing PDF files in SQL Server 2012 with Full-Text search capability.
    I followed the steps as below and it works fine with word document but not for PDF files. I tried with PDF ifiler 11 & 9 and both are unsuccessful.
    Server/DB Level Settings:
    1)
    Enable FileStream
    2)
    Install Full-Text
    then restart
    3)
    Use [specific db]
    alter
    database [db name]
    add
    filegroup Files
    contains filestream;
    alter
    database [db name]
    add
    file (
    name = N'Files',
    filename =
    N'D:\SQL\DATA') to
    filegroup [Files];
    3)
    Database level
    Settings:
    FileStream:
    FileStream
    Directory name:
    [Set the name]
    FileStream
    non-transacted
    Access: [set Appropriate]
    3a)
    Add a
    datafile to DB
    with filestreamdata
    filetype.
    4)
    Share D:\SQL\DATA
    directory and
    add specific accounts
    with read/write
    access
    5)
    Give bulkadmin
    access to those
    specific accounts
    at server
    level
    6)
    From the
    page (link)
    download and
    install the *.pdf
    IFilter for
    FTS. Link:
    http://www.adobe.com/support/downloads/detail.jsp?ftpID=5542
    7)
    To the
    PATH global system
    variable add
    path to the
    catalog,
    where you installed
    the plugin.
    Default for
    this version is:
    C:\Program
    Files\Adobe\Adobe
    PDF iFilter 9
    for 64-bit
    platforms\bin
    8)
    From the
    page (link)
    download a
    FilterPackx64.exe
    and install
    it. Link:
    http://www.microsoft.com/en-us/download/confirmation.aspx?id=20109
    9)
    Now from
    SSMS execute the following
    procedures:
    -sp_fulltext_service
    'load_os_resources',1
    -sp_fulltext_service
    'verify_signature', 0
    EXEC
    sp_fulltext_service
    'update_languages';
    -- update language list
    EXEC
    sp_fulltext_service
    'restart_all_fdhosts';
    -- restart daemon
    reconfigure
    with override;
    10)
    Restart the
    server
    11)
    select document_type,
    path from
    sys.fulltext_document_types
    where document_type
    = '.pdf'
    -select
    document_type,
    path from sys.fulltext_document_types
    where document_type
    = '.docx'
    12) Results are OK.
    Following is my Table /Index/ catalog script:
    CREATE
    TABLE dbo.DocumentFilesTest
    DocumentId  INT
    IDENTITY(1,1)
    NOT NULL
    PRIMARY KEY,
    AddDate datetime
    NOT NULL,
    Name nvarchar(50)
    NOT NULL,
    Extension nvarchar(10)
    NOT NULL,
    Description nvarchar(1000)
    NULL,
    FileStream_Id UNIQUEIDENTIFIER
    ROWGUIDCOL NOT
    NULL UNIQUE DEFAULT
    NEWSEQUENTIALID(),
    FileSource varbinary(MAX)
    FILESTREAM DEFAULT(0x)
    go
    --Add default add date for document   
    ALTER
    TABLE dbo.DocumentFilesTest
    ADD CONSTRAINT
    DF_DocumentFilesTest_AddDate
    DEFAULT sysdatetime()
    FOR AddDate
    EXEC
    sp_fulltext_database
    'enable'
    GO
    IF
    NOT EXISTS
    (SELECT
    TOP 1 1 FROM sys.fulltext_catalogs
    WHERE name
    = 'Ducuments_Catalog_test')
    BEGIN
    EXEC sp_fulltext_catalog
    'Ducuments_Catalog_test',
    'create',
    'D:\SQL\PDFBlob';
    END
    --EXEC sp_fulltext_catalog 'Ducuments_Catalog_test', 'drop'
    DECLARE
    @indexName nvarchar(255)
    = (SELECT
    Top 1 i.Name
    from sys.indexes
    i
    Join sys.tables
    t on 
    i.object_id
    = t.object_id
    WHERE t.Name
    = 'DocumentFilesTest'
    AND i.type_desc
    = 'CLUSTERED')
    PRINT @indexName
    EXEC
    sp_fulltext_table
    'DocumentFilesTest',
    'create',
    'Ducuments_Catalog_test', 
    @indexName
    EXEC
    sp_fulltext_column
    'DocumentFilesTest',
    'FileSource',
    'add', 0,
    'Extension'
    EXEC
    sp_fulltext_table
    'DocumentFilesTest',
    'activate'
    EXEC
    sp_fulltext_catalog
    'Ducuments_Catalog_test',
    'start_full'
    ALTER
    FULLTEXT INDEX
    ON [dbo].[DocumentFilesTest]
    ENABLE
    ALTER
    FULLTEXT INDEX
    ON [dbo].[DocumentFilesTest]
    SET CHANGE_TRACKING
    = AUTO
    ALTER
    FULLTEXT CATALOG
    Ducuments_Catalog_test REBUILD
    WITH ACCENT_SENSITIVITY=OFF;
    INSERT
    INTO DocumentFilesTest(Extension,
    Name,
    FileSource)
    SELECT
     'pdf'
    'BOL12006553.pdf'
    * FROM
    OPENROWSET(BULK
    'd:\SQL\PDFBlob\BOL12006553.pdf',
    SINGLE_BLOB)
    AS BLOB;
    GO
    INSERT
    INTO DocumentFilesTest(Extension,
    Name,
    FileSource)
    SELECT
     'docx'
    'test.docx'
    * FROM
    OPENROWSET(BULK
    'd:\SQL\PDFBlob\test.docx',
    SINGLE_BLOB)
    AS Document;
    GO
    SELECT
    d.*
    FROM dbo.DocumentFilesTest
    d WHERE
    Contains(d.FileSource,
    'BILL')
    Returns nothing. it should come from PDF file
    SELECT
    d.*
    FROM dbo.DocumentFilesTest
    d WHERE
    Contains(d.FileSource,
    'TEST')
    Returns from word document as follows:
    2           2014-06-04 10:11:41.393            test.docx docx           
    NULL   [BINARY Value]  [Binary Value]
    Any help is appreciated. Its been a long wait.
    Thanks,
    Vel
    Vel Thavasi

    Hello,
    Did you check the fulltext log files for more details about the errors. If the filter isn’t working, there should be errors in the error log file.
    The following thread is about similar issue, please refer to:
    http://social.msdn.microsoft.com/forums/sqlserver/en-US/69535dbc-c7ef-402d-a347-d3d3e4860d72/sql-server-2008-64bit-fulltext-indexing-pdf-not-working-cant-find-ifilter
    Regards,
    Fanny Liu
    If you have any feedback on our support, please click here.
    Fanny Liu
    TechNet Community Support

  • Apex 3.1, Interactive Report Row Text Search, image bitmap as TEXT?

    I think this IR thing is powerful which could save me lots of time in development.
    One question: does the row text search(default: all columns) treat image column as regular text(string)? I did the following search on:
    SAMPLE APPLICATION-->Products, I put 300 in the search column( for $300 list-price search), the search produces 3 lines( should only have 2). the 3rd line's list price is $1999, I looked it in SQL*PLUS and saw its image bitmap (long string) includes a "300" inside, so I believe the "default all columns search" treat image as regular string.
    How can I avoid the image bitmap search included in IR? This bitmap strings are very long for each image and can EASILY match searching conditions for something like PRODOUCT DESCRIPTION, PRODUCT PRICE for our products data( about 25000)? thanks
    sean

    Sean / Russell,
    Thanks for reporting this, it's certainly a bug.
    By the way, the search is performed in SQL, on whatever column values are being displayed (run the page in debug mode to see the full SQL). So in the case of the sample application, it is not matching the image bitmap, but the image size, which is selected in the SQL. The bug is that the full search should not include columns which have filtering disabled or one of the special image format masks. We'll try to fix this for an upcoming patch.
    Thanks,
    Marco

  • Oracle text search - special characters issue

    Hi.
    I'm facing a real annoying problem with text search query, and everything I've tried failed...
    I have a table with a varchar column indexed by text index. The column contains special characters like '&', ',' and mainly- '-'. Since I want to disregard these special characters for searches I have created a basic lexer of type skipjoins for the column index. So now, the phrase 'aaa-bbb something'. for example, can be searched without '-', like this: 'aaabbb'. But I want to make it possible for this phrase to be searched with and without '-'. So, that when the user enters 'aaabbb' he will get the same results as when he enters 'aaa-bbb'.
    In other words, This condition:
    WHERE CONTAINS(column, '<query> <textquery grammar="context"> <progression><seq>'
    ||'aaabbb'
    ||'</seq></progression> </textquery> </query> ' ,1)> 2
    Will return the same results as this condition:
    WHERE CONTAINS(r.POI_NAME, '<query> <textquery grammar="context"> <progression><seq>'
    ||'aaa-bbb'
    ||'</seq></progression> </textquery> </query> ' ,1)> 2
    Since text query treats the '-' sign as a minus sign and searches for 'aaa' which doesn't contain 'bbb', the only way I found to fix this was to wrap the search text with {}. like this:
    WHERE CONTAINS(r.POI_NAME, '<query> <textquery grammar="context"> <progression><seq>'
    ||'{aaa-bbb}'
    ||'</seq></progression> </textquery> </query> ' ,1)> 2
    This all went very well, until I wanted to create a relaxation query. like this:
    WHERE CONTAINS(r.POI_NAME, '<query> <textquery grammar="context"> <progression><seq>'
    ||'{aaab}'
    ||'</seq><seq>'
    ||'{aaab}'
    ||'%</seq></progression> </textquery> </query> ' ,1)> 2
    In this case, I would expect the first part of the query to return no results (since it's not the whole word) but the second part, using '%' should have returned the record of 'aaa-bbb'. It doesn't. It will only return my result if I remove the '{}' for the second part. I can't do that, because the exact same search, when containing '-', will not return the expected results when I remove the braces (the sign is treated as minus sign):
    WHERE CONTAINS(r.POI_NAME, '<query> <textquery grammar="context"> <progression><seq>'
    ||'{aaab}'
    ||'</seq><seq>'
    ||'aaa-b'
    ||'%</seq></progression> </textquery> </query> ' ,1)> 2
    So I now have no solution. My question is- How can I create a query that will disregard the minus sign and treat it as a regular sign, but would still handle percentage sign as a special sign. So that I could run a query like the last example and will get the results of searching the phrase 'aaa-b%'?
    In short, and to simplify my question, I'm looking for a way to escape all characters (not only the minus sign) except for a specific character. Kind of like 'unescaping' a specific character (the '%' sign) within braces {}. Or, another way would be to remove the space that is added to the phrase inside the braces at the end of the word, preventing me from adding "%" at the end of the word, outside the braces.
    Thanks you,
    Nili

    I'm looking for a way to escape all characters (not only the minus sign) except for a specific character. Kind of like 'unescaping' a specific character (the '%' sign) within braces {}What about if you apply a function like regexp_replace to escape all known "specific characters", and then unescape the particular specific character again back as e.g. in
    SQL>  select 'a.da-df%df*' str, replace (
                                                   regexp_replace (
                                                        'a.da-df%df*',
                                                        '([[:punct:]])',
                                                        '\\\1'
                                                   str2
      from dual
    STR         STR2         
    a.da-df%df* a\.da\-df%df\*
    1 row selected.i.e. don't escape with curly brackets but with the backslash character.
    You can then use this string in your query like in
    WHERE CONTAINS(r.POI_NAME, '<query> <textquery grammar="context"> <progression><seq>'
    ||'aaab'
    ||'</seq><seq>'
    ||'aaa\-b'
    ||'%</seq></progression> </textquery> </query> ' ,1)> 2

  • Full-Text Search has not worked since we upgraded to 2012

    I have a filestream database and table. Our full-text searches have always worked until we upgraded to SQL 2012 in December.  Now, no file that has been uploaded since December is searchable.  What has gone wrong here.  It should have been
    a clean upgrade.  We are not getting any error messages.  We are just not getting any records returned when we search on a word that we know are in the documents we've uploaded since December (for instance, the word 'aluminum'). 
    Filestream is enabled for the instance.
    A full-text catalog exists and contains a full-text index (the same one we've always had).  Full-text indexing is ENABLED.
    I've tried rebuilding the catalog and the index.  I've tried to do a FULL POPULATION on the table.
    We haven't changed our queries nor the way the files are uploaded.
    Nothing works.  I have been a database administrator since the SQL 2005 days and I have never seen anything like this.
    Please help.

    Hi GINGER PIERCE,
    Since the issue regards SQL Server Search. I will help you post the question in the related forums. It is appropriate and more experts will assist you.
    According to your description, in theory , if you can do a full-text search for SQL Server 2008, when upgrading the SQL Server version from 2008 to 2012, the Full text indexing feature should be run well in SQL Server 2012 databases. If not , you can try
    to restore your database from SQL Server 2008 to SQL Server 2012, create an new Full-Text Catalog and Index on the table or view in the database, and then use Full Text Index to search words, phrases and multiple forms of word or phrase via FREETEXT() and
    CANTAINS() with “and” or “or” operators.  check if it is normal that the full-text search feature is enabled in the SQL Server 2012 instance. For more information , see:
    Full Text Search step by step in SQL Server 2012.
    Note:  In SQL Server 2012 SP1 , the server will report that Full Text Search is not supported in this edition of SQL Server when it clearly is. The workaround is to create the initial catalog by using a T-SQL query:
    CREATE FULLTEXT CATALOG  
    In addition, since it is a fileStream database, we need to verify if you do Full Text Searches on documents in FileTables, if yes, you should enable FileStream database for your SQL Server, and enable FileTable options for the database. For more information,
    see:
    Full Text Searches on Documents in FileTables.
    Regards,
    Sofiya Li
    If you have any feedback on our support, please click here.
    Sofiya Li
    TechNet Community Support

Maybe you are looking for