Fuzzy Searches

Is there anywhere I can find the algorithm Oracle uses for the CONTEXT fuzzy search (as in SELECT surname from person_source where
contains(surname,'?Humphrey') > 0;
I would like to build a function for use outside CONTEXT that incorproates the same algorithm.
Fran

This is more information about our scenario:
We have two groups in the datastore:
concat:
1.) hierarchy:(example text) 321826 325123 543123
2.) page: Actual document text.
321826 325123 543123 represents ids in a hierarchy structure. As you move from left to right the number of times the number occurs is less so there should be less exact matches.
Example: In this index all pages have 321826 as the first value. A few pages have 543123 and all others will have some other number as the last value.
if I do this query:
contains(concat,(321826 within hierarchy ) and ('personnel') within page)
it takes about 10 seconds because it 321826 will hit all pages.
if I do this query:
contains(concat,(543123 within hierarchy ) and ('personnel') within page)
it takes only about 1 second because it 543123 will hit just a few pages.
BUT:::::::
Fuzzy search....
if I do this query:
search A.) contains(concat,(321826 within hierarchy ) and ?('personnel') within page)
it takes about 30 seconds because it 321826 will hit all pages. This is okay for performance for this.
BUT if I do this query:
search B.) contains(concat,(543123 within hierarchy ) and ?('personnel') within page)
it takes about 30 seconds even though 543123 will hit only a few pages.
This should be faster than 30 seconds because you're searching over only a fraction of material for the fuzzy search part.
We've played with different variations on the () and the '' but nothing seems to change this.
Any advice on how to make search B.) faster??
We don't understand why see the different speeds in the exact match and we DON'T see the different speeds in the fuzzy search...
I can send you some test data with the index and query scripts if you want.
Our indexes are on large tables (2,000,000) rows.
TIA
Colleen Geislinger.

Similar Messages

Trying to pass a variable to fuzzy search

I'm trying to write code like this:
   for x in 1 .. 6 loop
       v_searchword := listgetat(replace(p_searchphrase,',',' '),x,' ');
       for c1 in (select * from
                      (select score(1) as score, searchterms, suggestions from suggestions_table
                       where contains(searchterms,'fuzzy({'||v_searchword||'},,,weight)',1)>0
                       order by score desc)
                   where rownum < 10) loop
        end loop;
   end loop;Someone passes in a long search phrase. I separate it into words and take up to the first 6. The set of words is looped through. Each word in turn is assigned to the v_searchword variable. I then do an Oracle Text fuzzy search on that word. The above code, however, gives me an Oracle Text parser error (DRG-50901: text query parser syntax error...).
I've modified the code so that the all-important line reads *where contains(searchterms,'fuzzy({v_searchword},,,weight)',1)>0*, and whilst that doesn't produce a syntax error, it doesn't produce any results, either! Words that I know will generate suggestions when I do a manual fuzzy search in plain SQL (such as "womman" and "tomartoe") don't generate anything in this case, because (I think) instead of searching for 'womman' or 'tomartoe' it's actually just searching for the word 'v_searchword' each time.
Could someone tell me how to write my code so that the correct word is passed into the contains function each time, please? It seems syntactically not very difficult, but I'm stumped!

If any value for v_searchword is null it would result in an invalid syntax, searching for {}. This would happen if there was no such element, such as no sixth word in a string of five words. You might also want to remove duplciate spaces from the string. Please see the demonstration below that first reproduces, then corrects the error simply by adding a condition that v_searchword is not null.
SCOTT@orcl_11g> create table suggestions_table
2    (searchterms varchar2 (30),
3      suggestions varchar2 (20))
4 /
Table created.
SCOTT@orcl_11g> insert all
2 into suggestions_table values ('woman', null)
3 into suggestions_table values ('women', null)
4 into suggestions_table values ('tomato', null)
5 into suggestions_table values ('tomatoes', null)
6 select * from dual
7 /
4 rows created.
SCOTT@orcl_11g> create index your_index
2 on suggestions_table (searchterms)
3 indextype is ctxsys.context
4 /
Index created.
SCOTT@orcl_11g> CREATE OR REPLACE FUNCTION listgetat
2      (p_string    VARCHAR2,
3       p_element   INTEGER,
4       p_separator VARCHAR2 DEFAULT ' ')
5      RETURN          VARCHAR2
6 AS
7    v_string      VARCHAR2 (32767);
8 BEGIN
9    -- ensure there are starting and ending separators:
10    v_string := p_separator || p_string || p_separator;
11    -- remove all double separators:
12    WHILE INSTR (v_string, p_separator || p_separator) > 0 LOOP
13       v_string := REPLACE (v_string, p_separator || p_separator, p_separator);
14    END LOOP;
15    -- check if element exists:
16    IF LENGTH (v_string) - LENGTH (REPLACE (v_string, p_separator, '')) >
17        LENGTH (p_separator) * p_element
18    THEN
19       v_string := SUBSTR (v_string,
20                     INSTR (v_string, p_separator, 1, p_element)
21                     + LENGTH (p_separator));
22       RETURN SUBSTR (v_string, 1, INSTR (v_string, p_separator) - 1);
23    ELSE
24       RETURN NULL;
25    END IF;
26 END listgetat;
27 /
Function created.
SCOTT@orcl_11g> -- reproduction of error:
SCOTT@orcl_11g> create or replace procedure test_proc
2    (p_searchphrase     in varchar2)
3 as
4    v_searchword    varchar2 (100);
5 begin
6      for x in 1 .. 6 loop
7          v_searchword := listgetat(replace(p_searchphrase,',',' '),x,' ');
8
9          for c1 in (select * from
10                   (select score(1) as score, searchterms, suggestions from suggestions_table
11                    where contains(searchterms,'fuzzy({'||v_searchword||'},,,weight)',1)>0
12                    order by score desc)
13                where rownum < 10) loop
14             dbms_output.put_line
15            (lpad (c1.score, 3) || ' ' ||
16             rpad (c1.searchterms, 30) || ' ' ||
17             v_searchword);
18           end loop;
19      end loop;
20 end test_proc;
21 /
Procedure created.
SCOTT@orcl_11g> show errors
No errors.
SCOTT@orcl_11g> exec test_proc ('womman,and,tomartoe')
38 woman                          womman
25 women                          womman
29 tomato                         tomartoe
26 tomatoes                       tomartoe
BEGIN test_proc ('womman,and,tomartoe'); END;
ERROR at line 1:
ORA-29902: error in executing ODCIIndexStart() routine
ORA-20000: Oracle Text error:
DRG-50901: text query parser syntax error on line 1, column 8
ORA-06512: at "SCOTT.TEST_PROC", line 9
ORA-06512: at line 1
SCOTT@orcl_11g> -- correction of error:
SCOTT@orcl_11g> create or replace procedure test_proc
2    (p_searchphrase     in varchar2)
3 as
4    v_searchword    varchar2 (100);
5 begin
6      for x in 1 .. 6 loop
7          v_searchword := listgetat(replace(p_searchphrase,',',' '),x,' ');
8          -- check if xth word exists:
9          if v_searchword is not null then
10            for c1 in (select * from
11                     (select score(1) as score, searchterms, suggestions from suggestions_table
12                      where contains(searchterms,'fuzzy({'||v_searchword||'},,,weight)',1)>0
13                      order by score desc)
14                  where rownum < 10) loop
15            dbms_output.put_line
16              (lpad (c1.score, 3) || ' ' ||
17               rpad (c1.searchterms, 30) || ' ' ||
18               v_searchword);
19             end loop;
20          end if;
21      end loop;
22 end test_proc;
23 /
Procedure created.
SCOTT@orcl_11g> show errors
No errors.
SCOTT@orcl_11g> exec test_proc ('womman,and,tomartoe')
38 woman                          womman
25 women                          womman
29 tomato                         tomartoe
26 tomatoes                       tomartoe
PL/SQL procedure successfully completed.
SCOTT@orcl_11g>

Help with fuzzy search (doesn't work if change order of certain 2 letters)

Hi,
need some help with fuzzy search. It's pretty simple - we use fuzzy search on varchar2 columns that contain first name and last_name. The problem is that i don't really understand why it can't find name in some cases.
Say i want to search for 'Taekpaul'. Then
where CONTAINS(first_name,'fuzzy(TAEKPAUL)',1) > 0 - works
where CONTAINS(first_name,'fuzzy(TAEKPALU)',1) > 0 - works (changed order of the 2 last letters)
where CONTAINS(first_name,'fuzzy(TEAKPAUL)',1) > 0 - doesn't work, finds 'Tejpaul' that is completely unrelated (changed 2nd, 3rd order)
How can i make it find 'Taekpaul' even if i search for TEAKPAUL? Is it related to index? Like Text index should be created with some different parameters?
Thanks!
Edited by: Maitreya2 on Mar 3, 2010 2:08 PM

Thanks, adding '!' worked :)
Do you know where i can read more about '!' and other special characters? I think i didn't see anything like that here: http://download.oracle.com/docs/cd/B14117_01/text.101/b10730/cqoper.htm#BABBJGFJ
I also started using JARO_WINKLER_SIMILARITY function that is actually better i think for what i do. But it's very buggy - sometimes Oracle crashes and kills connection when you try to use it.
Ahha, it's here: http://download.oracle.com/docs/cd/B19306_01/text.102/b14218/cqspcl.htm
So, ! is soundex. Whatever it means..
Edited by: Maitreya2 on Mar 5, 2010 12:14 PM

Fuzzy search not returning results?

I'm executing a phonetic search on the nm_resource column. my application allows a call center employee to search on the resource name (nm_resource), if the resource is not found then they will enter a new one. The problem is someone may have already entered the resource name but spelled it incorrectly resulting in duplicate records for the same resource name. To enable the call center to retrieve records that may have the same sound but are spelled differently we have implemented the fuzzy search capability of Oracle text. Things have been going very nicely for the most part with the exception of this one issue we're trying to understand.
Using the query below we're searching for the resource name "rosies" the actual record in the database was entered as "rosy's". the search returns (rosies,rosie's,rosys) and does not return ---> rosy's <--- the record i'm interested in
it is reasonable to expect rosy's to be returned in the result set? my query should retunn the max fuzzy expansions and all fuzzy scores.
select score(1), nm_resource, ADDR_RSRC_ST_LN_1, id_resource, ADDR_RSRC_CITY FROM caps_resource where
CONTAINS (nm_resource,'fuzzy(rosies, 0, 5000, weight)',1)>0
union
select /*+index(caps_resource ind_caps_resource_8)*/ 10, nm_resource, ADDR_RSRC_ST_LN_1, id_resource, ADDR_RSRC_CITY from caps_resource
where NM_RESOURCE_UPPER like upper(replace(replace('%' || 'rosies' || '%',' '), '-'))
and rownum<500 order by 1 DESC;
any help explaining this is much appriciated.
Regards,

When you index "Rosy's", by default it sees the apostropohe as a delimiter and tokenizes and indexes "Rosy" and "s" separately. So, you could only find it by searching the singular form or the singular form obtained by using stemming. However, if you set the apostrophoe as a skipjoin, then it tokenizes and indexes "Rosys" as one token that you can then search for that using "rosies":. Please see the demonstration below. You might also be interested in soundex, which can be used with Oracle Text, or the functions in the utl_match package or metaphone.
SCOTT@orcl_11g> CREATE TABLE caps_resource
2    (nm_resource VARCHAR2 (30))
3 /
Table created.
SCOTT@orcl_11g> INSERT ALL
2 INTO caps_resource VALUES ('Rosy''s')
3 SELECT * FROM DUAL
4 /
1 row created.
SCOTT@orcl_11g> SELECT * FROM caps_resource
2 /
NM_RESOURCE
Rosy's
SCOTT@orcl_11g> CREATE INDEX your_text_idx ON caps_resource (nm_resource)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS
4      ('STOPLIST CTXSYS.EMPTY_STOPLIST')
5 /
Index created.
SCOTT@orcl_11g> SELECT token_text FROM dr$your_text_idx$i
2 /
TOKEN_TEXT
ROSY
S
SCOTT@orcl_11g> SELECT * FROM caps_resource
2 WHERE CONTAINS (nm_resource, 'FUZZY (rosies, 0, 5000, weight)') > 0
3 /
no rows selected
SCOTT@orcl_11g> DROP INDEX your_text_idx
2 /
Index dropped.
SCOTT@orcl_11g> BEGIN
2    CTX_DDL.CREATE_PREFERENCE ('your_lexer', 'BASIC_LEXER');
3    CTX_DDL.SET_ATTRIBUTE ('your_lexer', 'SKIPJOINS', '''');
4 END;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> CREATE INDEX your_text_idx ON caps_resource (nm_resource)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS
4    ('STOPLIST CTXSYS.EMPTY_STOPLIST
5       LEXER       your_lexer')
6 /
Index created.
SCOTT@orcl_11g> SELECT token_text FROM dr$your_text_idx$i
2 /
TOKEN_TEXT
ROSYS
SCOTT@orcl_11g> SELECT * FROM caps_resource
2 WHERE CONTAINS (nm_resource, 'FUZZY (rosies, 0, 5000, weight)') > 0
3 /
NM_RESOURCE
Rosy's
SCOTT@orcl_11g>

How to implement fuzzy search in Query variables

Dear Experts,
Fuzzy search is eazy implemented in the abap , but I do not know how to implement fuzzy search in Query variables
our company have a report,with input variable of customer code, the user want to input 3 bits as fuzzy search. for example,
the customer code have 10 bit, she want to only input 3 bits before-- EAE *
and hope the results will be displayed. if you have any solution , please advise.
ManyTthanks.
Best Regards.
Steve

closed

No active external product for the fuzzy search (FBL1N)

Hi,
I am in transaction FBL1N and want to search the vendor by F4 help.
1. When I hit F-4, the Search window appears.
2. It gives an pop-up message "No active external product for the fuzzy search".
3. When we open the help for the pop-up message it says :
No active external product for the fuzzy search
Message no. F2807
Diagnosis
The connection of an external product is required for the fuzzy search.
For more information, see Note 176559.
I have looked through the SAP Note 176559 but was not really relevant.
Regards,
Rohidas Shinde

Passing parameters for fuzzy search

Hello,
I am using Oracle 11.2 and do fuzzy search as following:
Create table tb_test(Nm varchar2(32));
create index fuzzy_idx on tb_test(Nm) indextype is ctxsys.context parameters(' Wordlist STEM_FUZZY_PREF');
select * from tb_test where contains(Nm, 'fuzzy(Wndy,,,weight)',1) >0;
The query works fine for hardcoded string 'Wndy'. I just wonder how can I use parameter to pass the match string in PLSQL?
Thanks,

try this (not tested):
Procedure findMatchNm(nmStr in VARCHAR2)
IS
oraCursor REF CURSOR
str_val varchar2(100);BEGIN
str_val := 'fuzzy('||nmStr||',,,weight)';OPEN OraCursor FOR
'SELECT NM FROM TB_test WHERE contains(Nm, :s, 1)>0' USING str_val;LOOP
FETCH...
END LOOP;
END;
Edited by: stefan nebesnak on Jan 17, 2013 12:49 PM
using bind variable

How score() function works in HANA fuzzy search

Hi, i am confused by the score() returned value when i use this in fuzzy search in HANA
CREATE COLUMN TABLE test_similar_calculation_mode
( id INTEGER PRIMARY KEY, s text);
INSERT INTO test_similar_calculation_mode VALUES ('1','stringg');
INSERT INTO test_similar_calculation_mode VALUES ('2','string theory');
INSERT INTO test_similar_calculation_mode VALUES ('3','this is a very very very long string');
INSERT INTO test_similar_calculation_mode VALUES ('4','this is another very long string');
SELECT TO_INT(SCORE()*100)/100 AS score, id, s FROM test_similar_calculation_mode WHERE CONTAINS(s, 'theory', FUZZY(0.9, 'similarCalculationMode=compare')) ORDER BY score DESC;
the returned list is just as below
SCORE ID S
0.84 2 string theory
why i assign 0.9 as threshold in fuzzy function while this line with score 0.84 also be returned.
from my understanding, the S field is text data type, so the string actually is divided into seperate word list, so the score should be 1.0, is it right?
any hints is very appreciate,thanks

Hi William,
By default the score() function returns a TF/IDF score for text data types.
To get back the fuzzy score, you have to use the search option 'textSearch=compare' (or 'ts=compare'). Without other options, this gives an average score of all tokens ('string' and 'theory') and you get a score of 0.7 as a result.
To ignore the additional token 'string' in the database, you have to specify another option that tells the score function to use the tokens from the user input only ('cnmt=input').
So you should use
TO_INT(SCORE()*100)/100 AS score, id, s
FROM test_similar_calculation_mode
WHERE CONTAINS(s, 'theory', FUZZY(0.9, 'scm=compare, ts=compare, cnmt=input'))
ORDER BY score DESC;
to get the expected results.
Regards,
Jörg

E-Recruiting TREX and fuzzy search

Hello,
is fuzzy search available for the e-Recruiting module. We are using TREX 7.0 and e-Recruiting 6.0.
If that functionality is available, how does it exactly work and is it customizable?
thanks
Koen

Hi Guys
Sorry to jump on this thread with nothing to add except my problem.
We are currently in the process of implementing E-Recruitment 3.0 as an extension to ECC 5. I have configured the system and have been doing some unit testing, and have found that when I
- try to "Apply Directly" in tab "Career and Job". If I enter the reference details of the specific job it finds the specific. Or if I leave the input field blank and hit the search button, it returns all jobs posted. However, if I try using a the "Search for Jobs" link option, I get a consistent error "An internal error occurred. Please try again later"
When I check transaction SLG1.....and the error log says the termination occured in a program called CL_HRRCF_ABDTRACT_CONTROLLER==CM001 line 56 and CL_HRRCF_SEARCH_MASK_GROUP====CM00M line 15
Same thing happens when searching for candidates to assign requisitions to.
Has anyone come across this prolem before? Any help will be greatly appreciated, as we have a tight deadline.
Cheers

Text 10g fuzzy search performance

Hello to everybody in this community,
im new to this and I got a question which belongs to Oracle Text 10g.
My Setup:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit
8 Cores with each 2,5 GHz
64 GB RAM
What I'd like to do:
I'd like to compare a large amount of row sets with each other in a way that human caused mistakes (eg spelling, typing mistakes) will be tolerated.
So my TEXT CONTEXT setup is as follows:
MULTI_COLUMN_DATASTORE with each Column to compare.
begin
ctx_ddl.create_preference('my_datastore', 'MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute('my_datastore', 'columns', 'column1, ...');
end;
BASIC_LEXER - with GERMAN settings:
begin
   ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER');
   ctx_ddl.set_attribute('my_lexer', 'index_themes', 'NO');
   ctx_ddl.set_attribute('my_lexer', 'index_text', 'YES');
   ctx_ddl.set_attribute('my_lexer', 'alternate_spelling', 'GERMAN');
   ctx_ddl.set_attribute('my_lexer', 'composite', 'GERMAN');
   ctx_ddl.set_attribute('my_lexer', 'index_stems', 'GERMAN');
   ctx_ddl.set_attribute('my_lexer', 'new_german_spelling', 'YES');
end;
BASIC_WORDLIST - with GERMAN settings:
begin
   ctx_ddl.create_preference('my_wordlist', 'BASIC_WORDLIST');
   ctx_ddl.set_attribute('my_wordlist','FUZZY_MATCH','GERMAN');
   ctx_ddl.set_attribute('my_wordlist','FUZZY_SCORE','60'); --defaults
   ctx_ddl.set_attribute('my_wordlist','FUZZY_NUMRESULTS','100'); --defaults
   --ctx_ddl.set_attribute('my_wordlist','SUBSTRING_INDEX','TRUE'); --uncommented due to long creation time of index
   ctx_ddl.set_attribute('my_wordlist','STEMMER','GERMAN');
end;
And a BASIC_SECTION_GROUP with a field_section for each column.
begin
ctx_ddl.create_section_group(
    group_name => 'my_section_group',
    group_type => 'BASIC_SECTION_GROUP'
ctx_ddl.add_field_section(
    group_name   => 'my_section_group',
    section_name => 'column1',
    tag          => 'column1'
end;
I create the index with
create index idx_myfulltextindex on fulltexttest(column1)
indextype is ctxsys.context
parameters ('datastore my_datastore
             section group my_section_group
             lexer my_lexer
             wordlist my_wordlist
             stoplist ctxsys.empty_stoplist')
Everything works functionally fine.
In my test scenario i got a table with around 100.000 Rows which has a primary key which is not in the CONTEXT index.
The Problem:
I do a query like:
SELECT SCORE(1), a.*
FROM fulltexttest a
WHERE CONTAINS(a.column1, 'FUZZY(({TEST}),,,W) WITHIN COUMN1', 1)
AND a.primkey BETWEEN 1000 AND 4000
This will do a fulltext search in a set of 3000 rows. The response time here is nearly immediate. Maybe a second.
If I do the same in a cursor for many times (>1000) with different search terms, it is takes a long time ofcourse. In the average it does 1 query per second.
I thought this could not be that slow and i tested the same with:
SELECT SCORE(1), a.*
FROM fulltexttest a
WHERE CONTAINS(a.column1, '({TEST}) WITHIN COUMN1', 1)
AND a.primkey BETWEEN 1000 AND 4000
NOTE there is no Fuzzy search anymore...
With this it is up to 20 times faster.
The cpu of the server reaches about 15% load while processing the fuzzy query.
So:
If I do a fuzzy search, it seems not to access the index. I thought I was telling the index to compute the results of 100 expansions in advance.
Am I doing it wrong? Or is it not possible to build an Index especially for fuzzy search ?
Are there any suggestions to increase the performance? Note that I read the guide (7 Tuning Oracle Text) already. None of the hints caused remedy.
I would appreciate if anyone is able to help me in this case... Or just give a hint.
Thank you,
Dominik

Here is a simplified example, first without, then with SDATA. Please note the differences is indexes, queries, and execution plans.
SCOTT@orcl12c> CREATE TABLE fulltexttest
2    (primkey NUMBER PRIMARY KEY,
3      column1 VARCHAR2(30))
4 /
Table created.
SCOTT@orcl12c> CREATE SEQUENCE seq
2 /
Sequence created.
SCOTT@orcl12c> INSERT INTO fulltexttest
2 SELECT seq.NEXTVAL, object_name
3 FROM all_objects
4 /
89826 rows created.
SCOTT@orcl12c> create index idx_myfulltextindex
2 on fulltexttest(column1)
3 indextype is ctxsys.context
4 /
Index created.
SCOTT@orcl12c> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl12c> SELECT SCORE(1), a.*
2 FROM fulltexttest a
3 WHERE CONTAINS
4            (a.column1,
5            'FUZZY(({TEST}),,,W)',
6            1) > 0
7 AND    a.primkey BETWEEN 1 AND 4000
8 /
SCORE(1)    PRIMKEY COLUMN1
        53        247 SQL$TEXT
        53        248 I_SQL$TEXT_PKEY
        53        249 I_SQL$TEXT_HANDLE
3 rows selected.
Execution Plan
Plan hash value: 2971213997
| Id | Operation                          | Name                | Rows | Bytes | Cost (%CPU)| Time    |
| 0 | SELECT STATEMENT                    |                    |    1 |    42 |    13 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| FULLTEXTTEST        |    1 |    42 |    13 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS      |                    |      |      |            |          |
| 3 |    BITMAP AND                      |                    |      |      |            |          |
| 4 |    BITMAP CONVERSION FROM ROWIDS |                    |      |      |            |          |
| 5 |      SORT ORDER BY                  |                    |      |      |            |          |
|* 6 |      DOMAIN INDEX                  | IDX_MYFULLTEXTINDEX | 2500 |      |    4 (0)| 00:00:01 |
| 7 |    BITMAP CONVERSION FROM ROWIDS |                    |      |      |            |          |
| 8 |      SORT ORDER BY                  |                    |      |      |            |          |
|* 9 |      INDEX RANGE SCAN              | SYS_C0035980        | 2500 |      |    9 (0)| 00:00:01 |
Predicate Information (identified by operation id):
6 - access("CTXSYS"."CONTAINS"("A"."COLUMN1",'FUZZY(({TEST}),,,W)',1)>0)
9 - access("A"."PRIMKEY">=1 AND "A"."PRIMKEY"<=4000)
Note
- dynamic statistics used: dynamic sampling (level=2)
SCOTT@orcl12c> SET AUTOTRACE OFF
SCOTT@orcl12c> DROP INDEX idx_myfulltextindex
2 /
Index dropped.
SCOTT@orcl12c> create index idx_myfulltextindex
2 on fulltexttest(column1)
3 indextype is ctxsys.context
4 FILTER BY primkey
5 /
Index created.
SCOTT@orcl12c> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl12c> SELECT SCORE(1), a.*
2 FROM fulltexttest a
3 WHERE CONTAINS
4            (a.column1,
5            'FUZZY(({TEST}),,,W) AND SDATA (primkey BETWEEN 1 AND 4000)',
6            1) > 0
7 /
SCORE(1)    PRIMKEY COLUMN1
        53        247 SQL$TEXT
        53        248 I_SQL$TEXT_PKEY
        53        249 I_SQL$TEXT_HANDLE
3 rows selected.
Execution Plan
Plan hash value: 1298620335
| Id | Operation                  | Name                | Rows | Bytes | Cost (%CPU)| Time    |
| 0 | SELECT STATEMENT            |                    |    41 | 1722 |    12 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| FULLTEXTTEST        |    41 | 1722 |    12 (0)| 00:00:01 |
|* 2 | DOMAIN INDEX              | IDX_MYFULLTEXTINDEX |      |      |    4 (0)| 00:00:01 |
Predicate Information (identified by operation id):
2 - access("CTXSYS"."CONTAINS"("A"."COLUMN1",'FUZZY(({TEST}),,,W) AND SDATA (primkey
              BETWEEN 1 AND 4000)',1)>0)
Note
- dynamic statistics used: dynamic sampling (level=2)
SCOTT@orcl12c>

How to find a data including% using fuzzy searching

Hello,
I have a column that has a value "wwt%abc" . How can I find this data using SQL*PLUS and fuzzy searching condition? I wish to use
the statement like this:
select name from mytable where name like ...
thanks for any help

Susan,
I think you could use ESCAPE clause to specify that % is to be interpreted literaly.
select * from mytable where name like '%wwt\%abc%' escape '\';
I hope this helps.
Cheema
null

Concatenated datastore fuzzy searches and performance...

Oracle 8.1.7:
I am using the concatenated datastore and indexing two columns.
The query I am executing includes an exact match on one column and a fuzzy match on the second column.
When I execute the query, performance should improve as the exact match column is set to return less values.
This is the case when we execute an exact match search on both columns.
However, when one column is an exact match and the second column is a fuzzy match this is not true.
Is this normal processing??? and why??? Is this a bug??
If you need more information please let me know.
We are under a deadline and this is our final road block.
TIA
Colleen GEislinger

This is more information about our scenario:
We have two groups in the datastore:
concat:
1.) hierarchy:(example text) 321826 325123 543123
2.) page: Actual document text.
321826 325123 543123 represents ids in a hierarchy structure. As you move from left to right the number of times the number occurs is less so there should be less exact matches.
Example: In this index all pages have 321826 as the first value. A few pages have 543123 and all others will have some other number as the last value.
if I do this query:
contains(concat,(321826 within hierarchy ) and ('personnel') within page)
it takes about 10 seconds because it 321826 will hit all pages.
if I do this query:
contains(concat,(543123 within hierarchy ) and ('personnel') within page)
it takes only about 1 second because it 543123 will hit just a few pages.
BUT:::::::
Fuzzy search....
if I do this query:
search A.) contains(concat,(321826 within hierarchy ) and ?('personnel') within page)
it takes about 30 seconds because it 321826 will hit all pages. This is okay for performance for this.
BUT if I do this query:
search B.) contains(concat,(543123 within hierarchy ) and ?('personnel') within page)
it takes about 30 seconds even though 543123 will hit only a few pages.
This should be faster than 30 seconds because you're searching over only a fraction of material for the fuzzy search part.
We've played with different variations on the () and the '' but nothing seems to change this.
Any advice on how to make search B.) faster??
We don't understand why see the different speeds in the exact match and we DON'T see the different speeds in the fuzzy search...
I can send you some test data with the index and query scripts if you want.
Our indexes are on large tables (2,000,000) rows.
TIA
Colleen Geislinger.

Email address validation, is there a way to use Regex or other fuzzy searching?

I would like to use PL/SQL for Email address validation, is there a way to use Regex (regular expressions) or some other fuzzy searching for that? Using % and _ wildcards only take you so far...
I need something that will verify alphanumeric charectors (no ",'.:#@&*^ etc.) any ideas?
Current code:
if email not like '_%@_%.__%' or email like '%@%@%' or email like '% %' or email like '%"%' or email like '%''%' or email like '%
%' then
The last line is to make sure there are no linebreaks in the middle of the email address, is there a better way to signify a line break, like \n or an ascii equivilent?

Michael:
The as noted in the previous post, DBI is a Perl package that allows Perl to talk to various databases, including Oracle. We use DBI on several UNIX servers, and it does not require ODBC, and I have always found it to be extremely quick. Things may be different in the Windows world.
If you are spooling files out to run through Perl anyway, you may want to take a look at DBI. You could probably modify your existing scripts to use DBI fairly easily. The basic structure using DBI is like:
use DBI;
my dbh;       # A database handle
my sth;       # A statment handle
my sqlstr;    # SQL statement
my db_vars;   # Variables for your db columns
# Connect to the database
$dbh = DBI->connect( "dbi:Oracle:service_name","user/password");
$sqlstr = 'SELECT * FROM emp WHERE id = ?' # even takes bind variables
#Prepare statement
$sth = $dbh->prepare($sqlstr);
$sth->execute(12345); # Execute with values for bind if desired
# Walk the "cursor"
while (($db_vars) = $sth->fetchrow_array()) {
   your processing here

Fuzzy searching and concatenated datastore query performance problems.

I am using the concatenated datastore and indexing two columns.
The query I am executing includes an exact match on one column and a fuzzy match on the second column.
When I execute the query, performance should improve as the exact match column is set to return less values.
This is the case when we execute an exact match search on both columns.
However, when one column is an exact match and the second column is a fuzzy match this is not true.
Is this normal processing??? and why??? Is this a bug??
If you need more information please let me know.
We are under a deadline and this is our final road block.
TIA
Colleen GEislinger

I see that you have posted the message in the Oracle text forum, good! You should get a better, more timely answer there.
Larry

[WTA] Perform Fuzzy/Matching/Search of Similarity Text

This are my sample data:
With
vCAR_MODEL AS (
Select '1' AS MODEL_ID, 'CITY' AS CAR_MODEL FROM DUAL UNION ALL
Select '2' AS MODEL_ID, 'HOOOONDA' AS CAR_MODEL FROM DUAL UNION ALL
Select '3' AS MODEL_ID, 'CRUZE' AS CAR_MODEL FROM DUAL UNION ALL
Select '5' AS MODEL_ID, 'HONDA CRUZE' AS CAR_MODEL FROM DUAL
vCAR_MODEL_DETAIL AS (
Select '1' AS MODEL_DETAIL_ID, 'HONDA @ CITY' AS CAR_MODEL , SYSTIMESTAMP + 1 AS UPDATE_DATE FROM DUAL UNION ALL
Select '2' AS MODEL_DETAIL_ID, 'HONDA,CITY' AS CAR_MODEL, SYSTIMESTAMP + 2 AS UPDATE_DATE FROM DUAL UNION ALL
Select '3' AS MODEL_DETAIL_ID, 'HONDA|| CITY' AS CAR_MODEL, SYSTIMESTAMP + 3 AS UPDATE_DATE FROM DUAL UNION ALL
Select '4' AS MODEL_DETAIL_ID, 'CIIIITY @ HOOOONDA' AS CAR_MODEL, SYSTIMESTAMP + 4 AS UPDATE_DATE FROM DUAL UNION ALL
Select '5' AS MODEL_DETAIL_ID, 'HONDA' AS CAR_MODEL,SYSTIMESTAMP + 5 AS UPDATE_DATE FROM DUAL UNION ALL
Select '6' AS MODEL_DETAIL_ID, 'CHEVY @ CRUZE' AS CAR_MODEL,SYSTIMESTAMP + 6 AS UPDATE_DATE FROM DUAL UNION ALL
Select '7' AS MODEL_DETAIL_ID, 'CRUZE' AS CAR_MODEL,SYSTIMESTAMP + 7 AS UPDATE_DATE FROM DUAL UNION ALL
Select '8' AS MODEL_DETAIL_ID, 'HONDA CRUZE' AS CAR_MODEL,SYSTIMESTAMP + 8 AS UPDATE_DATE FROM DUAL
Select * from vCAR_MODEL_DETAIL------------------------------------------------------------------------------------
CAR_MODEL_ID     CAR_MODEL     UPDATE_DATE
1     HONDA @ CITY     6-May-13
2     HONDA, CITY     7-May-13
3     HONDA|| CITY     8-May-13
4     CIIIITY @ HOOOONDA     9-May-13
5     HONDA     10-May-13
6     CHEVY @ CRUZE     11-May-13
7     CRUZE     12-May-13
8     HONDA CRUZE     13-May-13
and what I want actually is:
With
vCAR_MODEL AS (
Select '1' AS MODEL_ID, 'CITY' AS CAR_MODEL FROM DUAL UNION ALL
Select '2' AS MODEL_ID, 'HOOOONDA' AS CAR_MODEL FROM DUAL UNION ALL
Select '3' AS MODEL_ID, 'CRUZE' AS CAR_MODEL FROM DUAL UNION ALL
Select '5' AS MODEL_ID, 'HONDA CRUZE' AS CAR_MODEL FROM DUAL
vCAR_MODEL_DETAIL AS (
--Select '1' AS MODEL_DETAIL_ID, 'HONDA @ CITY' AS CAR_MODEL , SYSTIMESTAMP + 1 AS UPDATE_DATE FROM DUAL UNION ALL
--Select '2' AS MODEL_DETAIL_ID, 'HONDA,CITY' AS CAR_MODEL, SYSTIMESTAMP + 2 AS UPDATE_DATE FROM DUAL UNION ALL
Select '3' AS MODEL_DETAIL_ID, 'HONDA|| CITY' AS CAR_MODEL, SYSTIMESTAMP + 3 AS UPDATE_DATE FROM DUAL UNION ALL
Select '4' AS MODEL_DETAIL_ID, 'CIIIITY @ HOOOONDA' AS CAR_MODEL, SYSTIMESTAMP + 4 AS UPDATE_DATE FROM DUAL UNION ALL
Select '5' AS MODEL_DETAIL_ID, 'HONDA' AS CAR_MODEL,SYSTIMESTAMP + 5 AS UPDATE_DATE FROM DUAL UNION ALL
--Select '6' AS MODEL_DETAIL_ID, 'CHEVY @ CRUZE' AS CAR_MODEL,SYSTIMESTAMP + 6 AS UPDATE_DATE FROM DUAL UNION ALL
Select '7' AS MODEL_DETAIL_ID, 'CRUZE' AS CAR_MODEL,SYSTIMESTAMP + 7 AS UPDATE_DATE FROM DUAL UNION ALL
Select '8' AS MODEL_DETAIL_ID, 'HONDA CRUZE' AS CAR_MODEL,SYSTIMESTAMP + 8 AS UPDATE_DATE FROM DUAL
Select * from vCAR_MODEL_DETAIL------------------------------------------------------------------------------------
CAR_MODEL_ID     CAR_MODEL     UPDATE_DATE
3     HONDA|| CITY     8-May-13
4     CIIIITY @ HOOOONDA     9-May-13
5     HONDA     10-May-13
7     CRUZE     12-May-13
8     HONDA CRUZE     13-May-13
The main table is "vCAR_MODEL" and the detail table is "vCAR_MODEL_DETAIL", the purpose is to fuzzy search based on "vCAR_MODEL" over "vCAR_MODEL_DETAIL".
And the detail table is pickup from MAX "UPDATE_DATE" column.
My problem is how do I perform fuzzy search over those symbols where cross join over the main table?
any idea?

From Text Area, I got an answer

Fuzzy Searches

Similar Messages

Maybe you are looking for