Concatenated datastore fuzzy searches and performance...

Oracle 8.1.7:
I am using the concatenated datastore and indexing two columns.
The query I am executing includes an exact match on one column and a fuzzy match on the second column.
When I execute the query, performance should improve as the exact match column is set to return less values.
This is the case when we execute an exact match search on both columns.
However, when one column is an exact match and the second column is a fuzzy match this is not true.
Is this normal processing??? and why??? Is this a bug??
If you need more information please let me know.
We are under a deadline and this is our final road block.
TIA
Colleen GEislinger

This is more information about our scenario:
We have two groups in the datastore:
concat:
1.) hierarchy:(example text) 321826 325123 543123
2.) page: Actual document text.
321826 325123 543123 represents ids in a hierarchy structure. As you move from left to right the number of times the number occurs is less so there should be less exact matches.
Example: In this index all pages have 321826 as the first value. A few pages have 543123 and all others will have some other number as the last value.
if I do this query:
contains(concat,(321826 within hierarchy ) and ('personnel') within page)
it takes about 10 seconds because it 321826 will hit all pages.
if I do this query:
contains(concat,(543123 within hierarchy ) and ('personnel') within page)
it takes only about 1 second because it 543123 will hit just a few pages.
BUT:::::::
Fuzzy search....
if I do this query:
search A.) contains(concat,(321826 within hierarchy ) and ?('personnel') within page)
it takes about 30 seconds because it 321826 will hit all pages. This is okay for performance for this.
BUT if I do this query:
search B.) contains(concat,(543123 within hierarchy ) and ?('personnel') within page)
it takes about 30 seconds even though 543123 will hit only a few pages.
This should be faster than 30 seconds because you're searching over only a fraction of material for the fuzzy search part.
We've played with different variations on the () and the '' but nothing seems to change this.
Any advice on how to make search B.) faster??
We don't understand why see the different speeds in the exact match and we DON'T see the different speeds in the fuzzy search...
I can send you some test data with the index and query scripts if you want.
Our indexes are on large tables (2,000,000) rows.
TIA
Colleen Geislinger.

Similar Messages

Fuzzy searching and concatenated datastore query performance problems.

I am using the concatenated datastore and indexing two columns.
The query I am executing includes an exact match on one column and a fuzzy match on the second column.
When I execute the query, performance should improve as the exact match column is set to return less values.
This is the case when we execute an exact match search on both columns.
However, when one column is an exact match and the second column is a fuzzy match this is not true.
Is this normal processing??? and why??? Is this a bug??
If you need more information please let me know.
We are under a deadline and this is our final road block.
TIA
Colleen GEislinger

I see that you have posted the message in the Oracle text forum, good! You should get a better, more timely answer there.
Larry

Text 10g fuzzy search performance

Hello to everybody in this community,
im new to this and I got a question which belongs to Oracle Text 10g.
My Setup:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit
8 Cores with each 2,5 GHz
64 GB RAM
What I'd like to do:
I'd like to compare a large amount of row sets with each other in a way that human caused mistakes (eg spelling, typing mistakes) will be tolerated.
So my TEXT CONTEXT setup is as follows:
MULTI_COLUMN_DATASTORE with each Column to compare.
begin
ctx_ddl.create_preference('my_datastore', 'MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute('my_datastore', 'columns', 'column1, ...');
end;
BASIC_LEXER - with GERMAN settings:
begin
 ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER');
 ctx_ddl.set_attribute('my_lexer', 'index_themes', 'NO');
 ctx_ddl.set_attribute('my_lexer', 'index_text', 'YES');
 ctx_ddl.set_attribute('my_lexer', 'alternate_spelling', 'GERMAN');
 ctx_ddl.set_attribute('my_lexer', 'composite', 'GERMAN');
 ctx_ddl.set_attribute('my_lexer', 'index_stems', 'GERMAN');
 ctx_ddl.set_attribute('my_lexer', 'new_german_spelling', 'YES');
end;
BASIC_WORDLIST - with GERMAN settings:
begin
 ctx_ddl.create_preference('my_wordlist', 'BASIC_WORDLIST');
 ctx_ddl.set_attribute('my_wordlist','FUZZY_MATCH','GERMAN');
 ctx_ddl.set_attribute('my_wordlist','FUZZY_SCORE','60'); --defaults
 ctx_ddl.set_attribute('my_wordlist','FUZZY_NUMRESULTS','100'); --defaults
 --ctx_ddl.set_attribute('my_wordlist','SUBSTRING_INDEX','TRUE'); --uncommented due to long creation time of index
 ctx_ddl.set_attribute('my_wordlist','STEMMER','GERMAN');
end;
And a BASIC_SECTION_GROUP with a field_section for each column.
begin
ctx_ddl.create_section_group(
 group_name => 'my_section_group',
 group_type => 'BASIC_SECTION_GROUP'
ctx_ddl.add_field_section(
 group_name => 'my_section_group',
 section_name => 'column1',
 tag => 'column1'
end;
I create the index with
create index idx_myfulltextindex on fulltexttest(column1)
indextype is ctxsys.context
parameters ('datastore my_datastore
 section group my_section_group
 lexer my_lexer
 wordlist my_wordlist
 stoplist ctxsys.empty_stoplist')
Everything works functionally fine.
In my test scenario i got a table with around 100.000 Rows which has a primary key which is not in the CONTEXT index.
The Problem:
I do a query like:
SELECT SCORE(1), a.*
FROM fulltexttest a
WHERE CONTAINS(a.column1, 'FUZZY(({TEST}),,,W) WITHIN COUMN1', 1)
AND a.primkey BETWEEN 1000 AND 4000
This will do a fulltext search in a set of 3000 rows. The response time here is nearly immediate. Maybe a second.
If I do the same in a cursor for many times (>1000) with different search terms, it is takes a long time ofcourse. In the average it does 1 query per second.
I thought this could not be that slow and i tested the same with:
SELECT SCORE(1), a.*
FROM fulltexttest a
WHERE CONTAINS(a.column1, '({TEST}) WITHIN COUMN1', 1)
AND a.primkey BETWEEN 1000 AND 4000
NOTE there is no Fuzzy search anymore...
With this it is up to 20 times faster.
The cpu of the server reaches about 15% load while processing the fuzzy query.
So:
If I do a fuzzy search, it seems not to access the index. I thought I was telling the index to compute the results of 100 expansions in advance.
Am I doing it wrong? Or is it not possible to build an Index especially for fuzzy search ?
Are there any suggestions to increase the performance? Note that I read the guide (7 Tuning Oracle Text) already. None of the hints caused remedy.
I would appreciate if anyone is able to help me in this case... Or just give a hint.
Thank you,
Dominik

Here is a simplified example, first without, then with SDATA. Please note the differences is indexes, queries, and execution plans.
SCOTT@orcl12c> CREATE TABLE fulltexttest
2 (primkey NUMBER PRIMARY KEY,
3 column1 VARCHAR2(30))
4 /
Table created.
SCOTT@orcl12c> CREATE SEQUENCE seq
2 /
Sequence created.
SCOTT@orcl12c> INSERT INTO fulltexttest
2 SELECT seq.NEXTVAL, object_name
3 FROM all_objects
4 /
89826 rows created.
SCOTT@orcl12c> create index idx_myfulltextindex
2 on fulltexttest(column1)
3 indextype is ctxsys.context
4 /
Index created.
SCOTT@orcl12c> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl12c> SELECT SCORE(1), a.*
2 FROM fulltexttest a
3 WHERE CONTAINS
4 (a.column1,
5 'FUZZY(({TEST}),,,W)',
6 1) > 0
7 AND a.primkey BETWEEN 1 AND 4000
8 /
SCORE(1) PRIMKEY COLUMN1
 53 247 SQL$TEXT
 53 248 I_SQL$TEXT_PKEY
 53 249 I_SQL$TEXT_HANDLE
3 rows selected.
Execution Plan
Plan hash value: 2971213997
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1 | 42 | 13 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| FULLTEXTTEST | 1 | 42 | 13 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 3 | BITMAP AND | | | | | |
| 4 | BITMAP CONVERSION FROM ROWIDS | | | | | |
| 5 | SORT ORDER BY | | | | | |
|* 6 | DOMAIN INDEX | IDX_MYFULLTEXTINDEX | 2500 | | 4 (0)| 00:00:01 |
| 7 | BITMAP CONVERSION FROM ROWIDS | | | | | |
| 8 | SORT ORDER BY | | | | | |
|* 9 | INDEX RANGE SCAN | SYS_C0035980 | 2500 | | 9 (0)| 00:00:01 |
Predicate Information (identified by operation id):
6 - access("CTXSYS"."CONTAINS"("A"."COLUMN1",'FUZZY(({TEST}),,,W)',1)>0)
9 - access("A"."PRIMKEY">=1 AND "A"."PRIMKEY"<=4000)
Note
- dynamic statistics used: dynamic sampling (level=2)
SCOTT@orcl12c> SET AUTOTRACE OFF
SCOTT@orcl12c> DROP INDEX idx_myfulltextindex
2 /
Index dropped.
SCOTT@orcl12c> create index idx_myfulltextindex
2 on fulltexttest(column1)
3 indextype is ctxsys.context
4 FILTER BY primkey
5 /
Index created.
SCOTT@orcl12c> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl12c> SELECT SCORE(1), a.*
2 FROM fulltexttest a
3 WHERE CONTAINS
4 (a.column1,
5 'FUZZY(({TEST}),,,W) AND SDATA (primkey BETWEEN 1 AND 4000)',
6 1) > 0
7 /
SCORE(1) PRIMKEY COLUMN1
 53 247 SQL$TEXT
 53 248 I_SQL$TEXT_PKEY
 53 249 I_SQL$TEXT_HANDLE
3 rows selected.
Execution Plan
Plan hash value: 1298620335
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 41 | 1722 | 12 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| FULLTEXTTEST | 41 | 1722 | 12 (0)| 00:00:01 |
|* 2 | DOMAIN INDEX | IDX_MYFULLTEXTINDEX | | | 4 (0)| 00:00:01 |
Predicate Information (identified by operation id):
2 - access("CTXSYS"."CONTAINS"("A"."COLUMN1",'FUZZY(({TEST}),,,W) AND SDATA (primkey
 BETWEEN 1 AND 4000)',1)>0)
Note
- dynamic statistics used: dynamic sampling (level=2)
SCOTT@orcl12c>

Concatenated datastore performance with other predicates

Hi
I am using context indexes with a concatenated datastore.
The query is like this -
select *
from my_table
where contains ( my_column, 'token_1 within xx or token_2 within yy ', 1 ) > 0
and some_other_column = 'xxx'
There is no index on "some_other_column".
Would it help to include "some_other_column" in the concatenated datastore? Will this increase the performance of the query, or does it always depends on the type of data we have?
How is the query of a concatenated datastore fired? Is the $I table queried for each token in the query?
Thanks and regards
Pratap

Yes, it should generally be faster to include "some_other_column" in the
list for the concatenated datastore.
The query would then be
select * from my_table where contains
( my_column, '(token_1 within xx or token_2 within yy) and (xxx within some_other_column)', 1 ) > 0
Note that this is not exactly the same as your query - for example if some_other_column contained "abc xxx xyz" then my query would be a hit but yours would not. If you know the column will only ever contain one word, then they are identical.
- Roger

Firefox clears the searchbox on ebay when i perform a search and nothing happens!

Firefox clears the search-box on ebay when i perform a search and nothing happens! The page just refreshes and i can't find any items. It works as normal when i try with IE though..
== This happened ==
Every time Firefox opened
== about 3-4 months ago..

Start Firefox in [[Safe Mode]] to check if one of the extensions is causing the problem (switch to the DEFAULT theme: Firefox (Tools) > Add-ons > Appearance/Themes).
*Don't make any changes on the Safe mode start window.
*https://support.mozilla.com/kb/Safe+Mode

Perform search and replace in a custom command?

Hello,
Does anyone know how to perform a search and replace for selected text in a custom command?
My specific dilemma is this: I have a large document containing [very poorly formatted] songs. For example:
Verse1 Verse1 Verse1 Verse1 Verse1
Verse1 Verse1 Verse1 Verse1 Verse1
Verse1 Verse1 Verse1 Verse1 Verse1
Chorus Chorus Chorus Chorus
Chorus Chorus Chorus Chorus
Chorus Chorus Chorus Chorus
Verse2 Verse2 Verse2 Verse2 Verse2
Verse2 Verse2 Verse2 Verse2 Verse2
Verse2 Verse2 Verse2 Verse2 Verse2
Verse3 Verse3 Verse3 Verse3 Verse3
Verse3 Verse3 Verse3 Verse3 Verse3
Verse3 Verse3 Verse3 Verse3 Verse3
I would like to be able to select all the text in a verse and run my custom command on it that would remove the "" and replace "" with " ".
Thanks,
Kelso

replaceAll method in String is available only from 1.4 onwards.
http://java.sun.com/j2se/1.4.1/docs/api/java/lang/String.html#replaceAll(java.lang.String, java.lang.String).
You could use a method to search and replace a String with another...
public String replace (String source, String search, String replace) {
 StringBuffer result = new StringBuffer();
 int start = 0;
 int index = 0;
 while ((index = source.indexOf(search, start)) >= 0) {
 result.append(source.substring(start, index));
 result.append(replace);
 start = index + search.length();
 result.append(source.substring(start));
 return result.toString();
}

E-Recruiting TREX and fuzzy search

Hello,
is fuzzy search available for the e-Recruiting module. We are using TREX 7.0 and e-Recruiting 6.0.
If that functionality is available, how does it exactly work and is it customizable?
thanks
Koen

Hi Guys
Sorry to jump on this thread with nothing to add except my problem.
We are currently in the process of implementing E-Recruitment 3.0 as an extension to ECC 5. I have configured the system and have been doing some unit testing, and have found that when I
- try to "Apply Directly" in tab "Career and Job". If I enter the reference details of the specific job it finds the specific. Or if I leave the input field blank and hit the search button, it returns all jobs posted. However, if I try using a the "Search for Jobs" link option, I get a consistent error "An internal error occurred. Please try again later"
When I check transaction SLG1.....and the error log says the termination occured in a program called CL_HRRCF_ABDTRACT_CONTROLLER==CM001 line 56 and CL_HRRCF_SEARCH_MASK_GROUP====CM00M line 15
Same thing happens when searching for candidates to assign requisitions to.
Has anyone come across this prolem before? Any help will be greatly appreciated, as we have a tight deadline.
Cheers

Difference in WS performance between Search and Retrieve operations?

All,
We are currently working on a new repository and planning to use MDM webservices on top of that repository for searching and retrieving the data.
Now I'm curious about the difference in performance between the Search and the Retrieve operations and also within the Retrieve operation, between the different identification methods (internal ID, auto ID, remote key, unique field and display field).
Because in the webservices guide is stated that the identification methods are listed in order of best performance, but what are these performance differences between these methods (e.g. a retrieve on internal ID is x times faster than a retrieve on remote key which on his turn is x times faster than a retrieve on display fields which on his turn is x times faster than a search operation on same display field).
Of course the performance depends on lot of other things as well, but I just want to get a feeling on the performance related to eachother (keeping all other variables that can influence the performance the same!)!
I hope that any of you has experiences with all possibilities and can share performance measurements between the different operations related to eachother. Thanks in advance.
Regards,
Marcel Herber

Hi,
Did you implment Webservices in your site.
We are also having a similar scenarion where we have to serach a Records in MDM from SAP PI based on the certain criteria. I am concerned about the SAP MDM performance , since we are having heavy amount data being loaded every 30 minutes.
Please let me know the performace aspects of using Webservices.
Thanks
Ganesh Kotti

Store XML on Oracle and perform search

Hi,
I need to be able to store 10,000 XML documents in Oracle so
I can performance attribute search against these documents
Is Oracle 8i a must? Is relational tables the way to go?
What would a good way to store XML document and retrieve them.
We are currently storing them as BLOB, where we can't do
preform searching functionality.
Many Thanks.
Kevin
null

Oracle XML Team wrote:
: Kevin Lu (guest) wrote:
: : Hi,
: : I need to be able to store 10,000 XML documents in Oracle so
: : I can performance attribute search against these documents
: : Is Oracle 8i a must? Is relational tables the way to go?
: : What would a good way to store XML document and retrieve
them.
: : We are currently storing them as BLOB, where we can't do
: : preform searching functionality.
: : Many Thanks.
: : Kevin
: 8i is the way to go but interMedia's support of XML attribute
: searching is not in the current release but has been announced
: for 8.1.6.
I posted this as a follow-up a previous query but, I need to
accomplish the same. That is store XML data in CLOB but search
(and select rows) based on XML element or attribute values of the
XML documents in the CLOB column. Where can I learn more about
the InterMedia search. Thanks again.
Prasad
null

I searched and searched (fuzzy logic 4)

Hello!
Right now i was reading for over an hour and i cannot find the right answer about fuzzy logic 4
I used live update to get the most recent drivers for my mobo msi 845 pe max2.
(just to be complete I have the HT option in the bios while having a P4 2.4Ghz, for those who want to know, bios 1.20).....
My problem:
I used the "auto" function in fuzzy-logic and waited 3 minutes, the FSB is boosted to 164. Than the system hangs and restarts. (thats normal right?) the cpu-temp is 64C when it hangs..
then after the reboot, nothing has changed and the fsb is back to 133???
What is wrong? how can i overclock my system with fuzzy logic??
Sander

I would NOT use Core-Center Either with your Board, as since it is a 845 Series, I dont Believe that you even have the "Core-Cell" Chip on your Motherboard ..I suggest that you use "Speed-Fan" as this seems to be the Utility that Most of the Members have Success with, and if you want a Utility to Overclock your FSB with, Download Clock-Generator, at http://www.cpuid.com .....Sean REILLY875

Scoring messed up using concatenated datastore Index

Hi,
Here is my table structure....
CREATE TABLE SRCH_KEYWORD_SEARCH_SME
SYS_ID NUMBER(10) NOT NULL,
PAPER_NO VARCHAR2(10),
PRODIDX_ID VARCHAR2(10),
RESULT_TITLE VARCHAR2(255),
RESULT_DESCR VARCHAR2(1000) NOT NULL,
ABSTRACT CLOB,
SRSLT_CATEGORY_ID VARCHAR2(10) NOT NULL,
SRSLT_SUB_CATEGORY_ID VARCHAR2(10) NOT NULL,
ACTIVE_FLAG VARCHAR2(1) DEFAULT 'Y' NOT NULL,
EVENT_START_DATE DATE,
EVENT_END_DATE DATE,
Here is the Concatenated Datastore preference...
 -- Drop any existing storage preference.
 CTX_DDL.drop_preference('SEARCH_STORAGE_PREF');
 -- Create new storage preference.
 CTX_DDL.create_preference('SEARCH_STORAGE_PREF', 'BASIC_STORAGE');
 CTX_DDL.set_attribute('SEARCH_STORAGE_PREF', 'I_TABLE_CLAUSE', 'tablespace searchidx');
 CTX_DDL.set_attribute('SEARCH_STORAGE_PREF', 'K_TABLE_CLAUSE', 'tablespace searchidx');
 CTX_DDL.set_attribute('SEARCH_STORAGE_PREF', 'R_TABLE_CLAUSE', 'tablespace searchidx lob (data) store as (disable storage in row cache)');
 CTX_DDL.set_attribute('SEARCH_STORAGE_PREF', 'N_TABLE_CLAUSE', 'tablespace searchidx');
 CTX_DDL.set_attribute('SEARCH_STORAGE_PREF', 'I_INDEX_CLAUSE', 'tablespace searchidx compress 2');
 CTX_DDL.set_attribute('SEARCH_STORAGE_PREF', 'P_TABLE_CLAUSE', 'tablespace searchidx');
 -- Drop any existing datastore preference.
 CTX_DDL.drop_preference('SEARCH_DATA_STORE');
 CTX_DDL.DROP_SECTION_GROUP('SEARCH_DATA_STORE_SG');
 -- Create new multi-column datastore preference.
 CTX_DDL.create_preference('SEARCH_DATA_STORE','MULTI_COLUMN_DATASTORE');
 CTX_DDL.set_attribute('SEARCH_DATA_STORE','columns','abstract, srslt_category_id, srslt_sub_category_id, active_flag');
 CTX_DDL.set_attribute('SEARCH_DATA_STORE', 'FILTER','N,N,N,N');
 -- Create new section group preference.
 CTX_DDL.create_section_group ('SEARCH_DATA_STORE_SG','BASIC_SECTION_GROUP');
 CTX_DDL.add_field_section('SEARCH_DATA_STORE_SG', 'abstract', 'abstract', TRUE);
 CTX_DDL.add_field_section('SEARCH_DATA_STORE_SG', 'srslt_category_id', 'srslt_category_id', TRUE);
 CTX_DDL.add_field_section('SEARCH_DATA_STORE_SG', 'srslt_sub_category_id', 'srslt_sub_category_id',TRUE);
 CTX_DDL.add_field_section('SEARCH_DATA_STORE_SG', 'active_flag', 'active_flag', TRUE);
Here is the context Index
CREATE INDEX SRCH_KEYWORD_SEARCH_I ON SRCH_KEYWORD_SEARCH_SME(ABSTRACT)
 INDEXTYPE IS CTXSYS.CONTEXT
 PARAMETERS('STORAGE search_storage_pref DATASTORE SEARCH_DATA_STORE SECTION GROUP SEARCH_DATA_STORE_SG' )
Here is the Query # 1 I am trying out...
SELECT /*+ FIRST_ROWS(10) */
 SCORE(1) score_nbr,
 k.SYS_ID,
 k.RESULT_TITLE,
FROM SRCH_KEYWORD_SEARCH_SME k
WHERE CONTAINS (k.ABSTRACT, '<query><textquery><progression><seq>{hitchhiker} WITHIN abstract</seq></progression></textquery></query>',1) > 0
ORDER BY SCORE(1) DESC;
Here is the result for Query # 1...
score_nbr sys_id result_title
54 99220 SME Releases New Book The Hitchhiker's Guide to Lean 72
43 116583 Lean Leadership Package 72
32 132392 The Hitchhikers Guide to Lean: Lessons from the Road 72
11 132017 Lean Manufacturing A Plant Floor Guide Book Summary 72
11 137106 Managing Factory Maintenance, Second Edition 72
11 132082 Lean Pocket GuideHere is the Query # 2 I am trying out...
SELECT /*+ FIRST_ROWS(10) */
 SCORE(1) score_nbr,
 k.SYS_ID,
 k.RESULT_TITLE,
FROM SRCH_KEYWORD_SEARCH_SME k
WHERE CONTAINS (k.ABSTRACT, '<query><textquery><progression><seq>{hitchhiker} WITHIN abstract AND Y WITHIN active_flag</seq></progression></textquery></query>',1) > 0
ORDER BY SCORE(1) DESC
Here is the result for Query # 2...
score_nbr sys_id result_title
3 132017 Lean Manufacturing: A Plant Floor Guide Book Summary 72
3 137106 Managing Factory Maintenance, Second Edition 72
3 132082 Lean Pocket Guide 72
3 132083 The Toyota Way: 14 Management Principles From the World's Greatest... 72
3 132417 Lean Manufacturing: A Plant Floor Guide 72
3 132091 Breaking the Cost Barrier: A Proven Approach to Managing and... 72
3 99318 Conflicting pairs 72
3 132393 One-Piece Flow: Cell Design for Transforming the Production Process 72
3 137091 Learning to See: Value Stream Mapping to Create Value & Eliminate MUDA 72
3 137090 The Purchasing Machine: How the Top 10 Companies Use Best Practices... 72
3 137393 Passion for Manufacturing My question is, why did the scoring went all the way to 3 for ALL the results the above query returned when I used the AND clause
and added the 2nd column used in the datastore for my query condition..
Also I want to use progressive relaxation technique in the queries to use stemming & fuzzy search option too.
Help me out please....
Thanks in advance.
- Richard.

Yes, it's in the doc - it's known as the weight operator.
http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/cqoper.htm#i998379
"term*n Returns documents that contain term. Calculates score by multiplying the raw score of term by n, where n is a number from 0.1 to 10."
We're just using the operator twice as the limit on "n" is 10 (for no obvious reason I know of!). This is perfectly safe, and common practice.

Fuzzy Searches

Is there anywhere I can find the algorithm Oracle uses for the CONTEXT fuzzy search (as in SELECT surname from person_source where
contains(surname,'?Humphrey') > 0;
I would like to build a function for use outside CONTEXT that incorproates the same algorithm.
Fran

This is more information about our scenario:
We have two groups in the datastore:
concat:
1.) hierarchy:(example text) 321826 325123 543123
2.) page: Actual document text.
321826 325123 543123 represents ids in a hierarchy structure. As you move from left to right the number of times the number occurs is less so there should be less exact matches.
Example: In this index all pages have 321826 as the first value. A few pages have 543123 and all others will have some other number as the last value.
if I do this query:
contains(concat,(321826 within hierarchy ) and ('personnel') within page)
it takes about 10 seconds because it 321826 will hit all pages.
if I do this query:
contains(concat,(543123 within hierarchy ) and ('personnel') within page)
it takes only about 1 second because it 543123 will hit just a few pages.
BUT:::::::
Fuzzy search....
if I do this query:
search A.) contains(concat,(321826 within hierarchy ) and ?('personnel') within page)
it takes about 30 seconds because it 321826 will hit all pages. This is okay for performance for this.
BUT if I do this query:
search B.) contains(concat,(543123 within hierarchy ) and ?('personnel') within page)
it takes about 30 seconds even though 543123 will hit only a few pages.
This should be faster than 30 seconds because you're searching over only a fraction of material for the fuzzy search part.
We've played with different variations on the () and the '' but nothing seems to change this.
Any advice on how to make search B.) faster??
We don't understand why see the different speeds in the exact match and we DON'T see the different speeds in the fuzzy search...
I can send you some test data with the index and query scripts if you want.
Our indexes are on large tables (2,000,000) rows.
TIA
Colleen Geislinger.

Drilldown Searches and Free-Form Searches

Hi All
can you please let me know concept of Drilldown Searches and Free-Form Searches ?
I read documentation in help portal.
Can you kindly focus more light on these concepts with some example
Thanks in advance
Mugdha Kulkarni

Hi Mugdha,
MDM provides two types of searches:
Drilldown Search:
With drilldown search, you can make selections from each search tab, where each tab corresponds to a lookup field in the table
You can also make selections for each of the attributes linked to the selected category, and each of the qualifiers of a qualified table record.
Freeform Search:
With free-form search, you can perform searches on any field that does not lookup its values from a sub table.
Free-form search also allows you to do fuzzy searches with a variety of search operators
It accepts typed values for one or more fields (like a traditional DBMS query form) and a keyword search that can match keywords in any or all of the fields in a table.
At each step along the way, the system narrows down the choice of values for each search dimension to show only those that are valid given the current result set based on the previous search selections.
The result is an extremely flexible and powerful search capability, delivered through an exceptionally smooth and intuitive process.
Hope this clears your doubts.
Regards,
Rashmi Jadhav

How Fuzzy score and Score() function works in HANA?

Hi,
I read fuzzy developer guide of HANA, but i am not getting how HANA calculate score() and fuzzy score?
As per developer guide, Score() is calculate using TF/IDF, and I also try to calculate TF/IDF as per WIKI page, but it gives different values. and Score() value is changed as per x value of fuzzy(x) .
See example
select score() as sc, *
from COMPANIES2
where contains(Companyname,'IBM',fuzzy(0.7))
it returns
SC; ID; COMPANYNAME; CONTACT
0.7599999904632568; 6; IBM Corp; M. Master
and for
select score() as sc, *
from COMPANIES2
where contains(Companyname,'IBM',fuzzy(0.2))
it return
SC; ID; COMPANYNAME; CONTACT
0.16945946216583252; 2; SAP in Walldorf Corp; Master Mister
0.8392000198364258; 6; IBM Corp; M. Master
and table content of Companies2 is
ID; Companyname; contact
1; SAP Corp; Mister Master
2; SAP in Walldorf Corp; Master Mister
3; ASAP; Nister Naster
4; ASAP Corp; Mixter Maxter
5; BSAP orp; Imster Marter
6; IBM Corp; M. Master
Please provide any formula or algorithm for above.
Thanks,
Somnath A. Kadam

Hi Somnath,
It seems that the column "Companyname" has data type "SHORTTEXT" and here is the quote from SAP HANA Developer Guide Ch. 10.2.4.8 (p659)
"Text types support a more sophisticated kind of fuzzy search. Texts are tokenized (split into terms), and the fuzzy comparison is performed term by term.
When searching with 'SAP' for example, a record like 'SAP Deutschland AG & Co. KG' gets a high score, because the term 'SAP' exists in both texts. A record like 'SAPPHIRE NOW Orlando' gets a lower score, because 'SAP' is just a part of the longer term 'SAPPHIRE' (3 of 8 characters match)."
So for text columns the score calculation is much more complex than tf-idf.
As for the different fuzzy score, there is an explanation in the FAQ section ( Ch. 10.2.4.14, p736 "Is the score between request and result always stable for TEXT columns?")
Basically, for each token, its similarity score will be used to calculate the overall result only if it is higher than the threshold given in fuzzy(). Any token with a lesser similarity score will be excluded. Therefore, slight change in the threshold may influence the overall score greatly.
Here is an example.
I added id 7 "SAP ASAP" to the data you used.
Note that the similarity score between "ASAP" and "BSAP" is slightly over 0.74 and similarity score between "SAP" and "BSAP" is 0.75:
For
 select score() as sc, * from COMPANIES2 where contains(COMPANYNAME,'BSAP',fuzzy(0.74))
We get:
<...omitted...>
0.7474510073661804; 7; SAP ASAP; M. Master
Now change the threshold to 0.75 and the result is:
<...omitted...>
0.5588234663009644; 7; SAP ASAP; M. Master
ID 7 now gets a lower score because "ASAP" is excluded and only "SAP" is used to calculate the overall result.
As for tf-idf, it is used in the so-called freestyle search across multiple columns.
An example from the same guide:
 select score() as sc, * from companies2 where contains((companyname,contact), 'IBM Master', FUZZY(0.7));
Result:
0.8103122115135193; 6; IBM Corp; M. Master
Regards
Roger Tao

Oracle Text Concatenated Datastore

I have read this:
http://www.oracle.com/technology/sample_code/products/text/htdocs/concatenated_text_datastore/cdstore_readme.html
I've been trying to follos the 'Installation' section.
I've downloaded cdstore.sql but I get error ORA-00942 (table does not exist) because ctx_user_cdstore_cols does not exist (at line 618 in the file).
Indeed, the table created is 'ctx_cdstore_cols' and not 'ctx_user_cdstore_cols'.
I've changed it to ctx_cdstore_cols and now get ORA-00904 because CDSTORE_NAME is not a column of ctx_cdstores.
Anyway, I believe that this code should work as is so there is something big I must be missing.
Has anyone managed to install this package and how please?

It's not a problem with the concatenated datastore, it's about operator precedence.
If you search for 'A or B within SECTION', "within" has a higher precedence than "or", so this becomes 'A or (B within SECTION)'. What you need to say is '(A or B) within SECTION', or in your case '(BROOKS or BONDS) within name'
Hope this helps.
Roger

Concatenated datastore fuzzy searches and performance...

Similar Messages

Maybe you are looking for