Substring search with Oracle context indexes

Hi,
i would like to know if it is possibile to do a substring search with one of the obtion offer with the context indexes.
(ctxcat,ctxrule,context)
example:
i would like to search the word 'berub' in a column A in table_example.
the value in the column a are :
The betther
berube
A.berube
berub
Berub
BERUB
R berube
S tartif
Y Thibeault
the rows return should be :
berube
A.berube
berub
Berub
BERUB
R berube
A simple sql could be
select * from table_example where upper(a) like upper('%berub%' );
How i can do this same action with the context indexes and a select (catsearch, contains, matches), if it is possible?
A example will be welcome
Thanks

I know how to do explain plan.
my point is not the query i post, it's just a example.
I have many query on my production we optimize many times (they past from 3min to 15 sec with optimisation, but we want to have better result). At this point we are looking to implant the context indexes to make them more efficient.
Do make this sql more efficient we have to deal with like '%xxxxxx%' and the context indexes like to be a option, but we have to be able to do some substring search with context option.
Is it possible to do it and how?
This is my question and why i post it here. The query is just a simple example to illsutrate what i want.
Thanks to anyone who can answer my question.

Similar Messages

Oracle 10g – Performance with BIG CONTEXT indexes

I would like to use Oracle XE 10.2.0.1.0 only for the full-text searching of the files residing outside the database on the FTP server.
Recently I have found out that size of the files to be indexed is 5GB.
As I have read somewhere on this forum before size of the index should be 30-40% of the indexed text files (so with formatted documents like PDF or DOC even less).
Lets say that the CONTEXT index size over these files will be 1.5-2GB.
Number of the concurrent user will be max. 5.
I can not easily test it my self yet.
Does anybody have any experience with Oracle XE or other Oracle Database edition performance with the CONTEXT index this BIG?
Will Oracle XE hardware resources license limitation be sufficient to handle one CONTEXT indexe this BIG?
(Oracle XE license limitations: 1 GB RAM and 1 CPU)
Regards.

That depends on at least three things:
(1) what is the range of words that will appear in the document set (wide range of documents = smaller resultsets = better performance)
(2) how precise are the user's queries likely to be (more precise = smaller resultsets = better performance)
(3) how many milliseconds are your users willing to wait for results
So, unfortunately, you'll probably have to experiment a bit before you'll know...

Problem with oracle text indexes during import

We have a 9.2.0.6 database using oracle text features on a server with windows 2000 5.00.2195 SP4.
We need to export its data ( user ARIANE only ) and then import the result into another 9.2.0.6 database.
The import never comes to an end.
The only way to make it work is to use the "indexes=n" clause.
Then ( without the indexes ), we tried to create manually the oracle text indexes.
We get this error :
CREATE INDEX ARIANE.DOSTEXTE_DTTEXTE_CTXIDX ON ARIANE.DOSTEXTE (DTTEXTE)
INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS('lexer ariane_lexer stoplist ctxsys.default_stoplist storage ariane_storage');
ORA-29855: erreur d'exécution de la routine ODCIINDEXCREATE
ORA-20000: Erreur Oracle Text :
DRG-10700: préférence inexistante : ariane_lexer
ORA-06512: à "CTXSYS.DRUE", ligne 157
ORA-06512: à "CTXSYS.TEXTINDEXMETHODS", ligne 219
We then tried to uninstall Oracle text and install it ( My Oracle Support [ID 275689.1] ). The index creation above still fails.
We also checked our Text installation and setup through My Oracle Support FAQ ( ID 153264.1 ) and everything seems ok.
Do we have to create some ARIANE* lexer preferences through specific pl/sql ( ctx_report* ? ) before importing anything from the ARIANE user ?
What do we need to do exactly when exporting data with oracle text features from one database to another given we used to restore the database through a copy of the entire windows files ?
Is there a specific order to follow to succeed an import ?
Thank you for your help.
Jean-michel, Nemours, FRANCE

Hi
index preferences are not exported, ie ariane_lexer + ariane_storage, only the Text index metada, thus the DRG-10700 from index DDL on target/import DB.
I recommend to use ctx_report.create_index_script on source/export DB, see Doc ID 189819.1 for details, export with indexes=N and then create text indexes manually after data import.
-Edwin

Performance issue with Oracle Text index

Hi Experts,
We are on Oracle 11.2..0.3 on Solaris 10. I have implemented Oracle Text in our environment and I am facing a strange performance issue that is happening in our environment.
One sql having CONTAINS clause is taking forever - more than 20 minutes and still does not complete. This sql has a contains clause and an exists clause and a not exists clause.
Now if I remove the exists clause and a not exists clause , it completes fast. but with those two clauses it is just taking forever. It is late night so i am not able to post the table and sql query details and will do so tomorrow but based on this general description, are there any pointers for me to review?
sql query doing fine:
SELECT
    U.CLNT_OID, U.USR_OID, S.MAILADDR
FROM
    access_usr U
    INNER JOIN access_sia S
        ON S.USR_OID = U.USR_OID AND S.CLNT_OID = U.CLNT_OID
    WHERE U.CLNT_OID = 'ABCX32S'
    AND CONTAINS(LAST_NAME , 'TO%' ) >0
--sql query that hangs forever:
SELECT
    U.CLNT_OID, U.USR_OID, S.MAILADDR
FROM
    access_usr U
    INNER JOIN access_sia S
        ON S.USR_OID = U.USR_OID AND S.CLNT_OID = U.CLNT_OID
    WHERE U.CLNT_OID = 'ABCX32S'
    AND CONTAINS(LAST_NAME , 'TO%' ) >0
and exists (--one clause here wiht a few table joins)
and not exists (--one clause here wiht a few table joins);
--Now another strange thing I found is if instead of 'TO%' in this sql, if I were to use 'ZZ%' or 'L1%' it works fast but for 'TO%' it goes slow with those two exists not exists clauses!
I will be most thankful for the inputs.
OrauserN

Hi Barbara,
First of all, thanks a lot for reviewing the issue.
Unluckily making the change to empty_stoplist did not work out. I am today copying the entire sql here that has this issue and will be most thankful for more insights/pointers on what can be done.
Here is the entire sql:
SELECT U.CLNT_OID,
       U.USR_OID,
       S.EMAILADDRESS,
       U.FIRST_NAME,
       U.LAST_NAME,
       S.JOBCODE,
       S.LOCATION,
       S.DEPARTMENT,
       S.ASSOCIATEID,
       S.ENTERPRISECOMPANYCODE,
       S.EMPLOYEEID,
       S.PAYGROUP,
       S.PRODUCTLOCALE
FROM    ACCESS_USR U
       INNER JOIN
          ACCESS_SIA S
       ON S.USR_OID = U.USR_OID AND S.CLNT_OID = U.CLNT_OID
WHERE     U.CLNT_OID = 'G39NY3D25942TXDA'
       AND EXISTS
              (SELECT 1
                 FROM ACCESS_USR_GROUP_XREF UGX
                      INNER JOIN ACCESS_GROUP RELG
                         ON     RELG.CLNT_OID = UGX.CLNT_OID
                            AND RELG.GROUP_OID = UGX.GROUP_OID
                      INNER JOIN ACCESS_GROUP G
                         ON     G.CLNT_OID = RELG.CLNT_OID
                            AND G.GROUP_TYPE_OID = RELG.GROUP_TYPE_OID
                WHERE     UGX.CLNT_OID = U.CLNT_OID
                      AND UGX.USR_OID = U.USR_OID
                      AND G.GROUP_OID = 920512943
                      AND UGX.INCLUDED = 1)
       AND NOT EXISTS
                  (SELECT 1
                     FROM    ACCESS_USR_GROUP_XREF UGX
                          INNER JOIN
                             ACCESS_GROUP G
                          ON     G.CLNT_OID = UGX.CLNT_OID
                             AND G.GROUP_OID = UGX.GROUP_OID
                    WHERE     UGX.CLNT_OID = U.CLNT_OID
                          AND UGX.USR_OID = U.USR_OID
                          AND G.GROUP_OID = 920512943
                          AND UGX.INCLUDED = 1)
       AND CONTAINS (U.LAST_NAME, 'Bon%') > 0;
Like I said before if the EXISTS and NOT EXISTS clause are removed it works in sub-second. But with those EXISTS and NOT EXISTS CLAUSE IT TAKES ANY WHERE FROM 25 minutes to more than one hour.
NOte also that it was not TO% but Bon% in the CONTAINS clause that is giving the issue - sorry that was wrong on my part.
Also please see below the ORACLE TEXT index defined on the table ACCESS_USER:
--definition of preferences used in the index:
SET SERVEROUTPUT ON size unlimited
WHENEVER SQLERROR EXIT SQL.SQLCODE
DECLARE
   v_err       VARCHAR2 (1000);
   v_sqlcode   NUMBER;
   v_count     NUMBER;
BEGIN
   ctxsys.ctx_ddl.create_preference ('cust_lexer', 'BASIC_LEXER');
   ctxsys.ctx_ddl.set_attribute ('cust_lexer', 'base_letter', 'YES'); -- removes diacritics
EXCEPTION
   WHEN OTHERS
   THEN
      v_err := SQLERRM;
      v_sqlcode := SQLCODE;
      v_count := INSTR (v_err, 'DRG-10701');
      IF v_count > 0
      THEN
         DBMS_OUTPUT.put_line (
            'The required preference named CUST_LEXER with BASIC LEXER is already set up');
      ELSE
         RAISE;
      END IF;
END;
DECLARE
   v_err       VARCHAR2 (1000);
   v_sqlcode   NUMBER;
   v_count     NUMBER;
BEGIN
   ctxsys.ctx_ddl.create_preference ('cust_wl', 'BASIC_WORDLIST');
   ctxsys.ctx_ddl.set_attribute ('cust_wl', 'SUBSTRING_INDEX', 'true'); -- to improve performance
EXCEPTION
   WHEN OTHERS
   THEN
      v_err := SQLERRM;
      v_sqlcode := SQLCODE;
      v_count := INSTR (v_err, 'DRG-10701');
      IF v_count > 0
      THEN
         DBMS_OUTPUT.put_line (
            'The required preference named CUST_WL with BASIC WORDLIST is already set up');
      ELSE
         RAISE;
      END IF;
END;
--now below is the code of the index:
CREATE INDEX ACCESS_USR_IDX3 ON ACCESS_USR
(FIRST_NAME)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('LEXER cust_lexer WORDLIST cust_wl SYNC (ON COMMIT)');
CREATE INDEX ACCESS_USR_IDX4 ON ACCESS_USR
(LAST_NAME)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('LEXER cust_lexer WORDLIST cust_wl SYNC (ON COMMIT)');
The strange thing is that, like I said, If I remove the exists clause the query returns very fast. Also if I modify the query to use only one NOT EXISTS clause and remove the other EXISTS clause it returns in less than one second. Also if I remove the EXISTS clause and use only the NOT EXISTS clause it returns in less than 4 seconds. But with both clauses it runs forever!
When I tried to get dbms_xplan.display_cursor to get the query plan (for the case of both exists and not exists clause in the query), it said that previous statement's sql id was 0 or something like that so that I was not able to see the query plan. I will keep trying to get this plan (it takes 25 minutes to one hour each time but will get this info soon). Again any pointers are most helpful.
Regards
OrauserN

Smart search with oracle SES

Hi,
I'm working on a demo and I'd like to set up smart search. I'm trying to make it work with Oracle Secure Enterprise Search (google onebox isn't free), but I'm having a hard time finding some documentation on-line, and the configuration process is unclear to me.
Has someone already done this? Could you please give me a link, or guide me through Oracle SES Search Mode Configuration ?
Thanks,
Cyril

Hi,
I found a good tutorial: http://st-curriculum.oracle.com/tutorial/SESAdminTutorial/index.htm
Thanks for reading ;) !
Cyril

Offline Search with Oracle Documentation Library?

..I have download Oracle Database Documentation Library 10g for reading at home..
It's all fine! But i can not use the Search function offline.
(see Quick Search or Tab Search. Enter a word or phrase. Button Search)
It must offline, because i have not inernet connection at home for online..
Question: Is it posible or somehow can I config for Search, but offline!

maybe you can use google desktop ...

Oracle 9.2 ConText index alternate_spelling problem

Hello everybody!
I'm having problems with a ConText index in Oracle 9.2, using the alternative_spelling parameter...
Here is my code
CREATE TABLE U2000P.TEST_FICHIER_INT
(ID NUMBER(6) NOT NULL,
NOM_FICHIER VARCHAR2(90) NULL,
MIME VARCHAR2(90) NULL,
FICHIER BLOB DEFAULT empty_blob(),
LNG VARCHAR2(3) NULL,
KEY_WORDS VARCHAR2(500) NULL,
CONSTRAINT PK_TEST_FICHIER_INT PRIMARY KEY (ID)
EXECUTE CTX_DDL.CREATE_PREFERENCE('ENGLISH_LEXER','BASIC_LEXER');
EXECUTE CTX_DDL.SET_ATTRIBUTE('ENGLISH_LEXER', 'INDEX_THEMES', 'YES');
EXECUTE CTX_DDL.SET_ATTRIBUTE('ENGLISH_LEXER', 'THEME_LANGUAGE', 'ENGLISH');
EXECUTE CTX_DDL.SET_ATTRIBUTE('ENGLISH_LEXER', 'BASE_LETTER', 'NO');
EXECUTE CTX_DDL.CREATE_PREFERENCE('FRENCH_LEXER','BASIC_LEXER');
EXECUTE CTX_DDL.SET_ATTRIBUTE('FRENCH_LEXER', 'INDEX_THEMES', 'NO');
EXECUTE CTX_DDL.SET_ATTRIBUTE('FRENCH_LEXER', 'BASE_LETTER', 'NO');
EXECUTE CTX_DDL.CREATE_PREFERENCE('GERMAN_LEXER','BASIC_LEXER');
EXECUTE CTX_DDL.SET_ATTRIBUTE('GERMAN_LEXER', 'INDEX_THEMES', 'NO');
EXECUTE CTX_DDL.SET_ATTRIBUTE('GERMAN_LEXER', 'BASE_LETTER', 'NO');
EXECUTE CTX_DDL.SET_ATTRIBUTE('GERMAN_LEXER', 'ALTERNATE_SPELLING', 'GERMAN');
EXECUTE CTX_DDL.CREATE_PREFERENCE('GLOBAL_LEXER','MULTI_LEXER');
EXECUTE CTX_DDL.ADD_SUB_LEXER('GLOBAL_LEXER', 'FRENCH', 'FRENCH_LEXER', '1');
EXECUTE CTX_DDL.ADD_SUB_LEXER('GLOBAL_LEXER', 'DEFAULT', 'GERMAN_LEXER');
EXECUTE CTX_DDL.ADD_SUB_LEXER('GLOBAL_LEXER', 'ENGLISH', 'ENGLISH_LEXER', '5');
CREATE INDEX IDX_F_TEST_FICHIER_INT
ON TEST_FICHIER_INT(FICHIER)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('DATASTORE CTXSYS.DIRECT_DATASTORE
FILTER CTXSYS.INSO_FILTER
LEXER GLOBAL_LEXER LANGUAGE COLUMN LNG');
In one of the files that I load, I have the word 'paläontologie'
Here are my searches
select nom_fichier, score(1) from test_fichier_int where contains(fichier, 'paläontologie', 1) > 0;
-> no rows selected
select nom_fichier, score(1) from test_fichier_int where contains(fichier, 'palaontologie', 1) > 0;
-> Finds my document
Why does the first search not work?
If I don't use the 'alternate_spelling' parameter, both searches don't work, why is that???
Thanks in advance for your help
Best regards
Neil.

I found my error!!!! Thanks Neil... lol
In fact, it's my SQL*Plus that must be badly configured, and I am having problems with accentuated characters... If I search through a browser, it works!!!
Sorry about that...
Best regards
Neil.

Searching using Oracle Text instead of LIKE '%'

Hello all,
I hope you help me in this:
I have a table looks like this
create table subscribers (
id numer(10),
first_name varchar2(30),
father_name varchar2(30),
grandfather_name varchar2(30),
last_name varchar2(30))
The application is built using Oracle Forms. Many times, the end users are not so sure of the spelling of the name, therefore they use the "%" wildcard with name fields. This will be reflected to the queries the application will send them to the Oracle Server.
We have the following queries
1) select *
from subscribers
where last_name like '%family_name%';
2) select *
from subscribers
where last_name like 'family_name%';
3) select *
from subscribers
where last_name like '%family_name%' and first_name like '%first_name%';
4) select *
from subscribers
where last_name like 'family_name%' and first_name like 'first_name%';
As well as searching on the father_name and grandfather_name fields. But most of the search are on the first_name and the last_name.
These queries are killing the server since we have millions of records. BTree indexes will not help here because of the LIKE and the "%"
I am thinking to use Oracle Text here, but I am not sure whether I have to go for a CONTEXT index on each individual column, or I can use the MULTI_COLUMN_DATASTORE indexing.
Any idea will be appreciated

The ctxcat index and catsearch operator are generally intended for usage with one text column and one or more columns of structured data. You would have to pick just one of your columns as the text column and the others as structured columns. I would be more inclined to use the multi_column_datastore with a context index and contains operator, so that you can search all of your columns as text columns.

Slow performance for context index

Hi, I'm just a newbie here in forum and I would like ask for your expertise about oracle context index. I have my sql and I'm using wild character for searching '%%' .
I used the sql below with a context index (ctxsys.context) in order to avoid full table scan for wild character searching.
SELECT BODY_ID
                    TITLE, trim(upper(title)) as title_sort,
                    SUM(JAN) as JAN,
                    SUM(FEB) as FEB,
                    SUM(MAR) as MAR,
                    SUM(APR) as APR,
                    SUM(MAY) as MAY,
                    SUM(JUN) as JUN,
                    SUM(JUL) as JUL,
                    SUM(AUG) as AUG,
                    SUM(SEP) as SEP,
                    SUM(OCT) as OCT,
                    SUM(NOV) as NOV,
                    SUM(DEC) AS DEC
                    FROM APP_REPCBO.CBO_TURNAWAY_REPORT
                    WHERE contains (BODY_ID,'%240103%') >0 and
PERIOD BETWEEN '1201' AND '1212'
                    GROUP BY BODY_ID, trim(upper(title))
But i was surprised that performance was very slow, and when I try this on explain plan time of performance almost consume 2 hours.
plan FOR succeeded.
PLAN_TABLE_OUTPUT
Plan hash value: 814472363
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1052K| 97M| | 805K (1)| 02:41:12 |
| 1 | HASH GROUP BY | | 1052K| 97M| 137M| 805K (1)| 02:41:12 |
|* 2 | TABLE ACCESS BY INDEX ROWID| CBO_TURNAWAY_REPORT | 1052K| 97M| | 782K (1)| 02:36:32 |
|* 3 | DOMAIN INDEX | CBO_REPORT_BID_IDX | | | | 663K (0)| 02:12:41 |
Predicate Information (identified by operation id):
2 - filter("PERIOD">='1201' AND "PERIOD"<='1212')
3 - access("CTXSYS"."CONTAINS"("BODY_ID",'%240103%')>0)
16 rows selected
oracle version: Oracle Database 11g Release 11.1.0.7.0 - 64bit Production
Thanks,
Zack

Hi Rod,
Thanks for the reply, yes I already made gather stats on that table, including rebuild index.
but its so strange when I use another body_id the performance will vary.
SQL> EXPLAIN PLAN FOR
2 SELECT BODY_ID
3 TITLE, trim(upper(title)) as title_sort,
4 SUM(JAN) as JAN,
5 SUM(FEB) as FEB,
6 SUM(MAR) as MAR,
7 SUM(APR) as APR,
8 SUM(MAY) as MAY,
9 SUM(JUN) as JUN,
10 SUM(JUL) as JUL,
11 SUM(AUG) as AUG,
12 SUM(SEP) as SEP,
13 SUM(OCT) as OCT,
14 SUM(NOV) as NOV,
15 SUM(DEC) as DEC
16 FROM WEB_REPCBO.CBO_TURNAWAY_REPORT
17 WHERE contains (BODY_ID,'%119915311%')> 0 and
18 PERIOD BETWEEN '1201' AND '1212'
19 GROUP BY BODY_ID, trim(upper(title));
SELECT * FROM TABLE (dbms_xplan.display);
Explained.
SQL>
Explained.
SQL>
PLAN_TABLE_OUTPUT
Plan hash value: 814472363
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 990 | 96030 | 1477 (1)| 00:00:18 |
| 1 | HASH GROUP BY | | 990 | 96030 | 1477 (1)| 00:00:18 |
|* 2 | TABLE ACCESS BY INDEX ROWID| CBO_TURNAWAY_REPORT | 990 | 96030 | 1475 (0)| 00:00:18 |
|* 3 | DOMAIN INDEX | CBO_REPORT_BID_IDX | | | 647 (0)| 00:00:08 |
Predicate Information (identified by operation id):
2 - filter("PERIOD">='1201' AND "PERIOD"<='1212')
3 - access("CTXSYS"."CONTAINS"("BODY_ID",'%119915311%')>0)
16 rows selected.

Searching for words with dot in the context index.

Hi ,
We have a context index that uses a following BASIC_LEXER
drop_preference (c_lexer_pref);
ctx_ddl.create_preference(c_lexer_pref, 'basic_lexer');
ctx_ddl.set_attribute(c_lexer_pref, 'base_letter', 'YES');
ctx_ddl.set_attribute(c_lexer_pref, 'whitespace', '()"`/');
ctx_ddl.set_attribute (c_lexer_pref, 'index_stems', 'english');
When user searches for "OAR" , oracle does not find "O.A.R'"
Can someone help me understand why oracle is not able to find the "O.A.R" when OAR is being searched.
Thanks for your help.

You need to use skipjoins to tell it to skip the periods, as shown below.
scott@ORA92> drop table your_table
2 /
Table dropped.
scott@ORA92> exec ctx_ddl.drop_preference ('c_lexer_pref')
PL/SQL procedure successfully completed.
scott@ORA92> begin
2 ctx_ddl.create_preference ('c_lexer_pref', 'basic_lexer');
3 ctx_ddl.set_attribute ('c_lexer_pref', 'base_letter', 'YES');
4 ctx_ddl.set_attribute ('c_lexer_pref', 'whitespace', '()"`/');
5 ctx_ddl.set_attribute ('c_lexer_pref', 'index_stems', 'english');
6 ctx_ddl.set_attribute ('c_lexer_pref', 'skipjoins', '.');
7 end;
8 /
PL/SQL procedure successfully completed.
scott@ORA92> create table your_table
2 (your_column varchar2(30))
3 /
Table created.
scott@ORA92> insert into your_table values ('OAR')
2 /
1 row created.
scott@ORA92> insert into your_table values ('O.A.R')
2 /
1 row created.
scott@ORA92> create index your_index
2 on your_table (your_column)
3 indextype is ctxsys.context
4 parameters ('lexer c_lexer_pref')
5 /
Index created.
scott@ORA92> select * from your_table
2 where contains (your_column, 'OAR') > 0
3 /
YOUR_COLUMN
O.A.R
OAR
scott@ORA92>

Problem with blob column index created using Oracle Text.

Hi,
I'm running Oracle Database 10g 10.2.0.1.0 standard edition one, on windows server 2003 R2 x64.
I have a table with a blob column which contains pdf document.
Then, I create an index using the following script so that I can do fulltext search using Oracle Text.
CREATE INDEX DMCS.T_DMCS_FILE_DF_FILE_IDX ON DMCS.T_DMCS_FILE
(DF_FILE)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('DATASTORE CTXSYS.DEFAULT_DATASTORE');
However, the index is not searchable and I check the following tables created by database for my index and found them to be empty as well !!
DR$T_DMCS_FILE_DF_FILE_IDX$I
DR$T_DMCS_FILE_DF_FILE_IDX$K
DR$T_DMCS_FILE_DF_FILE_IDX$N
DR$T_DMCS_FILE_DF_FILE_IDX$R
I wonder what's wrong with it.
My user has been granted the ctx_app role and I have other tables that store plain text which I use Oracle Text are fine. I even output the blob column and save as pdf file and they are fine.
However the database seems like not indexing my blob column although the index can be created without error.
Please advise.
Really appreciate anyone who can help.
Thank you.

The situation is I have already loaded a few pdf document into the table's blob column.
After I create the Oracle text index on this blob column, I find the system generated index tables listed in my earlier posting are empty, except for the 4th table.
Normally we'll see words inside the table where those are the words indexed by oracle text on my document.
As a result, no matter how i search for the index using select statement with contains operator, it will not give me any result.
I feel weird why the blob is not indexed. The content of the blob are actually valid because I tested this by export the content back to pdf and I can still view and search within the pdf.
Regards,
Jap.

Wildcard search with catsearch on Oracle 10g

Wildcard search problem is being discussed many times, however the solutions provided did not solve the problem.
I am using catsearch to take its advantages and return results at a faster rate. Aim is to simulate a like '%abc%' using catsearch in 10g.
Following are the steps to reproduce the problem.
CREATE TABLE test
   (name VARCHAR2(60))
INSERT ALL
INTO test VALUES ('VCL Master')
INTO test VALUES ('VCL Master S.')
INTO test VALUES ('VCL Master S.A. Compartment 1')
INTO test VALUES ('VCL MasterS.A. Compartment 2')
INTO test VALUES ('VCL Master., S.A.')
INTO test VALUES ('KCL Master Corp.')
SELECT * FROM DUAL
begin
ctx_ddl.create_preference('Jylex', 'basic_lexer');
ctx_ddl.set_attribute('Jylex','SKIPJOINS','.-=[];\,/~!@#$%^&*+{}:"|<>?`§´¨½¼¾¤£€©®''');
end;
begin
    Ctx_Ddl.Create_Preference('wildcard_Jylex', 'BASIC_WORDLIST');
    ctx_ddl.set_attribute('wildcard_Jylex', 'wildcard_maxterms', 15000) ;
end;
CREATE INDEX test_inx ON test(NAME)
INDEXTYPE IS CTXSYS.CTXCAT
PARAMETERS('STOPLIST CTXSYS.EMPTY_STOPLIST
LEXER     Jylex
WORDLIST wildcard_Jylex')
problem1:
select * from test where catsearch(name, 'CL Mast*', NULL)>0 --- no results returned
problem 2:
when I run the following query on the actual column of my table with 3 million records,
select * from XXXX where catsearch (NAME, 'VCL Master S*', NULL) > 0
I get the following error.
DRG-51030: wildcard query expansion resulted in too many terms
I have used () and "". Did a lot of R&D and still I am not able to a solution that resolves both of the problems.
Suggestions will be much appreciated.
Thanks.

This post has nothing to do with Oracle Objects. Perhaps some moderator will move it to the Oracle Text sub-forum/space, where it belongs.
Your search for 'CL Mast*' did not find any rows because there aren't any rows that match that criteria. If you want to return rows that have that string, then you need to add a leading wildcard. Technically CTXCAT indexes and CATSEARCH do not support leading wildcards, but you can use two asterisks as a workaround, so you can search for '**CL Mast*'.
Your wildcard_maxterms is set to 15000, so if there are more than 15000 words that begin with 's' in your 3 million records, then it will result in an error. If you upgrade to Oracle 11g, then you can set the wildcard_maxterms higher or to unlimited by setting it to 0, but that may cause your system to run out of memory. Most applications trap these errors and return a simple message to the user, indicating that the search for s* is too broad and to narrow the search.
I would use a CONTEXT index with CONTAINS instead of CTXCAT and CATSEARCH. It supports leading wildcards and you can index substrings for faster searches. If you want it to be like the CTXCAT index then you can make it transactional.

Stop words handling with CONTEXT index - weird behavior

I have a context index with the following output from the report (describe index report).
CTX_REPORT.DESCRIBE_INDEX('KWTI10569_20121010115054')
===========================================================================
INDEX DESCRIPTION
===========================================================================
index name: "METCALF_T"."KWTI10569_20121010115054"
index id: 1524
index type: context
base table: "METCALF_T"."KWTD10569_20121010115054"
primary key column:
text column: MESSAGE_CONTENT
text column type: RAW(2000)
language column:
format column: FMT
charset column: CSET
===========================================================================
INDEX OBJECTS
===========================================================================
datastore: DIRECT_DATASTORE
filter: CHARSET_FILTER
charset: UTF8
section group: NULL_SECTION_GROUP
lexer: BASIC_LEXER
punctuations: .?!
skipjoins: _-"'`~!@#$%^&*()+=|}{[]\:;<>?/,
continuation: \-
index_stems: NONE
wordlist: BASIC_WORDLIST
stemmer: ENGLISH
fuzzy_match: GENERIC
stoplist: BASIC_STOPLIST
stop_word: how
stop_word: however
stop_word: i
stop_word: if
<trimmed for brevity of message......but all default stop words provided by Oracle has been added here>
storage: BASIC_STORAGE
i_table_clause: tablespace TEXT_INDEX storage (initial 10M next 10M)
k_table_clause: tablespace TEXT_INDEX storage (initial 10M next 10M)
r_table_clause: tablespace TEXT_INDEX storage (initial 1M) lob (data) store as (cache)
n_table_clause: tablespace TEXT_INDEX storage (initial 1M)
i_index_clause: tablespace TEXT_INDEX storage (initial 1M) compress 2
DB: 10g (10.2.0.4)
DB characterset: UTF8
Distinct tokens from index:
SQL> select distinct token_text from dr$KWTI10569_20121010115054$i;
TOKEN_TEXT
BLAH
EXPIRE
OFFER
My text content:
SQL>
SQL> select distinct utl_raw.cast_to_varchar2(message_content) from KWTD10569_20121010115054;
UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT)
blah blah offer will expire blah blah
offer expire
this offer shall expire
offer to expire
offer expire
blah blah offer expire blah blah
blah blah offer to expire blah blah
blah blah offer expire blah blah
offer will expire
blah blah this offer shall expire blah blah
10 rows selected.
Now, when i perform some contain queries i get some behavior that i cant understand.
When i search for "this offer will expire" i dont get every row (10 rows) - why is that?
SQL> select UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT) from KWTD10569_20121010115054 where contains(message_content,'this offer will expire')>0;
UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT)
blah blah offer will expire blah blah
this offer shall expire
blah blah offer to expire blah blah
blah blah this offer shall expire blah blah
Also, when i search for "offer expire" i get the following
SQL> select UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT) from KWTD10569_20121010115054 where contains(message_content,'offer expire')>0;
UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT)
offer expire
blah blah offer expire blah blah
blah blah offer expire blah blah
offer expire
I was thinking that the stop words will be ignored while searching in context grammar, so i would get all my rows back? Isnt that correct?
What i really want to achieve is that all these stop words are stripped from the content AND the keywords when i run the query and i get 100% matches. Any pointers on how that can be accomplished?

Roger-
Thanks again. Is there any place in Oracle doc that documents these two facts?
Please see the example below, does the number of words also matter? My search phrase was "the offer will expire" but why is that i didnt get rows like "offer to expire" back?
SQL> select distinct utl_raw.cast_to_varchar2(message_content) from KWTD10569_20121010115054;
UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT)
offer expire
blah blah offer expire blah blah
blah blah offer will expire blah blah
this offer shall expire
offer expire
offer to expire
blah blah offer to expire blah blah
blah blah offer expire blah blah
offer will expire
blah blah this offer shall expire blah blah
10 rows selected.
SQL> select UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT) from KWTD10569_20121010115054 where contains(message_content,'the offer will expire')>0;
UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT)
blah blah offer will expire blah blah
this offer shall expire
blah blah offer to expire blah blah
blah blah this offer shall expire blah blah

Oracle Text - CTX Context Index Soundex Problem

Hi,
I'm running into a problem with Oracle Text when searching using the ! (soundex) option. I've created a simple test example to highlight the issue.
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit
Windows 2008 Server 64-bit
create table test_tab (test_col varchar2(200));
insert all
into test_tab (test_col) values ('ab-tönes')
into test_tab (test_col) values ('ab-tones')
into test_tab (test_col) values ('abtones')
into test_tab (test_col) values ('ab tones')
into test_tab (test_col) values ('ab-tanes')
select * from dual
select * from test_tab
begin
      ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
      ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&''');
      ctx_ddl.set_attribute('test_lex1','base_letter','YES');
      -- ctx_ddl.set_attribute('test_lex1','skipjoins','-');
end;
create index test_idx on test_tab (test_col)
indextype is ctxsys.context
    parameters
      ('lexer        test_lex1'
select token_text from dr$test_idx$i;
TOKEN_TEXT
AB
ABTONES
TANES
TONES
select * from test_tab where contains (test_col, '!ab tones') > 0;
TEST_COL
ab-tönes
ab-tones
ab tones
select * from test_tab where soundex(test_col) = soundex('ab tones');
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
So my question is, can anyone suggest an approach whereby I can get the Oracle Text Context index (or CTXCAT index if it's more appropriate) to return all 5 rows like the simple Soundex is doing?
I can't really use soundex as this search query will form part of a search screen for a multi-language application. Soundex is limited to English sounding words, so I need the solution to be able to compare strings that may not "sound" English.
It must be an attribute of the BASIC_LEXER, and I've tried skipjoins, start/end-joins, stop lists, but I just cannot get the Soundex feature of Oracle Text to function like the SOUNDEX() function!
Looking at how the tokens are stored dr$test_idx$i I need Oracle Text to almost concat 'AB' and 'TONES' to search as a single string.
Any help greatly appreciated.
Thanks,

I am not getting the same problem that you are getting with the umlat, but I don't see what is different. Please post the result of:
select ctx_report.create_index_script ('test_idx') from dual;
Here are the results on my system. Perhaps you can spot the difference. I added an empty_stoplist, so that it won't print out a long list of stopwords.
SCOTT@orcl12c> create table test_tab (test_col    varchar2(200))
2 /
Table created.
SCOTT@orcl12c> insert all
2    into test_tab (test_col) values ('ab-tönes')
3    into test_tab (test_col) values ('ab-tones')
4    into test_tab (test_col) values ('abtones')
5    into test_tab (test_col) values ('ab tones')
6    into test_tab (test_col) values ('ab-tanes')
7 select * from dual
8 /
5 rows created.
SCOTT@orcl12c> select * from test_tab
2 /
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
5 rows selected.
SCOTT@orcl12c> begin
2    ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
3    ctx_ddl.set_attribute('test_lex1','base_letter','YES');
4 end;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> create or replace procedure test_proc
2    (p_rowid in          rowid,
3      p_clob    in out nocopy clob)
4 as
5 begin
6    select replace (translate (test_col, '/\|-_+&''', '      '), ' ', '')
7    into   p_clob
8    from   test_tab
9    where rowid = p_rowid;
10 end test_proc;
11 /
Procedure created.
SCOTT@orcl12c> show errors
No errors.
SCOTT@orcl12c> begin
2    ctx_ddl.create_preference ('test_ds', 'user_datastore');
3    ctx_ddl.set_attribute ('test_ds', 'procedure', 'test_proc');
4 end;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> create index test_idx on test_tab (test_col)
2    indextype is ctxsys.context
3    parameters
4       ('lexer    test_lex1
5         datastore    test_ds
6         stoplist    ctxsys.empty_stoplist')
7 /
Index created.
SCOTT@orcl12c> select token_text from dr$test_idx$i
2 /
TOKEN_TEXT
ABTANES
ABTONES
2 rows selected.
SCOTT@orcl12c> variable search_string varchar2(100)
SCOTT@orcl12c> exec :search_string := 'ab tones'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> select * from test_tab
2 where contains
3            (test_col,
4             '!' || replace (:search_string, ' ', ' !') ||
5             ' or !' || replace (:search_string, ' ', '')) > 0
6 /
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
5 rows selected.
SCOTT@orcl12c> exec :search_string := 'abtones'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> /
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
5 rows selected.
SCOTT@orcl12c> exec :search_string := 'ab tönes'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> /
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
5 rows selected.
SCOTT@orcl12c> select ctx_report.create_index_script ('test_idx') from dual
2 /
CTX_REPORT.CREATE_INDEX_SCRIPT('TEST_IDX')
begin
ctx_ddl.create_preference('"TEST_IDX_DST"','USER_DATASTORE');
ctx_ddl.set_attribute('"TEST_IDX_DST"','PROCEDURE','"SCOTT"."TEST_PROC"');
end;
begin
ctx_ddl.create_preference('"TEST_IDX_FIL"','NULL_FILTER');
end;
begin
ctx_ddl.create_section_group('"TEST_IDX_SGP"','NULL_SECTION_GROUP');
end;
begin
ctx_ddl.create_preference('"TEST_IDX_LEX"','BASIC_LEXER');
ctx_ddl.set_attribute('"TEST_IDX_LEX"','BASE_LETTER','YES');
end;
begin
ctx_ddl.create_preference('"TEST_IDX_WDL"','BASIC_WORDLIST');
ctx_ddl.set_attribute('"TEST_IDX_WDL"','STEMMER','ENGLISH');
ctx_ddl.set_attribute('"TEST_IDX_WDL"','FUZZY_MATCH','GENERIC');
end;
begin
ctx_ddl.create_stoplist('"TEST_IDX_SPL"','BASIC_STOPLIST');
end;
begin
ctx_ddl.create_preference('"TEST_IDX_STO"','BASIC_STORAGE');
ctx_ddl.set_attribute('"TEST_IDX_STO"','R_TABLE_CLAUSE','lob (data) store as (
cache)');
ctx_ddl.set_attribute('"TEST_IDX_STO"','I_INDEX_CLAUSE','compress 2');
end;
begin
ctx_output.start_log('TEST_IDX_LOG');
end;
create index "SCOTT"."TEST_IDX"
on "SCOTT"."TEST_TAB"
      ("TEST_COL")
indextype is ctxsys.context
parameters('
    datastore       "TEST_IDX_DST"
    filter          "TEST_IDX_FIL"
    section group   "TEST_IDX_SGP"
    lexer           "TEST_IDX_LEX"
    wordlist        "TEST_IDX_WDL"
    stoplist        "TEST_IDX_SPL"
    storage         "TEST_IDX_STO"
begin
ctx_output.end_log;
end;
1 row selected.

Help with creating oracle text index on 2 columns with partial html data

Hi,
I need to create an oracle text index on 2 columns.
TITLE - varchar(255) = contains plain text data
DESCRIPTION - CLOB = contains partial HTML data
This is what I created.
begin
ctx_ddl.create_preference ('Title_Description_Pref', 'MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute('Title_Description_Pref', 'columns', 'TITLE, DESCRIPTION');
end;
begin
ctx_ddl.create_preference ('bid_lexer', 'BASIC_LEXER');
ctx_ddl.set_attribute('bid_lexer', 'index_stems', 'ENGLISH');
ctx_ddl.create_section_group('htmgroup', 'HTML_SECTION_GROUP');
end;
create index Bid_Title_Index on Bid(title) indextype is ctxsys.context parameters ('LEXER bid_lexer sync (every "sysdate+(1/24)")');
create index Bid_Title_Desc_Index on Bid(description) indextype is ctxsys.context parameters ('LEXER bid_lexer DATASTORE Title_Description_Pref sync (every "sysdate+(1/24)") filter ctxsys.null_filter section group htmgroup');
The problem is when I do a CONTAINS(description, '$(auction)')>0. I get results where the descriptions have the "auction" word (which is correct). But, the results also returned rows where the search word is inside an IMG tag. e.g. <img src="http://auction.de/120483" alt="Auction Logo"/>.
What I would like is to exclude rows where the search word is inside HTML tag attributes, results expected are rows having <a>Auction</a> or <p>For Auction</p> ... etc. Basically stripping the html tags and leave the text contents.
I'd appreciate some input.
Thanks,
Amiel

Hi,
I need to create an oracle text index on 2 columns.
TITLE - varchar(255) = contains plain text data
DESCRIPTION - CLOB = contains partial HTML data
This is what I created.
begin
ctx_ddl.create_preference ('Title_Description_Pref', 'MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute('Title_Description_Pref', 'columns', 'TITLE, DESCRIPTION');
end;
begin
ctx_ddl.create_preference ('bid_lexer', 'BASIC_LEXER');
ctx_ddl.set_attribute('bid_lexer', 'index_stems', 'ENGLISH');
ctx_ddl.create_section_group('htmgroup', 'HTML_SECTION_GROUP');
end;
create index Bid_Title_Index on Bid(title) indextype is ctxsys.context parameters ('LEXER bid_lexer sync (every "sysdate+(1/24)")');
create index Bid_Title_Desc_Index on Bid(description) indextype is ctxsys.context parameters ('LEXER bid_lexer DATASTORE Title_Description_Pref sync (every "sysdate+(1/24)") filter ctxsys.null_filter section group htmgroup');
The problem is when I do a CONTAINS(description, '$(auction)')>0. I get results where the descriptions have the "auction" word (which is correct). But, the results also returned rows where the search word is inside an IMG tag. e.g. <img src="http://auction.de/120483" alt="Auction Logo"/>.
What I would like is to exclude rows where the search word is inside HTML tag attributes, results expected are rows having <a>Auction</a> or <p>For Auction</p> ... etc. Basically stripping the html tags and leave the text contents.
I'd appreciate some input.
Thanks,
Amiel

Substring search with Oracle context indexes

Similar Messages

Maybe you are looking for