Beginning Oracle Text...

Could someone perhaps point to a good online source of basic information about how to USE oracle text in searches?
I'm specifically looking for information about how to do searches like {woman NOT man}, or whether "woman" will select "women" or whether "$woman" will select "man" and so on. What switches are there to control what is searched for? What booleans are allowed and how must they be presented, and so on.
I'm doing OK with the official oracle documentation, but something snappier and abstracted would be good to find!
Any good book recommendations would be appreciated, too. (Especially since doing a search at Amazon for "oracle text" brings up a lot of textbooks about Oracle, but not many obviously about the specific database feature!)
In the meantime, could someone answer a simple question I've not been able to find a simple answer to so far? Can Oracle text do 'NOT' searches? (As in 'man not boy')?

Most of what you are asking about is covered in the section of the Text Reference on Contains Query Operators:
http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/cqoper.htm#CCREF0300
Here are some examples regarding the specific questions you asked:
SCOTT@orcl_11g> CREATE TABLE test_tab (test_col VARCHAR2 (60))
2 /
Table created.
SCOTT@orcl_11g> INSERT ALL
2 INTO test_tab (test_col) VALUES ('woman')
3 INTO test_tab (test_col) VALUES ('man woman')
4 INTO test_tab (test_col) VALUES ('women')
5 INTO test_tab (test_col) VALUES ('men women')
6 INTO test_tab (test_col) VALUES ('man boy')
7 INTO test_tab (test_col) VALUES ('man')
8 SELECT * FROM DUAL
9 /
6 rows created.
SCOTT@orcl_11g> CREATE INDEX test_idx ON test_tab (test_col) INDEXTYPE IS CTXSYS.CONTEXT
2 /
Index created.
SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'woman NOT man') > 0
2 /
TEST_COL
woman
SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'woman') > 0
2 /
TEST_COL
woman
man woman
SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, '$woman') > 0
2 /
TEST_COL
woman
man woman
women
men women
SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'man NOT boy') > 0
2 /
TEST_COL
man woman
man
SCOTT@orcl_11g>

Similar Messages

Searching from beginning of a line/string with Oracle Text ...

Oracle Database 10.2.0.3, Solaris
Hi,
what sounds very easy with the LIKE operator seems to be impossible with the Oracle Text Operator Contains ;
Searching for 'Deutsche%' results with LIKE ->
'Deutsche Bank'
'Deutsche Post'
'Deutsche Oracle Community'
But with Contains-Operator which is token based the result is following ('$Deutsche%')
'Deutsche Bank'
'Armin Deutscher'
We want to get results starting with 'Deutscher...' But with Contains and some configuration i did not find a way. Combining LIKE with Contains did not help too because Contains expands the word in more instances then LIKE.
Indexed Columns are varchar2 typed
any idea?
kind regards
Karl
Message was edited by:
kreitsch

Have you evaluated the query rewrite template with CONTAINS?
http://download.oracle.com/docs/cd/B19306_01/text.102/b14218/csql.htm#sthref122

Error while running the Oracle Text optimize index procedure (even as a dba user too)

Hi Experts,
I am on Oracle on 11.2.0.2 on Linux. I have implemented Oracle Text. My Oracle Text indexes are fragmented but I am getting an error while running the optimize_index error. Following is the error:
begin
ctx_ddl.optimize_index(idx_name=>'ACCESS_T1',optlevel=>'FULL');
end;
ERROR at line 1:
ORA-20000: Oracle Text error:
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DDL", line 941
ORA-06512: at line 1
Now I tried then to run this as DBA user too and it failed the same way!
begin
ctx_ddl.optimize_index(idx_name=>'BVSCH1.ACCESS_T1',optlevel=>'FULL');
end;
ERROR at line 1:
ORA-20000: Oracle Text error:
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DDL", line 941
ORA-06512: at line 1
Now CTXAPP role is granted to my schema and still I am getting this error. I will be thankful for the suggestions.
Also one other important observation: We have this issue ONLY in one database and in the other two databases, I don't see any problem at all.
I am unable to figure out what the issue is with this one database!
Thanks,
OrauserN

How about check the following?
Bug 10626728 - CTX_DDL.optimize_index "full" fails with an empty ORA-20000 since 11.2.0.2 upgrade (DOCID 10626728.8)

Pre-loading Oracle text in memory with Oracle 12c

There is a white paper from Roger Ford that explains how to load the Oracle index in memory : http://www.oracle.com/technetwork/database/enterprise-edition/mem-load-082296.html
In our application, Oracle 12c, we are indexing a big XML field (which is stored as XMLType with storage secure file) with the PATH_SECTION_GROUP. If I don't load the I table (DR$..$I) into memory using the technique explained in the white paper then I cannot have decent performance (and especially not predictable performance, it looks like if the blocks from the TOKEN_INFO columns are not memory then performance can fall sharply)
But after migrating to oracle 12c, I got a different problem, which I can reproduce: when I create the index it is relatively small (as seen with ctx_report.index_size) and by applying the technique from the whitepaper, I can pin the DR$ I table into memory. But as soon as I do a ctx_ddl.optimize_index('Index','REBUILD') the size becomes much bigger and I can't pin the index in memory. Not sure if it is bug or not.
What I found as work-around is to build the index with the following storage options:
ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'YES' );
ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
so that the token_info column will be stored in a secure file. Then I can change the storage of that column to put it in the keep buffer cache, and write a procedure to read the LOB so that it will be loaded in the keep cache. The size of the LOB column is more or less the same as when creating the index without the BIG_IO option but it remains constant even after a ctx_dll.optimize_index. The procedure to read the LOB and to load it into the cache is very similar to the loaddollarR procedure from the white paper.
Because of the SDATA section, there is a new DR table (S table) and an IOT on top of it. This is not documented in the white paper (the white paper was written for Oracle 10g). In my case this DR$ S table is much used, and the IOT also, but putting it in the keep cache is not as important as the token_info column of the DR I table. A final note: doing SEPARATE_OFFSETS = 'YES' was very bad in my case, the combined size of the two columns is much bigger than having only the TOKEN_INFO column and both columns are read.
Here is an example on how to reproduce the problem with the size increasing when doing ctx_optimize
1. create the table
drop table test;
CREATE TABLE test
(ID NUMBER(9,0) NOT NULL ENABLE,
XML_DATA XMLTYPE
XMLTYPE COLUMN XML_DATA STORE AS SECUREFILE BINARY XML (tablespace users disable storage in row);
2. insert a few records
insert into test values(1,'<Book><TITLE>Tale of Two Cities</TITLE>It was the best of times.<Author NAME="Charles Dickens"> Born in England in the town, Stratford_Upon_Avon </Author></Book>');
insert into test values(2,'<BOOK><TITLE>The House of Mirth</TITLE>Written in 1905<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
insert into test values(3,'<BOOK><TITLE>Age of innocence</TITLE>She got a prize for it.<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
3. create the text index
drop index i_test;
exec ctx_ddl.create_section_group('TEST_SGP','PATH_SECTION_GROUP');
begin
CTX_DDL.ADD_SDATA_SECTION(group_name => 'TEST_SGP',
                            section_name => 'SData_02',
                            tag => 'SData_02',
                            datatype => 'varchar2');
end;
exec ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
exec ctx_ddl.set_attribute('TEST_STO','I_TABLE_CLAUSE','tablespace USERS storage (initial 64K)');
exec ctx_ddl.set_attribute('TEST_STO','I_INDEX_CLAUSE','tablespace USERS storage (initial 64K) compress 2');
exec ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'NO' );
exec ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
create index I_TEST
on TEST (XML_DATA)
indextype is ctxsys.context
parameters('
    section group   "TEST_SGP"
    storage         "TEST_STO"
') parallel 2;
4. check the index size
select ctx_report.index_size('I_TEST') from dual;
it says :
TOTALS FOR INDEX TEST.I_TEST
TOTAL BLOCKS ALLOCATED:                                                104
TOTAL BLOCKS USED:                                                      72
TOTAL BYTES ALLOCATED:                                 851,968 (832.00 KB)
TOTAL BYTES USED:                                      589,824 (576.00 KB)
4. optimize the index
exec ctx_ddl.optimize_index('I_TEST','REBUILD');
and now recompute the size, it says
TOTALS FOR INDEX TEST.I_TEST
TOTAL BLOCKS ALLOCATED:                                               1112
TOTAL BLOCKS USED:                                                    1080
TOTAL BYTES ALLOCATED:                                 9,109,504 (8.69 MB)
TOTAL BYTES USED:                                      8,847,360 (8.44 MB)
which shows that it went from 576KB to 8.44MB. With a big index the difference is not so big, but still from 14G to 19G.
5. Workaround: use the BIG_IO option, so that the token_info column of the DR$ I table will be stored in a secure file and the size will stay relatively small. Then you can load this column in the cache using a procedure similar to
alter table DR$I_TEST$I storage (buffer_pool keep);
alter table dr$i_test$i modify lob(token_info) (cache storage (buffer_pool keep));
rem: now we must read the lob so that it will be loaded in the keep buffer pool, use the prccedure below
create or replace procedure loadTokenInfo is
type c_type is ref cursor;
c2 c_type;
s varchar2(2000);
b blob;
buff varchar2(100);
siz number;
off number;
cntr number;
begin
    s := 'select token_info from DR$i_test$I';
    open c2 for s;
    loop
       fetch c2 into b;
       exit when c2%notfound;
       siz := 10;
       off := 1;
       cntr := 0;
       if dbms_lob.getlength(b) > 0 then
         begin
           loop
             dbms_lob.read(b, siz, off, buff);
             cntr := cntr + 1;
             off := off + 4096;
           end loop;
         exception when no_data_found then
           if cntr > 0 then
             dbms_output.put_line('4K chunks fetched: '||cntr);
           end if;
         end;
       end if;
    end loop;
end;
Rgds, Pierre

I have been working a lot on that issue recently, I can give some more info.
First I totally agree with you, I don't like to use the keep_pool and I would love to avoid it. On the other hand, we have a specific use case : 90% of the activity in the DB is done by queuing and dbms_scheduler jobs where response time does not matter. All those processes are probably filling the buffer cache. We have a customer facing application that uses the text index to search the database : performance is critical for them.
What kind of performance do you have with your application ?
In my case, I have learned the hard way that having the index in memory (the DR$I table in fact) is the key : if it is not, then performance is poor. I find it reasonable to pin the DR$I table in memory and if you look at competitors this is what they do. With MongoDB they explicitly says that the index must be in memory. With elasticsearch, they use JVM's that are also in memory. And effectively, if you look at the awr report, you will see that Oracle is continuously accessing the DR$I table, there is a SQL similar to
SELECT /*+ DYNAMIC_SAMPLING(0) INDEX(i) */
TOKEN_FIRST, TOKEN_LAST, TOKEN_COUNT, ROWID
FROM DR$idxname$I
WHERE TOKEN_TEXT = :word AND TOKEN_TYPE = :wtype
ORDER BY TOKEN_TEXT, TOKEN_TYPE, TOKEN_FIRST
which is continuously done.
I think that the algorithm used by Oracle to keep blocks in cache is too complex. A just realized that in 12.1.0.2 (was released last week) there is finally a "killer" functionality, the in-memory parameters, with which you can pin tables or columns in memory with compression, etc. this looks ideal for the text index, I hope that R. Ford will finally update his white paper :-)
But my other problem was that the optimize_index in REBUILD mode caused the DR$I table to double in size : it seems crazy that this was closed as not a bug but it was and I can't do anything about it. It is a bug in my opinion, because the create index command and "alter index rebuild" command both result in a much smaller index, so why would the guys that developped the optimize function (is it another team, using another algorithm ?) make the index two times bigger ?
And for that the track I have been following is to put the index in a 16K tablespace : in this case the space used by the index remains more or less flat (increases but much more reasonably). The difficulty here is to pin the index in memory because the trick of R. Ford was not working anymore.
What worked:
first set the keep_pool to zero and set the db_16k_cache_size to instead. Then change the storage preference to make sure that everything you want to cache (mostly the DR$I) table come in the tablespace with the non-standard block size of 16k.
Then comes the tricky part : the pre-loading of the data in the buffer cache. The problem is that with Oracle 12c, Oracle will use direct_path_read for FTS which basically means that it bypasses the cache and read directory from file to the PGA !!! There is an event to avoid that, I was lucky to find it on a blog (I can't remember which, sorry for the credit).
I ended-up doing that. the events to 10949 is to avoid the direct path reads issue.
alter session set events '10949 trace name context forever, level 1';
alter table DR#idxname0001$I cache;
alter table DR#idxname0002$I cache;
alter table DR#idxname0003$I cache;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0001$I;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0002$I;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0003$I;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0001$I ITAB;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0002$I ITAB;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0003$I ITAB;
It worked. With a big relief I expected to take some time out, but there was a last surprise. The command
exec ctx_ddl.optimize_index(idx_name=>'idxname',part_name=>'partname',optlevel=>'REBUILD');
gqve the following
ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-50857: oracle error in drftoptrebxch
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DDL", line 1141
ORA-06512: at line 1
Which is very much exactly described in a metalink note 1645634.1 but in the case of a non-partitioned index. The work-around given seemed very logical but it did not work in the case of a partitioned index. After experimenting, I found out that the bug occurs when the partitioned index is created with dbms_pclxutil.build_part_index procedure (this enables enables intra-partition parallelism in the index creation process). This is a very annoying and stupid bug, maybe there is a work-around, but did not find it on metalink
Other points of attention with the text index creation (stuff that surprised me at first !) ;
- if you use the dbms_pclxutil package, then the ctx_output logging does not work, because the index is created immediately and then populated in the background via dbms_jobs.
- this in combination with the fact that if you are on a RAC, you won't see any activity on the box can be very frightening : this is because oracle can choose to start the workers on the other node.
I understand much better how the text indexing works, I think it is a great technology which can scale via partitioning. But like always the design of the application is crucial, most of our problems come from the fact that we did not choose the right sectioning (we choosed PATH_SECTION_GROUP while XML_SECTION_GROUP is so much better IMO). Maybe later I can convince the dev to change the sectionining, especially because SDATA and MDATA section are not supported with PATCH_SECTION_GROUP (although it seems to work, even though we had one occurence of a bad result linked to the existence of SDATA in the index definition). Also the whole problematic of mixed structured/unstructured searches is completly tackled if one use XML_SECTION_GROUP with MDATA/SDATA (but of course the app was written for Oracle 10...)
Regards, Pierre

ERROR at line 1: ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine ORA-20000: Oracle Text error: DRG-10700: preference does not exist: global_lexer ORA-06512: at "CTXSYS.DRUE", line 160 ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 366

database version 11.2.0.4
rac two node
CREATE INDEX MAXIMO.ACTCI_NDX3 ON MAXIMO.ACTCI
(DESCRIPTION)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('lexer global_lexer language column LANGCODE')
ERROR at line 1:
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: Oracle Text error:
DRG-10700: preference does not exist: global_lexer
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 366

Like the error message says, you don't have a global_lexer. So, you need to create a global_lexer and that lexer must have at least a default sub_lexer, then you can use that global_lexer in your index parameters. Please see the demonstration below, including reproduction of the error and solution.
SCOTT@orcl12c> -- reproduction of problem:
SCOTT@orcl12c> CREATE TABLE actci
2    (description VARCHAR2(60),
3      langcode     VARCHAR2(30))
4 /
Table created.
SCOTT@orcl12c> CREATE INDEX ACTCI_NDX3 ON ACTCI (DESCRIPTION)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS('lexer global_lexer language column LANGCODE')
4 /
CREATE INDEX ACTCI_NDX3 ON ACTCI (DESCRIPTION)
ERROR at line 1:
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: Oracle Text error:
DRG-10700: preference does not exist: global_lexer
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 366
SCOTT@orcl12c> -- solution:
SCOTT@orcl12c> DROP INDEX actci_ndx3
2 /
Index dropped.
SCOTT@orcl12c> BEGIN
2    CTX_DDL.CREATE_PREFERENCE ('global_lexer', 'multi_lexer');
3    CTX_DDL.CREATE_PREFERENCE ('english_lexer', 'basic_lexer');
4    CTX_DDL.ADD_SUB_LEXER ('global_lexer', 'default', 'english_lexer');
5 END;
6 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> CREATE INDEX ACTCI_NDX3 ON ACTCI (DESCRIPTION)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS('lexer global_lexer language column LANGCODE')
4 /
Index created.

Oracle text related internal procedure taking a lot of time in our Production database

Hi,
I am on Oracle 11.2.0.2 on Linux. I have Oracle Text implemented in all my databases for fuzzy search. I am seeing the following Oracle TExt specific internal procedure to be among the Top SQL in my AWR in production. This is during business time.
SQL ordered by Elapsed Time
Resources reported for PL/SQL code includes the resources used by
all SQL statements called by the code.
% Total DB Time is the Elapsed Time of the SQL statement divided
into the Total Database Time multiplied by 100
%Total - Elapsed Time as a percentage of Total DB time
%CPU - CPU Time as a percentage of Elapsed Time
%IO - User I/O Time as a percentage of Elapsed Time
Captured SQL account for 59.3% of Total DB Time (s): 120,379
Captured PL/SQL account for 33.8% of Total DB Time (s): 120,379
Elapsed Time (s)
Executions
Elapsed Time per Exec (s)
%Total
%CPU
%IO
SQL Id
SQL Module
SQL Text
23,476.22
205,095
0.11
19.50
16.21
7.88
ddr8uck5s5kp3
begin ctxsys.drvdml.com_sync_i...
Note that the sql id ddr8uck5s5kp3 has this sql:
begin ctxsys.drvdml.com_sync_index(:idxname, :idxmem, :partname); end;
Also note that I have the procedure to optimize the indexes (ctx_ddl.optimize_index in FULL mode) set up every night to run at 3 am for all our Oracle Text indexes. Is there anything else needed. I don't know why the procedure I showed above in the AWR report takes so much time and why it is among our Top sql.
I will be very thankful for guidance in this regard.
Thanks,
OrauserN

This is the internal call which drives the SYNC call for a text index. Effectively all the indexing of new and updated data in your text-indexed table is contained within this call.
If you're using parallel SYNC you will see this call contained with a SELECT query - that select is executed as a parallel query on a table function, which is the way we divide up the work between parallel slaves.

Oracle text in 9.2

Hi
I am trying to make a query run against a table as follows
SQL> desc FILES_INCLUDED;
Name Null? Type
PID NOT NULL VARCHAR2(16)
FILENAME NOT NULL VARCHAR2(240)
SQL> select count(*) from FILES_INCLUDED;
COUNT(*)
5719417
SQL>
where PID & FILENAME essentially contains all the files delivered by a particular patch, ie
123456-01 woudl be pid and FILENAME might be
/usr/bin/ls, so 123456-01 might have multiple entries, one row for each deliverable i.e.
SQL> select count(*) from FILES_INCLUDED where PID='xxxxxx-xx';
COUNT(*)
969
SQL>
I know this is poor really, but that is what this looks like so short of rewriting apps, that is the way it is.
Anyway users want to do
select distinct PID from FILES_INCLUDED where FILENAME like '%bin/ls%';
now this just takes up to one minute to run ( not unexpected that really )
So I recommended oracle text as perhaps a better way,
I set this up via
begin
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','true');
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
ctx_ddl.set_attribute('mywordlist','WILDCARD_MAXTERMS', '15000');
end;
and this works to a degree, but is there a way to know upfront, how to avoid getting
DRG-51030: wildcard query expansion resulted in too many terms
that is what is the minimum number of chars that one can use ( how to determine this )
ie
select distinct pid from files_included where (contains (filename,'/bin/%',1)>0)
gets a DRG-51030
I can query
select count(*) from dr$FILE_NAME_INX$i;
select count(*) from dr$FILE_NAME_INX$p;
to get an idea of the number of tokens, but not sure how to make a query run on this kind of data that is foolproof.
The use case is for users to enter
%ls% and get back the PID's that deliver %ls%.
Enda

It looks like you are wanting to search by sub-directory names or a combination of sub-directory names. By default, Oracle Text views the directory delimiter / as white space, so the individual sub-directories are tokenized. Therefore, you don't need the wildcards or / to do your searches. Please see the example below.
SCOTT@orcl_11g> create table files_included
2    (pid       varchar2 (16) not null,
3      filename varchar2 (40) not null)
4 /
Table created.
SCOTT@orcl_11g> insert all
2 into files_included values
3    ('123456-01', '/usr/bin/ls/a')
4 into files_included values
5    ('123456-02', '/usr/bin/ls/b')
6 into files_included values
7    ('123456-03', '/usr/x/ls/a')
8 into files_included values
9    ('123456-02', '/usr/bin/x/b')
10 into files_included values
11    ('654321', '/usr/bin/other')
12 select * from dual
13 /
5 rows created.
SCOTT@orcl_11g> create index myindex
2 on files_included (filename)
3 indextype is ctxsys.context
4 /
Index created.
SCOTT@orcl_11g> select token_text
2 from   dr$myindex$i
3 /
TOKEN_TEXT
B
BIN
LS
OTHER
USR
X
6 rows selected.
SCOTT@orcl_11g> select * from files_included
2 where contains (filename, 'bin ls') > 0
3 /
PID              FILENAME
123456-01        /usr/bin/ls/a
123456-02        /usr/bin/ls/b
SCOTT@orcl_11g> select * from files_included
2 where contains (filename, 'bin') > 0
3 /
PID              FILENAME
123456-01        /usr/bin/ls/a
123456-02        /usr/bin/ls/b
123456-02        /usr/bin/x/b
654321           /usr/bin/other
SCOTT@orcl_11g> select * from files_included
2 where contains (filename, 'ls') > 0
3 /
PID              FILENAME
123456-01        /usr/bin/ls/a
123456-02        /usr/bin/ls/b
123456-03        /usr/x/ls/a
SCOTT@orcl_11g>

Using Oracle Text to search through WORD, EXCEL and PDF documents

Hello again,
What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
Thanks
Doug

Yes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
begin example.
-- The following needs to be executed
-- as sys.
DROP DIRECTORY docs_dir;
CREATE OR REPLACE DIRECTORY docs_dir
AS 'C:\sql\oracle_text\documents';
GRANT READ ON DIRECTORY docs_dir TO text;
-- End sys ran SQL
DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
CREATE TABLE db_docs (
id NUMBER,
format VARCHAR2(10),
location VARCHAR2(50),
document BLOB,
CONSTRAINT i_db_docs_p PRIMARY KEY(id)
-- Several notes need to be made about this anonymous block.
-- First the 'DOCS_DIR' parameter is a directory object name.
-- This directory object name must be in upper case.
DECLARE
f_lob BFILE;
b_lob BLOB;
document_name VARCHAR2(50);
BEGIN
document_name := 'externaltables.doc';
INSERT INTO db_docs
VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
RETURN document INTO b_lob;
f_lob := BFILENAME('DOCS_DIR', document_name);
DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
DBMS_LOB.FILECLOSE(f_lob);
COMMIT;
END;
-- build the index
-- Note that this index differs than the file system stored file
-- in that paramter datastore is ctxsys.defautl_datastore and not
-- ctxsys.file_datastore. FILE_DATASTORE is for documents that
-- exist on the file system. DEFAULT_DATASTORE is for documents
-- that are stored in the column.
create index db_docs_ctx on db_docs(document)
indextype is ctxsys.context
parameters (
'datastore ctxsys.default_datastore
filter ctxsys.inso_filter
format column format');
--search for something that is known to not be in the document.
SELECT SCORE(1), id, location
FROM db_docs
WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
--search for something that is known to be in the document.
SELECT SCORE(1), id, location
FROM db_docs
WHERE CONTAINS(document, 'Albright', 1) > 0;

Using oracle text on a non-materialized view

I'm having trouble tracking down an error when using oracle text on a non-materialized view (indexes are on the referenced columns). My database has a users table and a user history table which saves the old values when a user profile changes. My view performs a "union all" so I can select from both at once.
I would like to use oracle text to perform a "contains" on the view whenever someone signs up to see if any current users or historical entries contain the desired username.
The following works fine:
contains(user_history_view, 'bill')but when I reference anything in the contains clause, i get a "column is not indexed" error:
contains(user_history_view, signup.user_name) --signup.username is 'bill'Here is a stripped-down demonstration (I am using version 10.2.0.4.0)
create table signup (
signup_id   number(19,0) not null,
signup_name varchar2(255),
primary key (signup_id)
create table users (
user_id   number(19,0) not null,
user_name varchar2(255),
primary key (user_id)
create table user_history (
history_id number(19,0) not null,
user_id    number(19,0) not null,
user_name varchar2(255),
primary key (history_id),
foreign key (user_id) references users on delete set null
create index user_name_index on users(user_name)
indextype is ctxsys.context parameters ('sync (on commit)');
create index user_hist_name_index on user_history(user_name)
indextype is ctxsys.context parameters ('sync (on commit)');
create index signup_name_index on signup(signup_name)
indextype is ctxsys.context parameters ('sync (on commit)');
create or replace force view user_history_view
(user_id, user_name, flag_history) as
select user_id, user_name, 'N' from users
union all
select user_id, user_name, 'Y' from user_history;
--user bill changed his name to bob, and there is a pending signup for another bill
insert into users(user_id, user_name) values (1, 'bob');
insert into user_history(history_id, user_id, user_name) values (1, 1, 'bill');
insert into signup(signup_id, signup_name) values(1, 'bill');
commit;
--works
select * from user_history_view users, signup new_user
where new_user.signup_id = 1
and contains(users.user_name, 'bill')>0;
--fails
select * from user_history_view users, signup new_user
where new_user.signup_id = 1
and contains(users.user_name, new_user.signup_name)>0;I could move everything into a materialized view, but querying against real-time data like this would be ideal. Any help would be greatly appreciated.

Hi,
this is to my knowledge not possible. It is hard for Oracle to do, think about a table with many rows, every row with that column must be checked. So I think only a single varchar2 is possible. Maybe for you will a function work. It is possible to give a function as second parameter.
function return_signup
return varchar2
is
l_signup_name signup.signup_name%type;
begin
select signup_name
into l_signup_name
from signup
where signup_id = 1
and rownum = 1
return l_signup_name;
exception
when no_data_found
then
    l_signup_name := 'abracadabra'; -- hope does not exist
    return l_signup_name;
end;Now you can use above function in the contains.
select * from user_history_view users --, signup new_user
--where new_user.signup_id = 1
where contains(users.user_name, return_signup)>0;I didn't test the code! Maybe you have to adjust the function for your needs. But it is a idea how this can be done.
Otherwise you must make the check by normaly check the columns by simple using a join:
select * from user_history_view users, signup new_user
where new_user.signup_id = 1
and users.user_name = new_user.signup_name;Herald ten Dam
htendam.wordpress.com

Is Oracle Text the right solution for this need of a specific search!

Hi ,
We are on Oracle 11.2.0.2 on Solaris 10. We have the need to be able to do search on data that are having diacritical marks and we should be able to do the serach ignoring this diacritical marks. That is the requirement. Now I got to hear that Oracle Text has a preference called BASIC_LEXER which can bypass the diacritical marks and so solely due to this feature I implemented Oracle Text and just for this diacritical search and no other need.
I mean I set up preference like this:
ctxsys.ctx_ddl.create_preference ('cust_lexer', 'BASIC_LEXER');
ctxsys.ctx_ddl.set_attribute ('cust_lexer', 'base_letter', 'YES'); -- removes diacritics
With this I set up like this:
CREATE TABLE TEXT_TEST
NAME VARCHAR2(255 BYTE)
--created Oracle Text index
CREATE INDEX TEXT_TEST_IDX1 ON TEXT_TEST
(NAME)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('LEXER cust_lexer WORDLIST cust_wl SYNC (ON COMMIT)');
--sample data to illustrate the problem
Insert into TEXT_TEST
   (NAME)
Values
   ('muller');
Insert into TEXT_TEST
   (NAME)
Values
   ('müller');
Insert into TEXT_TEST
   (NAME)
Values
   ('MULLER');
Insert into TEXT_TEST
   (NAME)
Values
   ('MÜLLER');
Insert into TEXT_TEST
   (NAME)
Values
   ('PAUL HERNANDEZ');
Insert into TEXT_TEST
   (NAME)
Values
   ('CHRISTOPHER Phil');
COMMIT;
--Now there is an alternative solution that is there, instead of thee Oracle Text which is just a plain function given below (and it seems to work neat for my simple need of removing diacritical characters effect in search)
--I need to evaluate which is better given my specific needs -the function below or Oracle Text.
CREATE OR REPLACE FUNCTION remove_dia(p_value IN VARCHAR2, p_doUpper IN VARCHAR2 := 'Y')
RETURN VARCHAR2 DETERMINISTIC
IS
OUTPUT_STR VARCHAR2(4000);
begin
IF (p_doUpper = 'Y') THEN
   OUTPUT_STR := UPPER(p_value);
ELSE
   OUTPUT_STR := p_value;
END IF;
OUTPUT_STR := TRANSLATE(OUTPUT_STR,'ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ', 'AAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy');
RETURN (OUTPUT_STR);
end;
--now I query for which name stats with a P%:
--Below query gets me unexpected result of one row as I am using Oracle Text where each word is parsed for search using CONTAINS...
SQL> select * from text_test where contains(name,'P%')>0;
NAME
PAUL HERNANDEZ
CHRISTOPHER Phil
--Below query gets me the right and expected result of one row...
SQL> select * from text_test where name like 'P%';
NAME
PAUL HERNANDEZ
--Below query gets me the right and expected result of one row...
SQL> select * from text_test where remove_dia(name) like remove_dia('P%');
NAME
PAUL HERNANDEZMy entire need was only to be able to do a search that bypasses diacritical characters. To implement Oracle Text for that reason, I am wondering if that was the right choice! More so when I am now finding that the functionality of LIKE is not available in Oracle Text - the Oracle text search are based on tokens or words and they are different from output of the LIKE operator. So may be should I have just used a simple function like below and used that for my purpose instead of using Oracle Text:
This function (remove_dia) just removes the diacritical characters and may be for my need this is all that is needed. Can someone help to review that given my need I am better of not using Oracle Text? I need to continue using the functionality of Like operator and also need to bypass diacritical characters so the simple function that I have meets my need whereas Oracle Text causes a change in behaviour of search queries.
Thanks,
OrauserN

If all you need is LIKE functionality and you do not need any of the complex search capabilities of Oracle Text, then I would not use Oracle Text. I would create a function-based index on your name column that uses your function that removes the diacritical marks, so that your searches will be faster. Please see the demonstration below.
SCOTT@orcl_11gR2> CREATE TABLE TEXT_TEST
2    (NAME VARCHAR2(255 BYTE))
3 /
Table created.
SCOTT@orcl_11gR2> Insert all
2 into TEXT_TEST (NAME) Values ('muller')
3 into TEXT_TEST (NAME) Values ('müller')
4 into TEXT_TEST (NAME) Values ('MULLER')
5 into TEXT_TEST (NAME) Values ('MÜLLER')
6 into TEXT_TEST (NAME) Values ('PAUL HERNANDEZ')
7 into TEXT_TEST (NAME) Values ('CHRISTOPHER Phil')
8 select * from dual
9 /
6 rows created.
SCOTT@orcl_11gR2> CREATE OR REPLACE FUNCTION remove_dia
2    (p_value   IN VARCHAR2,
3      p_doUpper IN VARCHAR2 := 'Y')
4    RETURN VARCHAR2 DETERMINISTIC
5 IS
6    OUTPUT_STR VARCHAR2(4000);
7 begin
8    IF (p_doUpper = 'Y') THEN
9       OUTPUT_STR := UPPER(p_value);
10    ELSE
11       OUTPUT_STR := p_value;
12    END IF;
13    RETURN
14       TRANSLATE
15         (OUTPUT_STR,
16          'ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ',
17          'AAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy');
18 end;
19 /
Function created.
SCOTT@orcl_11gR2> show errors
No errors.
SCOTT@orcl_11gR2> CREATE INDEX text_test_remove_dia_name
2 ON text_test (remove_dia (name))
3 /
Index created.
SCOTT@orcl_11gR2> set autotrace on explain
SCOTT@orcl_11gR2> select * from text_test
2 where remove_dia (name) like remove_dia ('mü%')
3 /
NAME
muller
müller
MULLER
MÜLLER
4 rows selected.
Execution Plan
Plan hash value: 3139591283
| Id | Operation                   | Name                      | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT            |                           |     1 | 2131 |     2   (0)| 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID| TEXT_TEST                 |     1 | 2131 |     2   (0)| 00:00:01 |
|* 2 |   INDEX RANGE SCAN          | TEXT_TEST_REMOVE_DIA_NAME |     1 |       |     1   (0)| 00:00:01 |
Predicate Information (identified by operation id):
   2 - access("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('mü%'))
       filter("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('mü%'))
Note
   - dynamic sampling used for this statement (level=2)
SCOTT@orcl_11gR2> select * from text_test
2 where remove_dia (name) like remove_dia ('P%')
3 /
NAME
PAUL HERNANDEZ
1 row selected.
Execution Plan
Plan hash value: 3139591283
| Id | Operation                   | Name                      | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT            |                           |     1 | 2131 |     2   (0)| 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID| TEXT_TEST                 |     1 | 2131 |     2   (0)| 00:00:01 |
|* 2 |   INDEX RANGE SCAN          | TEXT_TEST_REMOVE_DIA_NAME |     1 |       |     1   (0)| 00:00:01 |
Predicate Information (identified by operation id):
   2 - access("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('P%'))
       filter("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('P%'))
Note
   - dynamic sampling used for this statement (level=2)
SCOTT@orcl_11gR2>

Problem full-text in blob column index created using Oracle Text

Hi,
I'm running Oracle Database 10g 10.2 on solaris
I configure Oracle text if i look for in a varchar2 column is ok, but with blob column doesn't works the search.
I have a table with a blob column which contains document. I load document with Oracle UCM (stellent)
My index scripts is:
CREATE INDEX ORAUCM.FT_IDCCOLL1 ON ORAUCM.IDCCOLL1
(DDOCFULLTEXT)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('DATASTORE CTXSYS.DEFAULT_DATASTORE FILTER CTXSYS.AUTO_FILTER FORMAT COLUMN DFULLTEXTFORMAT CHARSET
COLUMN DFULLTEXTCHARSET LEXER OCS_IDCCOLL1_LEXER SYNC (ON COMMIT)')
NOPARALLEL;
And my select retunm 0 rows although it will be many documents:
SELECT IdcColl2.dID, dDocName, dDocTitle, dDocType, dRevisionID, dSecurityGroup, dDocAuthor,
dDocAccount, dRevLabel, dFormat, dOriginalName, dExtension, dWebExtension, dInDate, dOutDate,
dPublishType, dRendition1, dRendition2, VaultFileSize, WebFileSize, URL, dFullTextFormat,
dFullTextCharset, DocMeta.*
FROM IdcColl1, DocMeta
WHERE IdcColl1.dID=DocMeta.dID AND (CONTAINS(dDocFullText,'SUBIR') > 0 )
ORDER BY dInDate Desc
Thanks in advance.

Thank you for your answer.
I response your question:
- yes DDOCFULLTEXT is a BLOB column.
- The document that word, excels, whatever. We load the document with UCM (universal Content Management)
because i need full-test search form UCM tool.
- Yes 'subir' containts in the word document.
- select * from CTX_USER_INDEX_ERRORS ;
No rows returned.
- SELECT TOKEN_TEXT FROM DR$FT_IDCCOLL1$I
No rows returned.
- I tried create symplifying index and doen't work.
I tried create table and index context on oracle 10.2.0.3 (test database)and works ok.
I compared both context (test database and ucm database) and i saw a difference:
In ucm database there are these preferences "analyze text"
BEGIN ctx_ddl.create_preference('ORAUCM.', 'WORLD_LEXER'); end;
BEGIN ctx_ddl.create_preference('ORAUCM.', 'DETAIL_DATASTORE'); end;
I don't know if is important diference or no.
Please if you need more information, tell me.
Thanks for your time.

Oracle Text USER_LEXERについて

Oracle Text を USER_LEXER で使用したいと考えております。
後述のサイトを参考にプロシージャをcreateしようとしたのですが
以下のエラーになってしまい実行出来ませんでした。
対処方法ご存じでしたら教えて頂けないでしょうか。
＠エラー内容
PROCEDURE QUERY_OTのエラーです。
LINE/COL ERROR
0/0 PL/SQL: Compilation unit analysis terminated
4/20 PLS-00904:
オブジェクト'PUBLIC.CTX_ULEXER'にアクセスするには権限が不十分です
＠実行したSQL文
CREATE OR REPLACE PROCEDURE query_ot
p_target IN VARCHAR2,
p_tab IN CTX_ULEXER.WILDCARD_TAB,
p_result IN OUT VARCHAR2
AUTHID CURRENT_USER
IS
cursor cur_oti is select token from oti;
BEGIN
p_result := '<TOKENS>';
for now in cur_oti loop
if instr(p_target, now.token) > 0 then
p_result := p_result || '<WORD>' || now.token || '</WORD>';
end if;
end loop;
p_result := p_result || '</TOKENS>';
END query_ot;
＠参考サイト
http://www.nacky.info/wiki/index.php?OracleText
＠環境
OS: Ubuntu 12.04LTS
Oracle: 11.2.0-1.0 (64ビット版)
Edited by: 955082 on 2012/08/26 20:42

ご回答ありがとうございます。
FORMAT COLUMNとUSER_DATASTOREを試したところ、
当方の望みの動作を確認出来ました。
（マテリアライズド・ビューは後に確認させて頂きます）
ただ、検索対象のデータを更新したの索引の同期が取れず
また問題につまずいてしまいました。
度々で恐れ入りますが、同期の方法ももしよろしければご教授頂けないでしょうか。
USER_DATASTOREを実装したSQLと確認手順は以下の通りです。
＠データやプロシージャ作成SQL
--検索対象テーブル
drop table ot;
create table ot
( id number primary key,
text varchar2(80),
del_flg number,
type varchar2(80) );
insert into ot ( id, text, del_flg, type ) values ( 1, 'The cat sat on the mat', 0, 'TEXT' );
insert into ot ( id, text, del_flg, type ) values ( 2, 'The dog barked like a dog', 0, 'TEXT' );
insert into ot ( id, text, del_flg, type ) values ( 3, '日本オラクル株式会社', 0, 'IGNORE' );
insert into ot ( id, text, del_flg, type ) values ( 4, '全日本自動車協会', 1, 'TEXT' );
commit;
-- 辞書テーブル
drop table oti;
create table oti
( token varchar2(80) );
insert into oti ( token ) values ( '日本' );
insert into oti ( token ) values ( 'オラクル' );
insert into oti ( token ) values ( '日本オラクル' );
insert into oti ( token ) values ( '自動車協会' );
insert into oti ( token ) values ( 'Ora' );
commit;
-- 索引作成プロシージャ
CREATE OR REPLACE PROCEDURE ind_ngram
v_a IN VARCHAR2,
v_b IN OUT VARCHAR2,
v_c IN BOOLEAN
IS
cursor cur_oti is select token from oti;
BEGIN
v_b := '<tokens>';
for now in cur_oti loop
if instr(v_a, now.token) > 0 then
v_b := v_b || '<word>' || now.token || '</word>';
end if;
end loop;
v_b := v_b || '</tokens>';
END ind_ngram;
--索引検索プロシージャ
CREATE OR REPLACE PROCEDURE que_ngram
v_a IN VARCHAR2,
v_b IN CTX_ULEXER.WILDCARD_TAB,
v_c IN OUT VARCHAR2
IS
cursor cur_oti is select token from oti;
BEGIN
v_c := '<tokens>';
for now in cur_oti loop
if instr(v_a, now.token) > 0 then
v_c := v_c || '<word>' || now.token || '</word>';
end if;
end loop;
v_c := v_c || '</tokens>';
END que_ngram;
--データストアプロシージャ
create or replace procedure myproc(rid in rowid, ret in out nocopy varchar2) is
begin
ret := null;
for c1 in (select text from ot
where rowid = rid and del_flg = 0)
loop
ret := c1.text;
end loop;
end;
--CTX_DDL
BEGIN
CTX_DDL.drop_PREFERENCE('my_lexer');
CTX_DDL.CREATE_PREFERENCE('my_lexer', 'user_lexer');
CTX_DDL.SET_ATTRIBUTE('my_lexer', 'index_procedure', 'ind_ngram');
CTX_DDL.SET_ATTRIBUTE('my_lexer', 'input_type', 'varchar2');
CTX_DDL.SET_ATTRIBUTE('my_lexer', 'query_procedure', 'que_ngram');
END;
BEGIN
CTX_DDL.drop_STOPLIST('my_stoplist');
CTX_DDL.CREATE_STOPLIST('my_stoplist', 'basic_stoplist');
END;
BEGIN
ctx_ddl.drop_preference('my_datastore');
ctx_ddl.create_preference('my_datastore', 'user_datastore');
ctx_ddl.set_attribute('my_datastore', 'procedure', 'myproc');
ctx_ddl.set_attribute('my_datastore', 'output_type', 'varchar2');
END;
--索引作成
CREATE INDEX a ON ot(text)
INDEXTYPE IS ctxsys.context
PARAMETERS
LEXER my_lexer
STOPLIST my_stoplist
datastore my_datastore
＠確認手順１：索引を表示
# USER_DATASTORE実装前はid=3とid=4のデータが索引に格納されていたが、
# id=4のdel_flgを1に設定してINSERTしたためid=3 のデータのみが格納されている(正常)
SQL> select token_text from dr$a$i;
TOKEN_TEXT
オラクル
日本
日本オラク
＠確認手順２：id=3のdel_flgをオン
SQL> update ot set del_flg=1 where id=3;
1行が更新されました
SQL> commit;
コミットが完了しました。
＠確認手順３：同期コマンドを実行
SQL> exec CTX_DDL.SYNC_INDEX('a');
PL/SQLプロシージャが正常に完了しました。
SQL> exec CTX_DDL.OPTIMIZE_INDEX('a','full');
PL/SQLプロシージャが正常に完了しました。
＠確認手順４：再び索引を表示
# id=3のdel_flgに1を立てていたため索引はでないはず・・・だが表示されてしまった。更新されていない模様(異常)
SQL> select token_text from dr$a$i;
TOKEN_TEXT
オラクル
日本
日本オラクル
---

Oracle text indexed view is possible

Oracle text indexed view is possible???

ok,
My table name is T_DOC :
ID----------------> NUMBER(30)
DESCRIPTION-------> VARCHAR2(2000 BYTE)
DOC---------------> BLOB
FILENAME----------> VARCHAR2(2000 BYTE)
MIMETYPE----------> VARCHAR2(2000 BYTE)
LAST_UPDATE_DATE--> DATE
T_DOC
| Id | DESCRIPTION | DOC | FILENAME | MIMETYPE | LAST_UPDATE_DATE |
| 1 | THE DOG | *(!BLOB) | THE_CAT.PDF | application/pdf | 20/05/2010 15:06:15 |
| 2 | THE BIRD | **(!BLOB) | THE_BIRD.PDF | application/pdf | 20/05/2010 15:06:15 |
| 3 | THE HUMAN AND CAT | ***(!BLOB) | THE_HUMAN.PDF | application/pdf | 20/05/2010 15:06:15 |
* is a document .pdf with content: "the dog and cat"
** is a document .pdf with content: "the bird in house"
*** is a document .pdf with content: "the human from USA"
Index the columns DESCRIPTION, DOC (document content), FILENAME
begin
ctx_ddl.create_preference('idxDoc_lx', 'BASIC_LEXER');
ctx_ddl.set_attribute (' idxDoc_lx ', 'MIXED_CASE', 'NO');
end;
begin
ctx_ddl.create_preference('idxDoc_ds', 'MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute ('idxDoc_ds', 'COLUMNS', 'DOC, FILENAME, DESCRIPTION');
end;
CREATE INDEX IDX_DOC
ON T_DOC (FILENAME)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('lexer idxDoc_lx
datastore idxDoc_ds
filter CTXSYS.AUTO_FILTER
sync (on commit)');Search Query:
select ID
from T_DOC
where CONTAINS (DOCUMENTO, 'CAT', 1) > 0 RESULT ID = 1
WHY NOT ALSO Returned ID 3 ??????

Oracle text search query

Dear Professionals,
I am using oracle text functionality(11g).
Is there any way to replace '-' with space and search the as full text as '18005-12220',or partial(either 18005 or 12220 ) as keyword.
case 1)
select * from search_table where CONTAINS(searchdata,'18005\-12220')>0;
o/p=>18005-12220   xyz abc   address 145
case 2)
select * from search_table where CONTAINS(searchdata,'18005')>0;
o/p=>no rows
case 3)
select * from search_table where CONTAINS(searchdata,'12220')>0;
o/p=>no rows
BEGIN
ctx_ddl.create_preference ('SUBSTRING_PREF', 'BASIC_WORDLIST');
ctx_ddl.set_attribute      ('SUBSTRING_PREF', 'substring_index',   'YES');
ctx_ddl.set_attribute      ('SUBSTRING_PREF', 'prefix_index',      'YES');
ctx_ddl.set_attribute      ('SUBSTRING_PREF', 'prefix_min_length', 1);
ctx_ddl.set_attribute      ('SUBSTRING_PREF', 'prefix_max_length', 10);
ctx_ddl.set_attribute ('SUBSTRING_PREF', 'WILDCARD_MAXTERMS', 10000);
ctx_ddl.create_preference('mylex', 'BASIC_LEXER');
ctx_ddl.set_attribute('mylex', 'printjoins', '_-');
Ctx_Ddl.Set_Attribute ( 'mylex', 'index_themes', 'FALSE');
Ctx_Ddl.Create_Preference('my_text_storage', 'BASIC_STORAGE');
ctx_ddl.set_attribute('my_text_storage','I_TABLE_CLAUSE', 'tablespace users storage (initial 10M next 10M)');
ctx_ddl.set_attribute('my_text_storage', 'K_TABLE_CLAUSE', 'tablespace users storage (initial 10M next 10M)');
ctx_ddl.set_attribute('my_text_storage', 'R_TABLE_CLAUSE', 'tablespace users storage (initial 10M) lob (data) store as (cache)');
ctx_ddl.set_attribute('my_text_storage', 'N_TABLE_CLAUSE', 'tablespace users storage (initial 1M)');
ctx_ddl.set_attribute('my_text_storage', 'I_INDEX_CLAUSE', 'tablespace users storage (initial 1M) compress 2');
ctx_ddl.set_attribute('my_text_storage', 'P_TABLE_CLAUSE', 'tablespace users storage (initial 1M)');
END;

thanks Roger Ford for your valuable suggestion.problem is resolved now.

NEAR operator alternative when not using. oracle Text ?

hi,
I'm working on a project where i would need a Oracle Text 'NEAR like' operator ...
here is my scenario ...
in db we have Customers ... and every customer has some criterias like different search words( names, towns,cars,etc...) so for every customer i can create an SQL query out of criterias . ....
now .... we can have a criteria like. ...... WHERE fulltext like 'john%'. or even distance search line NEAR inside CONTAINS. ... but then the Oracle text index is needed .....
the only tAble on which Text index is created is our storage table that holds more then 4mil records and growing...
my question is ... is there any way to have a query that would do the same thing as NEAR but without Text index ?
here is how I start ....
I get full newspaper article text from our OCR library ......
then i need to check customer's criterias against this text to see which article is for which customer and then bind the article to the customer
I could do it without Oracle using RegEx , but criterias can get really complicated ... like customer wants only specific MEDIA, or specific category , type , only articles that are from medias that are from specific country etc ... and many more different criterias ... and all this can be wrapped inside brackets with ANDs, ORs, NOT. ....
So the only way to do it is to put it in Oracle and execute the correct query and let Oracle decide if the result is true or false .... but due to NEAR operator I need Oracle text ...
So if I decide to first insert article into our storage table which has Oracle text index to be able to do the correct search .... how fast will this be ????
will the the search become slower when there are 6mil records ? I know I can use FILTER BY to help Text index to do a better and quicker seach ... and how to optimize index ....but still
I'm always asking my self..... why insert the article in a table where there are already 6mil articles and execute query when I only need to check data on one single article and. i already know this article ...
I see two solutions :
- if there is alternative for NEAR without using Oracle text index then i would insert data into temporary table and execute query on this table..... table would always contain only this one article. maybe one option would be to have one 'temp' table with Oracle text index in which i insert this one article and with help of Oracle text based on this one article do the search , and then maybe on a daily basis clear index ..... or when the article is removed from the table ... but this would mean having two Orcle text indexes, cause we already have Oracle text index on our storage table anyway....
- another is to use Oracle text index and insert it into our storage table and hope for the best quick results ....
Maybe I'm exaggerating and query like WHERE id=1234 and CONTAINS(...). will execute faster then I think
If anyone would have any other suggestion I will be happy to try it ..
thanks,
Kris

Hi,
this is to my knowledge not possible. It is hard for Oracle to do, think about a table with many rows, every row with that column must be checked. So I think only a single varchar2 is possible. Maybe for you will a function work. It is possible to give a function as second parameter.
function return_signup
return varchar2
is
l_signup_name signup.signup_name%type;
begin
select signup_name
into l_signup_name
from signup
where signup_id = 1
and rownum = 1
return l_signup_name;
exception
when no_data_found
then
l_signup_name := 'abracadabra'; -- hope does not exist
return l_signup_name;
end;Now you can use above function in the contains.
select * from user_history_view users --, signup new_user
--where new_user.signup_id = 1
where contains(users.user_name, return_signup)>0;I didn't test the code! Maybe you have to adjust the function for your needs. But it is a idea how this can be done.
Otherwise you must make the check by normaly check the columns by simple using a join:
select * from user_history_view users, signup new_user
where new_user.signup_id = 1
and users.user_name = new_user.signup_name;Herald ten Dam
htendam.wordpress.com

Beginning Oracle Text...

Similar Messages

Maybe you are looking for