Oracle text clustering

Hi,
I'm trying the ctx_cluster package.
My question is:
Is it possible to create for a whole collection
the themes table before and then apply cluster analysis on the whole collection or a subset?
With the ctx_cluster package it seems that the themes
must be extracted every time you start the clustering
process. For large collections this could be a long task.
Giorgio

Similar Messages

Oracle text clustering - use of Stemming

Hi,
I am using K-means to cluster a set of documents. But I am unable to set the clustering algorithm parameters to use stemming for the tokens. The clustering algorithm uses 'move', 'moving', 'moves', 'moved' as separate words and clusters the documents into different clusters. I would like to group all the documents that contain 'move', 'moving', 'moves', 'moved' into a single group by using the stem 'move'. I am unable to do this so far. In case any of you have some ideas, please suggest.
I use the following preferences and attributes to create a text index:
BEGIN
CTX_DDL.DROP_PREFERENCE ('test_lex');
CTX_DDL.CREATE_PREFERENCE ('test_lex', 'BASIC_LEXER');
CTX_DDL.SET_ATTRIBUTE ('test_lex', 'INDEX_STEMS', 'ENGLISH');
END;
drop index temp_idx;
CREATE index temp_idx ON temp(text1) indextype is CTXSYS.CONTEXT parameters ('WORDLIST CTXSYS.BASIC_WORDLIST LEXER test_lex SYNC (ON COMMIT)');
And below is the code I use to cluster the documents:
create table temp0 (docid NUMBER, clusterid NUMBER, score NUMBER);
create table temp1 (clusterid NUMBER, descript varchar2(4000), label varchar2(200), sze number, quality_score number, parent number);
begin
ctx_ddl.drop_preference('my_cluster');
ctx_ddl.create_preference('my_cluster','KMEAN_CLUSTERING');
ctx_ddl.set_attribute('my_cluster','CLUSTER_NUM','10');
ctx_ddl.set_attribute('my_cluster','STEM_ON','FALSE');
ctx_output.start_log('my_log');
ctx_cls.clustering('temp_idx','seq','temp0','temp1','my_cluster');
ctx_output.end_log;
end;
Thanks!

Make the following true
ctx_ddl.set_attribute('my_cluster','STEM_ON','FALSE');
i.e.
ctx_ddl.set_attribute('my_cluster','STEM_ON','TRUE');
and then create the clusters. Also as you have already done, the lexer should have INDEX_STEM on

Oracle Text Classification/Clustering

Is anyone using the Oracle Text classification/clustering technology? I am working on a project where we are doing research on using this type of technology for our Oracle Text searches.

Thanks for your help. I will talk to the person I am working with to see if he thinks we can go this route. I know he is the one that is the contact person for the thesaurus. If I have any more questions, I'll post to this thread. It will be the first of next week before I can get with him.
Thanks,
--Sandra :->

Oracle Text and Real Application Clusters

Hello,
i know, its a simple question but i found no one, who can answer.
We want to use Oracle Text in combination with a 2-Node-ORAC-System.
Is this possibe. And if it is, do I have to do a special configuration job. Are there any differences betwenn usind Oracle Text in an normal and in an ORAC Enviroment?
Thank you and best Regards from Berlin
M. Wuttke

It is possible to use Oracle Text with RAC.
And if it is, do I have to do a special configuration job. Are there any differences betwenn usind Oracle Text in an normal and in an ORAC Enviroment? I'll get back to you with more information about this later.

Justification for Using Oracle Text

Hello,
Can someone give me good cause (justification) for utilizing Oracle Text over other tools out there that are not tied directly to Oracle?
Apparently it is possible to identify metadata within text and do keyfield and keyword searches this way with other tools, but I question the accuracy, speed, or value in terms of data relationships with this approach. I feel the relationships belong in the database along with the indexes but can't convince anyone of this.
Has anyone experience working with Oracle Text where relationships help to drive the search and can give me good cause to this approach?
thanks

Hi,
Justification depends on your use. For starters:
1) It is included in both standard and enterprise editions of the db at no added charge
2) Uses SQL to query and maintain
3) Includes a number of built-ins for maintenance and optimization
4) It has 4 different index types for various uses
5) It can index any data type
6) UltraSearch is included in both standard and enterprise editions of the db at no additional charge (this is a crawler built on Oracle Text).
As for the integration - it is optimized for Oracle. If you were to build a standalone indexing solution you would probably design it a bit different, but Oracle Text takes into account the optimizer and database structure.
It has other features (same as some of the other tools) like a knowledge base, classification, clustering, theme extraction, language-specific features, ability to index documents in and out of the database, stopwords, stemming, wildcard, progressive relaxation, and the list goes on.
I guess my question would be, what is the reason for NOT using it? That might give me a better line on the reasoning so that I can respond with something a bit more specific.
Thanks,
Ron

ORACLE TEXT 10G의 수동 설치와, 설치 해제 방법

제품 : ORACLE SERVER
작성날짜 : 2006-03-17
ORACLE TEXT 10G의 수동 설치와, 설치 해제 방법
===================================
PURPOSE
이 문서는, Oracle Text 10gR1의 수동 설치와 설치 내용의 확인, 설치 해제 방법을 기술하고 있다.
이 정보는, Oracle 10g Release 1 (10.1.0.2)의 Text를 구성하는 데이터베이스 관리자와, 기술 지원 담당자에게
유용한 정보가 될 것이다.
Explanation
* 주의 사항
오라클 데이터베이스를 Database Configuration Assistant (DBCA)를 사용하여 생성하였다면, Text는
기본적으로 설치되며, 별도로 아래 기술된 절차에 따라 설치를 진행할 필요가 없다.
Oracle Text는 모든 데이터베이스 에디션에 (Oracle Database Standard Edition One,
Oracle Database Standard Edition (SE), Oracle Database Enterprise Edition (EE),
Oracle Database Personal Edition ) 추가 라이센스 비용 없이 사용할 수 있다.
Oracle Database Enterprise Edition (EE)의 경우, Oracle Text를 설치하기전에 Oracle Data Mining (ODM)
기능을 사용할 수 있게 하는 것이 좋다. 이와 같이 하면, SVM classifier 와 KMEANS clustering 기능을 사용
할 수 있다. RULE classifier 나 TEXTK clustering과 같은 다른 기능은 ODM이 설치되지 않는 상태에서도
실행 가능하다.
* 설치 절차
데이터베이스를 수동으로 생성하였거나, 데이터베이스 생성 후 Text를 나중에 별도로 설치하기 위해서는
다음과 같은 절차를 따른다.
참고: SQL*Plus에서 $ORACLE_HOME 대신에 '?'를 사용하였다.
1. SQL*Plus에 SYSDBA로 연결 한 수, 다음과 같이 스크립트를 호출하면, CTXSYS 스키마에 Text dictionary가
생성된다.
SQL> connect SYS/password as SYSDBA
SQL> spool text_install.txt
SQL>@?/ctx/admin/catctx.sql CTXSYS SYSAUX TEMP LOCK
위 명령에서:
CTXSYS - ctxsys 사용자의 패스워드
SYSAUX - ctxsys 사용자의 default tablespace명
TEMP - ctxsys 사용자의 temporary tablespace명
LOCK|NOLOCK - ctxsys 사용자 계정을 잠글 것인지, 잠금 해제 할 것인지 여부
2. 위 작업이 끝나면 언어별로 적절한 default preference를 생성한다.
Oracle text가 지원하는 언어별 default preference는 /ctx/admin/defaults 디렉토리에 있다.
예를 들면, English(US), Danish(DK), Dutch(NL), Finnish(SF), French(F), German(D), Italian(IT),
Portuguese(PT), Spanish(E), Swedish(S) 등이 있다.
이들 스크립트는 drdefXX.sql과 같은 파일 명을 갖는데, 여기서 XX는 사용하고자 하는 국제
코드이다. 예를 들어 US default preference를 수동으로 설치하기 위해서는 sqlplus에 CTXSYS
계정으로 로그인 한 후, 'drdefus.sql'을 다음과 같이 실행시킨다.
예를 들어 한국의 경우엔 KR default preference를 수동으로 설치하기 위해서는 sqlplus에 CTXSYS
계정으로 로그인 한 후, 'drdefkr.sql'을 다음과 같이 실행시킨다.
SQL> connect CTXSYS/password
SQL>@?/ctx/admin/defaults/drdefko.sql
SQL> spool off
*** 주의 ***
만약 Text를 설치하기 전에 Oracle Data Mining (ODM)을 설치하였다면, text_install.txt 파일에서
public synonym과 관련된 ORA-955 에러가 남겨진 것을 확인 할 수 있다. 예를 들어 dm_svm_build 와
관련된 에러가 발생하게 되는데 이 에러는 무시하면 된다. CTXSYS 스키마에 API를 휴내내는 더미
패키지를 생성하는데, 설치 과정에서 public synonym을 생성하는 것을 시도해 보기 때문이다.
ODM이 설치되어 있는 상태에서는, public synonym 생성은 실패하게 되고, public synonym이
ODM의 객체를 가리키는 상태가 되므로, 정상적인 동작이 된다.
3. Text 10gR1 (10.1.0.x) 정상 설치 여부에 대한 확인
a. 모든 Text 객체가 CTXSYS 스키마에 정상적으로 설치 되었는지 여부를 확인한다.
b. CTXSYS 스키마에 속한 객체중 invalid 상태인 것들이 없는 것이 있는지 확인한다.
정상적으로 설치된 상태에서는 결과 값으로 "no rows selected"가 나와야 한다.
만약 invalid 상태의 객체가 있다면 수동으로 이들 객체를 컴파일 해 주어야 한다.
------------------- cut here ------------------------------
connect SYS/password as SYSDBA
set pages 1000
col object_name format a40
col object_type format a20
col comp_name format a30
column library_name format a8
column file_spec format a60 wrap
spool text_install_verification.log
-- check on setup
select comp_name, status, substr(version,1,10) as version
from dba_registry
where comp_id = 'CONTEXT';
select * from ctxsys.ctx_version;
select substr(ctxsys.dri_version,1,10) VER_CODE from dual;
select count(*)
from dba_objects where owner='CTXSYS';
-- Get a summary count
select object_type, count(*)
from dba_objects where owner='CTXSYS'
group by object_type;
-- Any invalid objects
select object_name, object_type, status
from dba_objects
where owner='CTXSYS'
and status != 'VALID'
order by object_name;
spool off
------------------- cut here ------------------------------
정상적으로 설치가 되었다면 다음과 같은 결과가 나와야 한다:
SQL> select comp_name, status, substr(version,1,10) as version
from dba_registry
where comp_id = 'CONTEXT';
COMP_NAME STATUS VERSION
Oracle Text VALID 10.1.0.2.0
SQL> select * from ctxsys.ctx_version;
VER_DICT VER_CODE
10.1.0.2.0 10.1.0.2.0
SQL> select substr(ctxsys.dri_version,1,10) VER_CODE from dual;
VER_CODE
10.1.0.2.0
SQL> select count(*)
from dba_objects where owner='CTXSYS';
COUNT(*)
338
SQL> select object_type, count(*)
from dba_objects where owner='CTXSYS'
group by object_type;
OBJECT_TYPE COUNT(*)
FUNCTION 5
INDEX 46
INDEXTYPE 4
LIBRARY 1
LOB 1
OPERATOR 6
PACKAGE 71
PACKAGE BODY 58
PROCEDURE 3
SEQUENCE 3
TABLE 37
TYPE 42
TYPE BODY 7
VIEW 54
14 rows selected.
SQL> select object_name, object_type, status
from dba_objects
where owner='CTXSYS'
and status != 'VALID'
order by object_name;
no rows selected
SQL>
4. 수동으로 설치 해제를 하는 절차
*** 주의 ***
Oracle Text를 설치 해제 하기 전에, CTXSYS 이외의 계정에서 생성한 모든 Text 인덱스를 drop 시키는 것이
좋다.
CTXSYS 스키마의 Text dictionary는 SQL*Plus에 SYSDBA로 연결하여 다음과 같이 스크립트를 실행 시킴으로써
제거된다.
SQL> connect SYS/password as SYSDBA
SQL> spool spool textdeinstall.log
SQL>@?/ctx/admin/catnoctx.sql
SQL> spool off
Review the output file textdeinstall.log for errors.
Deinstallation of Oracle Text 10.1.0.x is complete.
Example
Reference Documents
Oracle Text Reference 10g Release 1 (10.1) Part Number B10730-02
Note 280713.1 Manual installation, deinstallation of Oracle Text 10gR1

How do I get Oracle Text to index files on a file server?

I am new to Oracle (I'm a MS-SQL DBA looking for a Full-Text Search solution that is better than linking to a MS index server.)
So - Here's the objective:
I have Oracle Server(Express) installed on a Windows server.
I would like for Oracle to build a Full-Text Catalog of the files on a separate file server based on file paths in a table in the database.
(No desire to store terabytes of images and documents inside the database)
I can get Oracle text up and running, using the URL_Datastore:
CREATE TABLE files (id NUMBER PRIMARY KEY, issue_id NUMBER, path VARCHAR(255) UNIQUE, ot_format VARCHAR(6), ot_version VARCHAR(10));
The Compaq server is a remote windows server on my local workgroup, so the fully qualified path is just "compaq" and the URL is valid:
INSERT INTO files VALUES (9,9,'file://Compaq/FTQ/00000003.pdf',NULL,NULL);
INSERT INTO files VALUES (13,13,'file://Compaq/FTQ/01.txt',NULL,NULL);
CREATE INDEX file_index ON files(path) INDEXTYPE IS ctxsys.context
PARAMETERS ('datastore ctxsys.URL_DATASTORE format column ot_format');
but when I enter:
Select * from CTX_User_Index_errors, I see the following errors:
DRG-11609: URL store: unable to open local file specified by file://Compaq/FTQ/00000003.pdf
DRG-11609: URL store: unable to open local file specified by file://Compaq/FTQ/01.txt
Did I miss something?
Do I need to install anything on the file server?
I would like to convince my company that Oracle can be much quicker than Microsoft's Indexing Service because it can avoid joining two large result sets (one result set from Full_text (indexing service) and one for specific data contained in fields in the MS-SQL database.) Full Text Searches commonly take 40 - 60 seconds where there are 1.5 million multi-page PDF files for a particular set that I sample search on. Without this massive join, I believe I can get the search to run in under 10 seconds.

Thank you!
File_Datastore worked fine.
I was staying away from File_Datastore because the information I gathered from googling suggested that file_datastore would only work locally.
Now I just have to get Oracle to pull data out of tables in a MS-SQL database on the local network (don't have a clue yet), and then have it index compiled file paths.
Then MS-SQL can query Oracle with index and full-text criteria and Oracle can send back a result set
It may sound like a bad way of performing Full-Text Queries, but anything will be better than the way things are currently running. We are currently performing Full Text Searches on a table that is rebuilt nightly, so the table containing millions of file paths is not live..
It would be so much better if we just migrated to Oracle, but we currently do not have the resources.

Error while running the Oracle Text optimize index procedure (even as a dba user too)

Hi Experts,
I am on Oracle on 11.2.0.2 on Linux. I have implemented Oracle Text. My Oracle Text indexes are fragmented but I am getting an error while running the optimize_index error. Following is the error:
begin
ctx_ddl.optimize_index(idx_name=>'ACCESS_T1',optlevel=>'FULL');
end;
ERROR at line 1:
ORA-20000: Oracle Text error:
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DDL", line 941
ORA-06512: at line 1
Now I tried then to run this as DBA user too and it failed the same way!
begin
ctx_ddl.optimize_index(idx_name=>'BVSCH1.ACCESS_T1',optlevel=>'FULL');
end;
ERROR at line 1:
ORA-20000: Oracle Text error:
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DDL", line 941
ORA-06512: at line 1
Now CTXAPP role is granted to my schema and still I am getting this error. I will be thankful for the suggestions.
Also one other important observation: We have this issue ONLY in one database and in the other two databases, I don't see any problem at all.
I am unable to figure out what the issue is with this one database!
Thanks,
OrauserN

How about check the following?
Bug 10626728 - CTX_DDL.optimize_index "full" fails with an empty ORA-20000 since 11.2.0.2 upgrade (DOCID 10626728.8)

Getting error while importing schema with ORACLE TEXT

IMP-00003: ORACLE error 20000 encountered
ORA-20000: Oracle Text error:
DRG-52204: error while registering index
DRG-10507: duplicate index name: WORKORDER_Q, owner: SYS
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.DRIIMP", line 115
ORA-06512: at line 2
IMP-00088: Problem importing metadata for index WORKORDER_Q. Index creation will be skipped
Database version - Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
Os version - Linux nlxs1012.slb.atosorigin-asp.com 2.6.18-308.el5 #1 SMP Fri Jan 27 17:17:51 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
We have take export of schema from production db now importing data to qa environment..
In import facing above error..

I am importing objects from P20_MAXIMO to Q25_MAXIMO to another database..
Below is import par file..
USERID='/ as sysdba'
FILE=exp_P20_MAXIMO_C2364781.dmp
LOG=imp_P20_MAXIMO__Q25_MAXIMO_C2364781_1.log
FROMUSER=P20_MAXIMO
TOUSER=Q25_MAXIMO
buffer=1000000
feedback=100000
Export parfile
userid='/ as sysdba'
owner=P20_MAXIMO
FILE=exp_P20_MAXIMO_C2364781.dmp
LOG=exp_P20_MAXIMO_C2364781.log
buffer=10000000
feedback=100000
statistics=none

Pre-loading Oracle text in memory with Oracle 12c

There is a white paper from Roger Ford that explains how to load the Oracle index in memory : http://www.oracle.com/technetwork/database/enterprise-edition/mem-load-082296.html
In our application, Oracle 12c, we are indexing a big XML field (which is stored as XMLType with storage secure file) with the PATH_SECTION_GROUP. If I don't load the I table (DR$..$I) into memory using the technique explained in the white paper then I cannot have decent performance (and especially not predictable performance, it looks like if the blocks from the TOKEN_INFO columns are not memory then performance can fall sharply)
But after migrating to oracle 12c, I got a different problem, which I can reproduce: when I create the index it is relatively small (as seen with ctx_report.index_size) and by applying the technique from the whitepaper, I can pin the DR$ I table into memory. But as soon as I do a ctx_ddl.optimize_index('Index','REBUILD') the size becomes much bigger and I can't pin the index in memory. Not sure if it is bug or not.
What I found as work-around is to build the index with the following storage options:
ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'YES' );
ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
so that the token_info column will be stored in a secure file. Then I can change the storage of that column to put it in the keep buffer cache, and write a procedure to read the LOB so that it will be loaded in the keep cache. The size of the LOB column is more or less the same as when creating the index without the BIG_IO option but it remains constant even after a ctx_dll.optimize_index. The procedure to read the LOB and to load it into the cache is very similar to the loaddollarR procedure from the white paper.
Because of the SDATA section, there is a new DR table (S table) and an IOT on top of it. This is not documented in the white paper (the white paper was written for Oracle 10g). In my case this DR$ S table is much used, and the IOT also, but putting it in the keep cache is not as important as the token_info column of the DR I table. A final note: doing SEPARATE_OFFSETS = 'YES' was very bad in my case, the combined size of the two columns is much bigger than having only the TOKEN_INFO column and both columns are read.
Here is an example on how to reproduce the problem with the size increasing when doing ctx_optimize
1. create the table
drop table test;
CREATE TABLE test
(ID NUMBER(9,0) NOT NULL ENABLE,
XML_DATA XMLTYPE
XMLTYPE COLUMN XML_DATA STORE AS SECUREFILE BINARY XML (tablespace users disable storage in row);
2. insert a few records
insert into test values(1,'<Book><TITLE>Tale of Two Cities</TITLE>It was the best of times.<Author NAME="Charles Dickens"> Born in England in the town, Stratford_Upon_Avon </Author></Book>');
insert into test values(2,'<BOOK><TITLE>The House of Mirth</TITLE>Written in 1905<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
insert into test values(3,'<BOOK><TITLE>Age of innocence</TITLE>She got a prize for it.<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
3. create the text index
drop index i_test;
exec ctx_ddl.create_section_group('TEST_SGP','PATH_SECTION_GROUP');
begin
CTX_DDL.ADD_SDATA_SECTION(group_name => 'TEST_SGP',
                            section_name => 'SData_02',
                            tag => 'SData_02',
                            datatype => 'varchar2');
end;
exec ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
exec ctx_ddl.set_attribute('TEST_STO','I_TABLE_CLAUSE','tablespace USERS storage (initial 64K)');
exec ctx_ddl.set_attribute('TEST_STO','I_INDEX_CLAUSE','tablespace USERS storage (initial 64K) compress 2');
exec ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'NO' );
exec ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
create index I_TEST
on TEST (XML_DATA)
indextype is ctxsys.context
parameters('
    section group   "TEST_SGP"
    storage         "TEST_STO"
') parallel 2;
4. check the index size
select ctx_report.index_size('I_TEST') from dual;
it says :
TOTALS FOR INDEX TEST.I_TEST
TOTAL BLOCKS ALLOCATED:                                                104
TOTAL BLOCKS USED:                                                      72
TOTAL BYTES ALLOCATED:                                 851,968 (832.00 KB)
TOTAL BYTES USED:                                      589,824 (576.00 KB)
4. optimize the index
exec ctx_ddl.optimize_index('I_TEST','REBUILD');
and now recompute the size, it says
TOTALS FOR INDEX TEST.I_TEST
TOTAL BLOCKS ALLOCATED:                                               1112
TOTAL BLOCKS USED:                                                    1080
TOTAL BYTES ALLOCATED:                                 9,109,504 (8.69 MB)
TOTAL BYTES USED:                                      8,847,360 (8.44 MB)
which shows that it went from 576KB to 8.44MB. With a big index the difference is not so big, but still from 14G to 19G.
5. Workaround: use the BIG_IO option, so that the token_info column of the DR$ I table will be stored in a secure file and the size will stay relatively small. Then you can load this column in the cache using a procedure similar to
alter table DR$I_TEST$I storage (buffer_pool keep);
alter table dr$i_test$i modify lob(token_info) (cache storage (buffer_pool keep));
rem: now we must read the lob so that it will be loaded in the keep buffer pool, use the prccedure below
create or replace procedure loadTokenInfo is
type c_type is ref cursor;
c2 c_type;
s varchar2(2000);
b blob;
buff varchar2(100);
siz number;
off number;
cntr number;
begin
    s := 'select token_info from DR$i_test$I';
    open c2 for s;
    loop
       fetch c2 into b;
       exit when c2%notfound;
       siz := 10;
       off := 1;
       cntr := 0;
       if dbms_lob.getlength(b) > 0 then
         begin
           loop
             dbms_lob.read(b, siz, off, buff);
             cntr := cntr + 1;
             off := off + 4096;
           end loop;
         exception when no_data_found then
           if cntr > 0 then
             dbms_output.put_line('4K chunks fetched: '||cntr);
           end if;
         end;
       end if;
    end loop;
end;
Rgds, Pierre

I have been working a lot on that issue recently, I can give some more info.
First I totally agree with you, I don't like to use the keep_pool and I would love to avoid it. On the other hand, we have a specific use case : 90% of the activity in the DB is done by queuing and dbms_scheduler jobs where response time does not matter. All those processes are probably filling the buffer cache. We have a customer facing application that uses the text index to search the database : performance is critical for them.
What kind of performance do you have with your application ?
In my case, I have learned the hard way that having the index in memory (the DR$I table in fact) is the key : if it is not, then performance is poor. I find it reasonable to pin the DR$I table in memory and if you look at competitors this is what they do. With MongoDB they explicitly says that the index must be in memory. With elasticsearch, they use JVM's that are also in memory. And effectively, if you look at the awr report, you will see that Oracle is continuously accessing the DR$I table, there is a SQL similar to
SELECT /*+ DYNAMIC_SAMPLING(0) INDEX(i) */
TOKEN_FIRST, TOKEN_LAST, TOKEN_COUNT, ROWID
FROM DR$idxname$I
WHERE TOKEN_TEXT = :word AND TOKEN_TYPE = :wtype
ORDER BY TOKEN_TEXT, TOKEN_TYPE, TOKEN_FIRST
which is continuously done.
I think that the algorithm used by Oracle to keep blocks in cache is too complex. A just realized that in 12.1.0.2 (was released last week) there is finally a "killer" functionality, the in-memory parameters, with which you can pin tables or columns in memory with compression, etc. this looks ideal for the text index, I hope that R. Ford will finally update his white paper :-)
But my other problem was that the optimize_index in REBUILD mode caused the DR$I table to double in size : it seems crazy that this was closed as not a bug but it was and I can't do anything about it. It is a bug in my opinion, because the create index command and "alter index rebuild" command both result in a much smaller index, so why would the guys that developped the optimize function (is it another team, using another algorithm ?) make the index two times bigger ?
And for that the track I have been following is to put the index in a 16K tablespace : in this case the space used by the index remains more or less flat (increases but much more reasonably). The difficulty here is to pin the index in memory because the trick of R. Ford was not working anymore.
What worked:
first set the keep_pool to zero and set the db_16k_cache_size to instead. Then change the storage preference to make sure that everything you want to cache (mostly the DR$I) table come in the tablespace with the non-standard block size of 16k.
Then comes the tricky part : the pre-loading of the data in the buffer cache. The problem is that with Oracle 12c, Oracle will use direct_path_read for FTS which basically means that it bypasses the cache and read directory from file to the PGA !!! There is an event to avoid that, I was lucky to find it on a blog (I can't remember which, sorry for the credit).
I ended-up doing that. the events to 10949 is to avoid the direct path reads issue.
alter session set events '10949 trace name context forever, level 1';
alter table DR#idxname0001$I cache;
alter table DR#idxname0002$I cache;
alter table DR#idxname0003$I cache;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0001$I;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0002$I;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0003$I;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0001$I ITAB;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0002$I ITAB;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0003$I ITAB;
It worked. With a big relief I expected to take some time out, but there was a last surprise. The command
exec ctx_ddl.optimize_index(idx_name=>'idxname',part_name=>'partname',optlevel=>'REBUILD');
gqve the following
ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-50857: oracle error in drftoptrebxch
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DDL", line 1141
ORA-06512: at line 1
Which is very much exactly described in a metalink note 1645634.1 but in the case of a non-partitioned index. The work-around given seemed very logical but it did not work in the case of a partitioned index. After experimenting, I found out that the bug occurs when the partitioned index is created with dbms_pclxutil.build_part_index procedure (this enables enables intra-partition parallelism in the index creation process). This is a very annoying and stupid bug, maybe there is a work-around, but did not find it on metalink
Other points of attention with the text index creation (stuff that surprised me at first !) ;
- if you use the dbms_pclxutil package, then the ctx_output logging does not work, because the index is created immediately and then populated in the background via dbms_jobs.
- this in combination with the fact that if you are on a RAC, you won't see any activity on the box can be very frightening : this is because oracle can choose to start the workers on the other node.
I understand much better how the text indexing works, I think it is a great technology which can scale via partitioning. But like always the design of the application is crucial, most of our problems come from the fact that we did not choose the right sectioning (we choosed PATH_SECTION_GROUP while XML_SECTION_GROUP is so much better IMO). Maybe later I can convince the dev to change the sectionining, especially because SDATA and MDATA section are not supported with PATCH_SECTION_GROUP (although it seems to work, even though we had one occurence of a bad result linked to the existence of SDATA in the index definition). Also the whole problematic of mixed structured/unstructured searches is completly tackled if one use XML_SECTION_GROUP with MDATA/SDATA (but of course the app was written for Oracle 10...)
Regards, Pierre

Suggestion: Oracle text CONTEXT index on one or more columns ?

Hi,
I'm implementing Oracle text using CONTEXT ..... and would like to ask you for performance suggestion ...
I have a table of Articles .... with columns .. TITLE, SUBTITLE , BODY ...
Now is it better from performance point of view to move all three columns into one dummy column ... with name like FULLTEXT ... and put index on this single column,
and then use CONTAINS(FULLTEXT,'...')>0
Or is it almost the same for oracle if i put indexes on all three columns and then call:
CONTAINS(TITLE,'...')>0 OR CONTAINS(SUBTITLE,'...')>0 OR CONTAINS(BODY,'...')>0
I actually don't care if the result is a match in TITLE OR SUBTITLE OR BODY ....
So if i move into some FULLTEXT column, then i have duplicate data in a article row ... but if i create indexes for each column, than oracle has 2x more to index,optimize and search ... am I wright ?
Table has 1.8mil records ...
Thank you.
Kris

mackrispi wrote:
Now is it better from performance point of view to move all three columns into one dummy column ... with name like FULLTEXT ... and put index on this single column,
and then use CONTAINS(FULLTEXT,'...')>0What version of Oracle are you on? If 11 then you could use a virtual column to do this, otherwise you'd have to write code to maintain the column which can get messy.
mackrispi wrote:
Or is it almost the same for oracle if i put indexes on all three columns and then call:
CONTAINS(TITLE,'...')>0 OR CONTAINS(SUBTITLE,'...')>0 OR CONTAINS(BODY,'...')>0Benchmark it and find out :)
Another option would be something like this.
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9455353124561
Were i you, i would try out those 3 approaches and see which meet your performance requirements and weigh that with the ease of implementation and administration.

ERROR at line 1: ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine ORA-20000: Oracle Text error: DRG-10700: preference does not exist: global_lexer ORA-06512: at "CTXSYS.DRUE", line 160 ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 366

database version 11.2.0.4
rac two node
CREATE INDEX MAXIMO.ACTCI_NDX3 ON MAXIMO.ACTCI
(DESCRIPTION)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('lexer global_lexer language column LANGCODE')
ERROR at line 1:
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: Oracle Text error:
DRG-10700: preference does not exist: global_lexer
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 366

Like the error message says, you don't have a global_lexer. So, you need to create a global_lexer and that lexer must have at least a default sub_lexer, then you can use that global_lexer in your index parameters. Please see the demonstration below, including reproduction of the error and solution.
SCOTT@orcl12c> -- reproduction of problem:
SCOTT@orcl12c> CREATE TABLE actci
2    (description VARCHAR2(60),
3      langcode     VARCHAR2(30))
4 /
Table created.
SCOTT@orcl12c> CREATE INDEX ACTCI_NDX3 ON ACTCI (DESCRIPTION)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS('lexer global_lexer language column LANGCODE')
4 /
CREATE INDEX ACTCI_NDX3 ON ACTCI (DESCRIPTION)
ERROR at line 1:
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: Oracle Text error:
DRG-10700: preference does not exist: global_lexer
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 366
SCOTT@orcl12c> -- solution:
SCOTT@orcl12c> DROP INDEX actci_ndx3
2 /
Index dropped.
SCOTT@orcl12c> BEGIN
2    CTX_DDL.CREATE_PREFERENCE ('global_lexer', 'multi_lexer');
3    CTX_DDL.CREATE_PREFERENCE ('english_lexer', 'basic_lexer');
4    CTX_DDL.ADD_SUB_LEXER ('global_lexer', 'default', 'english_lexer');
5 END;
6 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> CREATE INDEX ACTCI_NDX3 ON ACTCI (DESCRIPTION)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS('lexer global_lexer language column LANGCODE')
4 /
Index created.

Upgrading Oracle Text - Post upgrade step 10.2 to 11.2

I already upgraded my 10.2.0.4 database to 11.2.0.1 and have to do post upgrade steps. In step 39 of Manual guideline (837570.1) is not clear for me. If some one can explain further would be appriciated. When i check my source ORACLE_HOME/ctx/admin/ctxf102.txt or ctxf102.sql
Step 39
Upgrading Oracle Text
Copy the following files from the previous Oracle home to the new Oracle home:
* Stemming user-dictionary files
* User-modified KOREAN_MORPH_LEXER dictionary files
* USER_FILTER executables
To obtain a list of the above files, use:
$ORACLE_HOME/ctx/admin/ctxf<version>.txt
$ORACLE_HOME/ctx/admin/ctxf<version>.sql
where version is 920,101,102
For instance, if upgrading from 10.2.0
*1. For dictionary files check*
*$ORACLE_HOME/ctx/admin/ctxf102.txt*
*2. Execute the script as database user SYS,SYSTEM, or CTXSYS*
*$ORACLE_HOME/ctx/admin/ctxf102.sql*
If your Oracle Text index uses KOREAN_LEXER which was deprecated in Oracle 9i and desupported in Oracle 10g Release 2, see below Note for further information on manual migration from KOREAN_LEXER to KOREAN_MORPH_LEXER.
Note 300172.1 Obsolescence of KOREAN_LEXER Lexer Type

Hi Srini
Thank you very much. now i got it.
Oracle asked me to identify the CTXCAT indexes with KOREAN_LEXER execute the following query as user CTXSYS: if nothing return then i can skip this step.
SELECT idx_name
FROM ctxsys.ctx_indexes
WHERE idx_type = 'CTXCAT'
AND idx_name IN
(SELECT ixo_index_name
FROM ctxsys.ctx_index_objects
WHERE ixo_class = 'LEXER'
AND ixo_object = 'KOREAN_MORPH_LEXER ');
SELECT isl_index_owner,isl_index_name,isl_language
FROM CTXSYS.ctx_index_sub_lexers
WHERE isl_object = 'KOREAN_MORPH_LEXER';

Using Oracle Text in Apex

Hi,
from what I've read about it, the following has to be done.
e.g. CREATE index ticket_keywords_index ON ticket(keywords) indextype IS ctxsys.context;
CREATE index ticket_solution_index ON ticket(solution) indextype IS ctxsys.context;
SELECT * from ticket where ctxsys.contain(:P12Value_to_find);
But I wonder, how does it know on which index it has to look ?
Is there anyway to specify on what it should look ?
If yes, any idea how one goes on about that ?
If no, any idea how to avoid getting information from the two columns back, if one only needs one ?
Could it in a way be done, by adding a column in apex, that allows to put in a checkbox, at the top, to say include this column in the search, or not, or is this not the good way to do so ?
Or am i missing a point ?
Thanks for the help,
Floris

Floris,
Your query should be of the form:
SELECT   *
FROM   ticket
WHERE   contains(indexed_col,:P12_VALUE_TO_FIND) > 0Where indexed_col is the name of the column on which you have built your Oracle Text index and :P12_VALUE_TO_FIND is the page item that contains the Search String.
Andy
http://atulley.wordpress.com/

Using Oracle Text in Oracle XML DB .

Hi all ,
The idea is simple ,i need to index all stored files in Oracle XML DB and the index should stay in Oracle DB . Using some 3 party index software is also possible but you need to write a mapping to move the index file in Oracle DB .
So i thought of using Oracle Text but i am not sure about how to retrieve such a document from Oracle XML DB , let me say over ftp or http ? . And if these documents are password protected -> how can Oracle Text allow this ?

[11gR2 XMLDB Developers Guide -- Full-Text Search over XML Data|http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10492/xdb09sea.htm#i1006756] would be the first place to start.
For document display, there a bunch of potential solutions, you can look at [XML DB Repository|http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10492/xdb03usg.htm#insertedID18], or the Text Application Developers Guide [Presenting Documents in Oracle Text|http://download.oracle.com/docs/cd/B28359_01/text.111/b28303/view.htm#i1006687] .
Password protected documents can't be indexed using the auto_filter.

Oracle text clustering

Similar Messages

Maybe you are looking for