Performance of context index with sorting

Dear All,
I've got a problem and don't know how to solve this.
there has a table which have a XMLTYPE field to store the unstructred xml, and created with context index.
When I try to select a record from it by using contains (res, '[searchingfield]')>0, the response time is quick, but when I try to order by another field which in the same table, the response time is drop down slightly. (ex. select id, path, res, update_date from testingtbl where contains(res, 'shopper')>0 order by update_date desc.
Actually there is a context index build for field 'res', any other index build for field 'update_date', when sql without 'order by update_date', the context index will use, but the update_date index will not be used even have ordering criteria.
Is there any expect can tell how to solve this? how to keep the performance even doing the sorting process?
Thanks and Regards
Raymond

Thanks for your quick reply.
The mentions information provide after back to office, actually I just want to know if there is any method(s) which can use the context index (with contains keyword) and sorting without slow down the performance.
Thanks and Regards
Raymond

Similar Messages

DML operations performance on table indexed with CTXCAT

Hi,
I have a table with 2M records. The table is batch updated once a day, and the number of record movements (update/delete/insert) should be around 100K.
The table is indexed with CTXCAT.
If I create the index from scratch, it takes 5minutes.
If I perform delete/insert/update operations involving 40K records, it takes a lot more (especially for delete and update operations, something like 30 minutes).
In this particular case I can drop the index and recreate it from scratch every night. The problem is that the 2M records table is only the first step in adoption of Oracle Text. The next step will be a 40M records table, on which the initial index creation takes something like 2hours (so I can't rebuild it every night).
Do you have any suggest?
Thanks.
-- table DDL
CREATE TABLE TAHZVCON_TEXT
CONSUMER_ID NUMBER(10) NOT NULL,
COMPANY_NAME VARCHAR2(4000 CHAR),
CITY VARCHAR2(30 BYTE),
PROVINCE VARCHAR2(3 CHAR),
POST_CODE VARCHAR2(10 BYTE)
CREATE UNIQUE INDEX TAHZVCON_TEXT_PK ON TAHZVCON_TEXT (CONSUMER_ID);
begin
ctx_ddl.drop_preference('mylex');
ctx_ddl.create_preference('mylex', 'BASIC_LEXER');
ctx_ddl.set_attribute('mylex', 'printjoins', '.#');
ctx_ddl.set_attribute('mylex', 'base_letter', 'YES');
ctx_ddl.set_attribute('mylex', 'index_themes','NO');
ctx_ddl.set_attribute('mylex', 'index_text','YES');
ctx_ddl.set_attribute('mylex', 'prove_themes','NO');
ctx_ddl.drop_preference('mywordlist');
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('mywordlist','stemmer','NULL');
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'NO');
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','NO');
ctx_ddl.drop_index_set('tahzvcon_iset');
ctx_ddl.create_index_set('tahzvcon_iset');
ctx_ddl.add_index('tahzvcon_iset','city');
ctx_ddl.add_index('tahzvcon_iset','province');
ctx_ddl.add_index('tahzvcon_iset','post_code');
end;
CREATE INDEX TAHZVCON_TEXT_TI01 ON TAHZVCON_TEXT(COMPANY_NAME)
INDEXTYPE IS CTXSYS.CTXCAT
PARAMETERS ('lexer mylex wordlist mywordlist index set tahzvcon_iset')
PARALLEL 8;
Andrea

Hi kevinUCB,
I've decided to use CTXCAT indexes because I had to perform queries involving different columns (company name, city, region, etc.). So I thought CTXCAT was the right index for me.
Now I've discovered that if I index an XML with CONTEXT, I can perform a search on single XML fields, so CONTEXT is suitable for my needs.
Preliminary test on the 2M record table looks very good.
Bye,
Andrea

Slow performance for context index

Hi, I'm just a newbie here in forum and I would like ask for your expertise about oracle context index. I have my sql and I'm using wild character for searching '%%' .
I used the sql below with a context index (ctxsys.context) in order to avoid full table scan for wild character searching.
SELECT BODY_ID
                    TITLE, trim(upper(title)) as title_sort,
                    SUM(JAN) as JAN,
                    SUM(FEB) as FEB,
                    SUM(MAR) as MAR,
                    SUM(APR) as APR,
                    SUM(MAY) as MAY,
                    SUM(JUN) as JUN,
                    SUM(JUL) as JUL,
                    SUM(AUG) as AUG,
                    SUM(SEP) as SEP,
                    SUM(OCT) as OCT,
                    SUM(NOV) as NOV,
                    SUM(DEC) AS DEC
                    FROM APP_REPCBO.CBO_TURNAWAY_REPORT
                    WHERE contains (BODY_ID,'%240103%') >0 and
PERIOD BETWEEN '1201' AND '1212'
                    GROUP BY BODY_ID, trim(upper(title))
But i was surprised that performance was very slow, and when I try this on explain plan time of performance almost consume 2 hours.
plan FOR succeeded.
PLAN_TABLE_OUTPUT
Plan hash value: 814472363
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1052K| 97M| | 805K (1)| 02:41:12 |
| 1 | HASH GROUP BY | | 1052K| 97M| 137M| 805K (1)| 02:41:12 |
|* 2 | TABLE ACCESS BY INDEX ROWID| CBO_TURNAWAY_REPORT | 1052K| 97M| | 782K (1)| 02:36:32 |
|* 3 | DOMAIN INDEX | CBO_REPORT_BID_IDX | | | | 663K (0)| 02:12:41 |
Predicate Information (identified by operation id):
2 - filter("PERIOD">='1201' AND "PERIOD"<='1212')
3 - access("CTXSYS"."CONTAINS"("BODY_ID",'%240103%')>0)
16 rows selected
oracle version: Oracle Database 11g Release 11.1.0.7.0 - 64bit Production
Thanks,
Zack

Hi Rod,
Thanks for the reply, yes I already made gather stats on that table, including rebuild index.
but its so strange when I use another body_id the performance will vary.
SQL> EXPLAIN PLAN FOR
2 SELECT BODY_ID
3 TITLE, trim(upper(title)) as title_sort,
4 SUM(JAN) as JAN,
5 SUM(FEB) as FEB,
6 SUM(MAR) as MAR,
7 SUM(APR) as APR,
8 SUM(MAY) as MAY,
9 SUM(JUN) as JUN,
10 SUM(JUL) as JUL,
11 SUM(AUG) as AUG,
12 SUM(SEP) as SEP,
13 SUM(OCT) as OCT,
14 SUM(NOV) as NOV,
15 SUM(DEC) as DEC
16 FROM WEB_REPCBO.CBO_TURNAWAY_REPORT
17 WHERE contains (BODY_ID,'%119915311%')> 0 and
18 PERIOD BETWEEN '1201' AND '1212'
19 GROUP BY BODY_ID, trim(upper(title));
SELECT * FROM TABLE (dbms_xplan.display);
Explained.
SQL>
Explained.
SQL>
PLAN_TABLE_OUTPUT
Plan hash value: 814472363
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 990 | 96030 | 1477 (1)| 00:00:18 |
| 1 | HASH GROUP BY | | 990 | 96030 | 1477 (1)| 00:00:18 |
|* 2 | TABLE ACCESS BY INDEX ROWID| CBO_TURNAWAY_REPORT | 990 | 96030 | 1475 (0)| 00:00:18 |
|* 3 | DOMAIN INDEX | CBO_REPORT_BID_IDX | | | 647 (0)| 00:00:08 |
Predicate Information (identified by operation id):
2 - filter("PERIOD">='1201' AND "PERIOD"<='1212')
3 - access("CTXSYS"."CONTAINS"("BODY_ID",'%119915311%')>0)
16 rows selected.

Help with context index with /, -, @

Hi all!
I have just start work with oracle. I have a problem with context index. Please help me. My problem is :
I have two column 'name' and 'address'. I index two column with context index (for example : Two index have name is 'Index1' and 'Index 2' ). I set parameter ('STOPLIST ctxsys.empty_stoplist') and i insert four rows such as : ('A','80/3 cong hoa'), ('B','80-3 cong hoa'), ('C','80@3 cong hoa'), ('D','80 3 cong hoa'). But when i execute this select :
select * from tablename where contains(address, '3 cong hoa') > 0
Result will return to me 4 rows But i just want one rows is ('D', '80 3 cong hoa').
I know oracle will convert character '/', '-', '@' to space so result will return 4 rows and i don't know how to oracle keep character '/', '-', '@' when oracle index. I just want to add with 'Index2' for column 'address' and i don't want to add with 'Index1' for column 'name'
Please help me, and thanks for your attention

So you want "/", "-" and "@" to link tokens, but you want "." to break numeric tokens?
OK, we can do that - though it seems a slightly odd requirement.
There are two special characters NUMJOIN and NUMGROUP which are used for purely numeric tokens. The default will vary by locale, but for English-speaking locales the defaults are "." and "," - so a number such as 1,234,567.89 will be treated as a single token. In French (and other) speaking locales, they are reversed since numbers are normally written as 1.234.567,89.
If you want to disable these NUMJOIN and NUMGROUP characters, so that numbers are always split into component tokens, then you can set both of the to the space character (it won't allow NULL or '', which would be more logical in my opinion).
drop table foo;
create table foo (bar varchar2(200));
insert into foo values ('80/3 cong hoa');
insert into foo values ('80-3 cong hoa');
insert into foo values ('80@3 cong hoa');
insert into foo values ('80 3 cong hoa');
insert into foo values ('80.3 cong hoa');
exec ctx_ddl.drop_preference('foo_lexer')
exec ctx_ddl.create_preference('foo_lexer', 'basic_lexer')
exec ctx_ddl.set_attribute('foo_lexer', 'PRINTJOINS', '/-@')
exec ctx_ddl.set_attribute('foo_lexer', 'PRINTJOINS', '/-@')
exec ctx_ddl.set_attribute('foo_lexer', 'NUMJOIN', ' ')
exec ctx_ddl.set_attribute('foo_lexer', 'NUMGROUP', ' ')
create index foo_index on foo(bar) indextype is ctxsys.context
parameters ('lexer foo_lexer');
select * from foo where contains (bar, '3 cong hoa') > 0;Output is:
BAR
80 3 cong hoa
80.3 cong hoa

Stop words handling with CONTEXT index - weird behavior

I have a context index with the following output from the report (describe index report).
CTX_REPORT.DESCRIBE_INDEX('KWTI10569_20121010115054')
===========================================================================
INDEX DESCRIPTION
===========================================================================
index name: "METCALF_T"."KWTI10569_20121010115054"
index id: 1524
index type: context
base table: "METCALF_T"."KWTD10569_20121010115054"
primary key column:
text column: MESSAGE_CONTENT
text column type: RAW(2000)
language column:
format column: FMT
charset column: CSET
===========================================================================
INDEX OBJECTS
===========================================================================
datastore: DIRECT_DATASTORE
filter: CHARSET_FILTER
charset: UTF8
section group: NULL_SECTION_GROUP
lexer: BASIC_LEXER
punctuations: .?!
skipjoins: _-"'`~!@#$%^&*()+=|}{[]\:;<>?/,
continuation: \-
index_stems: NONE
wordlist: BASIC_WORDLIST
stemmer: ENGLISH
fuzzy_match: GENERIC
stoplist: BASIC_STOPLIST
stop_word: how
stop_word: however
stop_word: i
stop_word: if
<trimmed for brevity of message......but all default stop words provided by Oracle has been added here>
storage: BASIC_STORAGE
i_table_clause: tablespace TEXT_INDEX storage (initial 10M next 10M)
k_table_clause: tablespace TEXT_INDEX storage (initial 10M next 10M)
r_table_clause: tablespace TEXT_INDEX storage (initial 1M) lob (data) store as (cache)
n_table_clause: tablespace TEXT_INDEX storage (initial 1M)
i_index_clause: tablespace TEXT_INDEX storage (initial 1M) compress 2
DB: 10g (10.2.0.4)
DB characterset: UTF8
Distinct tokens from index:
SQL> select distinct token_text from dr$KWTI10569_20121010115054$i;
TOKEN_TEXT
BLAH
EXPIRE
OFFER
My text content:
SQL>
SQL> select distinct utl_raw.cast_to_varchar2(message_content) from KWTD10569_20121010115054;
UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT)
blah blah offer will expire blah blah
offer expire
this offer shall expire
offer to expire
offer expire
blah blah offer expire blah blah
blah blah offer to expire blah blah
blah blah offer expire blah blah
offer will expire
blah blah this offer shall expire blah blah
10 rows selected.
Now, when i perform some contain queries i get some behavior that i cant understand.
When i search for "this offer will expire" i dont get every row (10 rows) - why is that?
SQL> select UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT) from KWTD10569_20121010115054 where contains(message_content,'this offer will expire')>0;
UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT)
blah blah offer will expire blah blah
this offer shall expire
blah blah offer to expire blah blah
blah blah this offer shall expire blah blah
Also, when i search for "offer expire" i get the following
SQL> select UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT) from KWTD10569_20121010115054 where contains(message_content,'offer expire')>0;
UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT)
offer expire
blah blah offer expire blah blah
blah blah offer expire blah blah
offer expire
I was thinking that the stop words will be ignored while searching in context grammar, so i would get all my rows back? Isnt that correct?
What i really want to achieve is that all these stop words are stripped from the content AND the keywords when i run the query and i get 100% matches. Any pointers on how that can be accomplished?

Roger-
Thanks again. Is there any place in Oracle doc that documents these two facts?
Please see the example below, does the number of words also matter? My search phrase was "the offer will expire" but why is that i didnt get rows like "offer to expire" back?
SQL> select distinct utl_raw.cast_to_varchar2(message_content) from KWTD10569_20121010115054;
UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT)
offer expire
blah blah offer expire blah blah
blah blah offer will expire blah blah
this offer shall expire
offer expire
offer to expire
blah blah offer to expire blah blah
blah blah offer expire blah blah
offer will expire
blah blah this offer shall expire blah blah
10 rows selected.
SQL> select UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT) from KWTD10569_20121010115054 where contains(message_content,'the offer will expire')>0;
UTL_RAW.CAST_TO_VARCHAR2(MESSAGE_CONTENT)
blah blah offer will expire blah blah
this offer shall expire
blah blah offer to expire blah blah
blah blah this offer shall expire blah blah

Trying to understand context indexes and contains-help

Hi
i am using
Achieving functionality of many preferences using one context index
to understand context indexes and contains
and i get the following
Error starting at line 1 in command:
begin
ctx_ddl.create_preference ('nd_mcds', 'multi_column_datastore');
ctx_ddl.set_attribute ('nd_mcds', 'columns', 'text nd, text text');
ctx_ddl.create_section_group ('nd_sg', 'basic_section_group');
ctx_ddl.add_ndata_section ('nd_sg', 'nd', 'nd');
ctx_ddl.create_preference ('test_lex', 'basic_lexer');
ctx_ddl.set_attribute ('test_lex', 'whitespace', '/\|-_+');
end;
Error report:
ORA-06550: line 5, column 15:
PLS-00302: component 'ADD_NDATA_SECTION' must be declared
ORA-06550: line 5, column 7:
PL/SQL: Statement ignored
06550. 00000 - "line %s, column %s:\n%s"
*Cause:    Usually a PL/SQL compilation error.
*Action:
so i am using the following to check for the error
http://docs.oracle.com/cd/E18283_01/text.112/e16593/cddlpkg.htm#BABCBFCB
plus
oracle text application developer's guide
plus
oracle text reference
but these have not listed that error (i have even googled this in vain)
background::we were actually using catsearch but because of its downsides i want to implement this
Is Achieving functionality of many preferences using one context index a good place to start when one does not know about
context and contains??
please post any other useful link for contains and context index that even explains
1) fuzzy
2) stem
3) synonym
4) near
5) soundex
6)ndata
7)lexer
thanks in advance

Ndata is new to Oracle 11g. Your other posts indicate that you are using Oracle 10g, so you don't have ndata, so you get an error when you try to use it. If you want to use the 11g features that enable context indexes with contains to do all of the things that ctxcat indexes with catsearch do, then you need to upgrade to 11g.
The online documentation is searchable. Most things regarding Oracle Text are contained in either the Oracle Text Reference or the Oracle Text Application Developer's guide.
I suggest that you start with something very simple, then build from there.
The following is similar to your other post that used catsearch:
SCOTT@orcl_11gR2> CREATE TABLE mv_cat_seg_reg_prod
2    (cat_ids       VARCHAR2 ( 7),
3      act_status    VARCHAR2 (10),
4      name           VARCHAR2 ( 1),
5      email           VARCHAR2 ( 1),
6      address1      VARCHAR2 ( 1),
7      address2      VARCHAR2 ( 1),
8      contact_name VARCHAR2 ( 1),
9      mobile           VARCHAR2 ( 1),
10      telephone     VARCHAR2 ( 1))
11 /
Table created.
SCOTT@orcl_11gR2> INSERT ALL
2 INTO mv_cat_seg_reg_prod VALUES
3    ('1', 'Y', 'A', 'B', 'C', 'D', 'E', 'F', 'G')
4 INTO mv_cat_seg_reg_prod VALUES
5    ('2', 'N', 'H', 'I', 'J', 'K', 'L', 'M', 'N')
6 SELECT * FROM DUAL
7 /
2 rows created.
SCOTT@orcl_11gR2> CREATE INDEX mv_cat_seg_reg_prod_idx
2 ON mv_cat_seg_reg_prod (cat_ids)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 /
Index created.
SCOTT@orcl_11gR2> SELECT token_text FROM dr$mv_cat_seg_reg_prod_idx$i
2 /
TOKEN_TEXT
1
2
2 rows selected.
SCOTT@orcl_11gR2> SELECT *
2 FROM   (SELECT SCORE (1), name, email, address1, address2, contact_name, mobile, telephone
3           FROM   mv_cat_seg_reg_prod
4           WHERE CONTAINS (cat_ids, '1', 1) > 0
5           AND    act_status = 'Y'
6           ORDER BY DBMS_RANDOM.VALUE)
7 WHERE ROWNUM < 8
8 /
SCORE(1) N E A A C M T
         4 A B C D E F G
1 row selected.

Privileges require for a user to create CONTEXT indexes

Hi all,
   RDBMS: 11.2.0.3
   SO.......: OEL 6.3
   What are the necessary privileges that have to be granted to a user to be able to create CONTEXT Indexes, for example. I have granted the CTXAPP to my user, but when i tryied to create the CONTEXT Index with the command bellow, i got an "insufficient privilege" error message.
   CREATE INDEX USR_DOCS.IDX_CTX_TAB_DOCUMENTOS_01 ON USR_DOCS.TAB_DOCUMENTOS(DOCUMENTO) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('SYNC (ON COMMIT)');

It depends on whether the user is trying to create the index on his own table in his own schema or on somebody else's table in somebody else's schema. The following demonstrates minimal privileges (quota could be smaller) for user usr_docs to create the index on his own table in his own schema and for my_user to create the index on usr_docs table in usr_docs schema.
SCOTT@orcl> -- version:
SCOTT@orcl> SELECT banner FROM v$version
2 /
BANNER
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0    Production
TNS for 64-bit Windows: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production
5 rows selected.
SCOTT@orcl> -- usr_docs privileges:
SCOTT@orcl> CREATE USER usr_docs IDENTIFIED BY usr_docs
2 /
User created.
SCOTT@orcl> ALTER USER usr_docs QUOTA UNLIMITED ON users
2 /
User altered.
SCOTT@orcl> GRANT CREATE SESSION, CREATE TABLE TO usr_docs
2 /
Grant succeeded.
SCOTT@orcl> -- my_user privileges:
SCOTT@orcl> CREATE USER my_user IDENTIFIED BY my_user
2 /
User created.
SCOTT@orcl> GRANT CREATE SESSION, CREATE ANY INDEX TO my_user
2 /
Grant succeeded.
SCOTT@orcl> -- user_docs:
SCOTT@orcl> CONNECT usr_docs/usr_docs
Connected.
USR_DOCS@orcl> CREATE TABLE tab_documentos (documento CLOB)
2 /
Table created.
USR_DOCS@orcl> INSERT ALL
2 INTO tab_documentos VALUES ('test data')
3 INTO tab_documentos VALUES ('other stuff')
4 SELECT * FROM DUAL
5 /
2 rows created.
USR_DOCS@orcl> CREATE INDEX USR_DOCS.IDX_CTX_TAB_DOCUMENTOS_01
2 ON USR_DOCS.TAB_DOCUMENTOS(DOCUMENTO)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS ('SYNC (ON COMMIT)')
5 /
Index created.
USR_DOCS@orcl> DROP INDEX usr_docs.idx_ctx_tab_documentos_01
2 /
Index dropped.
USR_DOCS@orcl> -- my_user:
USR_DOCS@orcl> CONNECT my_user/my_user
Connected.
MY_USER@orcl> CREATE INDEX USR_DOCS.IDX_CTX_TAB_DOCUMENTOS_01
2 ON USR_DOCS.TAB_DOCUMENTOS(DOCUMENTO)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS ('SYNC (ON COMMIT)')
5 /
Index created.

Limitations of Path Section Group Indexing with alphabet I

Hi,
I created oracle text context indexing with Path section group and was trying to retrieve an element using the below query but it was not giving me any output
select /*+FIRST_ROWS*/rowid,abccolumn,xyz column, efgcolumn,
from xyz table
where contains(xyzcolumn, 'I INPATH (/a:xyz/b:abc/c:efg/d:hij)',1)>0 )
but with the same query instead of I if i am replacing with O i am getting the output.
I wanted to know the significance of I as i was to retrieve all the other columns based on the search criteria .
There are only 2 elements I or O in that node.
with O i am able to retrieve so wanted to know how i can retrieve with I

begin
ctx_ddl.drop_section_group('xyz_path_group');
end;
begin
ctx_ddl.drop_preference('xyz_wildcard_pref1');
end;
begin
ctx_ddl.drop_preference('xyz_word_PREF');
end;
begin
ctx_ddl.drop_preference('xyz_lexer_PREF1');
end;
begin
ctx_ddl.create_section_group('xyz_path_group','PATH_SECTION_GROUP');
end;
begin
ctx_ddl.create_preference('xyz_word_PREF','BASIC_WORDLIST');
ctx_ddl.set_attribute('xyz_word_PREF','SUBSTRING_INDEX','TRUE');
ctx_ddl.set_attribute('xyz_word_PREF','PREFIX_INDEX','YES');
end;
begin
ctx_ddl.create_preference('xyz_lexer_pref1','BASIC_LEXER');
ctx_ddl.set_attribute('xyz_word_PREF','WILDCARD_MAXTERMS','15000');
end;
drop index xyz;
create index xyz on abc(pqr) indextype is ctxsys.context parameters
('datastore ctxsys.direct_datastore wordlist xyz_word_PREF FILTER ctxsys.null_filter lexer xyz_lexer_pref1
sync(on commit)SECTION GROUP xyz_path_group MEMORY 500M ');
This is how i created index and the table i have several column but the column pqr i am indexing has XML (Large) xml with 15 namespaces and with 3 prefixes

Oracle 10g – Performance with BIG CONTEXT indexes

I would like to use Oracle XE 10.2.0.1.0 only for the full-text searching of the files residing outside the database on the FTP server.
Recently I have found out that size of the files to be indexed is 5GB.
As I have read somewhere on this forum before size of the index should be 30-40% of the indexed text files (so with formatted documents like PDF or DOC even less).
Lets say that the CONTEXT index size over these files will be 1.5-2GB.
Number of the concurrent user will be max. 5.
I can not easily test it my self yet.
Does anybody have any experience with Oracle XE or other Oracle Database edition performance with the CONTEXT index this BIG?
Will Oracle XE hardware resources license limitation be sufficient to handle one CONTEXT indexe this BIG?
(Oracle XE license limitations: 1 GB RAM and 1 CPU)
Regards.

That depends on at least three things:
(1) what is the range of words that will appear in the document set (wide range of documents = smaller resultsets = better performance)
(2) how precise are the user's queries likely to be (more precise = smaller resultsets = better performance)
(3) how many milliseconds are your users willing to wait for results
So, unfortunately, you'll probably have to experiment a bit before you'll know...

Performance - composite index with 'OR' in 'WHERE' clause

I have a problem with the performance of the following query:
select /*+ index_asc(omschact oma_index1) */ knr, projnr, actnr from omschact where ((knr = 100 and actnr > 30) or knr > 100)
and rownum = 1;
(rownum used only for test purpose)
index:
create index on omschact (knr, projnr);
Execution plan:
Id Operation
0 SELECT STATEMENT
1 COUNT STOPKEY
2 TABLE ACCESS BY INDEX ROWID
3 INDEX FULL SCAN
If I'm correct, the 'OR' in the 'WHERE' clause is responsible for the INDEX FULL SCAN, what makes the query slow.
A solution would be then to separate the 'WHERE' clause in 2 separate select's (1 with 'knr = 100 and actnr > 30' and 1 with 'knr > 100' and combine the results with a UNION ALL.
Since it's necessary to have all rows in ascending order (oma_index1) I still have to use an ORDER BY to make sure the order of the rows is correct. This results again in a (too) low performance.
Another solution that does the trick is to create an index with the 2 fields (knr, projnr) concatenated and to use the same in the 'WHERE' clause:
create index oma_index2 on omschact (knr || projnr);
select /*+ index_asc(omschact oma_index2) */ knr, projnr, actnr from omschact where (knr || projnr) > 10030;
I just can't believe this work-around is the only solution, so I was hoping that someone here knows of a better way to solve this.

padders,
I'll give the real data instead of the example. The index I really use consists of 4 fields. In this table the fields are just numbers, but in other tables I need to use char-fields in indexes, so that's why I concatenate instead of using formula's (allthough I would prefer the latter).
SQL> desc omschact
Name Null? Type
KNR NOT NULL NUMBER(8)
PROJNR NOT NULL NUMBER(8)
ACTNR NOT NULL NUMBER(8)
REGELNR NOT NULL NUMBER(3)
REGEL CHAR(60)
first methode:
SQL> create index oma_key_001(knr,projnr,actnr,regelnr);
Index created.
SQL> select /*+ index_asc(omschact oma_key_001) */ * from omschact where
2 (knr > 100 or
3 (knr = 100 and projnr > 30) or
4 (knr = 100 and projnr = 30 and actnr > 100000) or
5 (knr = 100 and projnr = 30 and actnr = 100000 and regelnr >= 0));
Execution Plan
Plan hash value: 1117430516
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 11M| 822M| 192K (1)| 00:38:26 |
| 1 | TABLE ACCESS BY INDEX ROWID| OMSCHACT | 11M| 822M| 192K (1)| 00:38:26 |
|* 2 | INDEX FULL SCAN | OMA_KEY_001 | 11M| | 34030 (1)| 00:06:49 |
Predicate Information (identified by operation id):
2 - filter("KNR">100 OR "KNR"=100 AND "PROJNR">30 OR "KNR"=100 AND "PROJNR"=30
AND "ACTNR">100000 OR "ACTNR"=100000 AND "KNR"=100 AND "PROJNR"=30 AND
"REGELNR">=0)
second method (same index):
SQL> select * from (
2 select /*+ index_asc(omschact oma_key_001) */ * from omschact where knr > 100
3 union all
4 select /*+ index_asc(omschact oma_key_001) */ * from omschact where knr = 100 and projnr > 30
5 union all
6 select /*+ index_asc(omschact oma_key_001) */ * from omschact where knr = 100 and projnr = 30 and actnr > 100000
7 union all
8 select /*+ index_asc(omschact oma_key_001) */ * from omschact where knr = 100 and projnr = 30 and actnr = 100000 and regelnr > 0)
9 order by knr, projnr, actnr, regelnr;
Execution Plan
Plan hash value: 292918786
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 11M| 1203M| | 477K (1)| 01:35:31 |
| 1 | SORT ORDER BY | | 11M| 1203M| 2745M| 477K (1)| 01:35:31 |
| 2 | VIEW | | 11M| 1203M| | 192K (1)| 00:38:29 |
| 3 | UNION-ALL | | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID| OMSCHACT | 11M| 822M| | 192K (1)| 00:38:26 |
|* 5 | INDEX RANGE SCAN | OMA_KEY_001 | 11M| | | 33966 (1)| 00:06:48 |
| 6 | TABLE ACCESS BY INDEX ROWID| OMSCHACT | 16705 | 1272K| | 294 (1)| 00:00:04 |
|* 7 | INDEX RANGE SCAN | OMA_KEY_001 | 16705 | | | 54 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID| OMSCHACT | 47 | 3666 | | 4 (0)| 00:00:01 |
|* 9 | INDEX RANGE SCAN | OMA_KEY_001 | 47 | | | 3 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| OMSCHACT | 1 | 78 | | 4 (0)| 00:00:01 |
|* 11 | INDEX RANGE SCAN | OMA_KEY_001 | 1 | | | 3 (0)| 00:00:01 |
Predicate Information (identified by operation id):
5 - access("KNR">100)
7 - access("KNR"=100 AND "PROJNR">30)
9 - access("KNR"=100 AND "PROJNR"=30 AND "ACTNR">100000)
11 - access("KNR"=100 AND "PROJNR"=30 AND "ACTNR"=100000 AND "REGELNR">0)
third method:
SQL> create index oma_test(to_char(knr,'00000000')||to_char(projnr,'00000000')||to_char(actnr,'00000000')||to_char(regelnr,'000'));
Index created.
SQL> select /*+ index_asc(omschact oma_test) */ * from omschact where
2 (to_char(knr,'00000000')||to_char(projnr,'00000000')||
3 to_char(actnr,'00000000')||to_char(regelnr,'000')) >=
4 (to_char(100,'00000000')||to_char(30,'00000000')||
5* to_char(100000,'00000000')||to_char(0,'000'))
Execution Plan
Plan hash value: 424961364
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 553K| 55M| 1712 (1)| 00:00:21 |
| 1 | TABLE ACCESS BY INDEX ROWID| OMSCHACT | 553K| 55M| 1712 (1)| 00:00:21 |
|* 2 | INDEX RANGE SCAN | OMA_TEST | 99543 | | 605 (1)| 00:00:08 |
Predicate Information (identified by operation id):
2 - access(TO_CHAR("KNR",'00000000')||TO_CHAR("PROJNR",'00000000')||TO_CHAR("
ACTNR",'00000000')||TO_CHAR("REGELNR",'000')>=TO_CHAR(100,'00000000')||TO_CHAR(3
0,'00000000')||TO_CHAR(100000,'00000000')||TO_CHAR(0,'000'))

CONTEXT index creation - performance!

Hi,
I have a table with about 5Million rows. The content that needs to be indexed is of RAW datatype. The average size (length) of this field is about 50 characters (it could be more).
I am trying to index this column to perform a keyword search. DEtails are furnished below.
table:
SQL> desc kwtai
Name Null? Type
TSD_HH24 DATE
COUNTRY_CODE_ALPHA_2 VARCHAR2(2)
ONETWORK NUMBER(6)
OADDRESS VARCHAR2(25)
DNETWORK NUMBER(6)
DADDRESS VARCHAR2(25)
MESSAGE_LENGTH NUMBER
MESSAGE_CONTENT RAW(2000)
Preferences:-
begin
Ctx_Ddl.Create_Preference('mc_storage', 'BASIC_STORAGE');
ctx_ddl.set_attribute('mc_storage','I_TABLE_CLAUSE',
'tablespace large_index storage (initial 10M next 10M)');
ctx_ddl.set_attribute('mc_storage', 'K_TABLE_CLAUSE',
'tablespace large_index storage (initial 10M next 10M)');
ctx_ddl.set_attribute('mc_storage', 'R_TABLE_CLAUSE',
'tablespace large_index storage (initial 1M) lob (data) store as (cache)');
ctx_ddl.set_attribute('mc_storage', 'N_TABLE_CLAUSE',
'tablespace large_index storage (initial 1M)');
ctx_ddl.set_attribute('mc_storage', 'I_INDEX_CLAUSE',
'tablespace large_index storage (initial 1M) compress 2');
ctx_ddl.create_preference('mc_lex', 'BASIC_LEXER');
ctx_ddl.set_attribute('mc_lex', 'skipjoins', '_-"''`~!@#$%^&*()+=|}{[]\:;<>?/.,');
ctx_ddl.set_attribute('mc_lex', 'INDEX_STEMS','NONE');
end;
create index kwtaidx on kwtai (message_content) indextype is ctxsys.context
parameters (' lexer mc_lex storage mc_storage memory 500M ')
parallel 16;
This create index takes about 4 hours to complete on a 8CPU dual core machine.
This is on Oracle 10g (10.2.0.4)
The reason i am creating the index as opposed to syncing it is because the data gets loaded into this table only once a day and it gets cleared once my keyword analysis is done.
Any pointers to speed up the index creation will be really appreciated! Thanks in advance!

My base table has the text that needs to be indexed stored in the "MESSAGE_CONTENT" column which for now is RAW data type. The data stored in this table are in hex representation.
Some examples -
MESSAGE_CONTENT
616C70686120626574612067616D6D612064656C746120657073696C6F6E207A657461206E69F16F
616C70686120626574612067616D6D612064656C746120657073696C6F6E207A657461
616C70686120626574612067616D6D612064656C746120657073696C6F6E207A657461206E69C3B16F
6865792E2C2C77686174277320676F696E67206F6E2E2E2E7066206368616E67277320697320736F6D652072657374617572616E742E2074686579206172652070736564756F2D636F6F6C
54686520477265656B20616C7068616265742069732074686520736372697074207468617420686173206265656E
54686520477265656B20616C7068616265742069732074686520736372697074207468617420686173206265656E20706F73742D64617461
Now with your suggestion i tried to bypass this. So what i did was added a format column to my base table and updated it to "TEXT". My database is in UTF8.
Now when i create the index with the following preferences it takes less than a minute.
begin
Ctx_Ddl.Create_Preference('kwta_storage', 'BASIC_STORAGE');
ctx_ddl.set_attribute('kwta_storage','I_TABLE_CLAUSE',
'tablespace TEXT_INDEX storage (initial 10M next 10M)');
ctx_ddl.set_attribute('kwta_storage', 'K_TABLE_CLAUSE',
'tablespace TEXT_INDEX storage (initial 10M next 10M)');
ctx_ddl.set_attribute('kwta_storage', 'R_TABLE_CLAUSE',
'tablespace TEXT_INDEX storage (initial 1M) lob (data) store as (cache)');
ctx_ddl.set_attribute('kwta_storage', 'N_TABLE_CLAUSE',
'tablespace TEXT_INDEX storage (initial 1M)');
ctx_ddl.set_attribute('kwta_storage', 'I_INDEX_CLAUSE',
'tablespace TEXT_INDEX storage (initial 1M) compress 2');
ctx_ddl.create_preference('mylex', 'BASIC_LEXER');
ctx_ddl.set_attribute('mylex', 'skipjoins', '_-"''`~!@#$%^&*()+=|}{[]\:;<>?/,');
ctx_ddl.set_attribute('mylex','punctuations','.?!');
ctx_ddl.set_attribute('mylex', 'INDEX_STEMS','NONE');
ctx_ddl.set_attribute('mylex', 'continuation','\-');
Ctx_Ddl.Create_Stoplist ( 'mystop' );
Ctx_Ddl.Add_Stopword ( 'mystop', 'is' );
Ctx_Ddl.Add_Stopword ( 'mystop', 'has' );
Ctx_Ddl.Add_Stopword ( 'mystop', 'the' );
Ctx_Ddl.Add_Stopword ( 'mystop', 'that' );
end;
create index kwtaidx on kwtai (message_content) indextype is ctxsys.context
parameters ('filter ctxsys.auto_filter format column fmt stoplist mystop lexer mylex storage kwta_storage memory 500M')
parallel 16;
When i select distinct tokens from the $I table i get the following
TOKEN_TEXT
RESTAURANT
GREEK
WHATS
ARE
DELTA
ZETA
ALPHA
ALPHABET
EPSILON
PF
PSEDUOCOOL
SOME
CHANGS
NIÃO
ON
POSTDATA
SCRIPT
BEEN
GAMMA
GOING
HEY
NI
O
THEY
BETA
Now what i am also wondering is if the text (message_content column) is being converted to UTF8 (database characterset) by using AUTO_FILTER. Is my assumption correct? Not sure how to validate this?
And, would you kindly share of why RAW must not be used in this case?
Thanks for all your pointers!

Context Index and performance

Hi,
I want to create a context index on one column which contains large text. And the table contains millions of records and daily inserts happen into the same table. My question is
1.Do we need to run any procedures after inserting the records daily?
2.Is there any problem from performace point of view creating context index on the table
Thanks,
Sri

sri333 wrote:
Hi,
I want to create a context index on one column which contains large text. And the table contains millions of records and daily inserts happen into the same table. My question is
1.Do we need to run any procedures after inserting the records daily?Not for what you describe. But you didn't describe much. I guess you will do something with this table data later. It depends from that. But since you only mentioned that you insert. Then no there is nothing to do after that.
2.Is there any problem from performace point of view creating context index on the tableSure. Creating the index takes time. If the index is there new inserts will take more time.
Edited by: Sven W. on Oct 10, 2012 12:02 PM

Can you help with RoboHelp Version 11: WebHelp Index Keyword Sorting?

I'm new to RoboHelp 11, and I am finding it difficult to alphabetize topics listed under my Index Keywords. When I look at the keyword topics in my RoboHelp HTML editor, they are listed in alphabetical order (see the Tools topic in the first image), but when I generate WebHelp the Tools topics are not in the correct order (second image). I believe that the problem pertains to new entries made to a converted RoboHelp Version 6 WebHelp application. Basically, I have been adding content to several old version 6-generated html files in the new RoboHelp HTML editor.
Another issue that's perplexing is the fact that the Move Up and Move Down icons at the top of the Index editor pod, or whatever it's called, are grayed-out (not functioning). I remember with the Version 6 application, they worked fine.
Can anyone offer any suggestions on how to get the index alphabetized? I appreciate your help.

Hi, pweb248
Just an expansion of what Rick has suggested. Binary index is only used when Microsoft HTML Help "CHM" is your primary layout. So, because WebHelp is your primary layout, Binary should definitely be unchecked. Selecting the Index file (HHK) is fairly standard for WebHelp use and it is sorted numerically and alphabetically by default. The HHK file contains all the Index keywords and their topic associations all tucked into one file, whereas adding Index Keywords using the "Topic" radio button embeds the coded reference right in the topic html file itself.
This online help topic explains a little more about the Sorting options depending on the primary layouts and whether Binary is selected.
Adobe RoboHelp 11 * Edit index keywords
This is the key paragraph:
>>Note: The Sort command is unavailable with a binary index. The sort function is enabled only when the primary layout is HTML Help and the Index is set to Index File with no Binary Index. In all other layouts, the index remains sorted but for HTML output, the sorting of the index can be changed. Sorting enables the up and down keys on Index Pod.<<
As you have noticed, your Index Designer view is apparently working as documented. I share your puzzlement about the out-of-sort listing in the WebHelp output shown in your screenshot. I wonder if there is some left-over crud from the ancient RoboHelp 6 code that is not converted properly and gumming up the works? Also curious if you have more than one Index in your project and if you are selecting the right one in the WebHelp Settings > Content dialog. Maybe Rick, Peter or Willam can shed some light on this?
John

Substring search with Oracle context indexes

Hi,
i would like to know if it is possibile to do a substring search with one of the obtion offer with the context indexes.
(ctxcat,ctxrule,context)
example:
i would like to search the word 'berub' in a column A in table_example.
the value in the column a are :
The betther
berube
A.berube
berub
Berub
BERUB
R berube
S tartif
Y Thibeault
the rows return should be :
berube
A.berube
berub
Berub
BERUB
R berube
A simple sql could be
select * from table_example where upper(a) like upper('%berub%' );
How i can do this same action with the context indexes and a select (catsearch, contains, matches), if it is possible?
A example will be welcome
Thanks

I know how to do explain plan.
my point is not the query i post, it's just a example.
I have many query on my production we optimize many times (they past from 3min to 15 sec with optimisation, but we want to have better result). At this point we are looking to implant the context indexes to make them more efficient.
Do make this sql more efficient we have to deal with like '%xxxxxx%' and the context indexes like to be a option, but we have to be able to do some substring search with context option.
Is it possible to do it and how?
This is my question and why i post it here. The query is just a simple example to illsutrate what i want.
Thanks to anyone who can answer my question.

Content has been indexed with Info only. Resubmit should only be performed

Hi All,
Im using the Oracle Content Server (OCS) , When im trying to checkin new document then i get the below mentioned error message can any one plz tell me that what is the problem.
Error Message:_
Text conversion of the file failed.
Content has been indexed with Info only. Resubmit should only be performed if the problem has been resolved.
Text conversion of the file '//awusrp04/PortalStg/oracle/inetucmstg/weblayout/groups/public/@enterprise/@hr/documents/document/s_013020.pdf' failed.
**Content has been indexed with Info only. Resubmit should only be performed if the problem has been resolved. **

Hello Experts,
I am Facing the Same Issue, anybody know the solution for the same?
Thanks in Advance.

Performance of context index with sorting

Similar Messages

Maybe you are looking for