Performance of Oracle Text
Hi,
I'm tasked to help design an application that will have Oracle Text powering the searching logic. The application will have millions of records (in the 30 million to 50 million range), but there's a restriction that 95% of all searches must be able to complete in 1 second or less (!)
So, my question is, is it possible for Oracle Text to meet this criteria? Assuming we have the best hardware, etc. Or should I look for another solution/approach?
Regards,
Roy
Hi Roy,
It's pretty hard to give a yes/no answer based on the limited information. I will say that Oracle's method of indexing is fairly standard (dictionary/postings list) so you are not likely to find a better solution if your records are stored in Oracle. The Oracle Text advantage - tight integration with the database already. You have storage and query optimization features of Oracle when using Oracle Text. < 1 sec response time is pretty tight for any search, but I don't think you'd have any better chance at hitting it with another solution.
Thanks,
Ron
Similar Messages
-
Bad Query Performance in Oracle Text
Hello everyone, I have the following problem:
I have a table, TABLE_A from now on, a table of more or less 1,000.000 rows, with a CONTEXT index, using FILE_DATASTORE, CTXSYS.DEFAULT_STORAGE, CTXSYS.NULL_FILTER, CTXSYS.BASIC_LEXER and querying the index in the following way:
SELECT /*+FIRST_ROWS*/ A.ID, B.ID2, SCORE(1) FROM TABLE_A A, TABLE_B WHERE A.ID = B.ID AND CONTAINS(A.PATH, '<SOME KW>', 1) > 0 ORDER BY SCORE(1) DESC
where TABLE_B has another 1,000.000 rows.
The problem is that the query response time is much higher after some time of inactivity regarding those tables. How can I avoid this behavior?. The fact is that those inactivity times (not more than 20min) are core to my application, so I always get long long response times for my queries.
Is there any cache or cache time parameter that affects this behavior? I have checked the Oracle Text documentation without finding anything about that...
More data: I am using Oracle 9.2.0.1, but I have tested with the latest patches an the behavior is the same...
Thank you very much in advance.Pablo,
This appears to be a generic database or OS issue, not a Text specific issue. It really depends on what your application is doing.
If your application is doing some other database activity such as queries or DMLs on other non-text tables, chances are Oracle Text related data blocks are being aged out of cache. You can either increase the db_cache_size init
parmater or try to keep the text tables and index tables blocks in cache using ALTER TABLE commands.
If your app is doing NON-database activity, then chances are your application is taking up much of the machine's physical memory such that OS is swapping ORACLE out of the memory. In which case, you may want to consider to add more memory to the machine or have ORACLE run on a separate machine by itself. -
Poor performance using Oracle Text
We are currently using wildcard queries and we have setup substring or prefix index which is recommended in Oracle's documentation in order to improve performance but we have not seen any improvements. Does anyone have any other recommendations?
We are currently using wildcard queries and we have setup substring or prefix index which is recommended in Oracle's documentation in order to improve performance but we have not seen any improvements. Does anyone have any other recommendations?
-
Oracle text Improve perfomance
What is the best way to improve performance in Oracle text?
I retrieve some data in a csv file, i verify the data in a table (CUSTOMER) to
know if must make a INSERT or no.
It make 10 seconds to process only two rows.
How can i improve performance?What are you exactly doing? Do you have an Oracle Text index on that table, or are you just manipulating text in the table?
Give us the code what you are doing and we can help you further.
Herald ten Dam -
Performance issues and options to reduce load with Oracle text implementation
Hi Experts,
My database on Oracle 11.2.0.2 on Linux. We have Oracle Text implemented for fuzzy search. Our oracle text indexes are defined as sync on commit as we can not afford to have stale data. Now our application does literally thousands of inserts/updates/deletes to those columns where we have these Oracle text indexes defined. As a result, we are seeing a lot of performance impact due to the oracle text sync routines being called on each commit. We are doing the index optimization every night (full optimization every night at 3 am). The oracle text index related internal operations are showing up as top sql in our AWR report and there are concerns that it is causing lot of load on the DB. Since we do the full index optimization only once at night, I am thinking should I change that , and if I do so, will it help us?
For example here are some data from my one day's AWR report:
Elapsed Time (s)
Executions
Elapsed Time per Exec (s)
%Total
%CPU
%IO
SQL Id
SQL Module
SQL Text
27,386.25
305,441
0.09
16.50
15.82
9.98
ddr8uck5s5kp3
begin ctxsys.drvdml.com_sync_i...
14,618.81
213,980
0.07
8.81
8.39
27.79
02yb6k216ntqf
begin ctxsys.syncrn(:idxownid,...
Full Text of above top sql:
ddr8uck5s5kp3
begin ctxsys.drvdml.com_sync_index(:idxname, :idxmem, :partname);
end
02yb6k216ntqf
begin ctxsys.syncrn(:idxownid, :idxoname, :idxid, :ixpid, :rtabnm, :flg); end;
Now if I do the full index optimization more often and not just once at night 3 PM, will that mean, the load on DB due to sync on commit will decrease? If yes how often should I optimized and doesn't the optimization itself lead to some load? Can someone suggest?
Thanks,
OrauserNYou can query the ctx_parameters view to see what your default and maximum memory values are:
SCOTT@orcl12c> COLUMN bytes FORMAT 9,999,999,999
SCOTT@orcl12c> COLUMN megabytes FORMAT 9,999,999,999
SCOTT@orcl12c> SELECT par_name AS parameter,
2 TO_NUMBER (par_value) AS bytes,
3 par_value / 1048576 AS megabytes
4 FROM ctx_parameters
5 WHERE par_name IN ('DEFAULT_INDEX_MEMORY', 'MAX_INDEX_MEMORY')
6 ORDER BY par_name
7 /
PARAMETER BYTES MEGABYTES
DEFAULT_INDEX_MEMORY 67,108,864 64
MAX_INDEX_MEMORY 1,073,741,824 1,024
2 rows selected.
You can set the memory value in your index parameters:
SCOTT@orcl12c> CREATE INDEX EMPLOYEE_IDX01
2 ON EMPLOYEES (EMP_NAME)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS ('SYNC (ON COMMIT) MEMORY 1024M')
5 /
Index created.
You can also modify the default and maximum values using CTX_ADM.SET_PARAMETER:
http://docs.oracle.com/cd/E11882_01/text.112/e24436/cadmpkg.htm#CCREF2096
The following contains general guidelines for what to set the max_index_memory parameter and others to:
http://docs.oracle.com/cd/E11882_01/text.112/e24435/aoptim.htm#CCAPP9274 -
hi all.
i have useing orcle text for the indexing purposes ..
the below query goes for an wildcard search and it performance is very poor ..
/* Formatted on 2009/08/11 16:06 (Formatter Plus v4.8.5) */
SELECT *
FROM (SELECT z.*, ROWNUM r
FROM (SELECT *
FROM (SELECT *
FROM (SELECT score (1) score,
SUBSTR (note, 1, 200) note, tmplt_id,
appl_name, appl_use, appl_reference,
appl_entity, effctv_start_date,
effctv_end_date, eq_name, note_id,
Bfcommon.getvaluebasedontemplate
(tmplt_id,
note_id
) VALUE,
(SELECT em_name
FROM ip_eq
WHERE eq_name = a.eq_name) em_name,
Bfcommon.is_edit_allowed_note1
(a.note_id,
'PACRIM1\E317329',
SYSDATE,
SYSDATE
) LOCKED
FROM bf_note a
WHERE tmplt_id NOT IN (19, 14, 16)
AND effctv_start_date >=
TO_DATE ('08/12/2008 00:00:00',
'MM/DD/YYYY HH24:MI:SS'
AND ( effctv_end_date <=
TO_DATE
('08/12/2009 00:00:00',
'MM/DD/YYYY HH24:MI:SS'
OR effctv_end_date IS NULL
AND contains (note, '%test%', 1) > 0) r
UNION
SELECT score (1) score, SUBSTR (note, 1, 200) note,
tmplt_id, appl_name, appl_use,
appl_reference, appl_entity,
effctv_start_date, effctv_end_date, eq_name,
a.note_id, b.trgt_name VALUE,
(SELECT em_name
FROM ip_eq
WHERE eq_name = a.eq_name) em_name,
Bfcommon.is_edit_allowed_note1
(a.note_id,
'PACRIM1\E317329',
SYSDATE,
SYSDATE
) LOCKED
FROM bf_note a, om_limit_hist b
WHERE ( date_changed =
(SELECT MAX (date_changed)
FROM om_limit_hist h1
WHERE date_changed <
(SELECT MAX (date_changed)
FROM om_limit_hist h2
WHERE h2.trgt_name =
b.trgt_name)
AND h1.trgt_name = b.trgt_name)
OR date_changed =
(SELECT MAX (date_changed)
FROM om_limit_hist h1
WHERE h1.trgt_name = b.trgt_name)
AND b.note_id = a.note_id(+)
AND tmplt_id = 19
AND effctv_start_date >=
TO_DATE ('08/12/2008 00:00:00',
'MM/DD/YYYY HH24:MI:SS'
AND ( effctv_end_date <=
TO_DATE ('08/12/2009 00:00:00',
'MM/DD/YYYY HH24:MI:SS'
OR effctv_end_date IS NULL
AND contains (note, '%test%', 1) > 0)
ORDER BY 1 DESC) z
WHERE ROWNUM <= 50)
WHERE r >= 1here it goes for the wild card search for the string test ...Plz tell me how to index it in this case ..
and its plan is
Execution Plan
Plan hash value: 3535478881
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 50 | 128K| | 632 (1)| 00:00:08 |
|* 1 | VIEW | | 50 | 128K| | 632 (1)| 00:00:08 |
|* 2 | COUNT STOPKEY | | | | | | |
| 3 | VIEW | | 69 | 177K| | 632 (1)| 00:00:08 |
|* 4 | SORT ORDER BY STOPKEY | | 69 | 177K| 376K| 632 (1)| 00:00:08 |
| 5 | VIEW | | 69 | 177K| | 590 (1)| 00:00:08 |
| 6 | SORT UNIQUE | | 69 | 7281 | | 590 (2)| 00:00:08 |
| 7 | UNION-ALL | | | | | | |
|* 8 | TABLE ACCESS BY INDEX ROWID | BF_NOTE | 68 | 7140 | | 585 (1)| 00:00:08 |
|* 9 | DOMAIN INDEX | BF_NOTE_TEXT_SEARCH | | | | 221 (0)| 00:00:03 |
|* 10 | FILTER | | | | | | |
| 11 | NESTED LOOPS | | 1 | 141 | | 3 (0)| 00:00:01 |
| 12 | TABLE ACCESS FULL | OM_LIMIT_HIST | 1 | 36 | | 2 (0)| 00:00:01 |
|* 13 | TABLE ACCESS BY INDEX ROWID| BF_NOTE | 1 | 105 | | 1 (0)| 00:00:01 |
|* 14 | INDEX UNIQUE SCAN | BF_NOTE_PK | 1 | | | 1 (0)| 00:00:01 |
| 15 | SORT AGGREGATE | | 1 | 23 | | | |
|* 16 | INDEX RANGE SCAN | OM_LIMIT_HIST_IDX1 | 1 | 23 | | 0 (0)| 00:00:01 |
| 17 | SORT AGGREGATE | | 1 | 23 | | | |
|* 18 | INDEX RANGE SCAN | OM_LIMIT_HIST_IDX1 | 1 | 23 | | 0 (0)| 00:00:01 |
| 19 | SORT AGGREGATE | | 1 | 23 | | | |
|* 20 | INDEX RANGE SCAN | OM_LIMIT_HIST_IDX1 | 1 | 23 | | 0 (0)| 00:00:01 |
Predicate Information (identified by operation id):
1 - filter("R">=1)
2 - filter(ROWNUM<=50)
4 - filter(ROWNUM<=50)
8 - filter("EFFCTV_START_DATE">=TO_DATE(' 2008-08-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"TMPLT_ID"19 AND "TMPLT_ID"14 AND "TMPLT_ID"16 AND ("EFFCTV_END_DATE" IS NULL OR
"EFFCTV_END_DATE"<=TO_DATE(' 2009-08-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
9 - access("CTXSYS"."CONTAINS"("NOTE",'%test%',1)>0)
10 - filter("DATE_CHANGED"= (SELECT /*+ */ MAX("DATE_CHANGED") FROM "OM_LIMIT_HIST" "H1" WHERE
"H1"."TRGT_NAME"=:B1 AND "DATE_CHANGED"< (SELECT /*+ */ MAX("DATE_CHANGED") FROM "OM_LIMIT_HIST" "H2" WHER
E
"H2"."TRGT_NAME"=:B2)) OR "DATE_CHANGED"= (SELECT /*+ */ MAX("DATE_CHANGED") FROM "OM_LIMIT_HIST" "H1"
WHERE "H1"."TRGT_NAME"=:B3))
13 - filter("TMPLT_ID"=19 AND "EFFCTV_START_DATE">=TO_DATE(' 2008-08-12 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "CTXSYS"."CONTAINS"("NOTE",'%test%',1)>0 AND ("EFFCTV_END_DATE" IS NULL OR
"EFFCTV_END_DATE"<=TO_DATE(' 2009-08-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
14 - access("B"."NOTE_ID"="A"."NOTE_ID")
16 - access("H1"."TRGT_NAME"=:B1 AND "DATE_CHANGED"< (SELECT /*+ */ MAX("DATE_CHANGED") FROM
"OM_LIMIT_HIST" "H2" WHERE "H2"."TRGT_NAME"=:B2))
filter("DATE_CHANGED"< (SELECT /*+ */ MAX("DATE_CHANGED") FROM "OM_LIMIT_HIST" "H2" WHERE
"H2"."TRGT_NAME"=:B1))
18 - access("H2"."TRGT_NAME"=:B1)
20 - access("H1"."TRGT_NAME"=:B1)What version of Oracle are you using?
Your sample query is a good example of the benefits and costs of 'mixed' queries, and the challenges of mixed query performance. Oracle 11g has some very helpful new features (search for SDATA) that can really improve performance. (Specifically, it looks like your query does some useful date-range bounding. You need to get that into the FT index).
In the end, it's not going to be easy to look at your reasonably complex query, understand the data and relationships, and wave the magic wand to make the thing go fast. -
Oracle text performance with context search indexes
Search performance using context index.
We are intending to move our search engine to a new one based on Oracle Text, but we are meeting some
bad performance issues when using search.
Our application allows the user to search stored documents by name, object identifier and annotations(formerly set on objects).
For example, suppose I want to find a document named ImportSax2.c: according to user set parameters, our search engine format the following
search queries :
1) If the user explicitely ask for a search by document name, the query is the following one =>
select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c WITHIN objname' , 1 ) > 0;
2) If the user don't specify any extra parameters, the query is the following one =>
select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c' , 1 ) > 0;
Oracle text only need around 7 seconds to answer the second query, whereas it need around 50 seconds to give an answer for the first query.
Here is a part of the sql script used for creating the Oracle Text index on the column OBJFIELDURL
(this column stores a path to an xml file containing properties that have to be indexed for each object) :
begin
Ctx_Ddl.Create_Preference('wildcard_pref', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('wildcard_pref', 'wildcard_maxterms', 200) ;
ctx_ddl.set_attribute('wildcard_pref','prefix_min_length',3);
ctx_ddl.set_attribute('wildcard_pref','prefix_max_length',6);
ctx_ddl.set_attribute('wildcard_pref','STEMMER','AUTO');
ctx_ddl.set_attribute('wildcard_pref','fuzzy_match','AUTO');
ctx_ddl.set_attribute('wildcard_pref','prefix_index','TRUE');
ctx_ddl.set_attribute('wildcard_pref','substring_index','TRUE');
end;
begin
ctx_ddl.create_preference('doc_lexer_perigee', 'BASIC_LEXER');
ctx_ddl.set_attribute('doc_lexer_perigee', 'printjoins', '_-');
ctx_ddl.set_attribute('doc_lexer_perigee', 'BASE_LETTER', 'YES');
ctx_ddl.set_attribute('doc_lexer_perigee','index_themes','yes');
ctx_ddl.create_preference('english_lexer','basic_lexer');
ctx_ddl.set_attribute('english_lexer','index_themes','yes');
ctx_ddl.set_attribute('english_lexer','theme_language','english');
ctx_ddl.set_attribute('english_lexer', 'printjoins', '_-');
ctx_ddl.set_attribute('english_lexer', 'BASE_LETTER', 'YES');
ctx_ddl.create_preference('german_lexer','basic_lexer');
ctx_ddl.set_attribute('german_lexer','composite','german');
ctx_ddl.set_attribute('german_lexer','alternate_spelling','GERMAN');
ctx_ddl.set_attribute('german_lexer','printjoins', '_-');
ctx_ddl.set_attribute('german_lexer', 'BASE_LETTER', 'YES');
ctx_ddl.set_attribute('german_lexer','NEW_GERMAN_SPELLING','YES');
ctx_ddl.set_attribute('german_lexer','OVERRIDE_BASE_LETTER','TRUE');
ctx_ddl.create_preference('japanese_lexer','JAPANESE_LEXER');
ctx_ddl.create_preference('global_lexer', 'multi_lexer');
ctx_ddl.add_sub_lexer('global_lexer','default','doc_lexer_perigee');
ctx_ddl.add_sub_lexer('global_lexer','german','german_lexer','ger');
ctx_ddl.add_sub_lexer('global_lexer','japanese','japanese_lexer','jpn');
ctx_ddl.add_sub_lexer('global_lexer','english','english_lexer','en');
end;
begin
ctx_ddl.create_section_group('axmlgroup', 'AUTO_SECTION_GROUP');
end;
drop index ADSOBJ_XOBJFIELDURL force;
create index ADSOBJ_XOBJFIELDURL on ADSOBJ(OBJFIELDURL) indextype is ctxsys.context
parameters
('datastore ctxsys.file_datastore
filter ctxsys.inso_filter
sync (on commit)
lexer global_lexer
language column OBJFIELDURLLANG
charset column OBJFIELDURLCHARSET
format column OBJFIELDURLFORMAT
section group axmlgroup
Wordlist wildcard_pref
Oracle created a table named DR$ADSOBJ_XOBJFIELDURL$I which now contains around 25 millions records.
ADSOBJ is the table contaings information for our documents,OBJFIELDURL is the field that contains the path to the xml file containing
data to index. That file looks like this :
<?xml version="1.0" encoding="UTF-8" ?>
<fields>
<OBJNAME><![CDATA[NomLnk_177527o.jpgp]]></OBJNAME>
<OBJREM><![CDATA[Z_CARACT_141]]></OBJREM>
<OBJID>295926o.jpgp</OBJID>
</fields>
Can someone tell me how I can make that kind of request
"select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c WITHIN objname' , 1 ) > 0;"
run faster ?Below are the execution plan for both the 2 requests :
select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c WITHIN objname' , 1 ) > 0
PLAN_TABLE_OUTPUT
| Id | Operation |Name |Rows |Bytes |Cost (%CPU)|
| 0 | SELECT STATEMENT | |1272 |119K | 4 (0) |
| 1 | TABLE ACCESS BY INDEX ROWID |ADSOBJ |1272 |119K | 4 (0) |
| 2 | DOMAIN INDEX |ADSOBJ_XOBJFIELDURL | | | 4 (0) |
Note
- 'PLAN_TABLE' is old version
Executed in 2 seconds
select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c' , 1 ) > 0
PLAN_TABLE_OUTPUT
| Id |Operation |Name |Rows |Bytes |Cost (%CPU)|
| 0 | SELECT STATEMENT | |1272 |119K | 4 (0) |
| 1 | TABLE ACCESS BY INDEX ROWID |ADSOBJ |1272 |119K | 4 (0) |
| 2 | DOMAIN INDEX |ADSOBJ_XOBJFIELDURL | | | 4 (0) |
Sorry for the result formatting, I can't get it "easily" readable :( -
Oracle Text Indexing performance in Unicode database
Forum folks,
I'm looking for overall performance thoughts in Text Indexing within a Unicode database. Part of our internal testing suites includes searching on values using contains filters over indexed binary and text documents. We've architected these tests such that they could be run in a suite or on their own, thus, the data is loaded at the beginning of each test and then the text indexes are created and populated prior to running any of the actual testing.
We have the same tests running on non-unicode instances of Oracle 11gR2 just fine, but when we run them against a Unicode instance, we are almost always seeing timing issues where the indexes haven't finished populating, thus our tests are saying we've only found n number of hits when we are expecting n+ 50 or in some cases n + 150 records to be returned.
We are just looking for some general information in regards to text indexing performance in a unicode database. Will we need to add sleep time to the testing to allow for the indexes to populate? How much time? We would rather not get into having to create different tests for unicode vs non-unicode, but perhaps that is necessary.
Any insight you could provide would be most appreciated.
Thanks in advance,
DanRoger,
Thanks much for your quick reply...
When you talk about Unicode, do you mean AL32UTF8?
--> Yes, this is the Unicode charset we are using.
Is the data the same in both cases, or are you indexing simple 7-bit ascii data in the one database, and foreign text (maybe Chinese?) in the UTF8 database?
With the same data, there should be virtually no difference in performance due to the AL32UTF8 database character set.
--> We have a data generation tool we utilize. For non-unicode data, we generate using all 256 characters in the ISO-8859-1 set. With our Unicode data for clobs, we generate using only the first 1,000 characters of UTF8 by setting up an array of code points...0 - 1000. For Blobs, we have sets of sample word documents and pdfs that are inserted, then indexed.
I'm not sure I understand your testing methodology. Do you run ( load-data, index-data, run-queries ) sequentially?
--> That is correct. We utilize the ctx_ddl package to populate the pending table and then to sync the index....The following is an example of the ddl we generate to create and populate the index:
create index "DBMEARSPARK_ORA80"."RESRESUMEDOC" on "DBMEARSPARK_ORA80"."RESUME" ("RESUMEDOC") indextype is CTXSYS.CONTEXT parameters(' nopopulate sync (every "SYSTIMESTAMP + INTERVAL ''30'' MINUTE" PARALLEL 2) filter ctxsys.auto_filter ') PARALLEL 2;
execute ctx_ddl.populate_pending('"DBMEARSPARK_ORA80"."RESRESUMEDOC"',null);
execute ctx_ddl.sync_index('"DBMEARSPARK_ORA80"."RESRESUMEDOC"',null,null,2);
If so, there should be no way that the indexes can be half-created. If not, don't you have some check to see if the index creation has finished before running the query test?
--> Excellent question....is there such a check? I have not found a way to do that yet...
Were you just lucky with the "non-unicode" tests that the indexing just happened to have always finished by the time you ran the queries?
--> This is quite possible as well. If there is a check to see if the index is ready, then we could add that into our infrastructure.
--> Thanks, again, for responding so quickly.
Edited by: djulson on Feb 12, 2013 7:13 AM -
Performance issue with Oracle Text index
Hi Experts,
We are on Oracle 11.2..0.3 on Solaris 10. I have implemented Oracle Text in our environment and I am facing a strange performance issue that is happening in our environment.
One sql having CONTAINS clause is taking forever - more than 20 minutes and still does not complete. This sql has a contains clause and an exists clause and a not exists clause.
Now if I remove the exists clause and a not exists clause , it completes fast. but with those two clauses it is just taking forever. It is late night so i am not able to post the table and sql query details and will do so tomorrow but based on this general description, are there any pointers for me to review?
sql query doing fine:
SELECT
U.CLNT_OID, U.USR_OID, S.MAILADDR
FROM
access_usr U
INNER JOIN access_sia S
ON S.USR_OID = U.USR_OID AND S.CLNT_OID = U.CLNT_OID
WHERE U.CLNT_OID = 'ABCX32S'
AND CONTAINS(LAST_NAME , 'TO%' ) >0
--sql query that hangs forever:
SELECT
U.CLNT_OID, U.USR_OID, S.MAILADDR
FROM
access_usr U
INNER JOIN access_sia S
ON S.USR_OID = U.USR_OID AND S.CLNT_OID = U.CLNT_OID
WHERE U.CLNT_OID = 'ABCX32S'
AND CONTAINS(LAST_NAME , 'TO%' ) >0
and exists (--one clause here wiht a few table joins)
and not exists (--one clause here wiht a few table joins);
--Now another strange thing I found is if instead of 'TO%' in this sql, if I were to use 'ZZ%' or 'L1%' it works fast but for 'TO%' it goes slow with those two exists not exists clauses!
I will be most thankful for the inputs.
OrauserNHi Barbara,
First of all, thanks a lot for reviewing the issue.
Unluckily making the change to empty_stoplist did not work out. I am today copying the entire sql here that has this issue and will be most thankful for more insights/pointers on what can be done.
Here is the entire sql:
SELECT U.CLNT_OID,
U.USR_OID,
S.EMAILADDRESS,
U.FIRST_NAME,
U.LAST_NAME,
S.JOBCODE,
S.LOCATION,
S.DEPARTMENT,
S.ASSOCIATEID,
S.ENTERPRISECOMPANYCODE,
S.EMPLOYEEID,
S.PAYGROUP,
S.PRODUCTLOCALE
FROM ACCESS_USR U
INNER JOIN
ACCESS_SIA S
ON S.USR_OID = U.USR_OID AND S.CLNT_OID = U.CLNT_OID
WHERE U.CLNT_OID = 'G39NY3D25942TXDA'
AND EXISTS
(SELECT 1
FROM ACCESS_USR_GROUP_XREF UGX
INNER JOIN ACCESS_GROUP RELG
ON RELG.CLNT_OID = UGX.CLNT_OID
AND RELG.GROUP_OID = UGX.GROUP_OID
INNER JOIN ACCESS_GROUP G
ON G.CLNT_OID = RELG.CLNT_OID
AND G.GROUP_TYPE_OID = RELG.GROUP_TYPE_OID
WHERE UGX.CLNT_OID = U.CLNT_OID
AND UGX.USR_OID = U.USR_OID
AND G.GROUP_OID = 920512943
AND UGX.INCLUDED = 1)
AND NOT EXISTS
(SELECT 1
FROM ACCESS_USR_GROUP_XREF UGX
INNER JOIN
ACCESS_GROUP G
ON G.CLNT_OID = UGX.CLNT_OID
AND G.GROUP_OID = UGX.GROUP_OID
WHERE UGX.CLNT_OID = U.CLNT_OID
AND UGX.USR_OID = U.USR_OID
AND G.GROUP_OID = 920512943
AND UGX.INCLUDED = 1)
AND CONTAINS (U.LAST_NAME, 'Bon%') > 0;
Like I said before if the EXISTS and NOT EXISTS clause are removed it works in sub-second. But with those EXISTS and NOT EXISTS CLAUSE IT TAKES ANY WHERE FROM 25 minutes to more than one hour.
NOte also that it was not TO% but Bon% in the CONTAINS clause that is giving the issue - sorry that was wrong on my part.
Also please see below the ORACLE TEXT index defined on the table ACCESS_USER:
--definition of preferences used in the index:
SET SERVEROUTPUT ON size unlimited
WHENEVER SQLERROR EXIT SQL.SQLCODE
DECLARE
v_err VARCHAR2 (1000);
v_sqlcode NUMBER;
v_count NUMBER;
BEGIN
ctxsys.ctx_ddl.create_preference ('cust_lexer', 'BASIC_LEXER');
ctxsys.ctx_ddl.set_attribute ('cust_lexer', 'base_letter', 'YES'); -- removes diacritics
EXCEPTION
WHEN OTHERS
THEN
v_err := SQLERRM;
v_sqlcode := SQLCODE;
v_count := INSTR (v_err, 'DRG-10701');
IF v_count > 0
THEN
DBMS_OUTPUT.put_line (
'The required preference named CUST_LEXER with BASIC LEXER is already set up');
ELSE
RAISE;
END IF;
END;
DECLARE
v_err VARCHAR2 (1000);
v_sqlcode NUMBER;
v_count NUMBER;
BEGIN
ctxsys.ctx_ddl.create_preference ('cust_wl', 'BASIC_WORDLIST');
ctxsys.ctx_ddl.set_attribute ('cust_wl', 'SUBSTRING_INDEX', 'true'); -- to improve performance
EXCEPTION
WHEN OTHERS
THEN
v_err := SQLERRM;
v_sqlcode := SQLCODE;
v_count := INSTR (v_err, 'DRG-10701');
IF v_count > 0
THEN
DBMS_OUTPUT.put_line (
'The required preference named CUST_WL with BASIC WORDLIST is already set up');
ELSE
RAISE;
END IF;
END;
--now below is the code of the index:
CREATE INDEX ACCESS_USR_IDX3 ON ACCESS_USR
(FIRST_NAME)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('LEXER cust_lexer WORDLIST cust_wl SYNC (ON COMMIT)');
CREATE INDEX ACCESS_USR_IDX4 ON ACCESS_USR
(LAST_NAME)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('LEXER cust_lexer WORDLIST cust_wl SYNC (ON COMMIT)');
The strange thing is that, like I said, If I remove the exists clause the query returns very fast. Also if I modify the query to use only one NOT EXISTS clause and remove the other EXISTS clause it returns in less than one second. Also if I remove the EXISTS clause and use only the NOT EXISTS clause it returns in less than 4 seconds. But with both clauses it runs forever!
When I tried to get dbms_xplan.display_cursor to get the query plan (for the case of both exists and not exists clause in the query), it said that previous statement's sql id was 0 or something like that so that I was not able to see the query plan. I will keep trying to get this plan (it takes 25 minutes to one hour each time but will get this info soon). Again any pointers are most helpful.
Regards
OrauserN -
Regular expression vs oracle text performance
Does anyone have experience with comparig performance of regular expression vs oracle text?
We need to implement a text search on a large volume table, 100K-500K rows.
The select stmt will select from a VL, a view joining 2 tables, B and _TL.
We need to search 2 text columns from this _VL view.
Using regex seems less complex, but the deciding factor is of course performace.
Would oracle text search perform better than regular expression in general?
Thanks,
MargaretHi Dominc,
Thanks, we'll try both...
Would you be able to validate our code to create the multi-table index:
CREATE OR REPLACE PACKAGE requirements_util AS
PROCEDURE concat_columns(i_rowid IN ROWID, io_text IN OUT NOCOPY VARCHAR2);
END requirements_util;
CREATE OR REPLACE PACKAGE BODY requirements_util AS
PROCEDURE concat_columns(i_rowid IN ROWID, io_text IN OUT NOCOPY VARCHAR2)
AS
tl_req pjt_requirements_tl%ROWTYPE;
b_req pjt_requirements_b%ROWTYPE;
CURSOR cur_req_name (i_rqmt_id IN pjt_requirements_tl.rqmt_id%TYPE) IS
SELECT rqmt_name FROM pjt_requirements_tl
WHERE rqmt_id = i_rqmt_id;
PROCEDURE add_piece(i_add_str IN VARCHAR2) IS
lx_too_big EXCEPTION;
PRAGMA EXCEPTION_INIT(lx_too_big, -6502);
BEGIN
io_text := io_text||' '||i_add_str;
EXCEPTION WHEN lx_too_big THEN NULL; -- silently don't add the string.
END add_piece;
BEGIN
BEGIN
SELECT * INTO b_req FROM pjt_requirements_b WHERE ROWID = i_rowid;
EXCEPTION
WHEN NO DATA_FOUND THEN
RETURN;
END;
add_piece(b_req.req_code);
FOR tl_req IN cur_req_name(b_req.rqmt_id) LOOP
add_piece(tl_req.rqmt_name);
END concat_columns;
END requirements_util;
EXEC ctx_ddl.drop_section_group('rqmt_sectioner');
EXEC ctx_ddl.drop_preference('rqmt_user_ds');
BEGIN
ctx_ddl.create_preference('rqmt_user_ds', 'USER_DATASTORE');
ctx_ddl.set_attribute('rqmt_user_ds', 'procedure', sys_context('userenv','current_schema')||'.'||'requirements_util.concat_columns');
ctx_ddl.set_attribute('rqmt_user_ds', 'output_type', 'VARCHAR2');
END;
CREATE INDEX rqmt_cidx ON pjt_requirements_b(req_code)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('DATASTORE rqmt_user_ds
SYNC (ON COMMIT)'); -
Oracle Text ALTER INDEX Performance
Greetings,
We have encountered some ehancement issues with Oracle Text and really need assistance.
We are using Oracle 9i (Release 9.0.1) Standard Edition
We are using a very simple Oracle text environmet, with CTXSYS.CONTEXT indextype on Domain Indexes.
We have indexed two text columns in one table, one of these columns is CLOB.
Currently if one of these columns is modified, we are using a trigger to automatically ALTER the index.
This is very slow, it is just like dropping the index and creating it again.
Is this right? should it be this slow?
We are also trying to use the ONLINE parameter for ALTER INDEX and CREATE INDEX, but it gives an error saying this feature is not enabled.
How can we enable it?
Is there any way in improving the performance of this automatic update of the indexes?
Would using a trigger be the best way to do this?
How can we optimize it to a more satifactory performance level?
Also, are we able to use the language lexers for indexes with the Standard Edition. If so, how do you enable the CTX_DLL?
Many thanks for any assistance.
Chi-Shyan WangIf you are going to sync your index on every update, you need to make sure that you are optmizing it on a regular basis to remove index fragmentation and remove deleted rows.
you can set up a dmbs_job to do a ctx_ddl.optmize and run a full optmize periodically.
Also, depending on the number of rows you have, and also the size of the data, you might want to look at using a CTXCAT index, which is transactional, stays in sync automatically and does not need to be optimized. CTXCAT indexes do not work well on large text objects (they are good for a couple lines of text at most) so they may not suit your dataset. -
Oracle Text performance -- failed attempts
We are trying to implement a simple search of text data stored in a heavily used table (inserts/updates). There are 3 columns to index --
Headline (varchar2(255))
Subheadline (varchar2(255))
Teaser (varchar2(4000))
The first attempt to implement Oracle text w/ CATSEARCH
begin
ctx_ddl.create_index_set('cms_iset');
ctx_ddl.add_index('cms_iset','poolid_cp, mediaid_cp'); /* sub-index A */
end;
---- We knew we were going to filter on poolid_cp and mediaid_cp ---
CREATE INDEX cms_headlineidx ON con_properties (headline)
INDEXTYPE IS ctxsys.CTXCAT
PARAMETERS ('index set cms_iset');
CREATE INDEX cms_subheadlineidx ON con_properties (subheadline)
INDEXTYPE IS ctxsys.CTXCAT
PARAMETERS ('index set cms_iset');
CREATE INDEX cms_teaseridx ON con_properties (teaser)
INDEXTYPE IS ctxsys.CTXCAT
PARAMETERS ('index set cms_iset');
*********THE RESULTS*************
Our application server would spin up threads that would appear to be hanging. The load on the DB servers (RAC) were higher than normal. This implementation would have saved on having to do resync's manually.
The next attempt was implementing w/ CONTEXT:
alter table con_properties add (dummy varchar2(1));
begin
ctx_ddl.create_preference('con_propsearch', 'MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute('con_propsearch', 'columns', 'headline,subheadline,teaser');
end;
CREATE INDEX con_properties_searchidx
ON con_properties(dummy)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('datastore CTXSYS.con_propsearch')
Records getting put into the ctx_user_pending table a few hundred per hour.
********THE RESULTS*************
Same issue with the application servers spinning off threads that seem to be hung. Spikey load on the DB servers (RAC).
NOTE: In both implementations, running search querys ran OK. However, dropping the text index in BOTH cases caused the application servers to behave normally.
Can anyone tell me what's going on internally with Oracle TEXT when a table is heavily inserted and updated? What is going on in the background. Is there some sort of lock that the app servers are waiting on? I know there is "overhead" with inserts on a normal b-tree index. Is it "exponential" with Oracle Text?
Thank you!When documents in the base table are inserted, updated, or deleted, their ROWIDs are held in a DML queue until you synchronize the index. You can view this queue with the CTX_USER_PENDING view. Apparently, you are not synchronizing your context index, so the queue is building infinitely. You need to establish some method of synchronizing your index. You can use parameters('sync(on commit)') in your index creation or create an after insert or update statement level trigger, not row trigger, that uses dbms_job.submit to schedule ctx_ddl.sync_index to synchronize the index upon commit of the dml or you can manually run ctx_ddl.sync_index periodically or schedule it or you can alter and rebuild your index periodically or you can drop and recreate it periodically. Which method you choose depends on how current the information that you query needs to be. If your data needs to be current up to the moment, the you should sync on commit. Otherwise it may be better to do it in periodic batches.
-
How do I get Oracle Text to index files on a file server?
I am new to Oracle (I'm a MS-SQL DBA looking for a Full-Text Search solution that is better than linking to a MS index server.)
So - Here's the objective:
I have Oracle Server(Express) installed on a Windows server.
I would like for Oracle to build a Full-Text Catalog of the files on a separate file server based on file paths in a table in the database.
(No desire to store terabytes of images and documents inside the database)
I can get Oracle text up and running, using the URL_Datastore:
CREATE TABLE files (id NUMBER PRIMARY KEY, issue_id NUMBER, path VARCHAR(255) UNIQUE, ot_format VARCHAR(6), ot_version VARCHAR(10));
The Compaq server is a remote windows server on my local workgroup, so the fully qualified path is just "compaq" and the URL is valid:
INSERT INTO files VALUES (9,9,'file://Compaq/FTQ/00000003.pdf',NULL,NULL);
INSERT INTO files VALUES (13,13,'file://Compaq/FTQ/01.txt',NULL,NULL);
CREATE INDEX file_index ON files(path) INDEXTYPE IS ctxsys.context
PARAMETERS ('datastore ctxsys.URL_DATASTORE format column ot_format');
but when I enter:
Select * from CTX_User_Index_errors, I see the following errors:
DRG-11609: URL store: unable to open local file specified by file://Compaq/FTQ/00000003.pdf
DRG-11609: URL store: unable to open local file specified by file://Compaq/FTQ/01.txt
Did I miss something?
Do I need to install anything on the file server?
I would like to convince my company that Oracle can be much quicker than Microsoft's Indexing Service because it can avoid joining two large result sets (one result set from Full_text (indexing service) and one for specific data contained in fields in the MS-SQL database.) Full Text Searches commonly take 40 - 60 seconds where there are 1.5 million multi-page PDF files for a particular set that I sample search on. Without this massive join, I believe I can get the search to run in under 10 seconds.Thank you!
File_Datastore worked fine.
I was staying away from File_Datastore because the information I gathered from googling suggested that file_datastore would only work locally.
Now I just have to get Oracle to pull data out of tables in a MS-SQL database on the local network (don't have a clue yet), and then have it index compiled file paths.
Then MS-SQL can query Oracle with index and full-text criteria and Oracle can send back a result set
It may sound like a bad way of performing Full-Text Queries, but anything will be better than the way things are currently running. We are currently performing Full Text Searches on a table that is rebuilt nightly, so the table containing millions of file paths is not live..
It would be so much better if we just migrated to Oracle, but we currently do not have the resources. -
Pre-loading Oracle text in memory with Oracle 12c
There is a white paper from Roger Ford that explains how to load the Oracle index in memory : http://www.oracle.com/technetwork/database/enterprise-edition/mem-load-082296.html
In our application, Oracle 12c, we are indexing a big XML field (which is stored as XMLType with storage secure file) with the PATH_SECTION_GROUP. If I don't load the I table (DR$..$I) into memory using the technique explained in the white paper then I cannot have decent performance (and especially not predictable performance, it looks like if the blocks from the TOKEN_INFO columns are not memory then performance can fall sharply)
But after migrating to oracle 12c, I got a different problem, which I can reproduce: when I create the index it is relatively small (as seen with ctx_report.index_size) and by applying the technique from the whitepaper, I can pin the DR$ I table into memory. But as soon as I do a ctx_ddl.optimize_index('Index','REBUILD') the size becomes much bigger and I can't pin the index in memory. Not sure if it is bug or not.
What I found as work-around is to build the index with the following storage options:
ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'YES' );
ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
so that the token_info column will be stored in a secure file. Then I can change the storage of that column to put it in the keep buffer cache, and write a procedure to read the LOB so that it will be loaded in the keep cache. The size of the LOB column is more or less the same as when creating the index without the BIG_IO option but it remains constant even after a ctx_dll.optimize_index. The procedure to read the LOB and to load it into the cache is very similar to the loaddollarR procedure from the white paper.
Because of the SDATA section, there is a new DR table (S table) and an IOT on top of it. This is not documented in the white paper (the white paper was written for Oracle 10g). In my case this DR$ S table is much used, and the IOT also, but putting it in the keep cache is not as important as the token_info column of the DR I table. A final note: doing SEPARATE_OFFSETS = 'YES' was very bad in my case, the combined size of the two columns is much bigger than having only the TOKEN_INFO column and both columns are read.
Here is an example on how to reproduce the problem with the size increasing when doing ctx_optimize
1. create the table
drop table test;
CREATE TABLE test
(ID NUMBER(9,0) NOT NULL ENABLE,
XML_DATA XMLTYPE
XMLTYPE COLUMN XML_DATA STORE AS SECUREFILE BINARY XML (tablespace users disable storage in row);
2. insert a few records
insert into test values(1,'<Book><TITLE>Tale of Two Cities</TITLE>It was the best of times.<Author NAME="Charles Dickens"> Born in England in the town, Stratford_Upon_Avon </Author></Book>');
insert into test values(2,'<BOOK><TITLE>The House of Mirth</TITLE>Written in 1905<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
insert into test values(3,'<BOOK><TITLE>Age of innocence</TITLE>She got a prize for it.<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
3. create the text index
drop index i_test;
exec ctx_ddl.create_section_group('TEST_SGP','PATH_SECTION_GROUP');
begin
CTX_DDL.ADD_SDATA_SECTION(group_name => 'TEST_SGP',
section_name => 'SData_02',
tag => 'SData_02',
datatype => 'varchar2');
end;
exec ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
exec ctx_ddl.set_attribute('TEST_STO','I_TABLE_CLAUSE','tablespace USERS storage (initial 64K)');
exec ctx_ddl.set_attribute('TEST_STO','I_INDEX_CLAUSE','tablespace USERS storage (initial 64K) compress 2');
exec ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'NO' );
exec ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
create index I_TEST
on TEST (XML_DATA)
indextype is ctxsys.context
parameters('
section group "TEST_SGP"
storage "TEST_STO"
') parallel 2;
4. check the index size
select ctx_report.index_size('I_TEST') from dual;
it says :
TOTALS FOR INDEX TEST.I_TEST
TOTAL BLOCKS ALLOCATED: 104
TOTAL BLOCKS USED: 72
TOTAL BYTES ALLOCATED: 851,968 (832.00 KB)
TOTAL BYTES USED: 589,824 (576.00 KB)
4. optimize the index
exec ctx_ddl.optimize_index('I_TEST','REBUILD');
and now recompute the size, it says
TOTALS FOR INDEX TEST.I_TEST
TOTAL BLOCKS ALLOCATED: 1112
TOTAL BLOCKS USED: 1080
TOTAL BYTES ALLOCATED: 9,109,504 (8.69 MB)
TOTAL BYTES USED: 8,847,360 (8.44 MB)
which shows that it went from 576KB to 8.44MB. With a big index the difference is not so big, but still from 14G to 19G.
5. Workaround: use the BIG_IO option, so that the token_info column of the DR$ I table will be stored in a secure file and the size will stay relatively small. Then you can load this column in the cache using a procedure similar to
alter table DR$I_TEST$I storage (buffer_pool keep);
alter table dr$i_test$i modify lob(token_info) (cache storage (buffer_pool keep));
rem: now we must read the lob so that it will be loaded in the keep buffer pool, use the prccedure below
create or replace procedure loadTokenInfo is
type c_type is ref cursor;
c2 c_type;
s varchar2(2000);
b blob;
buff varchar2(100);
siz number;
off number;
cntr number;
begin
s := 'select token_info from DR$i_test$I';
open c2 for s;
loop
fetch c2 into b;
exit when c2%notfound;
siz := 10;
off := 1;
cntr := 0;
if dbms_lob.getlength(b) > 0 then
begin
loop
dbms_lob.read(b, siz, off, buff);
cntr := cntr + 1;
off := off + 4096;
end loop;
exception when no_data_found then
if cntr > 0 then
dbms_output.put_line('4K chunks fetched: '||cntr);
end if;
end;
end if;
end loop;
end;
Rgds, PierreI have been working a lot on that issue recently, I can give some more info.
First I totally agree with you, I don't like to use the keep_pool and I would love to avoid it. On the other hand, we have a specific use case : 90% of the activity in the DB is done by queuing and dbms_scheduler jobs where response time does not matter. All those processes are probably filling the buffer cache. We have a customer facing application that uses the text index to search the database : performance is critical for them.
What kind of performance do you have with your application ?
In my case, I have learned the hard way that having the index in memory (the DR$I table in fact) is the key : if it is not, then performance is poor. I find it reasonable to pin the DR$I table in memory and if you look at competitors this is what they do. With MongoDB they explicitly says that the index must be in memory. With elasticsearch, they use JVM's that are also in memory. And effectively, if you look at the awr report, you will see that Oracle is continuously accessing the DR$I table, there is a SQL similar to
SELECT /*+ DYNAMIC_SAMPLING(0) INDEX(i) */
TOKEN_FIRST, TOKEN_LAST, TOKEN_COUNT, ROWID
FROM DR$idxname$I
WHERE TOKEN_TEXT = :word AND TOKEN_TYPE = :wtype
ORDER BY TOKEN_TEXT, TOKEN_TYPE, TOKEN_FIRST
which is continuously done.
I think that the algorithm used by Oracle to keep blocks in cache is too complex. A just realized that in 12.1.0.2 (was released last week) there is finally a "killer" functionality, the in-memory parameters, with which you can pin tables or columns in memory with compression, etc. this looks ideal for the text index, I hope that R. Ford will finally update his white paper :-)
But my other problem was that the optimize_index in REBUILD mode caused the DR$I table to double in size : it seems crazy that this was closed as not a bug but it was and I can't do anything about it. It is a bug in my opinion, because the create index command and "alter index rebuild" command both result in a much smaller index, so why would the guys that developped the optimize function (is it another team, using another algorithm ?) make the index two times bigger ?
And for that the track I have been following is to put the index in a 16K tablespace : in this case the space used by the index remains more or less flat (increases but much more reasonably). The difficulty here is to pin the index in memory because the trick of R. Ford was not working anymore.
What worked:
first set the keep_pool to zero and set the db_16k_cache_size to instead. Then change the storage preference to make sure that everything you want to cache (mostly the DR$I) table come in the tablespace with the non-standard block size of 16k.
Then comes the tricky part : the pre-loading of the data in the buffer cache. The problem is that with Oracle 12c, Oracle will use direct_path_read for FTS which basically means that it bypasses the cache and read directory from file to the PGA !!! There is an event to avoid that, I was lucky to find it on a blog (I can't remember which, sorry for the credit).
I ended-up doing that. the events to 10949 is to avoid the direct path reads issue.
alter session set events '10949 trace name context forever, level 1';
alter table DR#idxname0001$I cache;
alter table DR#idxname0002$I cache;
alter table DR#idxname0003$I cache;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0001$I;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0002$I;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0003$I;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0001$I ITAB;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0002$I ITAB;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0003$I ITAB;
It worked. With a big relief I expected to take some time out, but there was a last surprise. The command
exec ctx_ddl.optimize_index(idx_name=>'idxname',part_name=>'partname',optlevel=>'REBUILD');
gqve the following
ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-50857: oracle error in drftoptrebxch
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DDL", line 1141
ORA-06512: at line 1
Which is very much exactly described in a metalink note 1645634.1 but in the case of a non-partitioned index. The work-around given seemed very logical but it did not work in the case of a partitioned index. After experimenting, I found out that the bug occurs when the partitioned index is created with dbms_pclxutil.build_part_index procedure (this enables enables intra-partition parallelism in the index creation process). This is a very annoying and stupid bug, maybe there is a work-around, but did not find it on metalink
Other points of attention with the text index creation (stuff that surprised me at first !) ;
- if you use the dbms_pclxutil package, then the ctx_output logging does not work, because the index is created immediately and then populated in the background via dbms_jobs.
- this in combination with the fact that if you are on a RAC, you won't see any activity on the box can be very frightening : this is because oracle can choose to start the workers on the other node.
I understand much better how the text indexing works, I think it is a great technology which can scale via partitioning. But like always the design of the application is crucial, most of our problems come from the fact that we did not choose the right sectioning (we choosed PATH_SECTION_GROUP while XML_SECTION_GROUP is so much better IMO). Maybe later I can convince the dev to change the sectionining, especially because SDATA and MDATA section are not supported with PATCH_SECTION_GROUP (although it seems to work, even though we had one occurence of a bad result linked to the existence of SDATA in the index definition). Also the whole problematic of mixed structured/unstructured searches is completly tackled if one use XML_SECTION_GROUP with MDATA/SDATA (but of course the app was written for Oracle 10...)
Regards, Pierre -
Suggestion: Oracle text CONTEXT index on one or more columns ?
Hi,
I'm implementing Oracle text using CONTEXT ..... and would like to ask you for performance suggestion ...
I have a table of Articles .... with columns .. TITLE, SUBTITLE , BODY ...
Now is it better from performance point of view to move all three columns into one dummy column ... with name like FULLTEXT ... and put index on this single column,
and then use CONTAINS(FULLTEXT,'...')>0
Or is it almost the same for oracle if i put indexes on all three columns and then call:
CONTAINS(TITLE,'...')>0 OR CONTAINS(SUBTITLE,'...')>0 OR CONTAINS(BODY,'...')>0
I actually don't care if the result is a match in TITLE OR SUBTITLE OR BODY ....
So if i move into some FULLTEXT column, then i have duplicate data in a article row ... but if i create indexes for each column, than oracle has 2x more to index,optimize and search ... am I wright ?
Table has 1.8mil records ...
Thank you.
Krismackrispi wrote:
Now is it better from performance point of view to move all three columns into one dummy column ... with name like FULLTEXT ... and put index on this single column,
and then use CONTAINS(FULLTEXT,'...')>0What version of Oracle are you on? If 11 then you could use a virtual column to do this, otherwise you'd have to write code to maintain the column which can get messy.
mackrispi wrote:
Or is it almost the same for oracle if i put indexes on all three columns and then call:
CONTAINS(TITLE,'...')>0 OR CONTAINS(SUBTITLE,'...')>0 OR CONTAINS(BODY,'...')>0Benchmark it and find out :)
Another option would be something like this.
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9455353124561
Were i you, i would try out those 3 approaches and see which meet your performance requirements and weigh that with the ease of implementation and administration.
Maybe you are looking for
-
Search help is not working at variable
Hi when i am aexecuting the report. at variable screen when i am selectiong like compy code , heer lot of company code are there, there in search tab i need to search perticual comp code, please let me know any note i need to apply?
-
I have a windows 7 64 bit os . Downloaded and installed iTunes for 64 bit os.Try to open ,it tells me , The file "iTunes Library.itl" cannot be read because it was created by a newer version of iTunes . Any suggestions ?
-
Hi, I am facing a problem with the Screen Painter Editor. When I am changing from the Module to the Layout, the editor is opening in the old fashion editor, so it is too difficult for me to design a screen, espacially when I have to design a StepLoop
-
Global Error Handling - Possible?
Hi there! Does anyone know if it's possible to catch unhandled errors that occur in a flex application globally? The Application.error event allows you to capture Error Events that arise when communication with the outside world fails for one reason
-
Differences: Outlooksoft 4.x Vs. SAP BPC 7.0 MS
Hello, I need urgently the following information: - What were the specifications of version 4 of OutlookSoft? - What are the improvements and differences involved in SAP BPC 7.0 MS compared to OutlookSoft 4 in all possible areas (security, architectu