Performance of Oracle Text

Hi,
I'm tasked to help design an application that will have Oracle Text powering the searching logic. The application will have millions of records (in the 30 million to 50 million range), but there's a restriction that 95% of all searches must be able to complete in 1 second or less (!)
So, my question is, is it possible for Oracle Text to meet this criteria? Assuming we have the best hardware, etc. Or should I look for another solution/approach?
Regards,
Roy

Hi Roy,
It's pretty hard to give a yes/no answer based on the limited information. I will say that Oracle's method of indexing is fairly standard (dictionary/postings list) so you are not likely to find a better solution if your records are stored in Oracle. The Oracle Text advantage - tight integration with the database already. You have storage and query optimization features of Oracle when using Oracle Text. < 1 sec response time is pretty tight for any search, but I don't think you'd have any better chance at hitting it with another solution.
Thanks,
Ron

Similar Messages

  • Bad Query Performance in Oracle Text

    Hello everyone, I have the following problem:
    I have a table, TABLE_A from now on, a table of more or less 1,000.000 rows, with a CONTEXT index, using FILE_DATASTORE, CTXSYS.DEFAULT_STORAGE, CTXSYS.NULL_FILTER, CTXSYS.BASIC_LEXER and querying the index in the following way:
    SELECT /*+FIRST_ROWS*/ A.ID, B.ID2, SCORE(1) FROM TABLE_A A, TABLE_B WHERE A.ID = B.ID AND CONTAINS(A.PATH, '<SOME KW>', 1) > 0 ORDER BY SCORE(1) DESC
    where TABLE_B has another 1,000.000 rows.
    The problem is that the query response time is much higher after some time of inactivity regarding those tables. How can I avoid this behavior?. The fact is that those inactivity times (not more than 20min) are core to my application, so I always get long long response times for my queries.
    Is there any cache or cache time parameter that affects this behavior? I have checked the Oracle Text documentation without finding anything about that...
    More data: I am using Oracle 9.2.0.1, but I have tested with the latest patches an the behavior is the same...
    Thank you very much in advance.

    Pablo,
    This appears to be a generic database or OS issue, not a Text specific issue. It really depends on what your application is doing.
    If your application is doing some other database activity such as queries or DMLs on other non-text tables, chances are Oracle Text related data blocks are being aged out of cache. You can either increase the db_cache_size init
    parmater or try to keep the text tables and index tables blocks in cache using ALTER TABLE commands.
    If your app is doing NON-database activity, then chances are your application is taking up much of the machine's physical memory such that OS is swapping ORACLE out of the memory. In which case, you may want to consider to add more memory to the machine or have ORACLE run on a separate machine by itself.

  • Poor performance using Oracle Text

    We are currently using wildcard queries and we have setup substring or prefix index which is recommended in Oracle's documentation in order to improve performance but we have not seen any improvements. Does anyone have any other recommendations?

    We are currently using wildcard queries and we have setup substring or prefix index which is recommended in Oracle's documentation in order to improve performance but we have not seen any improvements. Does anyone have any other recommendations?

  • Oracle text Improve perfomance

    What is the best way to improve performance in Oracle text?
      I retrieve some data in a csv file, i verify the data in a table (CUSTOMER) to
      know if must make a INSERT or no.
    It make 10 seconds to process only two rows.
    How can i improve performance?

    What are you exactly doing? Do you have an Oracle Text index on that table, or are you just manipulating text in the table?
    Give us the code what you are doing and we can help you further.
    Herald ten Dam

  • Performance issues and options to reduce load with Oracle text implementation

    Hi Experts,
    My database on Oracle 11.2.0.2 on Linux. We have Oracle Text implemented for fuzzy search. Our oracle text indexes are defined as sync on commit as we can not afford to have stale data.  Now our application does literally thousands of inserts/updates/deletes to those columns where we have these Oracle text indexes defined. As a result, we are seeing a lot of performance impact due to the oracle text sync routines being called on each commit. We are doing the index optimization every night (full optimization every night at 3 am).  The oracle text index related internal operations are showing up as top sql in our AWR report and there are concerns that it is causing lot of load on the DB.  Since we do the full index optimization only once at night, I am thinking should I change that , and if I do so, will it help us?
    For example here are some data from my one day's AWR report:
    Elapsed Time (s)
    Executions
    Elapsed Time per Exec (s)
    %Total
    %CPU
    %IO
    SQL Id
    SQL Module
    SQL Text
    27,386.25
    305,441
    0.09
    16.50
    15.82
    9.98
    ddr8uck5s5kp3
    begin ctxsys.drvdml.com_sync_i...
    14,618.81
    213,980
    0.07
    8.81
    8.39
    27.79
    02yb6k216ntqf
    begin ctxsys.syncrn(:idxownid,...
    Full Text of above top sql:
    ddr8uck5s5kp3
    begin ctxsys.drvdml.com_sync_index(:idxname, :idxmem, :partname);
    end
    02yb6k216ntqf
    begin ctxsys.syncrn(:idxownid, :idxoname, :idxid, :ixpid, :rtabnm, :flg); end;
    Now if I do the full index optimization more often and not just once at night 3 PM, will that mean, the load on DB due to sync on commit will decrease? If yes how often should I optimized and doesn't the optimization itself lead to some load? Can someone suggest?
    Thanks,
    OrauserN

    You can query the ctx_parameters view to see what your default and maximum memory values are:
    SCOTT@orcl12c> COLUMN bytes    FORMAT 9,999,999,999
    SCOTT@orcl12c> COLUMN megabytes FORMAT 9,999,999,999
    SCOTT@orcl12c> SELECT par_name AS parameter,
      2          TO_NUMBER (par_value) AS bytes,
      3          par_value / 1048576 AS megabytes
      4  FROM   ctx_parameters
      5  WHERE  par_name IN ('DEFAULT_INDEX_MEMORY', 'MAX_INDEX_MEMORY')
      6  ORDER  BY par_name
      7  /
    PARAMETER                               BYTES      MEGABYTES
    DEFAULT_INDEX_MEMORY               67,108,864             64
    MAX_INDEX_MEMORY                1,073,741,824          1,024
    2 rows selected.
    You can set the memory value in your index parameters:
    SCOTT@orcl12c> CREATE INDEX EMPLOYEE_IDX01
      2  ON EMPLOYEES (EMP_NAME)
      3  INDEXTYPE IS CTXSYS.CONTEXT
      4  PARAMETERS ('SYNC (ON COMMIT) MEMORY 1024M')
      5  /
    Index created.
    You can also modify the default and maximum values using CTX_ADM.SET_PARAMETER:
    http://docs.oracle.com/cd/E11882_01/text.112/e24436/cadmpkg.htm#CCREF2096
    The following contains general guidelines for what to set the max_index_memory parameter and others to:
    http://docs.oracle.com/cd/E11882_01/text.112/e24435/aoptim.htm#CCAPP9274

  • Oracle text performance

    hi all.
    i have useing orcle text for the indexing purposes ..
    the below query goes for an wildcard search and it performance is very poor ..
    /* Formatted on 2009/08/11 16:06 (Formatter Plus v4.8.5) */
    SELECT *
      FROM (SELECT z.*, ROWNUM r
              FROM (SELECT   *
                        FROM (SELECT *
                                FROM (SELECT score (1) score,
                                             SUBSTR (note, 1, 200) note, tmplt_id,
                                             appl_name, appl_use, appl_reference,
                                             appl_entity, effctv_start_date,
                                             effctv_end_date, eq_name, note_id,
                                             Bfcommon.getvaluebasedontemplate
                                                                 (tmplt_id,
                                                                  note_id
                                                                 ) VALUE,
                                             (SELECT em_name
                                                FROM ip_eq
                                               WHERE eq_name = a.eq_name) em_name,
                                             Bfcommon.is_edit_allowed_note1
                                                       (a.note_id,
                                                        'PACRIM1\E317329',
                                                        SYSDATE,
                                                        SYSDATE
                                                       ) LOCKED
                                        FROM bf_note a
                                       WHERE tmplt_id NOT IN (19, 14, 16)
                                         AND effctv_start_date >=
                                                TO_DATE ('08/12/2008 00:00:00',
                                                         'MM/DD/YYYY HH24:MI:SS'
                                         AND (   effctv_end_date <=
                                                    TO_DATE
                                                          ('08/12/2009 00:00:00',
                                                           'MM/DD/YYYY HH24:MI:SS'
                                              OR effctv_end_date IS NULL
                                         AND contains (note, '%test%', 1) > 0) r
                              UNION
                              SELECT score (1) score, SUBSTR (note, 1, 200) note,
                                     tmplt_id, appl_name, appl_use,
                                     appl_reference, appl_entity,
                                     effctv_start_date, effctv_end_date, eq_name,
                                     a.note_id, b.trgt_name VALUE,
                                     (SELECT em_name
                                        FROM ip_eq
                                       WHERE eq_name = a.eq_name) em_name,
                                     Bfcommon.is_edit_allowed_note1
                                                       (a.note_id,
                                                        'PACRIM1\E317329',
                                                        SYSDATE,
                                                        SYSDATE
                                                       ) LOCKED
                                FROM bf_note a, om_limit_hist b
                               WHERE (   date_changed =
                                            (SELECT MAX (date_changed)
                                               FROM om_limit_hist h1
                                              WHERE date_changed <
                                                       (SELECT MAX (date_changed)
                                                          FROM om_limit_hist h2
                                                         WHERE h2.trgt_name =
                                                                       b.trgt_name)
                                                AND h1.trgt_name = b.trgt_name)
                                      OR date_changed =
                                               (SELECT MAX (date_changed)
                                                  FROM om_limit_hist h1
                                                 WHERE h1.trgt_name = b.trgt_name)
                                 AND b.note_id = a.note_id(+)
                                 AND tmplt_id = 19
                                 AND effctv_start_date >=
                                        TO_DATE ('08/12/2008 00:00:00',
                                                 'MM/DD/YYYY HH24:MI:SS'
                                 AND (   effctv_end_date <=
                                            TO_DATE ('08/12/2009 00:00:00',
                                                     'MM/DD/YYYY HH24:MI:SS'
                                      OR effctv_end_date IS NULL
                                 AND contains (note, '%test%', 1) > 0)
                    ORDER BY 1 DESC) z
             WHERE ROWNUM <= 50)
    WHERE r >= 1here it goes for the wild card search for the string test ...Plz tell me how to index it in this case ..
    and its plan is
    Execution Plan
    Plan hash value: 3535478881
    | Id  | Operation                            | Name                | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
    |   0 | SELECT STATEMENT                     |                     |    50 |   128K|       |   632   (1)| 00:00:08 |
    |*  1 |  VIEW                                |                     |    50 |   128K|       |   632   (1)| 00:00:08 |
    |*  2 |   COUNT STOPKEY                      |                     |       |       |       |         |     |
    |   3 |    VIEW                              |                     |    69 |   177K|       |   632   (1)| 00:00:08 |
    |*  4 |     SORT ORDER BY STOPKEY            |                     |    69 |   177K|   376K|   632   (1)| 00:00:08 |
    |   5 |      VIEW                            |                     |    69 |   177K|       |   590   (1)| 00:00:08 |
    |   6 |       SORT UNIQUE                    |                     |    69 |  7281 |       |   590   (2)| 00:00:08 |
    |   7 |        UNION-ALL                     |                     |       |       |       |         |     |
    |*  8 |         TABLE ACCESS BY INDEX ROWID  | BF_NOTE             |    68 |  7140 |       |   585   (1)| 00:00:08 |
    |*  9 |          DOMAIN INDEX                | BF_NOTE_TEXT_SEARCH |       |       |       |   221   (0)| 00:00:03 |
    |* 10 |         FILTER                       |                     |       |       |       |         |     |
    |  11 |          NESTED LOOPS                |                     |     1 |   141 |       |     3   (0)| 00:00:01 |
    |  12 |           TABLE ACCESS FULL          | OM_LIMIT_HIST       |     1 |    36 |       |     2   (0)| 00:00:01 |
    |* 13 |           TABLE ACCESS BY INDEX ROWID| BF_NOTE             |     1 |   105 |       |     1   (0)| 00:00:01 |
    |* 14 |            INDEX UNIQUE SCAN         | BF_NOTE_PK          |     1 |       |       |     1   (0)| 00:00:01 |
    |  15 |          SORT AGGREGATE              |                     |     1 |    23 |       |         |     |
    |* 16 |           INDEX RANGE SCAN           | OM_LIMIT_HIST_IDX1  |     1 |    23 |       |     0   (0)| 00:00:01 |
    |  17 |            SORT AGGREGATE            |                     |     1 |    23 |       |         |     |
    |* 18 |             INDEX RANGE SCAN         | OM_LIMIT_HIST_IDX1  |     1 |    23 |       |     0   (0)| 00:00:01 |
    |  19 |            SORT AGGREGATE            |                     |     1 |    23 |       |         |     |
    |* 20 |             INDEX RANGE SCAN         | OM_LIMIT_HIST_IDX1  |     1 |    23 |       |     0   (0)| 00:00:01 |
    Predicate Information (identified by operation id):
       1 - filter("R">=1)
       2 - filter(ROWNUM<=50)
       4 - filter(ROWNUM<=50)
       8 - filter("EFFCTV_START_DATE">=TO_DATE(' 2008-08-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
                  "TMPLT_ID"19 AND "TMPLT_ID"14 AND "TMPLT_ID"16 AND ("EFFCTV_END_DATE" IS NULL OR
                  "EFFCTV_END_DATE"<=TO_DATE(' 2009-08-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
       9 - access("CTXSYS"."CONTAINS"("NOTE",'%test%',1)>0)
      10 - filter("DATE_CHANGED"= (SELECT /*+ */ MAX("DATE_CHANGED") FROM "OM_LIMIT_HIST" "H1" WHERE
                  "H1"."TRGT_NAME"=:B1 AND "DATE_CHANGED"< (SELECT /*+ */ MAX("DATE_CHANGED") FROM "OM_LIMIT_HIST" "H2" WHER
    E
                  "H2"."TRGT_NAME"=:B2)) OR "DATE_CHANGED"= (SELECT /*+ */ MAX("DATE_CHANGED") FROM "OM_LIMIT_HIST" "H1"
                  WHERE "H1"."TRGT_NAME"=:B3))
      13 - filter("TMPLT_ID"=19 AND "EFFCTV_START_DATE">=TO_DATE(' 2008-08-12 00:00:00', 'syyyy-mm-dd
                  hh24:mi:ss') AND "CTXSYS"."CONTAINS"("NOTE",'%test%',1)>0 AND ("EFFCTV_END_DATE" IS NULL OR
                  "EFFCTV_END_DATE"<=TO_DATE(' 2009-08-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
      14 - access("B"."NOTE_ID"="A"."NOTE_ID")
      16 - access("H1"."TRGT_NAME"=:B1 AND "DATE_CHANGED"< (SELECT /*+ */ MAX("DATE_CHANGED") FROM
                  "OM_LIMIT_HIST" "H2" WHERE "H2"."TRGT_NAME"=:B2))
           filter("DATE_CHANGED"< (SELECT /*+ */ MAX("DATE_CHANGED") FROM "OM_LIMIT_HIST" "H2" WHERE
                  "H2"."TRGT_NAME"=:B1))
      18 - access("H2"."TRGT_NAME"=:B1)
      20 - access("H1"."TRGT_NAME"=:B1)

    What version of Oracle are you using?
    Your sample query is a good example of the benefits and costs of 'mixed' queries, and the challenges of mixed query performance. Oracle 11g has some very helpful new features (search for SDATA) that can really improve performance. (Specifically, it looks like your query does some useful date-range bounding. You need to get that into the FT index).
    In the end, it's not going to be easy to look at your reasonably complex query, understand the data and relationships, and wave the magic wand to make the thing go fast.

  • Oracle text performance with context search indexes

    Search performance using context index.
    We are intending to move our search engine to a new one based on Oracle Text, but we are meeting some
    bad performance issues when using search.
    Our application allows the user to search stored documents by name, object identifier and annotations(formerly set on objects).
    For example, suppose I want to find a document named ImportSax2.c: according to user set parameters, our search engine format the following
    search queries :
    1) If the user explicitely ask for a search by document name, the query is the following one =>
         select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c WITHIN objname' , 1 ) > 0;
    2) If the user don't specify any extra parameters, the query is the following one =>
         select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c' , 1 ) > 0;
    Oracle text only need around 7 seconds to answer the second query, whereas it need around 50 seconds to give an answer for the first query.
    Here is a part of the sql script used for creating the Oracle Text index on the column OBJFIELDURL
    (this column stores a path to an xml file containing properties that have to be indexed for each object) :
    begin
    Ctx_Ddl.Create_Preference('wildcard_pref', 'BASIC_WORDLIST');
    ctx_ddl.set_attribute('wildcard_pref', 'wildcard_maxterms', 200) ;
    ctx_ddl.set_attribute('wildcard_pref','prefix_min_length',3);
    ctx_ddl.set_attribute('wildcard_pref','prefix_max_length',6);
    ctx_ddl.set_attribute('wildcard_pref','STEMMER','AUTO');
    ctx_ddl.set_attribute('wildcard_pref','fuzzy_match','AUTO');
    ctx_ddl.set_attribute('wildcard_pref','prefix_index','TRUE');
    ctx_ddl.set_attribute('wildcard_pref','substring_index','TRUE');
    end;
    begin
    ctx_ddl.create_preference('doc_lexer_perigee', 'BASIC_LEXER');
    ctx_ddl.set_attribute('doc_lexer_perigee', 'printjoins', '_-');
    ctx_ddl.set_attribute('doc_lexer_perigee', 'BASE_LETTER', 'YES');
    ctx_ddl.set_attribute('doc_lexer_perigee','index_themes','yes');
    ctx_ddl.create_preference('english_lexer','basic_lexer');
    ctx_ddl.set_attribute('english_lexer','index_themes','yes');
    ctx_ddl.set_attribute('english_lexer','theme_language','english');
    ctx_ddl.set_attribute('english_lexer', 'printjoins', '_-');
    ctx_ddl.set_attribute('english_lexer', 'BASE_LETTER', 'YES');
    ctx_ddl.create_preference('german_lexer','basic_lexer');
    ctx_ddl.set_attribute('german_lexer','composite','german');
    ctx_ddl.set_attribute('german_lexer','alternate_spelling','GERMAN');
    ctx_ddl.set_attribute('german_lexer','printjoins', '_-');
    ctx_ddl.set_attribute('german_lexer', 'BASE_LETTER', 'YES');
    ctx_ddl.set_attribute('german_lexer','NEW_GERMAN_SPELLING','YES');
    ctx_ddl.set_attribute('german_lexer','OVERRIDE_BASE_LETTER','TRUE');
    ctx_ddl.create_preference('japanese_lexer','JAPANESE_LEXER');
    ctx_ddl.create_preference('global_lexer', 'multi_lexer');
    ctx_ddl.add_sub_lexer('global_lexer','default','doc_lexer_perigee');
    ctx_ddl.add_sub_lexer('global_lexer','german','german_lexer','ger');
    ctx_ddl.add_sub_lexer('global_lexer','japanese','japanese_lexer','jpn');
    ctx_ddl.add_sub_lexer('global_lexer','english','english_lexer','en');
    end;
    begin
         ctx_ddl.create_section_group('axmlgroup', 'AUTO_SECTION_GROUP');
    end;
    drop index ADSOBJ_XOBJFIELDURL force;
    create index ADSOBJ_XOBJFIELDURL on ADSOBJ(OBJFIELDURL) indextype is ctxsys.context
    parameters
    ('datastore ctxsys.file_datastore
    filter ctxsys.inso_filter
    sync (on commit)
    lexer global_lexer
    language column OBJFIELDURLLANG
    charset column OBJFIELDURLCHARSET
    format column OBJFIELDURLFORMAT
    section group axmlgroup
    Wordlist wildcard_pref
    Oracle created a table named DR$ADSOBJ_XOBJFIELDURL$I which now contains around 25 millions records.
    ADSOBJ is the table contaings information for our documents,OBJFIELDURL is the field that contains the path to the xml file containing
    data to index. That file looks like this :
    <?xml version="1.0" encoding="UTF-8" ?>
    <fields>
    <OBJNAME><![CDATA[NomLnk_177527o.jpgp]]></OBJNAME>
    <OBJREM><![CDATA[Z_CARACT_141]]></OBJREM>
    <OBJID>295926o.jpgp</OBJID>
    </fields>
    Can someone tell me how I can make that kind of request
    "select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c WITHIN objname' , 1 ) > 0;"
    run faster ?

    Below are the execution plan for both the 2 requests :
    select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c WITHIN objname' , 1 ) > 0
    PLAN_TABLE_OUTPUT
    |     Id     | Operation                              |Name                         |Rows     |Bytes     |Cost (%CPU)|
    |     0     | SELECT STATEMENT                    |                              |1272     |119K     |     4     (0)     |
    |     1      | TABLE ACCESS BY INDEX ROWID     |ADSOBJ      |1272     |119K     |     4     (0)     |
    |     2      |     DOMAIN INDEX                    |ADSOBJ_XOBJFIELDURL     |          |          |     4     (0)     |
    Note
    - 'PLAN_TABLE' is old version
    Executed in 2 seconds
    select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c' , 1 ) > 0
    PLAN_TABLE_OUTPUT
    |     Id     |Operation                              |Name                         |Rows     |Bytes     |Cost (%CPU)|
    |     0     | SELECT STATEMENT                    |                              |1272     |119K     |     4     (0)     |
    |     1     | TABLE ACCESS BY INDEX ROWID     |ADSOBJ                         |1272     |119K     |     4     (0)     |
    |     2     | DOMAIN INDEX                    |ADSOBJ_XOBJFIELDURL     |          |          |     4     (0)     |
    Sorry for the result formatting, I can't get it "easily" readable :(

  • Oracle Text Indexing performance in Unicode database

    Forum folks,
    I'm looking for overall performance thoughts in Text Indexing within a Unicode database. Part of our internal testing suites includes searching on values using contains filters over indexed binary and text documents. We've architected these tests such that they could be run in a suite or on their own, thus, the data is loaded at the beginning of each test and then the text indexes are created and populated prior to running any of the actual testing.
    We have the same tests running on non-unicode instances of Oracle 11gR2 just fine, but when we run them against a Unicode instance, we are almost always seeing timing issues where the indexes haven't finished populating, thus our tests are saying we've only found n number of hits when we are expecting n+ 50 or in some cases n + 150 records to be returned.
    We are just looking for some general information in regards to text indexing performance in a unicode database. Will we need to add sleep time to the testing to allow for the indexes to populate? How much time? We would rather not get into having to create different tests for unicode vs non-unicode, but perhaps that is necessary.
    Any insight you could provide would be most appreciated.
    Thanks in advance,
    Dan

    Roger,
    Thanks much for your quick reply...
    When you talk about Unicode, do you mean AL32UTF8?
    --> Yes, this is the Unicode charset we are using.
    Is the data the same in both cases, or are you indexing simple 7-bit ascii data in the one database, and foreign text (maybe Chinese?) in the UTF8 database?
    With the same data, there should be virtually no difference in performance due to the AL32UTF8 database character set.
    --> We have a data generation tool we utilize. For non-unicode data, we generate using all 256 characters in the ISO-8859-1 set. With our Unicode data for clobs, we generate using only the first 1,000 characters of UTF8 by setting up an array of code points...0 - 1000. For Blobs, we have sets of sample word documents and pdfs that are inserted, then indexed.
    I'm not sure I understand your testing methodology. Do you run ( load-data, index-data, run-queries ) sequentially?
    --> That is correct. We utilize the ctx_ddl package to populate the pending table and then to sync the index....The following is an example of the ddl we generate to create and populate the index:
    create index "DBMEARSPARK_ORA80"."RESRESUMEDOC" on "DBMEARSPARK_ORA80"."RESUME" ("RESUMEDOC") indextype is CTXSYS.CONTEXT parameters(' nopopulate sync (every "SYSTIMESTAMP + INTERVAL ''30'' MINUTE" PARALLEL 2) filter ctxsys.auto_filter ') PARALLEL 2;
    execute ctx_ddl.populate_pending('"DBMEARSPARK_ORA80"."RESRESUMEDOC"',null);
    execute ctx_ddl.sync_index('"DBMEARSPARK_ORA80"."RESRESUMEDOC"',null,null,2);
    If so, there should be no way that the indexes can be half-created. If not, don't you have some check to see if the index creation has finished before running the query test?
    --> Excellent question....is there such a check? I have not found a way to do that yet...
    Were you just lucky with the "non-unicode" tests that the indexing just happened to have always finished by the time you ran the queries?
    --> This is quite possible as well. If there is a check to see if the index is ready, then we could add that into our infrastructure.
    --> Thanks, again, for responding so quickly.
    Edited by: djulson on Feb 12, 2013 7:13 AM

  • Performance issue with Oracle Text index

    Hi Experts,
    We are on Oracle 11.2..0.3 on Solaris 10. I have implemented Oracle Text in our environment and I am facing a strange performance issue that is happening in our environment.
    One sql having CONTAINS clause is taking forever - more than 20 minutes and still does not complete. This sql has a contains clause and an exists clause and a not exists clause.
    Now if I remove the exists clause and a not exists clause , it completes fast. but with those two clauses it is just taking forever. It is late night so i am not able to post the table and sql query details and will do so tomorrow but based on this general description, are there any pointers for me to review?
    sql query doing fine:
    SELECT
        U.CLNT_OID, U.USR_OID, S.MAILADDR
    FROM
        access_usr U
        INNER JOIN access_sia S
            ON S.USR_OID = U.USR_OID AND S.CLNT_OID = U.CLNT_OID
        WHERE U.CLNT_OID = 'ABCX32S'
        AND CONTAINS(LAST_NAME , 'TO%' ) >0
    --sql query that hangs forever:
    SELECT
        U.CLNT_OID, U.USR_OID, S.MAILADDR
    FROM
        access_usr U
        INNER JOIN access_sia S
            ON S.USR_OID = U.USR_OID AND S.CLNT_OID = U.CLNT_OID
        WHERE U.CLNT_OID = 'ABCX32S'
        AND CONTAINS(LAST_NAME , 'TO%' ) >0
    and exists (--one clause here wiht a few table joins)
    and not exists (--one clause here wiht a few table joins);
    --Now another strange thing I found is if instead of 'TO%' in this sql, if I were to use 'ZZ%' or 'L1%' it works fast but for 'TO%' it goes slow with those two exists not exists clauses!
    I will be most thankful for the inputs.
    OrauserN

    Hi Barbara,
    First of all, thanks a lot for reviewing the issue.
    Unluckily making the change to empty_stoplist did not work out. I am today copying the entire sql here that has this issue and will be most thankful for more insights/pointers on what can be done.
    Here is the entire sql:
    SELECT U.CLNT_OID,
           U.USR_OID,
           S.EMAILADDRESS,
           U.FIRST_NAME,
           U.LAST_NAME,
           S.JOBCODE,
           S.LOCATION,
           S.DEPARTMENT,
           S.ASSOCIATEID,
           S.ENTERPRISECOMPANYCODE,
           S.EMPLOYEEID,
           S.PAYGROUP,
           S.PRODUCTLOCALE
      FROM    ACCESS_USR U
           INNER JOIN
              ACCESS_SIA S
           ON S.USR_OID = U.USR_OID AND S.CLNT_OID = U.CLNT_OID
    WHERE     U.CLNT_OID = 'G39NY3D25942TXDA'
           AND EXISTS
                  (SELECT 1
                     FROM ACCESS_USR_GROUP_XREF UGX
                          INNER JOIN ACCESS_GROUP RELG
                             ON     RELG.CLNT_OID = UGX.CLNT_OID
                                AND RELG.GROUP_OID = UGX.GROUP_OID
                          INNER JOIN ACCESS_GROUP G
                             ON     G.CLNT_OID = RELG.CLNT_OID
                                AND G.GROUP_TYPE_OID = RELG.GROUP_TYPE_OID
                    WHERE     UGX.CLNT_OID = U.CLNT_OID
                          AND UGX.USR_OID = U.USR_OID
                          AND G.GROUP_OID = 920512943
                          AND UGX.INCLUDED = 1)
           AND NOT EXISTS
                      (SELECT 1
                         FROM    ACCESS_USR_GROUP_XREF UGX
                              INNER JOIN
                                 ACCESS_GROUP G
                              ON     G.CLNT_OID = UGX.CLNT_OID
                                 AND G.GROUP_OID = UGX.GROUP_OID
                        WHERE     UGX.CLNT_OID = U.CLNT_OID
                              AND UGX.USR_OID = U.USR_OID
                              AND G.GROUP_OID = 920512943
                              AND UGX.INCLUDED = 1)
           AND CONTAINS (U.LAST_NAME, 'Bon%') > 0;
    Like I said before if the EXISTS and NOT EXISTS clause are removed it works in sub-second. But with those EXISTS and NOT EXISTS CLAUSE IT TAKES ANY WHERE FROM 25 minutes to more than one hour.
    NOte also that it was not TO% but Bon% in the CONTAINS clause that is giving the issue - sorry that was wrong on my part.
    Also please see below the ORACLE TEXT index defined on the table ACCESS_USER:
    --definition of preferences used in the index:
    SET SERVEROUTPUT ON size unlimited
    WHENEVER SQLERROR EXIT SQL.SQLCODE
    DECLARE
       v_err       VARCHAR2 (1000);
       v_sqlcode   NUMBER;
       v_count     NUMBER;
    BEGIN
       ctxsys.ctx_ddl.create_preference ('cust_lexer', 'BASIC_LEXER');
       ctxsys.ctx_ddl.set_attribute ('cust_lexer', 'base_letter', 'YES'); -- removes diacritics
    EXCEPTION
       WHEN OTHERS
       THEN
          v_err := SQLERRM;
          v_sqlcode := SQLCODE;
          v_count := INSTR (v_err, 'DRG-10701');
          IF v_count > 0
          THEN
             DBMS_OUTPUT.put_line (
                'The required preference named CUST_LEXER with BASIC LEXER is already set up');
          ELSE
             RAISE;
          END IF;
    END;
    DECLARE
       v_err       VARCHAR2 (1000);
       v_sqlcode   NUMBER;
       v_count     NUMBER;
    BEGIN
       ctxsys.ctx_ddl.create_preference ('cust_wl', 'BASIC_WORDLIST');
       ctxsys.ctx_ddl.set_attribute ('cust_wl', 'SUBSTRING_INDEX', 'true'); -- to improve performance
    EXCEPTION
       WHEN OTHERS
       THEN
          v_err := SQLERRM;
          v_sqlcode := SQLCODE;
          v_count := INSTR (v_err, 'DRG-10701');
          IF v_count > 0
          THEN
             DBMS_OUTPUT.put_line (
                'The required preference named CUST_WL with BASIC WORDLIST is already set up');
          ELSE
             RAISE;
          END IF;
    END;
    --now below is the code of the index:
    CREATE INDEX ACCESS_USR_IDX3 ON ACCESS_USR
    (FIRST_NAME)
    INDEXTYPE IS CTXSYS.CONTEXT
    PARAMETERS('LEXER cust_lexer WORDLIST cust_wl SYNC (ON COMMIT)');
    CREATE INDEX ACCESS_USR_IDX4 ON ACCESS_USR
    (LAST_NAME)
    INDEXTYPE IS CTXSYS.CONTEXT
    PARAMETERS('LEXER cust_lexer WORDLIST cust_wl SYNC (ON COMMIT)');
    The strange thing is that, like I said, If I remove the exists clause the query returns very fast. Also if I modify the query to use only one NOT EXISTS clause and remove the other EXISTS clause it returns in less than one second.  Also if I remove the EXISTS clause and use only the NOT EXISTS  clause it returns in less than 4 seconds. But with both clauses it runs forever!
    When I tried to get dbms_xplan.display_cursor to get the query plan (for the case of both exists and not exists clause in the query), it said that previous statement's sql id was 0 or something like that so that I was not able to see the query plan. I will keep trying to get this plan (it takes 25 minutes to one hour each time but will get this info soon). Again any pointers are most helpful.
    Regards
    OrauserN

  • Regular expression vs oracle text performance

    Does anyone have experience with comparig performance of regular expression vs oracle text?
    We need to implement a text search on a large volume table, 100K-500K rows.
    The select stmt will select from a VL, a view joining 2 tables, B and _TL.
    We need to search 2 text columns from this _VL view.
    Using regex seems less complex, but the deciding factor is of course performace.
    Would oracle text search perform better than regular expression in general?
    Thanks,
    Margaret

    Hi Dominc,
    Thanks, we'll try both...
    Would you be able to validate our code to create the multi-table index:
    CREATE OR REPLACE PACKAGE requirements_util AS
    PROCEDURE concat_columns(i_rowid IN ROWID, io_text IN OUT NOCOPY VARCHAR2);
    END requirements_util;
    CREATE OR REPLACE PACKAGE BODY requirements_util AS
    PROCEDURE concat_columns(i_rowid IN ROWID, io_text IN OUT NOCOPY VARCHAR2)
    AS
    tl_req pjt_requirements_tl%ROWTYPE;
    b_req pjt_requirements_b%ROWTYPE;
    CURSOR cur_req_name (i_rqmt_id IN pjt_requirements_tl.rqmt_id%TYPE) IS
    SELECT rqmt_name FROM pjt_requirements_tl
    WHERE rqmt_id = i_rqmt_id;
    PROCEDURE add_piece(i_add_str IN VARCHAR2) IS
    lx_too_big EXCEPTION;
    PRAGMA EXCEPTION_INIT(lx_too_big, -6502);
    BEGIN
    io_text := io_text||' '||i_add_str;
    EXCEPTION WHEN lx_too_big THEN NULL; -- silently don't add the string.
    END add_piece;
    BEGIN
         BEGIN
              SELECT * INTO b_req FROM pjt_requirements_b WHERE ROWID = i_rowid;
              EXCEPTION
              WHEN NO DATA_FOUND THEN
              RETURN;
         END;
         add_piece(b_req.req_code);
         FOR tl_req IN cur_req_name(b_req.rqmt_id) LOOP
         add_piece(tl_req.rqmt_name);
    END concat_columns;
    END requirements_util;
    EXEC ctx_ddl.drop_section_group('rqmt_sectioner');
    EXEC ctx_ddl.drop_preference('rqmt_user_ds');
    BEGIN
    ctx_ddl.create_preference('rqmt_user_ds', 'USER_DATASTORE');
    ctx_ddl.set_attribute('rqmt_user_ds', 'procedure', sys_context('userenv','current_schema')||'.'||'requirements_util.concat_columns');
    ctx_ddl.set_attribute('rqmt_user_ds', 'output_type', 'VARCHAR2');
    END;
    CREATE INDEX rqmt_cidx ON pjt_requirements_b(req_code)
    INDEXTYPE IS CTXSYS.CONTEXT
    PARAMETERS ('DATASTORE rqmt_user_ds
    SYNC (ON COMMIT)');

  • Oracle Text ALTER INDEX Performance

    Greetings,
    We have encountered some ehancement issues with Oracle Text and really need assistance.
    We are using Oracle 9i (Release 9.0.1) Standard Edition
    We are using a very simple Oracle text environmet, with CTXSYS.CONTEXT indextype on Domain Indexes.
    We have indexed two text columns in one table, one of these columns is CLOB.
    Currently if one of these columns is modified, we are using a trigger to automatically ALTER the index.
    This is very slow, it is just like dropping the index and creating it again.
    Is this right? should it be this slow?
    We are also trying to use the ONLINE parameter for ALTER INDEX and CREATE INDEX, but it gives an error saying this feature is not enabled.
    How can we enable it?
    Is there any way in improving the performance of this automatic update of the indexes?
    Would using a trigger be the best way to do this?
    How can we optimize it to a more satifactory performance level?
    Also, are we able to use the language lexers for indexes with the Standard Edition. If so, how do you enable the CTX_DLL?
    Many thanks for any assistance.
    Chi-Shyan Wang

    If you are going to sync your index on every update, you need to make sure that you are optmizing it on a regular basis to remove index fragmentation and remove deleted rows.
    you can set up a dmbs_job to do a ctx_ddl.optmize and run a full optmize periodically.
    Also, depending on the number of rows you have, and also the size of the data, you might want to look at using a CTXCAT index, which is transactional, stays in sync automatically and does not need to be optimized. CTXCAT indexes do not work well on large text objects (they are good for a couple lines of text at most) so they may not suit your dataset.

  • Oracle Text performance -- failed attempts

    We are trying to implement a simple search of text data stored in a heavily used table (inserts/updates). There are 3 columns to index --
    Headline (varchar2(255))
    Subheadline (varchar2(255))
    Teaser (varchar2(4000))
    The first attempt to implement Oracle text w/ CATSEARCH
    begin
    ctx_ddl.create_index_set('cms_iset');
    ctx_ddl.add_index('cms_iset','poolid_cp, mediaid_cp'); /* sub-index A */
    end;
    ---- We knew we were going to filter on poolid_cp and mediaid_cp ---
    CREATE INDEX cms_headlineidx ON con_properties (headline)
    INDEXTYPE IS ctxsys.CTXCAT
    PARAMETERS ('index set cms_iset');
    CREATE INDEX cms_subheadlineidx ON con_properties (subheadline)
    INDEXTYPE IS ctxsys.CTXCAT
    PARAMETERS ('index set cms_iset');
    CREATE INDEX cms_teaseridx ON con_properties (teaser)
    INDEXTYPE IS ctxsys.CTXCAT
    PARAMETERS ('index set cms_iset');
    *********THE RESULTS*************
    Our application server would spin up threads that would appear to be hanging. The load on the DB servers (RAC) were higher than normal. This implementation would have saved on having to do resync's manually.
    The next attempt was implementing w/ CONTEXT:
    alter table con_properties add (dummy varchar2(1));
    begin
    ctx_ddl.create_preference('con_propsearch', 'MULTI_COLUMN_DATASTORE');
    ctx_ddl.set_attribute('con_propsearch', 'columns', 'headline,subheadline,teaser');
    end;
    CREATE INDEX con_properties_searchidx
    ON con_properties(dummy)
    INDEXTYPE IS CTXSYS.CONTEXT
    PARAMETERS ('datastore CTXSYS.con_propsearch')
    Records getting put into the ctx_user_pending table a few hundred per hour.
    ********THE RESULTS*************
    Same issue with the application servers spinning off threads that seem to be hung. Spikey load on the DB servers (RAC).
    NOTE: In both implementations, running search querys ran OK. However, dropping the text index in BOTH cases caused the application servers to behave normally.
    Can anyone tell me what's going on internally with Oracle TEXT when a table is heavily inserted and updated? What is going on in the background. Is there some sort of lock that the app servers are waiting on? I know there is "overhead" with inserts on a normal b-tree index. Is it "exponential" with Oracle Text?
    Thank you!

    When documents in the base table are inserted, updated, or deleted, their ROWIDs are held in a DML queue until you synchronize the index. You can view this queue with the CTX_USER_PENDING view. Apparently, you are not synchronizing your context index, so the queue is building infinitely. You need to establish some method of synchronizing your index. You can use parameters('sync(on commit)') in your index creation or create an after insert or update statement level trigger, not row trigger, that uses dbms_job.submit to schedule ctx_ddl.sync_index to synchronize the index upon commit of the dml or you can manually run ctx_ddl.sync_index periodically or schedule it or you can alter and rebuild your index periodically or you can drop and recreate it periodically. Which method you choose depends on how current the information that you query needs to be. If your data needs to be current up to the moment, the you should sync on commit. Otherwise it may be better to do it in periodic batches.

  • How do I get Oracle Text to index files on a file server?

    I am new to Oracle (I'm a MS-SQL DBA looking for a Full-Text Search solution that is better than linking to a MS index server.)
    So - Here's the objective:
    I have Oracle Server(Express) installed on a Windows server.
    I would like for Oracle to build a Full-Text Catalog of the files on a separate file server based on file paths in a table in the database.
    (No desire to store terabytes of images and documents inside the database)
    I can get Oracle text up and running, using the URL_Datastore:
    CREATE TABLE files (id NUMBER PRIMARY KEY, issue_id NUMBER, path VARCHAR(255) UNIQUE, ot_format VARCHAR(6), ot_version VARCHAR(10));
    The Compaq server is a remote windows server on my local workgroup, so the fully qualified path is just "compaq" and the URL is valid:
    INSERT INTO files VALUES (9,9,'file://Compaq/FTQ/00000003.pdf',NULL,NULL);
    INSERT INTO files VALUES (13,13,'file://Compaq/FTQ/01.txt',NULL,NULL);
    CREATE INDEX file_index ON files(path) INDEXTYPE IS ctxsys.context
    PARAMETERS ('datastore ctxsys.URL_DATASTORE format column ot_format');
    but when I enter:
    Select * from CTX_User_Index_errors, I see the following errors:
    DRG-11609: URL store: unable to open local file specified by file://Compaq/FTQ/00000003.pdf
    DRG-11609: URL store: unable to open local file specified by file://Compaq/FTQ/01.txt
    Did I miss something?
    Do I need to install anything on the file server?
    I would like to convince my company that Oracle can be much quicker than Microsoft's Indexing Service because it can avoid joining two large result sets (one result set from Full_text (indexing service) and one for specific data contained in fields in the MS-SQL database.) Full Text Searches commonly take 40 - 60 seconds where there are 1.5 million multi-page PDF files for a particular set that I sample search on. Without this massive join, I believe I can get the search to run in under 10 seconds.

    Thank you!
    File_Datastore worked fine.
    I was staying away from File_Datastore because the information I gathered from googling suggested that file_datastore would only work locally.
    Now I just have to get Oracle to pull data out of tables in a MS-SQL database on the local network (don't have a clue yet), and then have it index compiled file paths.
    Then MS-SQL can query Oracle with index and full-text criteria and Oracle can send back a result set
    It may sound like a bad way of performing Full-Text Queries, but anything will be better than the way things are currently running. We are currently performing Full Text Searches on a table that is rebuilt nightly, so the table containing millions of file paths is not live..
    It would be so much better if we just migrated to Oracle, but we currently do not have the resources.

  • Pre-loading Oracle text in memory with Oracle 12c

    There is a white paper from Roger Ford that explains how to load the Oracle index in memory : http://www.oracle.com/technetwork/database/enterprise-edition/mem-load-082296.html
    In our application, Oracle 12c, we are indexing a big XML field (which is stored as XMLType with storage secure file) with the PATH_SECTION_GROUP. If I don't load the I table (DR$..$I) into memory using the technique explained in the white paper then I cannot have decent performance (and especially not predictable performance, it looks like if the blocks from the TOKEN_INFO columns are not memory then performance can fall sharply)
    But after migrating to oracle 12c, I got a different problem, which I can reproduce: when I create the index it is relatively small (as seen with ctx_report.index_size) and by applying the technique from the whitepaper, I can pin the DR$ I table into memory. But as soon as I do a ctx_ddl.optimize_index('Index','REBUILD') the size becomes much bigger and I can't pin the index in memory. Not sure if it is bug or not.
    What I found as work-around is to build the index with the following storage options:
    ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
    ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'YES' );
    ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
    so that the token_info column will be stored in a secure file. Then I can change the storage of that column to put it in the keep buffer cache, and write a procedure to read the LOB so that it will be loaded in the keep cache. The size of the LOB column is more or less the same as when creating the index without the BIG_IO option but it remains constant even after a ctx_dll.optimize_index. The procedure to read the LOB and to load it into the cache is very similar to the loaddollarR procedure from the white paper.
    Because of the SDATA section, there is a new DR table (S table) and an IOT on top of it. This is not documented in the white paper (the white paper was written for Oracle 10g). In my case this DR$ S table is much used, and the IOT also, but putting it in the keep cache is not as important as the token_info column of the DR I table. A final note: doing SEPARATE_OFFSETS = 'YES' was very bad in my case, the combined size of the two columns is much bigger than having only the TOKEN_INFO column and both columns are read.
    Here is an example on how to reproduce the problem with the size increasing when doing ctx_optimize
    1. create the table
    drop table test;
    CREATE TABLE test
    (ID NUMBER(9,0) NOT NULL ENABLE,
    XML_DATA XMLTYPE
    XMLTYPE COLUMN XML_DATA STORE AS SECUREFILE BINARY XML (tablespace users disable storage in row);
    2. insert a few records
    insert into test values(1,'<Book><TITLE>Tale of Two Cities</TITLE>It was the best of times.<Author NAME="Charles Dickens"> Born in England in the town, Stratford_Upon_Avon </Author></Book>');
    insert into test values(2,'<BOOK><TITLE>The House of Mirth</TITLE>Written in 1905<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
    insert into test values(3,'<BOOK><TITLE>Age of innocence</TITLE>She got a prize for it.<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
    3. create the text index
    drop index i_test;
      exec ctx_ddl.create_section_group('TEST_SGP','PATH_SECTION_GROUP');
    begin
      CTX_DDL.ADD_SDATA_SECTION(group_name => 'TEST_SGP', 
                                section_name => 'SData_02',
                                tag => 'SData_02',
                                datatype => 'varchar2');
    end;
    exec ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
    exec  ctx_ddl.set_attribute('TEST_STO','I_TABLE_CLAUSE','tablespace USERS storage (initial 64K)');
    exec  ctx_ddl.set_attribute('TEST_STO','I_INDEX_CLAUSE','tablespace USERS storage (initial 64K) compress 2');
    exec  ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'NO' );
    exec  ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
    create index I_TEST
      on TEST (XML_DATA)
      indextype is ctxsys.context
      parameters('
        section group   "TEST_SGP"
        storage         "TEST_STO"
      ') parallel 2;
    4. check the index size
    select ctx_report.index_size('I_TEST') from dual;
    it says :
    TOTALS FOR INDEX TEST.I_TEST
    TOTAL BLOCKS ALLOCATED:                                                104
    TOTAL BLOCKS USED:                                                      72
    TOTAL BYTES ALLOCATED:                                 851,968 (832.00 KB)
    TOTAL BYTES USED:                                      589,824 (576.00 KB)
    4. optimize the index
    exec ctx_ddl.optimize_index('I_TEST','REBUILD');
    and now recompute the size, it says
    TOTALS FOR INDEX TEST.I_TEST
    TOTAL BLOCKS ALLOCATED:                                               1112
    TOTAL BLOCKS USED:                                                    1080
    TOTAL BYTES ALLOCATED:                                 9,109,504 (8.69 MB)
    TOTAL BYTES USED:                                      8,847,360 (8.44 MB)
    which shows that it went from 576KB to 8.44MB. With a big index the difference is not so big, but still from 14G to 19G.
    5. Workaround: use the BIG_IO option, so that the token_info column of the DR$ I table will be stored in a secure file and the size will stay relatively small. Then you can load this column in the cache using a procedure similar to
    alter table DR$I_TEST$I storage (buffer_pool keep);
    alter table dr$i_test$i modify lob(token_info) (cache storage (buffer_pool keep));
    rem: now we must read the lob so that it will be loaded in the keep buffer pool, use the prccedure below
    create or replace procedure loadTokenInfo is
      type c_type is ref cursor;
      c2 c_type;
      s varchar2(2000);
      b blob;
      buff varchar2(100);
      siz number;
      off number;
      cntr number;
    begin
        s := 'select token_info from  DR$i_test$I';
        open c2 for s;
        loop
           fetch c2 into b;
           exit when c2%notfound;
           siz := 10;
           off := 1;
           cntr := 0;
           if dbms_lob.getlength(b) > 0 then
             begin
               loop
                 dbms_lob.read(b, siz, off, buff);
                 cntr := cntr + 1;
                 off := off + 4096;
               end loop;
             exception when no_data_found then
               if cntr > 0 then
                 dbms_output.put_line('4K chunks fetched: '||cntr);
               end if;
             end;
           end if;
        end loop;
    end;
    Rgds, Pierre

    I have been working a lot on that issue recently, I can give some more info.
    First I totally agree with you, I don't like to use the keep_pool and I would love to avoid it. On the other hand, we have a specific use case : 90% of the activity in the DB is done by queuing and dbms_scheduler jobs where response time does not matter. All those processes are probably filling the buffer cache. We have a customer facing application that uses the text index to search the database : performance is critical for them.
    What kind of performance do you have with your application ?
    In my case, I have learned the hard way that having the index in memory (the DR$I table in fact) is the key : if it is not, then performance is poor. I find it reasonable to pin the DR$I table in memory and if you look at competitors this is what they do. With MongoDB they explicitly says that the index must be in memory. With elasticsearch, they use JVM's that are also in memory. And effectively, if you look at the awr report, you will see that Oracle is continuously accessing the DR$I table, there is a SQL similar to
    SELECT /*+ DYNAMIC_SAMPLING(0) INDEX(i) */    
    TOKEN_FIRST, TOKEN_LAST, TOKEN_COUNT, ROWID    
    FROM DR$idxname$I
    WHERE TOKEN_TEXT = :word AND TOKEN_TYPE = :wtype    
    ORDER BY TOKEN_TEXT,  TOKEN_TYPE,  TOKEN_FIRST
    which is continuously done.
    I think that the algorithm used by Oracle to keep blocks in cache is too complex. A just realized that in 12.1.0.2 (was released last week) there is finally a "killer" functionality, the in-memory parameters, with which you can pin tables or columns in memory with compression, etc. this looks ideal for the text index, I hope that R. Ford will finally update his white paper :-)
    But my other problem was that the optimize_index in REBUILD mode caused the DR$I table to double in size : it seems crazy that this was closed as not a bug but it was and I can't do anything about it. It is a bug in my opinion, because the create index command and "alter index rebuild" command both result in a much smaller index, so why would the guys that developped the optimize function (is it another team, using another algorithm ?) make the index two times bigger ?
    And for that the track I have been following is to put the index in a 16K tablespace : in this case the space used by the index remains more or less flat (increases but much more reasonably). The difficulty here is to pin the index in memory because the trick of R. Ford was not working anymore.
    What worked:
    first set the keep_pool to zero and set the db_16k_cache_size to instead. Then change the storage preference to make sure that everything you want to cache (mostly the DR$I) table come in the tablespace with the non-standard block size of 16k.
    Then comes the tricky part : the pre-loading of the data in the buffer cache. The problem is that with Oracle 12c, Oracle will use direct_path_read for FTS which basically means that it bypasses the cache and read directory from file to the PGA !!! There is an event to avoid that, I was lucky to find it on a blog (I can't remember which, sorry for the credit).
    I ended-up doing that. the events to 10949 is to avoid the direct path reads issue.
    alter session set events '10949 trace name context forever, level 1';
    alter table DR#idxname0001$I cache;
    alter table DR#idxname0002$I cache;
    alter table DR#idxname0003$I cache;
    SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0001$I;
    SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0002$I;
    SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0003$I;
    SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0001$I ITAB;
    SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0002$I ITAB;
    SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0003$I ITAB;
    It worked. With a big relief I expected to take some time out, but there was a last surprise. The command
    exec ctx_ddl.optimize_index(idx_name=>'idxname',part_name=>'partname',optlevel=>'REBUILD');
    gqve the following
    ERROR at line 1:
    ORA-20000: Oracle Text error:
    DRG-50857: oracle error in drftoptrebxch
    ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION
    ORA-06512: at "CTXSYS.DRUE", line 160
    ORA-06512: at "CTXSYS.CTX_DDL", line 1141
    ORA-06512: at line 1
    Which is very much exactly described in a metalink note 1645634.1 but in the case of a non-partitioned index. The work-around given seemed very logical but it did not work in the case of a partitioned index. After experimenting, I found out that the bug occurs when the partitioned index is created with  dbms_pclxutil.build_part_index procedure (this enables  enables intra-partition parallelism in the index creation process). This is a very annoying and stupid bug, maybe there is a work-around, but did not find it on metalink
    Other points of attention with the text index creation (stuff that surprised me at first !) ;
    - if you use the dbms_pclxutil package, then the ctx_output logging does not work, because the index is created immediately and then populated in the background via dbms_jobs.
    - this in combination with the fact that if you are on a RAC, you won't see any activity on the box can be very frightening : this is because oracle can choose to start the workers on the other node.
    I understand much better how the text indexing works, I think it is a great technology which can scale via partitioning. But like always the design of the application is crucial, most of our problems come from the fact that we did not choose the right sectioning (we choosed PATH_SECTION_GROUP while XML_SECTION_GROUP is so much better IMO). Maybe later I can convince the dev to change the sectionining, especially because SDATA and MDATA section are not supported with PATCH_SECTION_GROUP (although it seems to work, even though we had one occurence of a bad result linked to the existence of SDATA in the index definition). Also the whole problematic of mixed structured/unstructured searches is completly tackled if one use XML_SECTION_GROUP with MDATA/SDATA (but of course the app was written for Oracle 10...)
    Regards, Pierre

  • Suggestion: Oracle text CONTEXT index on one or more columns ?

    Hi,
    I'm implementing Oracle text using CONTEXT ..... and would like to ask you for performance suggestion ...
    I have a table of Articles .... with columns .. TITLE, SUBTITLE , BODY ...
    Now is it better from performance point of view to move all three columns into one dummy column ... with name like FULLTEXT ... and put index on this single column,
    and then use CONTAINS(FULLTEXT,'...')>0
    Or is it almost the same for oracle if i put indexes on all three columns and then call:
    CONTAINS(TITLE,'...')>0 OR CONTAINS(SUBTITLE,'...')>0 OR CONTAINS(BODY,'...')>0
    I actually don't care if the result is a match in TITLE OR SUBTITLE OR BODY ....
    So if i move into some FULLTEXT column, then i have duplicate data in a article row ... but if i create indexes for each column, than oracle has 2x more to index,optimize and search ... am I wright ?
    Table has 1.8mil records ...
    Thank you.
    Kris

    mackrispi wrote:
    Now is it better from performance point of view to move all three columns into one dummy column ... with name like FULLTEXT ... and put index on this single column,
    and then use CONTAINS(FULLTEXT,'...')>0What version of Oracle are you on? If 11 then you could use a virtual column to do this, otherwise you'd have to write code to maintain the column which can get messy.
    mackrispi wrote:
    Or is it almost the same for oracle if i put indexes on all three columns and then call:
    CONTAINS(TITLE,'...')>0 OR CONTAINS(SUBTITLE,'...')>0 OR CONTAINS(BODY,'...')>0Benchmark it and find out :)
    Another option would be something like this.
    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9455353124561
    Were i you, i would try out those 3 approaches and see which meet your performance requirements and weigh that with the ease of implementation and administration.

Maybe you are looking for