Oracle text + thesaurus

Hi, I use oracle text with "DEFAULT" thesaurus and everythings is OK.
I defined with "Oracle text manager" different thesaurus, if I would like to use it where I have to specify them? Can I define the thesaurus to use in SELECTION time?
Select....
where...
contains(SYN.....)
(use medical thesaurus)
Tanks ;-)

Solution founded
select ...
from ...
where contains(.....,'SYN(term,THES_NAME)' ) > 0
Tanks ; - )

Similar Messages

  • Oracle text thesaurus management

    Does anyone know if oracle supports a user interface for management of the thesuarus used in text search? ie browsing and editing this thesaurus? (I know indexes etc will need to be re-created to use the changed thesaurus) Anyone heard of another product that does this?

    I don't think there's a UI available at the moment for modifying the thesaurus, but it's all held in a straightforward manner in tables, so it wouldn't be hard to build one. I know people have done so in the past.
    You would NOT need to recreate indexes after loading / modifying the thesaurus. The thesaurus is used only at query time - it has no effect on index builds.

  • Pre-built Thesaurus for Oracle Text

    Could anyone provide me direction on where to get Pre-built Thesaurus For Oracle Text that is specific to pharmaceutical science companies.
    any advice would be appreicated

    There are two ways you can use pre-built Thesaurus in Oracle.
    1) Just as an earlier reply you got, use the Thesaurus Management System (TMS) product that comes with a prebuilt Medra database for mdeical and other related text mining.
    2) Import and supplement your own medical thesaurus into the existing thesaurus that comes pre-loaded with the database using the provided APIs. As an example, you could download from the web the UMLS thesaurus content, convert the data into the API required format and load the UMLS data into the database. You can then cluster and classify documents/abstracts based on themes or subject matter.
    Regards.

  • Error in accessing Knowledge base  + oracle text 11g

    Hi All,
    I am using oracle text 11g.
    I had installed a knowledge base and it was working fine(bcos i was able to generate themes and gists), i wanted to load a 'default' thesaurus, i created a 'default' thesaurus and from cmd prompt i gave 'ctxload' command, then it asked for user(i logged in as ctxsys credentials), name (thes default),file( thes path), but after this i got below error
    connecting...
    DRG-11510: encountered unrecoverable error on line1
    ORA-00931: missing identifier
    disconnected
    unknowingly after this i gave 'ctxkbtc' command and proceeded further and i cancelled this action by closing cmd prompt.
    after this, now i am not able to create themes or gists, i am getting the below error while creating themes.
    Error report:
    ORA-20000: Oracle Text error:
    DRG-11422: linguistic initialization failed
    DRG-00100: internal error, arguments : [52100],[drxa.c],[357],[gxtopen],[1]
    DRG-00100: internal error, arguments : [52100],[gxt.c],[186],[gxl err],[7]
    DRG-11432: file error while accessing knowledge base
    ORA-06512: at "CTXSYS.DRUE", line 160
    ORA-06512: at "CTXSYS.CTX_DOC", line 210
    ORA-06512: at line 2
    20000. 00000 - "%s"
    *Cause:    The stored procedure 'raise_application_error'
    was called which causes this error to be generated.
    *Action:   Correct the problem as described in the error message or contact
    the application administrator or DBA for more information.
    please let me know the solution
    Thanks in advance.

    Hi,
    You can try running ctxkbtc once more with revert option.
    -revert
    Reverts the extended knowledge base to the default knowledge base provided by
    Oracle Text.
    Make sure to set NLS_LANG variable before running it
    If you have already solved this issue, if possible share the solution to this forum too.

  • Running Oracle Text Manager w/o DBA

    We've been using the Oracle Text KB and our own thesaurus to index documents on help.unc.edu for two years now. We upload the new thesaurus files with ctxload each time new terms need to be added. I have to ask my DBA to run it in the shell on the DB server. I'd really like to manage the thesaurus and KB directly using the "Oracle Text Manager" application. Can this be run without DBA privledges, by granting select privledges to some other user? Can it be run from a client instead of on the server itself?
    Does anyone have recommendations in this regard?
    Many thanks,
    Greg Jansen
    ITS Knowledge Management
    UNC Chapel Hill

    No.  You can still create and distribute profiles but not through a push solution.  Look at MCX in WGM or even Apple Configurator and manually distribute.
    If this is a lab with no internet, how much change management do you need?

  • Deciding between Oracle Text  v/s PL/SQL

    In the Oracle Text technical document it is mentioned that the Standard ( CONTEXT ) and Catalog ( CTXCAT ) types of index are used to build index for larger co-herent documents and performing mixed querires respectively.
    As I read furthur, I understand that if the requirement is not heavily document centric, then may be Oracle Text is not an ideal candidate to use. If most of the data is going to reside primarily in tables, then standard PL/ SQL queries and joins is the way to go. But on the other hand using standard SQL for names matches using LIKE operator, for eg, may not guarantee to work or may be complex to implement when trying wildcard or theme matches.
    So the question is do we use Oracle text irrespective of the type of content being indexed i.e table data v/s documents ? How do we make that judgement?

    What type of data do you have and what types of searches do you want to be able to do? If you need features that are only available in Text, then you need Text. For example, if you will be searching documents that are stored in operatinig system files or in blob columns and you want to do stem searches or fuzzy searches or use a thesaurus, then you will need Oracle Text. If, on the other hand, the data that you have and the searches that you want can be done with or without Text, then you have a choice to make, with the major issue being which is more efficient. When in doubt, a little testing can help you decide. Set up a realistic test environment, test some queries both ways, and see which is fastest. If you are just doing standard searches on varchar2 columns, you may get better performance without Text.

  • Oracle Text Query of abbreviated word / name

    I'm new to Oracle Text so please excuse the (probably) simple question. I want to be able to create a search that excludes (includes?) special characters and/or spaces between an abbreviated name. I'm not sure if it's possible but I would like to be able to return all of the below results if someone queried for "ABC" in one form or another.
    Would this be something I'd add to a thesaurus? I see there is a STOPLIST but I'm not sure if there is the opposite of a stoplist.
    Thanks in advance!
    Regards,
    Rich
    set def off;
    drop table docs;
    CREATE TABLE docs (id NUMBER PRIMARY KEY, text VARCHAR2(200));
    INSERT INTO docs VALUES(1, 'ABC are my favorite letters.');
    INSERT INTO docs VALUES(2, 'My favorite letters are A,B,C');
    INSERT INTO docs VALUES(3, 'The best letters are A.B.C.');
    INSERT INTO docs VALUES(4, 'Three of the word letters are A-B-C.');
    INSERT INTO docs VALUES(5, 'A B C are great letters.');
    INSERT INTO docs VALUES(6, 'AB and C are easy letters to remember');
    INSERT INTO docs VALUES(7, 'What if we used A, B, & C?');
    commit;
    begin
    ctx_ddl.drop_preference('english_lexar');
    end;
    begin
    ctx_ddl.create_preference('english_lexar', 'BASIC_LEXER');
    ctx_ddl.set_attribute('english_lexar', 'printjoins', '_-');
    ctx_ddl.set_attribute('english_lexar', 'skipjoins', '-.');
    --ctx_ddl.set_attribute ( 'english_lexar', 'index_themes', 'YES');
    ctx_ddl.set_attribute ( 'english_lexar', 'index_text', 'YES');
    ctx_ddl.set_attribute ( 'english_lexar', 'index_stems', 'SPANISH');
    ctx_ddl.set_attribute ( 'english_lexar', 'mixed_case', 'YES');
    ctx_ddl.set_attribute ( 'english_lexar', 'base_letter', 'YES');
    end;
    begin
    ctx_ddl.drop_preference('STEM_FUZZY_PREF');
    end;
    begin
      ctx_ddl.create_preference('STEM_FUZZY_PREF', 'BASIC_WORDLIST');
      ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_MATCH','ENGLISH');
      ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_SCORE','0');
      ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_NUMRESULTS','5000');
      ctx_ddl.set_attribute('STEM_FUZZY_PREF','SUBSTRING_INDEX','TRUE');
      ctx_ddl.set_attribute('STEM_FUZZY_PREF','PREFIX_INDEX','TRUE');
      ctx_ddl.set_attribute('STEM_FUZZY_PREF','STEMMER','ENGLISH');
    end;
    begin
    ctx_ddl.drop_preference('wildcard_pref');
    end;
    begin
        Ctx_Ddl.create_Preference('wildcard_pref', 'BASIC_WORDLIST');
        ctx_ddl.set_attribute('wildcard_pref', 'wildcard_maxterms', 100) ;
    end;
    DROP index myindex;
    create index myindex on docs (text)
      indextype is ctxsys.context
      parameters ( 'LEXER english_lexar Wordlist wildcard_pref' );
    EXEC CTX_DDL.SYNC_INDEX('myindex', '2M');
    SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'ABC', 1) > 0;It may be that my SQL statement isn't taking advantage of the Text options -- i.e. I'm forgetting something obvious :)

    Indexes are case-insensitive by default, so let's ignore that.
    You can make wal-mart and wal*mart match walmart by defining "-" and "*" as SKIPJOINS characters. However, you cannot make wal mart match walmart, other than by using NDATA.
    NDATA does seem to work - any variation of wal mart walmart wal*mart and wal-mart do manage to match both walmart and wal mart. See example:
    SQL> create table testcase (text varchar2(2000));
    Table created.
    SQL> insert into testcase values ('<nd>walmart</nd>');
    1 row created.
    SQL> insert into testcase values ('<nd>wal mart</nd>');
    1 row created.
    SQL> exec ctx_ddl.drop_section_group('tcsg')
    PL/SQL procedure successfully completed.
    SQL> exec ctx_ddl.create_section_group('tcsg', 'xml_section_group')
    PL/SQL procedure successfully completed.
    SQL> exec ctx_ddl.add_ndata_section('tcsg', 'nd', 'nd')
    PL/SQL procedure successfully completed.
    SQL> create index testcase_index on testcase(text)
      2  indextype is ctxsys.context
      3  parameters ('section group tcsg')
      4  /
    Index created.
    SQL> select * from testcase where contains (text, 'ndata(nd, wal mart)') > 0;
    TEXT
    <nd>walmart</nd>
    <nd>wal mart</nd>
    SQL> select * from testcase where contains (text, 'ndata(nd, wal-mart)') > 0;
    TEXT
    <nd>walmart</nd>
    <nd>wal mart</nd>
    SQL> select * from testcase where contains (text, 'ndata(nd, wal*mart)') > 0;
    TEXT
    <nd>walmart</nd>
    <nd>wal mart</nd>
    SQL> select * from testcase where contains (text, 'ndata(nd, walmart)') > 0;
    TEXT
    <nd>walmart</nd>
    <nd>wal mart</nd>Edited by: Roger Ford on Jun 21, 2012 10:22 AM

  • Oracle Text Classification/Clustering

    Is anyone using the Oracle Text classification/clustering technology? I am working on a project where we are doing research on using this type of technology for our Oracle Text searches.

    Thanks for your help. I will talk to the person I am working with to see if he thinks we can go this route. I know he is the one that is the contact person for the thesaurus. If I have any more questions, I'll post to this thread. It will be the first of next week before I can get with him.
    Thanks,
    --Sandra :->                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • Ultra Search/ Oracle Text capabilities

    Our decision to go forward with Oracle9i is contingent upon the extensible use of Ultra Search and Oracle Text in our planned endeavors.
    Basically we are to build a system to do the following:
    1) download information (html files, links, documents) from web sites and accessible disk archives. The url sites are particular to a domain.
    2) place the downloaded file information into our Oracle database or download to local system with appropriate links in database.
    3) perform queries on the downloaded information through the database to isolate files for analysis.
    4) analyze and perform extraction on the information. For example, query based on a defined hierarchy of vulnerability terms.
    I've demoed Ultra Search and Oracle Text. I believe that Ultra Search can handle step 1, and possibly step 2 and that Oracle Text can help in step 4. Step 3 is satisfied by the Oracle database.
    I need to know details concerning Ultra Search and Oracle Text before committing:
    o when Ultra Search performs its crawling, how is found information represented in the database. Is a whole html file or document downloaded or are references to these documents stored in the database? If references are stored does Ultra Search embed the capability to download these files to be analyzed?
    o is Oracle Text the right tool to provide the capability for robust analysis of downloaded documents.
    o I have used the sample JSP that came with Ultra Search. Are there any more detailed examples which my above steps. In particular, performing robust analysis on downloaded documents from step 1.
    We have and are still exploring other COTS products to find a solution. Are main goal is to have the retrieved documents and analysis information resident in the database in this phase of our project. We find other COTS can perform the web crawling, but lack analysis, or vice versa and that their solutions are so vendor specific that in many times their services would be required to build a suitable solution that is not very extensible.
    Thanks for any feedback.

    Ultra Search does not keep documents in the database permanently. We bring them in for indexing purposes, but remove them after
    the indexing is completed. However, we keep the URLs of each unique document that was found during the crawling. You would
    have to do the downloading yourself. However, we are thinking about providing a mechanism, maybe in the form of an API, that
    would allow customers to retrieve documents. Please contact me on this issue if you are interested to discuss this: (650)-506-8173.
    Generally speaking you will find that Oracle Text is a very powerful tool for analysis of textual documents, especially since it is
    driven through the SQL language, has extensive functionality (themes, user-defined knowledge base, thesaurus, and many useful linguistic
    functions like segmentation, stemming, and globalisation support).
    The philosophy of Ultra Search is to provide you with an out-of-the-box solution for crawling and searching your data without the
    need for programming. Ultra Search is built on top of Text, so I would advise you to use Text to do the further analysis of your
    documents after they have been located by the crawler.
    Best Regards,
    Stefan Buchta

  • Loading Oracle Text Thesurus

    Is it possible to load the thesaurus below into oracle without human intervention, or command line ctxload avaliable.
    John
    RT Jon
    BT John
    Jon
    RT John
    BT Jonathan
    Jonathan
    RT Jon
    BT Jonathan
    I’m working on extending search functionality to use Oracle Text. At the moment Continuous Integration requires human intervention to load the Thesaurus.
    Ant is executing sql and pl/sql which wipes the database and rebuilds the test schema, at the moment I’ve not been able to get it to load the thesaurus.

    You can create and build a thesaurus using the ctx_thes package. You could create a plsql procedure that reads in the thesaurus from the file system using a bfile. Then you procedure can parse through the external file and make appropriate calls to the ctx_thes package.
    If you can you store you thesaurus in XML format then you can leverage sql xml and oracle xmldb api's to parse through the file and load the thesaurus.
    Hope this helps.

  • How do I get Oracle Text to index files on a file server?

    I am new to Oracle (I'm a MS-SQL DBA looking for a Full-Text Search solution that is better than linking to a MS index server.)
    So - Here's the objective:
    I have Oracle Server(Express) installed on a Windows server.
    I would like for Oracle to build a Full-Text Catalog of the files on a separate file server based on file paths in a table in the database.
    (No desire to store terabytes of images and documents inside the database)
    I can get Oracle text up and running, using the URL_Datastore:
    CREATE TABLE files (id NUMBER PRIMARY KEY, issue_id NUMBER, path VARCHAR(255) UNIQUE, ot_format VARCHAR(6), ot_version VARCHAR(10));
    The Compaq server is a remote windows server on my local workgroup, so the fully qualified path is just "compaq" and the URL is valid:
    INSERT INTO files VALUES (9,9,'file://Compaq/FTQ/00000003.pdf',NULL,NULL);
    INSERT INTO files VALUES (13,13,'file://Compaq/FTQ/01.txt',NULL,NULL);
    CREATE INDEX file_index ON files(path) INDEXTYPE IS ctxsys.context
    PARAMETERS ('datastore ctxsys.URL_DATASTORE format column ot_format');
    but when I enter:
    Select * from CTX_User_Index_errors, I see the following errors:
    DRG-11609: URL store: unable to open local file specified by file://Compaq/FTQ/00000003.pdf
    DRG-11609: URL store: unable to open local file specified by file://Compaq/FTQ/01.txt
    Did I miss something?
    Do I need to install anything on the file server?
    I would like to convince my company that Oracle can be much quicker than Microsoft's Indexing Service because it can avoid joining two large result sets (one result set from Full_text (indexing service) and one for specific data contained in fields in the MS-SQL database.) Full Text Searches commonly take 40 - 60 seconds where there are 1.5 million multi-page PDF files for a particular set that I sample search on. Without this massive join, I believe I can get the search to run in under 10 seconds.

    Thank you!
    File_Datastore worked fine.
    I was staying away from File_Datastore because the information I gathered from googling suggested that file_datastore would only work locally.
    Now I just have to get Oracle to pull data out of tables in a MS-SQL database on the local network (don't have a clue yet), and then have it index compiled file paths.
    Then MS-SQL can query Oracle with index and full-text criteria and Oracle can send back a result set
    It may sound like a bad way of performing Full-Text Queries, but anything will be better than the way things are currently running. We are currently performing Full Text Searches on a table that is rebuilt nightly, so the table containing millions of file paths is not live..
    It would be so much better if we just migrated to Oracle, but we currently do not have the resources.

  • Error while running the Oracle Text optimize index procedure (even as a dba user too)

    Hi Experts,
    I am on Oracle on 11.2.0.2  on Linux. I have implemented Oracle Text. My Oracle Text indexes are fragmented but I am getting an error while running the optimize_index error. Following is the error:
    begin
      ctx_ddl.optimize_index(idx_name=>'ACCESS_T1',optlevel=>'FULL');
    end;
    ERROR at line 1:
    ORA-20000: Oracle Text error:
    ORA-06512: at "CTXSYS.DRUE", line 160
    ORA-06512: at "CTXSYS.CTX_DDL", line 941
    ORA-06512: at line 1
    Now I tried then to run this as DBA user too and it failed the same way!
    begin
      ctx_ddl.optimize_index(idx_name=>'BVSCH1.ACCESS_T1',optlevel=>'FULL');
    end;
    ERROR at line 1:
    ORA-20000: Oracle Text error:
    ORA-06512: at "CTXSYS.DRUE", line 160
    ORA-06512: at "CTXSYS.CTX_DDL", line 941
    ORA-06512: at line 1
    Now CTXAPP role is granted to my schema and still I am getting this error. I will be thankful for the suggestions.
    Also one other important observation: We have this issue ONLY in one database and in the other two databases, I don't see any problem at all.
    I am unable to figure out what the issue is with this one database!
    Thanks,
    OrauserN

    How about check the following?
    Bug 10626728 - CTX_DDL.optimize_index "full" fails with an empty ORA-20000 since 11.2.0.2 upgrade (DOCID 10626728.8)

  • Getting error while importing schema with ORACLE TEXT

    IMP-00003: ORACLE error 20000 encountered
    ORA-20000: Oracle Text error:
    DRG-52204: error while registering index
    DRG-10507: duplicate index name: WORKORDER_Q, owner: SYS
    ORA-06512: at "CTXSYS.DRUE", line 160
    ORA-06512: at "CTXSYS.DRIIMP", line 115
    ORA-06512: at line 2
    IMP-00088: Problem importing metadata for index WORKORDER_Q. Index creation will be skipped
    Database version - Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
    Os version - Linux nlxs1012.slb.atosorigin-asp.com 2.6.18-308.el5 #1 SMP Fri Jan 27 17:17:51 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
    We have take export of schema from production db now importing data to qa environment..
    In import facing above error..

    I am importing objects from P20_MAXIMO to Q25_MAXIMO to another database..
    Below is import par file..
    USERID='/ as sysdba'
    FILE=exp_P20_MAXIMO_C2364781.dmp
    LOG=imp_P20_MAXIMO__Q25_MAXIMO_C2364781_1.log
    FROMUSER=P20_MAXIMO
    TOUSER=Q25_MAXIMO
    buffer=1000000
    feedback=100000
    Export parfile
    userid='/ as sysdba'
    owner=P20_MAXIMO
    FILE=exp_P20_MAXIMO_C2364781.dmp
    LOG=exp_P20_MAXIMO_C2364781.log
    buffer=10000000
    feedback=100000
    statistics=none

  • Pre-loading Oracle text in memory with Oracle 12c

    There is a white paper from Roger Ford that explains how to load the Oracle index in memory : http://www.oracle.com/technetwork/database/enterprise-edition/mem-load-082296.html
    In our application, Oracle 12c, we are indexing a big XML field (which is stored as XMLType with storage secure file) with the PATH_SECTION_GROUP. If I don't load the I table (DR$..$I) into memory using the technique explained in the white paper then I cannot have decent performance (and especially not predictable performance, it looks like if the blocks from the TOKEN_INFO columns are not memory then performance can fall sharply)
    But after migrating to oracle 12c, I got a different problem, which I can reproduce: when I create the index it is relatively small (as seen with ctx_report.index_size) and by applying the technique from the whitepaper, I can pin the DR$ I table into memory. But as soon as I do a ctx_ddl.optimize_index('Index','REBUILD') the size becomes much bigger and I can't pin the index in memory. Not sure if it is bug or not.
    What I found as work-around is to build the index with the following storage options:
    ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
    ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'YES' );
    ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
    so that the token_info column will be stored in a secure file. Then I can change the storage of that column to put it in the keep buffer cache, and write a procedure to read the LOB so that it will be loaded in the keep cache. The size of the LOB column is more or less the same as when creating the index without the BIG_IO option but it remains constant even after a ctx_dll.optimize_index. The procedure to read the LOB and to load it into the cache is very similar to the loaddollarR procedure from the white paper.
    Because of the SDATA section, there is a new DR table (S table) and an IOT on top of it. This is not documented in the white paper (the white paper was written for Oracle 10g). In my case this DR$ S table is much used, and the IOT also, but putting it in the keep cache is not as important as the token_info column of the DR I table. A final note: doing SEPARATE_OFFSETS = 'YES' was very bad in my case, the combined size of the two columns is much bigger than having only the TOKEN_INFO column and both columns are read.
    Here is an example on how to reproduce the problem with the size increasing when doing ctx_optimize
    1. create the table
    drop table test;
    CREATE TABLE test
    (ID NUMBER(9,0) NOT NULL ENABLE,
    XML_DATA XMLTYPE
    XMLTYPE COLUMN XML_DATA STORE AS SECUREFILE BINARY XML (tablespace users disable storage in row);
    2. insert a few records
    insert into test values(1,'<Book><TITLE>Tale of Two Cities</TITLE>It was the best of times.<Author NAME="Charles Dickens"> Born in England in the town, Stratford_Upon_Avon </Author></Book>');
    insert into test values(2,'<BOOK><TITLE>The House of Mirth</TITLE>Written in 1905<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
    insert into test values(3,'<BOOK><TITLE>Age of innocence</TITLE>She got a prize for it.<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
    3. create the text index
    drop index i_test;
      exec ctx_ddl.create_section_group('TEST_SGP','PATH_SECTION_GROUP');
    begin
      CTX_DDL.ADD_SDATA_SECTION(group_name => 'TEST_SGP', 
                                section_name => 'SData_02',
                                tag => 'SData_02',
                                datatype => 'varchar2');
    end;
    exec ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
    exec  ctx_ddl.set_attribute('TEST_STO','I_TABLE_CLAUSE','tablespace USERS storage (initial 64K)');
    exec  ctx_ddl.set_attribute('TEST_STO','I_INDEX_CLAUSE','tablespace USERS storage (initial 64K) compress 2');
    exec  ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'NO' );
    exec  ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
    create index I_TEST
      on TEST (XML_DATA)
      indextype is ctxsys.context
      parameters('
        section group   "TEST_SGP"
        storage         "TEST_STO"
      ') parallel 2;
    4. check the index size
    select ctx_report.index_size('I_TEST') from dual;
    it says :
    TOTALS FOR INDEX TEST.I_TEST
    TOTAL BLOCKS ALLOCATED:                                                104
    TOTAL BLOCKS USED:                                                      72
    TOTAL BYTES ALLOCATED:                                 851,968 (832.00 KB)
    TOTAL BYTES USED:                                      589,824 (576.00 KB)
    4. optimize the index
    exec ctx_ddl.optimize_index('I_TEST','REBUILD');
    and now recompute the size, it says
    TOTALS FOR INDEX TEST.I_TEST
    TOTAL BLOCKS ALLOCATED:                                               1112
    TOTAL BLOCKS USED:                                                    1080
    TOTAL BYTES ALLOCATED:                                 9,109,504 (8.69 MB)
    TOTAL BYTES USED:                                      8,847,360 (8.44 MB)
    which shows that it went from 576KB to 8.44MB. With a big index the difference is not so big, but still from 14G to 19G.
    5. Workaround: use the BIG_IO option, so that the token_info column of the DR$ I table will be stored in a secure file and the size will stay relatively small. Then you can load this column in the cache using a procedure similar to
    alter table DR$I_TEST$I storage (buffer_pool keep);
    alter table dr$i_test$i modify lob(token_info) (cache storage (buffer_pool keep));
    rem: now we must read the lob so that it will be loaded in the keep buffer pool, use the prccedure below
    create or replace procedure loadTokenInfo is
      type c_type is ref cursor;
      c2 c_type;
      s varchar2(2000);
      b blob;
      buff varchar2(100);
      siz number;
      off number;
      cntr number;
    begin
        s := 'select token_info from  DR$i_test$I';
        open c2 for s;
        loop
           fetch c2 into b;
           exit when c2%notfound;
           siz := 10;
           off := 1;
           cntr := 0;
           if dbms_lob.getlength(b) > 0 then
             begin
               loop
                 dbms_lob.read(b, siz, off, buff);
                 cntr := cntr + 1;
                 off := off + 4096;
               end loop;
             exception when no_data_found then
               if cntr > 0 then
                 dbms_output.put_line('4K chunks fetched: '||cntr);
               end if;
             end;
           end if;
        end loop;
    end;
    Rgds, Pierre

    I have been working a lot on that issue recently, I can give some more info.
    First I totally agree with you, I don't like to use the keep_pool and I would love to avoid it. On the other hand, we have a specific use case : 90% of the activity in the DB is done by queuing and dbms_scheduler jobs where response time does not matter. All those processes are probably filling the buffer cache. We have a customer facing application that uses the text index to search the database : performance is critical for them.
    What kind of performance do you have with your application ?
    In my case, I have learned the hard way that having the index in memory (the DR$I table in fact) is the key : if it is not, then performance is poor. I find it reasonable to pin the DR$I table in memory and if you look at competitors this is what they do. With MongoDB they explicitly says that the index must be in memory. With elasticsearch, they use JVM's that are also in memory. And effectively, if you look at the awr report, you will see that Oracle is continuously accessing the DR$I table, there is a SQL similar to
    SELECT /*+ DYNAMIC_SAMPLING(0) INDEX(i) */    
    TOKEN_FIRST, TOKEN_LAST, TOKEN_COUNT, ROWID    
    FROM DR$idxname$I
    WHERE TOKEN_TEXT = :word AND TOKEN_TYPE = :wtype    
    ORDER BY TOKEN_TEXT,  TOKEN_TYPE,  TOKEN_FIRST
    which is continuously done.
    I think that the algorithm used by Oracle to keep blocks in cache is too complex. A just realized that in 12.1.0.2 (was released last week) there is finally a "killer" functionality, the in-memory parameters, with which you can pin tables or columns in memory with compression, etc. this looks ideal for the text index, I hope that R. Ford will finally update his white paper :-)
    But my other problem was that the optimize_index in REBUILD mode caused the DR$I table to double in size : it seems crazy that this was closed as not a bug but it was and I can't do anything about it. It is a bug in my opinion, because the create index command and "alter index rebuild" command both result in a much smaller index, so why would the guys that developped the optimize function (is it another team, using another algorithm ?) make the index two times bigger ?
    And for that the track I have been following is to put the index in a 16K tablespace : in this case the space used by the index remains more or less flat (increases but much more reasonably). The difficulty here is to pin the index in memory because the trick of R. Ford was not working anymore.
    What worked:
    first set the keep_pool to zero and set the db_16k_cache_size to instead. Then change the storage preference to make sure that everything you want to cache (mostly the DR$I) table come in the tablespace with the non-standard block size of 16k.
    Then comes the tricky part : the pre-loading of the data in the buffer cache. The problem is that with Oracle 12c, Oracle will use direct_path_read for FTS which basically means that it bypasses the cache and read directory from file to the PGA !!! There is an event to avoid that, I was lucky to find it on a blog (I can't remember which, sorry for the credit).
    I ended-up doing that. the events to 10949 is to avoid the direct path reads issue.
    alter session set events '10949 trace name context forever, level 1';
    alter table DR#idxname0001$I cache;
    alter table DR#idxname0002$I cache;
    alter table DR#idxname0003$I cache;
    SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0001$I;
    SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0002$I;
    SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0003$I;
    SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0001$I ITAB;
    SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0002$I ITAB;
    SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0003$I ITAB;
    It worked. With a big relief I expected to take some time out, but there was a last surprise. The command
    exec ctx_ddl.optimize_index(idx_name=>'idxname',part_name=>'partname',optlevel=>'REBUILD');
    gqve the following
    ERROR at line 1:
    ORA-20000: Oracle Text error:
    DRG-50857: oracle error in drftoptrebxch
    ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION
    ORA-06512: at "CTXSYS.DRUE", line 160
    ORA-06512: at "CTXSYS.CTX_DDL", line 1141
    ORA-06512: at line 1
    Which is very much exactly described in a metalink note 1645634.1 but in the case of a non-partitioned index. The work-around given seemed very logical but it did not work in the case of a partitioned index. After experimenting, I found out that the bug occurs when the partitioned index is created with  dbms_pclxutil.build_part_index procedure (this enables  enables intra-partition parallelism in the index creation process). This is a very annoying and stupid bug, maybe there is a work-around, but did not find it on metalink
    Other points of attention with the text index creation (stuff that surprised me at first !) ;
    - if you use the dbms_pclxutil package, then the ctx_output logging does not work, because the index is created immediately and then populated in the background via dbms_jobs.
    - this in combination with the fact that if you are on a RAC, you won't see any activity on the box can be very frightening : this is because oracle can choose to start the workers on the other node.
    I understand much better how the text indexing works, I think it is a great technology which can scale via partitioning. But like always the design of the application is crucial, most of our problems come from the fact that we did not choose the right sectioning (we choosed PATH_SECTION_GROUP while XML_SECTION_GROUP is so much better IMO). Maybe later I can convince the dev to change the sectionining, especially because SDATA and MDATA section are not supported with PATCH_SECTION_GROUP (although it seems to work, even though we had one occurence of a bad result linked to the existence of SDATA in the index definition). Also the whole problematic of mixed structured/unstructured searches is completly tackled if one use XML_SECTION_GROUP with MDATA/SDATA (but of course the app was written for Oracle 10...)
    Regards, Pierre

  • Suggestion: Oracle text CONTEXT index on one or more columns ?

    Hi,
    I'm implementing Oracle text using CONTEXT ..... and would like to ask you for performance suggestion ...
    I have a table of Articles .... with columns .. TITLE, SUBTITLE , BODY ...
    Now is it better from performance point of view to move all three columns into one dummy column ... with name like FULLTEXT ... and put index on this single column,
    and then use CONTAINS(FULLTEXT,'...')>0
    Or is it almost the same for oracle if i put indexes on all three columns and then call:
    CONTAINS(TITLE,'...')>0 OR CONTAINS(SUBTITLE,'...')>0 OR CONTAINS(BODY,'...')>0
    I actually don't care if the result is a match in TITLE OR SUBTITLE OR BODY ....
    So if i move into some FULLTEXT column, then i have duplicate data in a article row ... but if i create indexes for each column, than oracle has 2x more to index,optimize and search ... am I wright ?
    Table has 1.8mil records ...
    Thank you.
    Kris

    mackrispi wrote:
    Now is it better from performance point of view to move all three columns into one dummy column ... with name like FULLTEXT ... and put index on this single column,
    and then use CONTAINS(FULLTEXT,'...')>0What version of Oracle are you on? If 11 then you could use a virtual column to do this, otherwise you'd have to write code to maintain the column which can get messy.
    mackrispi wrote:
    Or is it almost the same for oracle if i put indexes on all three columns and then call:
    CONTAINS(TITLE,'...')>0 OR CONTAINS(SUBTITLE,'...')>0 OR CONTAINS(BODY,'...')>0Benchmark it and find out :)
    Another option would be something like this.
    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9455353124561
    Were i you, i would try out those 3 approaches and see which meet your performance requirements and weigh that with the ease of implementation and administration.

Maybe you are looking for