About ORacle  Text

Hi,
I would like to know some points about oracle Text.
I have used oracle Text to extract just the text contents from blobs(of pdf,xls,..) stored in the database.
1) Whether the same Oracle text can be used to extract the contents from the file?(so that i can store the text in database instead of blobs)
2) What is the main area we use ORacle Text?
Thnks

Oracle text is mainly used to search in large sized text columns like clob, varchar2(4000) etc.
The LIKE search hits performance if tried to search on such columns, this new feature oracle text gives good performance on search.
Oracle text creates index on such columns.
You may find more details at http://www.stanford.edu/dept/itss/docs/oracle/10g/text.101/b10730.pdf

Similar Messages

  • Oracle Text in installing Oracle 10g without licence!!

    Hi. Everyone.
    I've read some thread , but I am still confused about "oracle text".
    Now, I am testing oracle10g database.
    I downloaded 10g software from www.oracle.com, and installed it sucessfully
    on windows xp.
    When I was trying to import a dump file from oracle9i to
    the unlicenced oracle10g database, I got the error , IMP-00017, which
    is related to "Oracle Text".
    I checked "dba_users" dictionary, but ctxsys user is locked and expired.
    I read some thread on this site, and according to the advice, I tried to
    enable oracle text, using "DBCA".
    However, every database option on DBCA is disabled, I was not able to
    check oracle text.
    Lastly, how can I enable "Oracle Text" with unlicenced oracle 10g ?
    Is this possible without licence?
    I am very confused about this.
    I am looking forward to hear your experience and advices.
    Have a nice day.
    Best Regards.
    Ho.

    Well, instead of being confused, you could go to http://www.oracle.com/pls/db102/portal.portal_db?selected=1 and look at
    1) the licensing document, which would tell you whether you need a separate license, and
    2) under the 'Books' tab, look at the Text Application Developer's Guide or the Text Reference manuals for details.
    You could also look for the Oracle Text forum (from the http://forums.oracle.com page, under Database - More, or Text and ask the people who concentrate on that set of features.
    In general, Oracle Text is a set of extensions, the definitions for which are stored under user ctxsys. You would use these extensions by creating your own objects that are based on the extensions.
    For example, suppose your tables contain varchar2 columns. Create indexes that are based on ctxsys's 'context index type' and your application can then use the 'CONTAINS' keyword search capability (which is effectively a ctxsys-owned extension to the select)
    However, you would never log on to ctxsys and do anythibng with that as you risk changing the template code that Oracle has supplied.
    Message was edited by:
    Hans Forbrich
    PS: Yes, Oracle Text is included as part of the base database. Most of it is even included in the free Oracle XE database.

  • Oracle 9i(9.2.0.5.0) - Oracle Text - Indexing files on FTP Server

    I am using Oracle 9i(9.2.0.5.0) and I am unable to upgrade to a newer version of Oracle DB.
    I am new to this technology and I have not tried it yet myself.
    I was reading some articles, documents or references about Oracle text technology and I have find out that Oracle text should be able to create a context index over the files which resides on the FTP server.
    I have also found out, that for this purpose an "URL_DATASTORE" should be used.
    I would be pleased if someone can answer my question before I decide to start using this technology:
    - Is there any limitation which I should be aware of when creating context index over files which resides on FTP server? (file size limit, supported file types limitation)
    - During index creation process are the indexed files downloaded and copied to the Oracle database permanently or only temporary until index is created?
    - Is any incremental indexing possible(when I add new files to the datastore I do not have to rebuild the whole index)?
    - Is there any formula between context index disk size and indexed files disk size?
    Regards,
    Michal

    - Is there any limitation which I should be aware of when creating context index over files which resides on FTP server? (file size limit, supported file types limitation)
    Max file size is configurable up to 2GB. No limitation on the file type from the datastore itself, but if you want to process binary files the normal list of suported filter file formats will apply (see the appendix in the admin guide)
    - During index creation process are the indexed files downloaded and copied to the Oracle database permanently or only temporary until index is created?
    Only temporarily
    - Is any incremental indexing possible(when I add new files to the datastore I do not have to rebuild the whole index)?
    From the question, I suspect you're seeing this as a crawler - you expect to provide the address of an FTP site and have it fetch all the documents. That's not how it works. Rather, you must put all the URLs into a table, and Text will index those URLs (and only those URLs)
    If new files are added, you must arrange somehow to have the new rows added to your table. Then Text will do an incremental update, it won't have to rebuild the whole index.
    - Is there any formula between context index disk size and indexed files disk size?
    It varies quite a lot depending on types of data and indexing options chosen, but a typical result is that the index will be 40% of the total file size. However, if the documents are formatted (eg Word, PDF) the percentage will be much smaller.

  • Querying Oracle Text using phrase with equivalence operator and NEAR

    Hello,
    I have two queries I'm running that are returning puzzling results. One query is a subset of the other. The queries use a NEAR operator and an equivalence operator.
    Query 1:
    NEAR((sister,father,mother=yo mama=mi madre),20) This is returning 3 results
    I believe Query 1 should return all records containing the words sister AND father AND (mother OR yo mama OR mi madre) that are within 20 words of each other.
    Query 2 (a subset of Query 1):
    NEAR((sister,father,mother=yo mama),20) This is returning 5 results
    I believe Query 2 should return all records containing the words sister AND father AND (mother OR yo mama) that are within 20 words of each other.
    Why would Query 1 be returning fewer results than Query 2, when Query 2 is a subset of Query 1? Shouldn't Query 1 return at least the same amount or more results than Query 2?
    ~Mimi

    For future questions about Oracle Text, you can try the Oracle Text forum at: Text
    There you have more chances of recieveing an awnser.

  • Where I can find Oracle Text limitations?

    Hi,
    Are there any release notes about Oracle Text limitations?
    I have read in some place that this only can have around two thousand millons documents.
    Is that correct? Where I can find information about this kind of limitations?
    Thanks in advance
    Rosa

    Hi Vladimir
    Workflow 2.6.4 is included with Oracle 10g Release 2
    1) Could check it doing a custom installation of the database product?
    2) Maybe you will need the Oracle 10g R2 Companion CD or Oracle 10g R2 Client to install Workflow Builder 2.6.4 (client side)
    Some useful links:
    http://download-west.oracle.com/docs/cd/B19306_01/install.102/b15664/install_sw.htm
    http://download-west.oracle.com/docs/cd/B19306_01/install.102/b15664/getting_started.htm#BABGIFAF
    And if you have access to metalink, take a look at note 351873.1

  • Oracle text catsearch sub index query

    Hello,
    I wonder if you can help me with a query about Oracle Text Catsearch.
    I have a database which has 10Gb of data.
    There is a text column in the database on which I have to find a partial match on the data contained in it
    I have indexed this column with a CTXSYS.CTXCAT index.
    In addition I have added a sub index to the index set for a Date Field and ran EXEC DBMS_STATS.GATHER_TABLE_STATS to make sure the query execution path is optimised
    Here's my Question:
    How can I make sure that the Date sub query always runs before the finding the Partial Match on the text column?
    Caveat I am a programmer not a DBA, but I've ended up doing some databasey type stuff, apologies if question is thick.
    Cheers
    Mark
    p.s. Performance is good, but I have a feeling that the Date subquery is not being used as efficiently as it should be (the subquery should massively reduce the result set to be searched for the partial match)

    You can't - ctxcat doesn't support the "functional invocation" which would be needed if another index is used first. So reducing the set of docs to index doesn't help.
    If you can find a way to denormalize the information used in the sub-query such that it can be included in the main query index set, that should help performance considerably.

  • About Multi_language features of Oracle Text.

    I have a customer who has to store into one table docs in different languages and
    use contains index to perform some text search.
    He would like to use the multi_language feature of Oracle Text.
    The database we are using is Oracle 10gR2
    We create a table with doc and language column, and then we have to create the context index.
    In documentation I found some info's about how set different lexer (MULTI_LEXER) for language that have different lexer,
    different stoplist (MULTI_STOPLIST) for different languages stop words,
    but I don't understand if it
    is possible use the stemmer features for different languages, and if there are some other features that I can set for using multi_language properties.
    Thank you in advance
    Paola

    According to the online documentation: "The Oracle Text stemmer, licensed from Xerox Corporation's XSoft Division, supports the following languages with the BASIC_LEXER: English, French, Spanish, Italian, German, and Dutch. Japanese stemming is supported with the JAPANESE_LEXER."
    Please see the demonstration below. Also, if you are using 10g, you can specify the language in the query, instead of changing the language for the session. 10g also has a world_lexer.
    scott@ORA92> CREATE TABLE your_table
      2    (id         NUMBER,
      3       doc         CLOB,
      4       lang         VARCHAR2 (3),
      5       CONSTRAINT  your_table_id_pk PRIMARY KEY (id))
      6  /
    Table created.
    scott@ORA92> INSERT ALL
      2  INTO your_table VALUES (1, 'They say only the good die young.', 'eng')
      3  INTO your_table VALUES (2, 'The dogs like the cats.',          'eng')
      4  INTO your_table VALUES (3, 'cats and dogs',               'eng')
      5  INTO your_table VALUES (4, 'cat and dog',                    'eng')
      6  INTO your_table VALUES (5, 'chats et chiens',               'fre')
      7  INTO your_table VALUES (6, 'chat et chien',               'fre')
      8  INTO your_table VALUES (7, 'Die Hunde mögen die Katzen',          'ger')
      9  INTO your_table VALUES (8, 'Katzen und Hunde',               'ger')
    10  INTO your_table VALUES (9, 'Katze und Hund',               'ger')
    11  SELECT * FROM DUAL
    12  /
    9 rows created.
    scott@ORA92> BEGIN
      2    ctx_ddl.create_preference ('english_lexer','basic_lexer');
      3    ctx_ddl.set_attribute      ('english_lexer','index_themes','yes');
      4    ctx_ddl.set_attribute      ('english_lexer','theme_language','english');
      5 
      6    ctx_ddl.create_preference ('french_lexer','basic_lexer');
      7    ctx_ddl.set_attribute      ('french_lexer','index_themes','yes');
      8    ctx_ddl.set_attribute      ('french_lexer','theme_language','french');
      9 
    10    ctx_ddl.create_preference ('german_lexer','basic_lexer');
    11    ctx_ddl.set_attribute      ('german_lexer','composite','german');
    12    ctx_ddl.set_attribute      ('german_lexer','alternate_spelling','german');
    13 
    14    CTX_DDL.CREATE_PREFERENCE ('global_lexer', 'MULTI_LEXER');
    15    ctx_ddl.add_sub_lexer      ('global_lexer','english','english_lexer', 'eng');
    16    ctx_ddl.add_sub_lexer      ('global_lexer','french','french_lexer', 'fre');
    17    ctx_ddl.add_sub_lexer      ('global_lexer','german','german_lexer','ger');
    18    ctx_ddl.add_sub_lexer      ('global_lexer','default','english_lexer');
    19 
    20    CTX_DDL.CREATE_STOPLIST ('global_stoplist', 'MULTI_STOPLIST');
    21    CTX_DDL.ADD_STOPWORD    ('global_stoplist', 'and', 'english');
    22    CTX_DDL.ADD_STOPWORD    ('global_stoplist', 'und', 'german');
    23    CTX_DDL.ADD_STOPWORD    ('global_stoplist', 'et', 'french');
    24    CTX_DDL.ADD_STOPWORD    ('global_stoplist', 'the', 'ALL');
    25    CTX_DDL.ADD_STOPWORD    ('global_stoplist', 'die', 'german');
    26  END;
    27  /
    PL/SQL procedure successfully completed.
    scott@ORA92> CREATE INDEX your_table_doc_idx
      2  ON your_table (doc)
      3  INDEXTYPE IS CTXSYS.CONTEXT
      4  PARAMETERS
      5    ('LEXER           global_lexer
      6        LANGUAGE COLUMN lang
      7        STOPLIST      global_stoplist')
      8  /
    Index created.
    scott@ORA92> ALTER SESSION SET NLS_LANGUAGE = 'AMERICAN'
      2  /
    Session altered.
    scott@ORA92> SELECT * FROM your_table WHERE CONTAINS (doc, 'die') > 0
      2  /
            ID DOC                                                                              LAN
             1 They say only the good die young.                                                eng
    scott@ORA92> SELECT * FROM your_table WHERE CONTAINS (doc, 'cat AND dog') > 0
      2  /
            ID DOC                                                                              LAN
             4 cat and dog                                                                      eng
    scott@ORA92> SELECT * FROM your_table WHERE  CONTAINS (doc, '$cat AND $dog') > 0
      2  /
            ID DOC                                                                              LAN
             4 cat and dog                                                                      eng
             3 cats and dogs                                                                    eng
             2 The dogs like the cats.                                                          eng
    scott@ORA92> ALTER SESSION SET NLS_LANGUAGE = 'FRENCH'
      2  /
    Session altered.
    scott@ORA92> SELECT * FROM your_table WHERE CONTAINS (doc, 'chat AND chien') > 0
      2  /
            ID DOC                                                                              LAN
             6 chat et chien                                                                    fre
    scott@ORA92> SELECT * FROM your_table WHERE  CONTAINS (doc, '$chat AND $chien') > 0
      2  /
            ID DOC                                                                              LAN
             6 chat et chien                                                                    fre
             5 chats et chiens                                                                  fre
    scott@ORA92> ALTER SESSION SET NLS_LANGUAGE = 'GERMAN'
      2  /
    Session altered.
    scott@ORA92> SELECT * FROM your_table WHERE CONTAINS (doc, 'Die') > 0
      2  /
    no rows selected
    scott@ORA92> SELECT * FROM your_table WHERE CONTAINS (doc, 'Katze AND Hund') > 0
      2  /
            ID DOC                                                                              LAN
             9 Katze und Hund                                                                   gerMessage was edited by:
    Barbara Boehmer

  • About index memory parameter for Oracle text indexes

    Hi Experts,
    I am on Oracle 11.2.0.3 on Linux and have implemented Oracle Text. I am not an expert in this subject and need help about one issue. I created Oracle Text indexes with default setting. However in an oracle white paper I read that the default setting may not be right. Here is the excerpt from the white paper by Roger Ford:
    URL:http://www.oracle.com/technetwork/database/enterprise-edition/index-maintenance-089308.html
    "(Part of this white paper below....)
    Index Memory                                    As mentioned above, cached $I entries are flushed to disk each time the indexing memory is exhausted. The default index memory at installation is a mere 12MB, which is very low. Users can specify up to 50MB at index creation time, but this is still pretty low.                                   
    This would be done by a CREATE INDEX statement something like:
    CREATE INDEX myindex ON mytable(mycol) INDEXTYPE IS ctxsys.context PARAMETERS ('index memory 50M'); 
    Allow index memory settings above 50MB, the CTXSYS user must first increase the value of the MAX_INDEX_MEMORY parameter, like this:                                
    begin ctx_adm.set_parameter('max_index_memory', '500M'); end; 
    The setting for index memory should never be so high as to cause paging, as this will have a serious effect on indexing speed. On smaller dedicated systems, it is sometimes advantageous to temporarily decrease the amount of memory consumed by the Oracle SGA (for example by decreasing DB_CACHE_SIZE and/or SHARED_POOL_SIZE) during the index creation process. Once the index has been created, the SGA size can be increased again to improve query performance."
    (End here from the white paper excerpt)
    My question is:
    1) To apply this procedure (ctx_adm.set_parameter) required me to login as CTXSYS user. Is that right? or can it be avoided and be done from the application schema? This user CTXSYS is locked by default and I had to unlock it. Is that ok to do in production?
    2) What is the value that I should use for the max_index_memory should it be 500 mb - my SGA is 2 GB in Dev/ QA and 3GB in production. Also in the index creation what is the value I should set for index memory parameter  - I had left that at default but how should I change now? Should it be 50MB as shown in example above?
    3) The white paper also refer to rebuilding an index at some interval like once in a month:   ALTER INDEX DR$index_name$X REBUILD ONLINE;
    --Is this correct advice? i would like to ask the experts once before doing that.  We are on Oracle 11g and the white paper was written in 2003.
    Basically while I read the paper, I am still not very clear on several aspects and need help to understand this.
    Thanks,
    OrauserN

    Perhaps it's time I updated that paper
    1.  To change max_index_memory you must be a DBA user OR ctxsys. As you say, the ctxsys account is locked by default. It's usually easiest to log in as a DBA and run something like
    exec ctxsys.ctx_adm.set_parameter('MAX_INDEX_MEMORY', '10G')
    2.  Index memory is allocated from PGA memory, not SGA memory. So the size of SGA is not relevant. If you use too high a setting your index build may fail with an error saying you have exceeded PGA_AGGREGATE_LIMIT.  Of course, you can increase that parameter if necessary. Also be aware that when indexing in parallel, each parallel process will allocated up to the index memory setting.
    What should it be set to?  It's really a "safety" setting to prevent users grabbing too much machine memory when creating indexes. If you don't have ad-hoc users, then just set it as high as you need. In 10.1 it was limited to just under 500M, in 10.2 you can set it to any value.
    The actual amount of memory used is not governed by this parameter, but by the MEMORY setting in the parameters clause of the CREATE INDEX statement. eg:
    create index fooindex on foo(bar) indextype is ctxsys.context parameters ('memory 1G')
    What's a good number to use for memory?  Somewhere in the region of 100M to 200M is usually good.
    3.  No - that's out of date.  To optimize your index use CTX_DDL.OPTIMIZE_INDEX.  You can do that in FULL mode daily or weekly, and REBUILD mode perhaps once a month.

  • About detection of entities using Oracle Text

    Hi,
    Anyone can explain me how using Oracle Text Technologies can help me in the detection of
    named entities such as names of persons, names of countries, email address and others in the text.
    that´s possible ?
    thanks
    Edited by: user13420813 on 05-dic-2010 20:09
    Edited by: user13420813 on 05-dic-2010 20:12

    Hi,
    it is possible, but only starting with version 11.2.0.2, so the latest now availbale version. You can start reading the manual: [url http://download.oracle.com/docs/cd/E11882_01/text.112/e16594/entity.htm#sthref873] Extracting entities in Oracle Text . Before you have to do it yourself with the help of tokens from an Oracle text index.
    Herald ten Dam'
    http://htendam.wordpress.com

  • Oracle text (basic question about availability)

    Dear sirs,
    I want to confirm what I think to be true:
    Oracle text is standard/builit in component of Oracle 11g (enterprise edition). Anybody who is licensed
    to use Oracle 11g (enterprise edition) should be licensed to use Oracle text and have access to this product.

    Hi,
    yes it is true. But all the editions of the Oracle Database have Oracle Text and you may use it if you licensed the database.
    Overview editions: http://www.oracle.com/us/products/database/enterprise-edition/comparisons/index.html
    Herald ten Dam
    http://htendam.wordpress.com

  • Error while running the Oracle Text optimize index procedure (even as a dba user too)

    Hi Experts,
    I am on Oracle on 11.2.0.2  on Linux. I have implemented Oracle Text. My Oracle Text indexes are fragmented but I am getting an error while running the optimize_index error. Following is the error:
    begin
      ctx_ddl.optimize_index(idx_name=>'ACCESS_T1',optlevel=>'FULL');
    end;
    ERROR at line 1:
    ORA-20000: Oracle Text error:
    ORA-06512: at "CTXSYS.DRUE", line 160
    ORA-06512: at "CTXSYS.CTX_DDL", line 941
    ORA-06512: at line 1
    Now I tried then to run this as DBA user too and it failed the same way!
    begin
      ctx_ddl.optimize_index(idx_name=>'BVSCH1.ACCESS_T1',optlevel=>'FULL');
    end;
    ERROR at line 1:
    ORA-20000: Oracle Text error:
    ORA-06512: at "CTXSYS.DRUE", line 160
    ORA-06512: at "CTXSYS.CTX_DDL", line 941
    ORA-06512: at line 1
    Now CTXAPP role is granted to my schema and still I am getting this error. I will be thankful for the suggestions.
    Also one other important observation: We have this issue ONLY in one database and in the other two databases, I don't see any problem at all.
    I am unable to figure out what the issue is with this one database!
    Thanks,
    OrauserN

    How about check the following?
    Bug 10626728 - CTX_DDL.optimize_index "full" fails with an empty ORA-20000 since 11.2.0.2 upgrade (DOCID 10626728.8)

  • Pre-loading Oracle text in memory with Oracle 12c

    There is a white paper from Roger Ford that explains how to load the Oracle index in memory : http://www.oracle.com/technetwork/database/enterprise-edition/mem-load-082296.html
    In our application, Oracle 12c, we are indexing a big XML field (which is stored as XMLType with storage secure file) with the PATH_SECTION_GROUP. If I don't load the I table (DR$..$I) into memory using the technique explained in the white paper then I cannot have decent performance (and especially not predictable performance, it looks like if the blocks from the TOKEN_INFO columns are not memory then performance can fall sharply)
    But after migrating to oracle 12c, I got a different problem, which I can reproduce: when I create the index it is relatively small (as seen with ctx_report.index_size) and by applying the technique from the whitepaper, I can pin the DR$ I table into memory. But as soon as I do a ctx_ddl.optimize_index('Index','REBUILD') the size becomes much bigger and I can't pin the index in memory. Not sure if it is bug or not.
    What I found as work-around is to build the index with the following storage options:
    ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
    ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'YES' );
    ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
    so that the token_info column will be stored in a secure file. Then I can change the storage of that column to put it in the keep buffer cache, and write a procedure to read the LOB so that it will be loaded in the keep cache. The size of the LOB column is more or less the same as when creating the index without the BIG_IO option but it remains constant even after a ctx_dll.optimize_index. The procedure to read the LOB and to load it into the cache is very similar to the loaddollarR procedure from the white paper.
    Because of the SDATA section, there is a new DR table (S table) and an IOT on top of it. This is not documented in the white paper (the white paper was written for Oracle 10g). In my case this DR$ S table is much used, and the IOT also, but putting it in the keep cache is not as important as the token_info column of the DR I table. A final note: doing SEPARATE_OFFSETS = 'YES' was very bad in my case, the combined size of the two columns is much bigger than having only the TOKEN_INFO column and both columns are read.
    Here is an example on how to reproduce the problem with the size increasing when doing ctx_optimize
    1. create the table
    drop table test;
    CREATE TABLE test
    (ID NUMBER(9,0) NOT NULL ENABLE,
    XML_DATA XMLTYPE
    XMLTYPE COLUMN XML_DATA STORE AS SECUREFILE BINARY XML (tablespace users disable storage in row);
    2. insert a few records
    insert into test values(1,'<Book><TITLE>Tale of Two Cities</TITLE>It was the best of times.<Author NAME="Charles Dickens"> Born in England in the town, Stratford_Upon_Avon </Author></Book>');
    insert into test values(2,'<BOOK><TITLE>The House of Mirth</TITLE>Written in 1905<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
    insert into test values(3,'<BOOK><TITLE>Age of innocence</TITLE>She got a prize for it.<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
    3. create the text index
    drop index i_test;
      exec ctx_ddl.create_section_group('TEST_SGP','PATH_SECTION_GROUP');
    begin
      CTX_DDL.ADD_SDATA_SECTION(group_name => 'TEST_SGP', 
                                section_name => 'SData_02',
                                tag => 'SData_02',
                                datatype => 'varchar2');
    end;
    exec ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
    exec  ctx_ddl.set_attribute('TEST_STO','I_TABLE_CLAUSE','tablespace USERS storage (initial 64K)');
    exec  ctx_ddl.set_attribute('TEST_STO','I_INDEX_CLAUSE','tablespace USERS storage (initial 64K) compress 2');
    exec  ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'NO' );
    exec  ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
    create index I_TEST
      on TEST (XML_DATA)
      indextype is ctxsys.context
      parameters('
        section group   "TEST_SGP"
        storage         "TEST_STO"
      ') parallel 2;
    4. check the index size
    select ctx_report.index_size('I_TEST') from dual;
    it says :
    TOTALS FOR INDEX TEST.I_TEST
    TOTAL BLOCKS ALLOCATED:                                                104
    TOTAL BLOCKS USED:                                                      72
    TOTAL BYTES ALLOCATED:                                 851,968 (832.00 KB)
    TOTAL BYTES USED:                                      589,824 (576.00 KB)
    4. optimize the index
    exec ctx_ddl.optimize_index('I_TEST','REBUILD');
    and now recompute the size, it says
    TOTALS FOR INDEX TEST.I_TEST
    TOTAL BLOCKS ALLOCATED:                                               1112
    TOTAL BLOCKS USED:                                                    1080
    TOTAL BYTES ALLOCATED:                                 9,109,504 (8.69 MB)
    TOTAL BYTES USED:                                      8,847,360 (8.44 MB)
    which shows that it went from 576KB to 8.44MB. With a big index the difference is not so big, but still from 14G to 19G.
    5. Workaround: use the BIG_IO option, so that the token_info column of the DR$ I table will be stored in a secure file and the size will stay relatively small. Then you can load this column in the cache using a procedure similar to
    alter table DR$I_TEST$I storage (buffer_pool keep);
    alter table dr$i_test$i modify lob(token_info) (cache storage (buffer_pool keep));
    rem: now we must read the lob so that it will be loaded in the keep buffer pool, use the prccedure below
    create or replace procedure loadTokenInfo is
      type c_type is ref cursor;
      c2 c_type;
      s varchar2(2000);
      b blob;
      buff varchar2(100);
      siz number;
      off number;
      cntr number;
    begin
        s := 'select token_info from  DR$i_test$I';
        open c2 for s;
        loop
           fetch c2 into b;
           exit when c2%notfound;
           siz := 10;
           off := 1;
           cntr := 0;
           if dbms_lob.getlength(b) > 0 then
             begin
               loop
                 dbms_lob.read(b, siz, off, buff);
                 cntr := cntr + 1;
                 off := off + 4096;
               end loop;
             exception when no_data_found then
               if cntr > 0 then
                 dbms_output.put_line('4K chunks fetched: '||cntr);
               end if;
             end;
           end if;
        end loop;
    end;
    Rgds, Pierre

    I have been working a lot on that issue recently, I can give some more info.
    First I totally agree with you, I don't like to use the keep_pool and I would love to avoid it. On the other hand, we have a specific use case : 90% of the activity in the DB is done by queuing and dbms_scheduler jobs where response time does not matter. All those processes are probably filling the buffer cache. We have a customer facing application that uses the text index to search the database : performance is critical for them.
    What kind of performance do you have with your application ?
    In my case, I have learned the hard way that having the index in memory (the DR$I table in fact) is the key : if it is not, then performance is poor. I find it reasonable to pin the DR$I table in memory and if you look at competitors this is what they do. With MongoDB they explicitly says that the index must be in memory. With elasticsearch, they use JVM's that are also in memory. And effectively, if you look at the awr report, you will see that Oracle is continuously accessing the DR$I table, there is a SQL similar to
    SELECT /*+ DYNAMIC_SAMPLING(0) INDEX(i) */    
    TOKEN_FIRST, TOKEN_LAST, TOKEN_COUNT, ROWID    
    FROM DR$idxname$I
    WHERE TOKEN_TEXT = :word AND TOKEN_TYPE = :wtype    
    ORDER BY TOKEN_TEXT,  TOKEN_TYPE,  TOKEN_FIRST
    which is continuously done.
    I think that the algorithm used by Oracle to keep blocks in cache is too complex. A just realized that in 12.1.0.2 (was released last week) there is finally a "killer" functionality, the in-memory parameters, with which you can pin tables or columns in memory with compression, etc. this looks ideal for the text index, I hope that R. Ford will finally update his white paper :-)
    But my other problem was that the optimize_index in REBUILD mode caused the DR$I table to double in size : it seems crazy that this was closed as not a bug but it was and I can't do anything about it. It is a bug in my opinion, because the create index command and "alter index rebuild" command both result in a much smaller index, so why would the guys that developped the optimize function (is it another team, using another algorithm ?) make the index two times bigger ?
    And for that the track I have been following is to put the index in a 16K tablespace : in this case the space used by the index remains more or less flat (increases but much more reasonably). The difficulty here is to pin the index in memory because the trick of R. Ford was not working anymore.
    What worked:
    first set the keep_pool to zero and set the db_16k_cache_size to instead. Then change the storage preference to make sure that everything you want to cache (mostly the DR$I) table come in the tablespace with the non-standard block size of 16k.
    Then comes the tricky part : the pre-loading of the data in the buffer cache. The problem is that with Oracle 12c, Oracle will use direct_path_read for FTS which basically means that it bypasses the cache and read directory from file to the PGA !!! There is an event to avoid that, I was lucky to find it on a blog (I can't remember which, sorry for the credit).
    I ended-up doing that. the events to 10949 is to avoid the direct path reads issue.
    alter session set events '10949 trace name context forever, level 1';
    alter table DR#idxname0001$I cache;
    alter table DR#idxname0002$I cache;
    alter table DR#idxname0003$I cache;
    SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0001$I;
    SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0002$I;
    SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0003$I;
    SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0001$I ITAB;
    SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0002$I ITAB;
    SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0003$I ITAB;
    It worked. With a big relief I expected to take some time out, but there was a last surprise. The command
    exec ctx_ddl.optimize_index(idx_name=>'idxname',part_name=>'partname',optlevel=>'REBUILD');
    gqve the following
    ERROR at line 1:
    ORA-20000: Oracle Text error:
    DRG-50857: oracle error in drftoptrebxch
    ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION
    ORA-06512: at "CTXSYS.DRUE", line 160
    ORA-06512: at "CTXSYS.CTX_DDL", line 1141
    ORA-06512: at line 1
    Which is very much exactly described in a metalink note 1645634.1 but in the case of a non-partitioned index. The work-around given seemed very logical but it did not work in the case of a partitioned index. After experimenting, I found out that the bug occurs when the partitioned index is created with  dbms_pclxutil.build_part_index procedure (this enables  enables intra-partition parallelism in the index creation process). This is a very annoying and stupid bug, maybe there is a work-around, but did not find it on metalink
    Other points of attention with the text index creation (stuff that surprised me at first !) ;
    - if you use the dbms_pclxutil package, then the ctx_output logging does not work, because the index is created immediately and then populated in the background via dbms_jobs.
    - this in combination with the fact that if you are on a RAC, you won't see any activity on the box can be very frightening : this is because oracle can choose to start the workers on the other node.
    I understand much better how the text indexing works, I think it is a great technology which can scale via partitioning. But like always the design of the application is crucial, most of our problems come from the fact that we did not choose the right sectioning (we choosed PATH_SECTION_GROUP while XML_SECTION_GROUP is so much better IMO). Maybe later I can convince the dev to change the sectionining, especially because SDATA and MDATA section are not supported with PATCH_SECTION_GROUP (although it seems to work, even though we had one occurence of a bad result linked to the existence of SDATA in the index definition). Also the whole problematic of mixed structured/unstructured searches is completly tackled if one use XML_SECTION_GROUP with MDATA/SDATA (but of course the app was written for Oracle 10...)
    Regards, Pierre

  • Using Oracle Text in Apex

    Hi,
    from what I've read about it, the following has to be done.
    e.g. CREATE index ticket_keywords_index ON ticket(keywords) indextype IS ctxsys.context;
    CREATE index ticket_solution_index ON ticket(solution) indextype IS ctxsys.context;
    SELECT * from ticket where ctxsys.contain(:P12Value_to_find);
    But I wonder, how does it know on which index it has to look ?
    Is there anyway to specify on what it should look ?
    If yes, any idea how one goes on about that ?
    If no, any idea how to avoid getting information from the two columns back, if one only needs one ?
    Could it in a way be done, by adding a column in apex, that allows to put in a checkbox, at the top, to say include this column in the search, or not, or is this not the good way to do so ?
    Or am i missing a point ?
    Thanks for the help,
    Floris

    Floris,
    Your query should be of the form:
    SELECT   *
    FROM   ticket
    WHERE   contains(indexed_col,:P12_VALUE_TO_FIND) > 0Where indexed_col is the name of the column on which you have built your Oracle Text index and :P12_VALUE_TO_FIND is the page item that contains the Search String.
    Andy
    http://atulley.wordpress.com/

  • Using Oracle Text in Oracle XML DB .

    Hi all ,
    The idea is simple ,i need to index all stored files in Oracle XML DB and the index should stay in Oracle DB . Using some 3 party index software is also possible but you need to write a mapping to move the index file in Oracle DB .
    So i thought of using Oracle Text but i am not sure about how to retrieve such a document from Oracle XML DB , let me say over ftp or http ? . And if these documents are password protected -> how can Oracle Text allow this ?

    [11gR2 XMLDB Developers Guide -- Full-Text Search over XML Data|http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10492/xdb09sea.htm#i1006756] would be the first place to start.
    For document display, there a bunch of potential solutions, you can look at [XML DB Repository|http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10492/xdb03usg.htm#insertedID18], or the Text Application Developers Guide [Presenting Documents in Oracle Text|http://download.oracle.com/docs/cd/B28359_01/text.111/b28303/view.htm#i1006687] .
    Password protected documents can't be indexed using the auto_filter.

  • Index rules in oracle text and query using matches

    Dear All,
    I would like to ask about rules and matches function in oracle text.
    I followed an example in oracle text application developer's guide.
    I have a rule table like this :
    1 oracle
    2 larry or ellison
    3 oracle and text
    4 market share
    then, I create an index to that table. This is needed for calling matches function. Here is the syntax :
    create index queryx on queries(query_string)
    indextype is ctxsys.ctxrule;
    then, I noticed that the result on DR$QUERYX$I table as follows :
    LARRY 0 2 2 1 (BLOB)
    MARKET 0 4 4 1 (BLOB) {MARKET} {SHARE}
    ORACLE 0 1 1 1 (BLOB)
    ORACLE 0 3 3 1 (BLOB) {TEXT}
    ELLISON 0 2 2 1 (BLOB)
    What I want to ask is why doesn't the words 'share' and 'text' appear in the DR$QUERYX$ table?
    When we use matches function, it then search on the index result and consequently it wion't find the 'share' word. so when for example I do query like this :
    select query_id from queries where matches(query_string,' It only share ten percent of all products sold')>0
    it will give 0 result since the no word in ' It only share ten percent of all products sold' was in index table. But actually it could possibly be categorized as the 4 category which rules is 'market share'
    I tried this in a larger set of data and get same result.
    Here is my generated rules from my document collection :
    1 {REQUIREMENTS} & {ELICITATION}
    1 {REQUIREMENTS} ~ {ELICITATION} & {ACTOR}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} & {FURPS}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} & {PROC}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} & {SPEED}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} & {DOCUME}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} ~ {DOCUME} & {PLACED}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} ~ {DOCUME} ~ {PLACED} & {UNNECESSARY}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} ~ {DOCUME} ~ {PLACED} ~ {UNNECESSARY} & {MISUSE}
    1 {INTERPRETATION} ~ {REQUIREMENTS}
    2 {DESIGN} & {REPRESENTATION}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} & {OCTOBER}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} & {PROCEDURAL}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} & {STRICT}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} ~ {STRICT} & {GRASP}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} ~ {STRICT} ~ {GRASP} & {MANY} & {LAYER}
    2 {DESIGN} ~ {REPRESENTATION} ~ {MAY}
    3 {PM} & {TESTING} & {ATTRIBUTI}
    And this is the index table result with ctxrule :
    (only the token_text column shown)
    PM
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    INTERPRETATION
    so when I try to classify a document with the word ouline inside it, it should produce category 1 (based on the rules) but since there are no word 'outline' in index tabel, the matches will return 0 means that the document is not classifiedto any category. I don't understand why it happen. Anybody knows about this? I would really appreciate any help.
    Thank you very much.

    Hm, I see. It do make sense. so nice to know.
    But then in the second example I gift where I used larger table, as shown below :
    Here is my generated rules from my document collection :
    1 {REQUIREMENTS} & {ELICITATION}
    1 {REQUIREMENTS} ~ {ELICITATION} & {ACTOR}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} & {FURPS}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} & {PROC}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} & {SPEED}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} & {DOCUME}
    1 {INTERPRETATION} ~ {REQUIREMENTS}
    2 {DESIGN} & {REPRESENTATION}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} & {OCTOBER}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} & {PROCEDURAL}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} & {STRICT}
    2 {DESIGN} ~ {REPRESENTATION} ~ {MAY}
    3 {PM} & {TESTING} & {ATTRIBUTI}
    As far as I know, the sign ' ~ ' means 'OR' and '&' means 'and' . So based on the 4th line in my table :
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE}
    it can be concluded that if any of the words stated there been queried, so the category '1' will appear as a result. But then before we can use 'matches' to query it, we need ti create index for the rules table . I did it and the result were :
    (only the token_text column shown)
    PM
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    INTERPRETATION
    there were no words other than PM, DESIGN< REQUIREMENTS and INTERPRETATION. Why the words REQUIREMENTS, ELICITATION, ACTOR, FURPS, OUTLINE don't appear in the index result?

Maybe you are looking for

  • Is there a way to allow the user to highlight portions of text like in acrobat?

    I am new to captivate, I was wondering if it is possible to allow the end user to select portions of text on a slide for highlighting purposes like you can do in acrobat or word?

  • How to Install Oracle Database for practice on Laptop?

    Hi All, I am planning to install the Oracle Database and some oracle developer products for practice on my Laptop. # Laptop configuration > Lenovo IdeaPad Y450 - 41896AU > Intel® Core™ 2 Duo T6600 ( 2.20GHz ) > 4GB RAM, 320GB Hard disk > Windows 7 Ho

  • What do i need to build this clinical image app.?

    I need to develop a web based tool which can be used in the field of clinical reaserch imaging. I have a brain mapped coordinates of an animal and i want to register or overlay an MRI image probably a greyscale images and find out the impact and coor

  • How to configure qpopper to authenticate against LDAP server

    Hi, This is re-post of my question: I have directory server 6.0 set up on Solaris 9 system. Also, I have set up Solaris 9 system native LDAP client. The qpopper daemon is running on that client. I have re-compiled the qpopper to use PAM authenticatio

  • How can I change my printer to scan to computer?

    I have an HP Office Jet Pro 8600 Plus. When I try to scan it the screen has tw computers listed. One says Owner PC (USB) then Computer not listed with an! over the picture. I am getting very frustrated with this pleas help! This question was solved.