Oracle Text Example

Can someone post a quick example of an Oracle Text query?

Ben,
Thanks for the quick answer! I was teaching an APEX class and encouraging them to use the forum. I said "I bet someone answers this in an hour or less". You did it in 13 minutes! I tried to ask a question that didn't require any research, so I hope you didn't invest much time in it.
Thanks again,
Tyler
Tyler Muth
http://tylermuth.wordpress.com
"Applied Oracle Security: Developing Secure Database and Middleware Environments": http://sn.im/aos.book

Similar Messages

  • Oracle Text -- OBE example

    I have been following the Oracle Text -- OBE example and am running into a problem when Creating a Database Access Descriptor (DAD) in the HTTP Server.
    Environment: W2k on a standalone machine with no network connection, running 9i v2.
    When trying to access <hostname>:80 I get a page not found error.
    The HTTP server is running and I can access OEM throught the web browser (port 3339).
    Any suggestions would be apprecieated.
    thanks

    Try using port number 7778 (http://yourmachinename:7778). You should see the Oracle HTTP server index page.

  • Pre-loading Oracle text in memory with Oracle 12c

    There is a white paper from Roger Ford that explains how to load the Oracle index in memory : http://www.oracle.com/technetwork/database/enterprise-edition/mem-load-082296.html
    In our application, Oracle 12c, we are indexing a big XML field (which is stored as XMLType with storage secure file) with the PATH_SECTION_GROUP. If I don't load the I table (DR$..$I) into memory using the technique explained in the white paper then I cannot have decent performance (and especially not predictable performance, it looks like if the blocks from the TOKEN_INFO columns are not memory then performance can fall sharply)
    But after migrating to oracle 12c, I got a different problem, which I can reproduce: when I create the index it is relatively small (as seen with ctx_report.index_size) and by applying the technique from the whitepaper, I can pin the DR$ I table into memory. But as soon as I do a ctx_ddl.optimize_index('Index','REBUILD') the size becomes much bigger and I can't pin the index in memory. Not sure if it is bug or not.
    What I found as work-around is to build the index with the following storage options:
    ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
    ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'YES' );
    ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
    so that the token_info column will be stored in a secure file. Then I can change the storage of that column to put it in the keep buffer cache, and write a procedure to read the LOB so that it will be loaded in the keep cache. The size of the LOB column is more or less the same as when creating the index without the BIG_IO option but it remains constant even after a ctx_dll.optimize_index. The procedure to read the LOB and to load it into the cache is very similar to the loaddollarR procedure from the white paper.
    Because of the SDATA section, there is a new DR table (S table) and an IOT on top of it. This is not documented in the white paper (the white paper was written for Oracle 10g). In my case this DR$ S table is much used, and the IOT also, but putting it in the keep cache is not as important as the token_info column of the DR I table. A final note: doing SEPARATE_OFFSETS = 'YES' was very bad in my case, the combined size of the two columns is much bigger than having only the TOKEN_INFO column and both columns are read.
    Here is an example on how to reproduce the problem with the size increasing when doing ctx_optimize
    1. create the table
    drop table test;
    CREATE TABLE test
    (ID NUMBER(9,0) NOT NULL ENABLE,
    XML_DATA XMLTYPE
    XMLTYPE COLUMN XML_DATA STORE AS SECUREFILE BINARY XML (tablespace users disable storage in row);
    2. insert a few records
    insert into test values(1,'<Book><TITLE>Tale of Two Cities</TITLE>It was the best of times.<Author NAME="Charles Dickens"> Born in England in the town, Stratford_Upon_Avon </Author></Book>');
    insert into test values(2,'<BOOK><TITLE>The House of Mirth</TITLE>Written in 1905<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
    insert into test values(3,'<BOOK><TITLE>Age of innocence</TITLE>She got a prize for it.<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
    3. create the text index
    drop index i_test;
      exec ctx_ddl.create_section_group('TEST_SGP','PATH_SECTION_GROUP');
    begin
      CTX_DDL.ADD_SDATA_SECTION(group_name => 'TEST_SGP', 
                                section_name => 'SData_02',
                                tag => 'SData_02',
                                datatype => 'varchar2');
    end;
    exec ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
    exec  ctx_ddl.set_attribute('TEST_STO','I_TABLE_CLAUSE','tablespace USERS storage (initial 64K)');
    exec  ctx_ddl.set_attribute('TEST_STO','I_INDEX_CLAUSE','tablespace USERS storage (initial 64K) compress 2');
    exec  ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'NO' );
    exec  ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
    create index I_TEST
      on TEST (XML_DATA)
      indextype is ctxsys.context
      parameters('
        section group   "TEST_SGP"
        storage         "TEST_STO"
      ') parallel 2;
    4. check the index size
    select ctx_report.index_size('I_TEST') from dual;
    it says :
    TOTALS FOR INDEX TEST.I_TEST
    TOTAL BLOCKS ALLOCATED:                                                104
    TOTAL BLOCKS USED:                                                      72
    TOTAL BYTES ALLOCATED:                                 851,968 (832.00 KB)
    TOTAL BYTES USED:                                      589,824 (576.00 KB)
    4. optimize the index
    exec ctx_ddl.optimize_index('I_TEST','REBUILD');
    and now recompute the size, it says
    TOTALS FOR INDEX TEST.I_TEST
    TOTAL BLOCKS ALLOCATED:                                               1112
    TOTAL BLOCKS USED:                                                    1080
    TOTAL BYTES ALLOCATED:                                 9,109,504 (8.69 MB)
    TOTAL BYTES USED:                                      8,847,360 (8.44 MB)
    which shows that it went from 576KB to 8.44MB. With a big index the difference is not so big, but still from 14G to 19G.
    5. Workaround: use the BIG_IO option, so that the token_info column of the DR$ I table will be stored in a secure file and the size will stay relatively small. Then you can load this column in the cache using a procedure similar to
    alter table DR$I_TEST$I storage (buffer_pool keep);
    alter table dr$i_test$i modify lob(token_info) (cache storage (buffer_pool keep));
    rem: now we must read the lob so that it will be loaded in the keep buffer pool, use the prccedure below
    create or replace procedure loadTokenInfo is
      type c_type is ref cursor;
      c2 c_type;
      s varchar2(2000);
      b blob;
      buff varchar2(100);
      siz number;
      off number;
      cntr number;
    begin
        s := 'select token_info from  DR$i_test$I';
        open c2 for s;
        loop
           fetch c2 into b;
           exit when c2%notfound;
           siz := 10;
           off := 1;
           cntr := 0;
           if dbms_lob.getlength(b) > 0 then
             begin
               loop
                 dbms_lob.read(b, siz, off, buff);
                 cntr := cntr + 1;
                 off := off + 4096;
               end loop;
             exception when no_data_found then
               if cntr > 0 then
                 dbms_output.put_line('4K chunks fetched: '||cntr);
               end if;
             end;
           end if;
        end loop;
    end;
    Rgds, Pierre

    I have been working a lot on that issue recently, I can give some more info.
    First I totally agree with you, I don't like to use the keep_pool and I would love to avoid it. On the other hand, we have a specific use case : 90% of the activity in the DB is done by queuing and dbms_scheduler jobs where response time does not matter. All those processes are probably filling the buffer cache. We have a customer facing application that uses the text index to search the database : performance is critical for them.
    What kind of performance do you have with your application ?
    In my case, I have learned the hard way that having the index in memory (the DR$I table in fact) is the key : if it is not, then performance is poor. I find it reasonable to pin the DR$I table in memory and if you look at competitors this is what they do. With MongoDB they explicitly says that the index must be in memory. With elasticsearch, they use JVM's that are also in memory. And effectively, if you look at the awr report, you will see that Oracle is continuously accessing the DR$I table, there is a SQL similar to
    SELECT /*+ DYNAMIC_SAMPLING(0) INDEX(i) */    
    TOKEN_FIRST, TOKEN_LAST, TOKEN_COUNT, ROWID    
    FROM DR$idxname$I
    WHERE TOKEN_TEXT = :word AND TOKEN_TYPE = :wtype    
    ORDER BY TOKEN_TEXT,  TOKEN_TYPE,  TOKEN_FIRST
    which is continuously done.
    I think that the algorithm used by Oracle to keep blocks in cache is too complex. A just realized that in 12.1.0.2 (was released last week) there is finally a "killer" functionality, the in-memory parameters, with which you can pin tables or columns in memory with compression, etc. this looks ideal for the text index, I hope that R. Ford will finally update his white paper :-)
    But my other problem was that the optimize_index in REBUILD mode caused the DR$I table to double in size : it seems crazy that this was closed as not a bug but it was and I can't do anything about it. It is a bug in my opinion, because the create index command and "alter index rebuild" command both result in a much smaller index, so why would the guys that developped the optimize function (is it another team, using another algorithm ?) make the index two times bigger ?
    And for that the track I have been following is to put the index in a 16K tablespace : in this case the space used by the index remains more or less flat (increases but much more reasonably). The difficulty here is to pin the index in memory because the trick of R. Ford was not working anymore.
    What worked:
    first set the keep_pool to zero and set the db_16k_cache_size to instead. Then change the storage preference to make sure that everything you want to cache (mostly the DR$I) table come in the tablespace with the non-standard block size of 16k.
    Then comes the tricky part : the pre-loading of the data in the buffer cache. The problem is that with Oracle 12c, Oracle will use direct_path_read for FTS which basically means that it bypasses the cache and read directory from file to the PGA !!! There is an event to avoid that, I was lucky to find it on a blog (I can't remember which, sorry for the credit).
    I ended-up doing that. the events to 10949 is to avoid the direct path reads issue.
    alter session set events '10949 trace name context forever, level 1';
    alter table DR#idxname0001$I cache;
    alter table DR#idxname0002$I cache;
    alter table DR#idxname0003$I cache;
    SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0001$I;
    SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0002$I;
    SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0003$I;
    SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0001$I ITAB;
    SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0002$I ITAB;
    SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0003$I ITAB;
    It worked. With a big relief I expected to take some time out, but there was a last surprise. The command
    exec ctx_ddl.optimize_index(idx_name=>'idxname',part_name=>'partname',optlevel=>'REBUILD');
    gqve the following
    ERROR at line 1:
    ORA-20000: Oracle Text error:
    DRG-50857: oracle error in drftoptrebxch
    ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION
    ORA-06512: at "CTXSYS.DRUE", line 160
    ORA-06512: at "CTXSYS.CTX_DDL", line 1141
    ORA-06512: at line 1
    Which is very much exactly described in a metalink note 1645634.1 but in the case of a non-partitioned index. The work-around given seemed very logical but it did not work in the case of a partitioned index. After experimenting, I found out that the bug occurs when the partitioned index is created with  dbms_pclxutil.build_part_index procedure (this enables  enables intra-partition parallelism in the index creation process). This is a very annoying and stupid bug, maybe there is a work-around, but did not find it on metalink
    Other points of attention with the text index creation (stuff that surprised me at first !) ;
    - if you use the dbms_pclxutil package, then the ctx_output logging does not work, because the index is created immediately and then populated in the background via dbms_jobs.
    - this in combination with the fact that if you are on a RAC, you won't see any activity on the box can be very frightening : this is because oracle can choose to start the workers on the other node.
    I understand much better how the text indexing works, I think it is a great technology which can scale via partitioning. But like always the design of the application is crucial, most of our problems come from the fact that we did not choose the right sectioning (we choosed PATH_SECTION_GROUP while XML_SECTION_GROUP is so much better IMO). Maybe later I can convince the dev to change the sectionining, especially because SDATA and MDATA section are not supported with PATCH_SECTION_GROUP (although it seems to work, even though we had one occurence of a bad result linked to the existence of SDATA in the index definition). Also the whole problematic of mixed structured/unstructured searches is completly tackled if one use XML_SECTION_GROUP with MDATA/SDATA (but of course the app was written for Oracle 10...)
    Regards, Pierre

  • Issues using Oracle Text conditions

    Hi all,
    I'm facing an issue executing a query on a VIEW using Oracle Text Indexes.
    The DB version I'm using is "Enterprise 9.2.0.5".
    TEST_VIEW is an sql-view that has a query on several tables where one of them has two Oracle Text indexes, one on field FIELD2 and another on FIELD3
    executing this query I get 10 rows:
    select *
    from TEST_VIEW
    where FIELD1 = 1001 -- regular condition
    and (contains(FIELD2, 'Blitz') > 0 ) -- Oracle text condition
    But if I add another condition on an existent Oracle Text Index, I'll get only 1 row:
    select *
    from TEST_VIEW
    where FIELD1 = 1001
    where (contains(FIELD2, 'Blitz') > 0 OR contains(FIELD3, 'Blitz') > 0)
    As you can see the third condition was added using a logical OR, so I should get at least 10 rows ...
    Can anyone help me ?
    ThaNks in advance.
    Eduardo.

    Eduardo,
    Without a full test case, it is hard to see if there is something wrong or not. I did the following, and all worked fine on my 10g instance. I had to assume some things, but I at least think I have the basic gist of your inquiry in this example. Run it/change it to match your situation, and post back when you can.
    Thanks,
    Ron
    CREATE TABLE Z_TEST1 (
    FIELD1 VARCHAR2(30));
    INSERT INTO Z_TEST1
    VALUES ('QUICK');
    INSERT INTO Z_TEST1
    VALUES ('BROWN');
    INSERT INTO Z_TEST1
    VALUES ('FOX');
    INSERT INTO Z_TEST1
    VALUES ('QUICK');
    INSERT INTO Z_TEST1
    VALUES ('BROWN');
    INSERT INTO Z_TEST1
    VALUES ('FOX');
    CREATE TABLE Z_TEST2 (
    FIELD2 VARCHAR2(30));
    INSERT INTO Z_TEST2
    VALUES ('JUMPED');
    INSERT INTO Z_TEST2
    VALUES ('OVER');
    INSERT INTO Z_TEST2
    VALUES ('LAZY');
    INSERT INTO Z_TEST2
    VALUES ('DOG');
    INSERT INTO Z_TEST2
    VALUES ('QUICK');
    INSERT INTO Z_TEST2
    VALUES ('BROWN');
    INSERT INTO Z_TEST2
    VALUES ('FOX');
    commit;
    CREATE VIEW TEST_VIEW
    AS
    SELECT Z_TEST1.FIELD1 AS "FIELD1", Z_TEST2.FIELD2 AS "FIELD2"
    FROM Z_TEST1, Z_TEST2;
    CREATE INDEX Z_TEST1_IDX ON Z_TEST1(FIELD1)
    INDEXTYPE IS CTXSYS.CONTEXT;
    CREATE INDEX Z_TEST2_IDX ON Z_TEST2(FIELD2)
    INDEXTYPE IS CTXSYS.CONTEXT;
    select *
    from TEST_VIEW
    where CONTAINS(FIELD1, 'FOX') > 0;
    14 rows
    select *
    from TEST_VIEW
    where (CONTAINS(FIELD1, 'FOX') > 0 OR CONTAINS(FIELD2, 'FOX') > 0);
    18 rows

  • Index rules in oracle text and query using matches

    Dear All,
    I would like to ask about rules and matches function in oracle text.
    I followed an example in oracle text application developer's guide.
    I have a rule table like this :
    1 oracle
    2 larry or ellison
    3 oracle and text
    4 market share
    then, I create an index to that table. This is needed for calling matches function. Here is the syntax :
    create index queryx on queries(query_string)
    indextype is ctxsys.ctxrule;
    then, I noticed that the result on DR$QUERYX$I table as follows :
    LARRY 0 2 2 1 (BLOB)
    MARKET 0 4 4 1 (BLOB) {MARKET} {SHARE}
    ORACLE 0 1 1 1 (BLOB)
    ORACLE 0 3 3 1 (BLOB) {TEXT}
    ELLISON 0 2 2 1 (BLOB)
    What I want to ask is why doesn't the words 'share' and 'text' appear in the DR$QUERYX$ table?
    When we use matches function, it then search on the index result and consequently it wion't find the 'share' word. so when for example I do query like this :
    select query_id from queries where matches(query_string,' It only share ten percent of all products sold')>0
    it will give 0 result since the no word in ' It only share ten percent of all products sold' was in index table. But actually it could possibly be categorized as the 4 category which rules is 'market share'
    I tried this in a larger set of data and get same result.
    Here is my generated rules from my document collection :
    1 {REQUIREMENTS} & {ELICITATION}
    1 {REQUIREMENTS} ~ {ELICITATION} & {ACTOR}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} & {FURPS}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} & {PROC}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} & {SPEED}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} & {DOCUME}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} ~ {DOCUME} & {PLACED}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} ~ {DOCUME} ~ {PLACED} & {UNNECESSARY}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} ~ {DOCUME} ~ {PLACED} ~ {UNNECESSARY} & {MISUSE}
    1 {INTERPRETATION} ~ {REQUIREMENTS}
    2 {DESIGN} & {REPRESENTATION}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} & {OCTOBER}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} & {PROCEDURAL}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} & {STRICT}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} ~ {STRICT} & {GRASP}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} ~ {STRICT} ~ {GRASP} & {MANY} & {LAYER}
    2 {DESIGN} ~ {REPRESENTATION} ~ {MAY}
    3 {PM} & {TESTING} & {ATTRIBUTI}
    And this is the index table result with ctxrule :
    (only the token_text column shown)
    PM
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    INTERPRETATION
    so when I try to classify a document with the word ouline inside it, it should produce category 1 (based on the rules) but since there are no word 'outline' in index tabel, the matches will return 0 means that the document is not classifiedto any category. I don't understand why it happen. Anybody knows about this? I would really appreciate any help.
    Thank you very much.

    Hm, I see. It do make sense. so nice to know.
    But then in the second example I gift where I used larger table, as shown below :
    Here is my generated rules from my document collection :
    1 {REQUIREMENTS} & {ELICITATION}
    1 {REQUIREMENTS} ~ {ELICITATION} & {ACTOR}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} & {FURPS}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} & {PROC}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} & {SPEED}
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} & {DOCUME}
    1 {INTERPRETATION} ~ {REQUIREMENTS}
    2 {DESIGN} & {REPRESENTATION}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} & {OCTOBER}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} & {PROCEDURAL}
    2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} & {STRICT}
    2 {DESIGN} ~ {REPRESENTATION} ~ {MAY}
    3 {PM} & {TESTING} & {ATTRIBUTI}
    As far as I know, the sign ' ~ ' means 'OR' and '&' means 'and' . So based on the 4th line in my table :
    1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE}
    it can be concluded that if any of the words stated there been queried, so the category '1' will appear as a result. But then before we can use 'matches' to query it, we need ti create index for the rules table . I did it and the result were :
    (only the token_text column shown)
    PM
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    DESIGN
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    REQUIREMENTS
    INTERPRETATION
    there were no words other than PM, DESIGN< REQUIREMENTS and INTERPRETATION. Why the words REQUIREMENTS, ELICITATION, ACTOR, FURPS, OUTLINE don't appear in the index result?

  • Oracle text in 9.2

    Hi
    I am trying to make a query run against a table as follows
    SQL> desc FILES_INCLUDED;
    Name Null? Type
    PID NOT NULL VARCHAR2(16)
    FILENAME NOT NULL VARCHAR2(240)
    SQL> select count(*) from FILES_INCLUDED;
    COUNT(*)
    5719417
    SQL>
    where PID & FILENAME essentially contains all the files delivered by a particular patch, ie
    123456-01 woudl be pid and FILENAME might be
    /usr/bin/ls, so 123456-01 might have multiple entries, one row for each deliverable i.e.
    SQL> select count(*) from FILES_INCLUDED where PID='xxxxxx-xx';
    COUNT(*)
    969
    SQL>
    I know this is poor really, but that is what this looks like so short of rewriting apps, that is the way it is.
    Anyway users want to do
    select distinct PID from FILES_INCLUDED where FILENAME like '%bin/ls%';
    now this just takes up to one minute to run ( not unexpected that really )
    So I recommended oracle text as perhaps a better way,
    I set this up via
    begin
    ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
    ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','true');
    ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
    ctx_ddl.set_attribute('mywordlist','WILDCARD_MAXTERMS', '15000');
    end;
    and this works to a degree, but is there a way to know upfront, how to avoid getting
    DRG-51030: wildcard query expansion resulted in too many terms
    that is what is the minimum number of chars that one can use ( how to determine this )
    ie
    select distinct pid from files_included where (contains (filename,'/bin/%',1)>0)
    gets a DRG-51030
    I can query
    select count(*) from dr$FILE_NAME_INX$i;
    select count(*) from dr$FILE_NAME_INX$p;
    to get an idea of the number of tokens, but not sure how to make a query run on this kind of data that is foolproof.
    The use case is for users to enter
    %ls% and get back the PID's that deliver %ls%.
    Enda

    It looks like you are wanting to search by sub-directory names or a combination of sub-directory names. By default, Oracle Text views the directory delimiter / as white space, so the individual sub-directories are tokenized. Therefore, you don't need the wildcards or / to do your searches. Please see the example below.
    SCOTT@orcl_11g> create table files_included
      2    (pid       varchar2 (16) not null,
      3       filename  varchar2 (40) not null)
      4  /
    Table created.
    SCOTT@orcl_11g> insert all
      2  into files_included values
      3    ('123456-01', '/usr/bin/ls/a')
      4  into files_included values
      5    ('123456-02', '/usr/bin/ls/b')
      6  into files_included values
      7    ('123456-03', '/usr/x/ls/a')
      8  into files_included values
      9    ('123456-02', '/usr/bin/x/b')
    10  into files_included values
    11    ('654321', '/usr/bin/other')
    12  select * from dual
    13  /
    5 rows created.
    SCOTT@orcl_11g> create index myindex
      2  on files_included (filename)
      3  indextype is ctxsys.context
      4  /
    Index created.
    SCOTT@orcl_11g> select token_text
      2  from   dr$myindex$i
      3  /
    TOKEN_TEXT
    B
    BIN
    LS
    OTHER
    USR
    X
    6 rows selected.
    SCOTT@orcl_11g> select * from files_included
      2  where  contains (filename, 'bin ls') > 0
      3  /
    PID              FILENAME
    123456-01        /usr/bin/ls/a
    123456-02        /usr/bin/ls/b
    SCOTT@orcl_11g> select * from files_included
      2  where  contains (filename, 'bin') > 0
      3  /
    PID              FILENAME
    123456-01        /usr/bin/ls/a
    123456-02        /usr/bin/ls/b
    123456-02        /usr/bin/x/b
    654321           /usr/bin/other
    SCOTT@orcl_11g> select * from files_included
      2  where  contains (filename, 'ls') > 0
      3  /
    PID              FILENAME
    123456-01        /usr/bin/ls/a
    123456-02        /usr/bin/ls/b
    123456-03        /usr/x/ls/a
    SCOTT@orcl_11g>

  • Using Oracle Text to search through WORD, EXCEL and PDF documents

    Hello again,
    What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
    Thanks
    Doug

    Yes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
    Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
    http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
    begin example.
    -- The following needs to be executed
    -- as sys.
    DROP DIRECTORY docs_dir;
    CREATE OR REPLACE DIRECTORY docs_dir
    AS 'C:\sql\oracle_text\documents';
    GRANT READ ON DIRECTORY docs_dir TO text;
    -- End sys ran SQL
    DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
    CREATE TABLE db_docs (
    id NUMBER,
    format VARCHAR2(10),
    location VARCHAR2(50),
    document BLOB,
    CONSTRAINT i_db_docs_p PRIMARY KEY(id)
    -- Several notes need to be made about this anonymous block.
    -- First the 'DOCS_DIR' parameter is a directory object name.
    -- This directory object name must be in upper case.
    DECLARE
    f_lob BFILE;
    b_lob BLOB;
    document_name VARCHAR2(50);
    BEGIN
    document_name := 'externaltables.doc';
    INSERT INTO db_docs
    VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
    RETURN document INTO b_lob;
    f_lob := BFILENAME('DOCS_DIR', document_name);
    DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
    DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
    DBMS_LOB.FILECLOSE(f_lob);
    COMMIT;
    END;
    -- build the index
    -- Note that this index differs than the file system stored file
    -- in that paramter datastore is ctxsys.defautl_datastore and not
    -- ctxsys.file_datastore. FILE_DATASTORE is for documents that
    -- exist on the file system. DEFAULT_DATASTORE is for documents
    -- that are stored in the column.
    create index db_docs_ctx on db_docs(document)
    indextype is ctxsys.context
    parameters (
    'datastore ctxsys.default_datastore
    filter ctxsys.inso_filter
    format column format');
    --search for something that is known to not be in the document.
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
    --search for something that is known to be in the document.  
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Albright', 1) > 0;

  • Oracle Text in installing Oracle 10g without licence!!

    Hi. Everyone.
    I've read some thread , but I am still confused about "oracle text".
    Now, I am testing oracle10g database.
    I downloaded 10g software from www.oracle.com, and installed it sucessfully
    on windows xp.
    When I was trying to import a dump file from oracle9i to
    the unlicenced oracle10g database, I got the error , IMP-00017, which
    is related to "Oracle Text".
    I checked "dba_users" dictionary, but ctxsys user is locked and expired.
    I read some thread on this site, and according to the advice, I tried to
    enable oracle text, using "DBCA".
    However, every database option on DBCA is disabled, I was not able to
    check oracle text.
    Lastly, how can I enable "Oracle Text" with unlicenced oracle 10g ?
    Is this possible without licence?
    I am very confused about this.
    I am looking forward to hear your experience and advices.
    Have a nice day.
    Best Regards.
    Ho.

    Well, instead of being confused, you could go to http://www.oracle.com/pls/db102/portal.portal_db?selected=1 and look at
    1) the licensing document, which would tell you whether you need a separate license, and
    2) under the 'Books' tab, look at the Text Application Developer's Guide or the Text Reference manuals for details.
    You could also look for the Oracle Text forum (from the http://forums.oracle.com page, under Database - More, or Text and ask the people who concentrate on that set of features.
    In general, Oracle Text is a set of extensions, the definitions for which are stored under user ctxsys. You would use these extensions by creating your own objects that are based on the extensions.
    For example, suppose your tables contain varchar2 columns. Create indexes that are based on ctxsys's 'context index type' and your application can then use the 'CONTAINS' keyword search capability (which is effectively a ctxsys-owned extension to the select)
    However, you would never log on to ctxsys and do anythibng with that as you risk changing the template code that Oracle has supplied.
    Message was edited by:
    Hans Forbrich
    PS: Yes, Oracle Text is included as part of the base database. Most of it is even included in the free Oracle XE database.

  • Oracle Text in XMLType Table

    I have successfully created (at least I think) oracle text indexes on my XMLType table:
    EXEC ctx_ddl.create_section_group('contract_xmlgroup', 'XML_SECTION_GROUP');
    EXEC CTX_DDL.Add_Zone_Section (group_name => 'contract_xmlgroup', section_name => 'complete_entry', tag => 'complete_entry')
    CREATE INDEX complete_entry ON boss_contracts INDEXTYPE IS ctxsys.context
    parameters('section group contract_xmlgroup');
    however I am unsure how to now search using CONTAINS with this index, I tried this at first:
    SELECT count(*) FROM boss_contracts b
    WHERE CONTAINS(value(b), 'string WITHIN complete_entry') > 0;
    this just gave me the error:
    ERROR at line 1:
    ORA-20000: Oracle Text error:
    DRG-10599: column is not indexed
    any help would be appreciated
    Paul

    It looks like you are wanting to search by sub-directory names or a combination of sub-directory names. By default, Oracle Text views the directory delimiter / as white space, so the individual sub-directories are tokenized. Therefore, you don't need the wildcards or / to do your searches. Please see the example below.
    SCOTT@orcl_11g> create table files_included
      2    (pid       varchar2 (16) not null,
      3       filename  varchar2 (40) not null)
      4  /
    Table created.
    SCOTT@orcl_11g> insert all
      2  into files_included values
      3    ('123456-01', '/usr/bin/ls/a')
      4  into files_included values
      5    ('123456-02', '/usr/bin/ls/b')
      6  into files_included values
      7    ('123456-03', '/usr/x/ls/a')
      8  into files_included values
      9    ('123456-02', '/usr/bin/x/b')
    10  into files_included values
    11    ('654321', '/usr/bin/other')
    12  select * from dual
    13  /
    5 rows created.
    SCOTT@orcl_11g> create index myindex
      2  on files_included (filename)
      3  indextype is ctxsys.context
      4  /
    Index created.
    SCOTT@orcl_11g> select token_text
      2  from   dr$myindex$i
      3  /
    TOKEN_TEXT
    B
    BIN
    LS
    OTHER
    USR
    X
    6 rows selected.
    SCOTT@orcl_11g> select * from files_included
      2  where  contains (filename, 'bin ls') > 0
      3  /
    PID              FILENAME
    123456-01        /usr/bin/ls/a
    123456-02        /usr/bin/ls/b
    SCOTT@orcl_11g> select * from files_included
      2  where  contains (filename, 'bin') > 0
      3  /
    PID              FILENAME
    123456-01        /usr/bin/ls/a
    123456-02        /usr/bin/ls/b
    123456-02        /usr/bin/x/b
    654321           /usr/bin/other
    SCOTT@orcl_11g> select * from files_included
      2  where  contains (filename, 'ls') > 0
      3  /
    PID              FILENAME
    123456-01        /usr/bin/ls/a
    123456-02        /usr/bin/ls/b
    123456-03        /usr/x/ls/a
    SCOTT@orcl_11g>

  • Using Oracle Text to Data Mine

    Can someone provide me with an idea of how to Data Mine with just using Oracle Text and not the data mining option. I need to search a column of customer complaints and then put it in a category based on that. It would be best if the categories were auto generated. It has to be done in PL/SQL.
    Thanks,

    You cannot have the categories created automatically without data mining. However, if you are willing to create the categories and queries that determine them, then you can do it with just Oracle Text. I posted an example on the 2nd page of the following thread:
    Re: New to Oracle Text search

  • Highlite oracle text search terms

    I have a report that I set up using the instructions for Oracle Text Application in APEX. It works very well however I have the actual document as a link and I would like the search terms highlighted in the actual document. Is there a way to do that in APEX?
    I use this Region Source:
    select score(1) relevance, filename, dbms_lob.getlength("DOCUMENT") Document, code_id
    from documents
    where contains (document, :P10_SEARCH, 1) > 0
    order by 1 desc
    I read something about using ctx_doc.snippet to highlight but can get that to work.
    Any suggestions or can APEX highlight terms when the actual document is used?

    '8265490,
    Take a look at the ctx_doc.markup procedure. I think it will do what you want.
    http://download.oracle.com/docs/cd/B19306_01/text.102/b14217/view.htm#sthref599
    My home server is on a moving truck, so I can only point you to some old forum posts for examples:
    Re: Using Oracle Text with Apex
    Re: Use apex to display email
    Doug

  • How to use Oracle text

    I'm storing files in a blob field in a 9i database, sometimes I need to query using the details stored in the database about the file and sometimes I need to search the files to find matches with some text (like search engine), I was told that oracle text can help me accomplish this functionality , however I don't know if it supports arabic text and I don't know how to use it from my application developed in 9i.
    Regards.

    Friend by using these step you can easily use Oracle inter text media
    j a h a n z e b
    [email protected]
    Oracle Developer
    6th Floor, State Bank of Pakistan
    I.I.Chundrigar Road, Karachi.
    Please note that in SqlPlus you can use '?' in stead of $ORACLE_HOME, and this works on Unix and Windows so if you want to execute $ORACLE_HOME/rdbms/admin/catalog.sql you can simply use:
    on Unix sql> @?/rdbms/admin/catalog.sql
    on Windows sql> @?\rdbms\admin\catalog.sql
    5.2.1 Explanation of installation steps
    1. Connected to database as SYSDBA and create CTXSYS user:
    Ctxsys user is created by calling following script:
    @?/ctx/admin/dr0csys.sql <ctxsys> <system> <temp>
    Where:
    change_on_install - is the ctxsys user password
    DRSYS - is the default tablespace for ctxsys
    TEMP - is the temporary tablespace for ctxsys
    This will create user CTXSYS and grants full privileges to CTXSYS in order to create and insert into result tables, execute callbacks, rewrite queries, and perform system cleanup. At this point CTXSYS will not own any objects.ss
    2. Connected to database as CTXSYS and create all necessary objects
    All necessary object are creates by calling following script:
    connect CTXSYS/change_on_install
    @?/ctx/admin/dr0inst <replace with $ORACLE_HOME>/ctx/lib/libctxx9.so;
    Please not that you have to put full path to your ORACLE_HOME, for example home as paramter
    On Solaris/Aix/Linux with $ORACLE_HOME of /u01/app/oracle/product/8.1.7
    @?/ctx/admin/dr0inst.sql /u01/app/oracle/product/8.1.7/ctx/lib/libctxx8.so
    On HP-UX with $ORACLE_HOME of /u01/app/oracle/product/8.1.7
    @?/ctx/admin/dr0inst.sql /u01/app/oracle/product/8.1.7/ctx/lib/libctxx8.sl
    Windows NT/2000 with D:\oracle\product\8.1.7
    @?/ctx/admin/dr0inst.sql D:\oracle\product\8.1.7\bin\oractxx8.dll
    This will installs all Oracle database objects required by the Oracle Text system. This includes:
    a) Data dictionary tables, views, sequence, packages
    b) Server management tables, views and packages
    c) Dispatcher packages
    d) Service queue objects
    3) Install appropriate language-specific default preferences.
    The next step is to install appropriate language-specific default preferences.When you use CREATE INDEX to create an index or ALTER INDEX to manage an index, you can optionally specify indexing preferences in the parameter string. There are seven preference classes:
    - Lexer, defines the language being indexed. ( language specific )
    - Wordlist, defines the expantion of stem and fuzzy queries. ( language specific )
    - Stoplist, defines words and themes that are not be indexed. ( language specific )
    - Datastore, defines document storage.
    - Filter, defines standards for converion of documents to plaintext.
    - Storage, defines the storage of the index tables.
    - Section group, enables possibilities to define document sections.
    There is script which creates language-specific default preferences for every language Oracle text supports in <ORACLE_HOME>/ctx/admin/defaults directory, such as English(US), Danish(DK), Dutch(NL), Finnish(SF), French(FR), German(DE), Italian(IT), Portuguese(PR), Spanish(ES), and Swedish(S). They are named in the form drdefXX.sql, where XX is the language code. To manually install US default preferences, for example, log into sqlplus as CTXSYS, and run 'drdefus.sql' as described below:
    @?/ctx/admin/defaults/drdefus.sql
    create user textuser identified by textuser
    default tablespace users
    temporary tablespace temp;
    -- You must grant 'ctxapp' role to textuser
    grant connect, resource, ctxapp to textuser;
    connect textuser/textuser
    drop table quick;
    create table quick (
    quick_id number
    constraint quick_pk primary key,
    text varchar2(80) );
    insert into quick ( quick_id, text ) values (1,'The cat sat on the mat');
    insert into quick ( quick_id, text ) values (2,'The quick brown fox jumps over the lazy dog' );
    insert into quick ( quick_id, text ) values (3,'The dog barked like a dog');
    commit;
    create index quick_text on quick ( text )
    indextype is ctxsys.context;
    col text format a45
    col s format 999
    select text, score(42) s from quick
    where contains ( text, 'dog', 42 ) > 0
    order by s desc;

  • Oracle Text with Numbers

    Hello,
    I need to search in a number column for particular "subnumbers". For
    example I have a column with 3453454 in it an I like to search e.g for the
    number "53" in it. I know I could use
    select * from table where number_column like '%53%'
    but since the table is rather big I'd like to use Oracle Text for it to avoid a full table scan and query like
    select * from table where contains(number_column, '53') > 0
    but above query would return NULL after converting the number column
    to a varchar2 column! Only full numbers are indexed and therefore only
    a search on the full number 3453454 would yield a result. What are my
    options to make above query with "contains" clause work?
    Thanks in advance

    You can configure Text to do substring searches.
    Do this:
    ctx_ddl.create_preference( 'SUBSTR_SUPPORT_PREF', 'basic_wordlist' );
    ctx_ddl.set_attribute( 'SUBSTR_SUPPORT_PREF', 'SUBSTRING_INDEX', 'YES' );
    Then you can do something like:
    where contains(col,'%53%',1) > 0
    Tom Best

  • How to compute a global SCORE over a few oracle text indexed tables?

    Dear experts!
    I want to search a website with Oracle Text. The website consists of four tables:
    - site
    - chapter
    - text
    - binaries
    Each table has two or three columns which should be indexed with oracle text. So I have created a MULTI_COLUMN_DATASTORE oracle text index on each table - So I have four indexes on my website.
    When I want to search over the website I have to join my 4 tables (4 contain clauses). So how do I get a global SCORE over these 4 contains clauses?
    The next question is can I change the weight of my text indexes (useful for the search hit list)? For example the highest weight has the site index, the second highest weight the chapter index and so on?
    Thanks
    Markus

    If it's a simple JOIN, then you could just add the scores for each CONTAINS clause
    select score(1)+score(2)+score(3)+score(4)
    from table1 t1,table2 t2, table3 t3,table4 t4
    where [join conditions]
    and contains(t1.col, 'xxx', 1) > 0 or
    contains(t2, col, 'xxx', 2) > 0 or
    ... etc
    then to change the weight you just add a multiplying factor.
    Can't help thinking it's probably more complex than this, though.

  • Parsing the word file using oracle text having tables within it............

    Hi,
    I was going through this document.Actually I am going to implement something like full text search functionality in our system.
    We get the info as .doc file.
    Earlier what we used to do is, we used to parse the file and store it into the database and then searched using PL/SQL.
    But what I understand from this article that this can be done using oracle text also.
    One concern is that whether the oracle text is able to parse the .doc file having tables embedded within it.
    Please let me know about this.(Whether oracle text will be able to parse the files having tables embedded within it).
    I am attaching an example file for this.
    Please let me know about this as early as possible.

    Yes Oracle Text have this capability. Use AUTO_FILTER or USER_FILTER to create index

Maybe you are looking for