Name search. Oracle Text?

Hi. I have a table containing surnames and forenames, many of which are foreign and easy to misspell. I wish to write a search application that will be intelligent and allow for misspellings. Am I right in assuming I need to use Oracle Text for this? And, if so, would it be a Context or Catalog search?
I have not used Oracle Text before and would appreciate it if answers are detailed or point me in the right direction for further reading.
Thanks.

If you are on 10g you may find UTL_MATCH.jaro_winkler_similarity useful:
Re: SQL Query to find out similar names in two tables

Similar Messages

  • Ultra Search/ Oracle Text capabilities

    Our decision to go forward with Oracle9i is contingent upon the extensible use of Ultra Search and Oracle Text in our planned endeavors.
    Basically we are to build a system to do the following:
    1) download information (html files, links, documents) from web sites and accessible disk archives. The url sites are particular to a domain.
    2) place the downloaded file information into our Oracle database or download to local system with appropriate links in database.
    3) perform queries on the downloaded information through the database to isolate files for analysis.
    4) analyze and perform extraction on the information. For example, query based on a defined hierarchy of vulnerability terms.
    I've demoed Ultra Search and Oracle Text. I believe that Ultra Search can handle step 1, and possibly step 2 and that Oracle Text can help in step 4. Step 3 is satisfied by the Oracle database.
    I need to know details concerning Ultra Search and Oracle Text before committing:
    o when Ultra Search performs its crawling, how is found information represented in the database. Is a whole html file or document downloaded or are references to these documents stored in the database? If references are stored does Ultra Search embed the capability to download these files to be analyzed?
    o is Oracle Text the right tool to provide the capability for robust analysis of downloaded documents.
    o I have used the sample JSP that came with Ultra Search. Are there any more detailed examples which my above steps. In particular, performing robust analysis on downloaded documents from step 1.
    We have and are still exploring other COTS products to find a solution. Are main goal is to have the retrieved documents and analysis information resident in the database in this phase of our project. We find other COTS can perform the web crawling, but lack analysis, or vice versa and that their solutions are so vendor specific that in many times their services would be required to build a suitable solution that is not very extensible.
    Thanks for any feedback.

    Ultra Search does not keep documents in the database permanently. We bring them in for indexing purposes, but remove them after
    the indexing is completed. However, we keep the URLs of each unique document that was found during the crawling. You would
    have to do the downloading yourself. However, we are thinking about providing a mechanism, maybe in the form of an API, that
    would allow customers to retrieve documents. Please contact me on this issue if you are interested to discuss this: (650)-506-8173.
    Generally speaking you will find that Oracle Text is a very powerful tool for analysis of textual documents, especially since it is
    driven through the SQL language, has extensive functionality (themes, user-defined knowledge base, thesaurus, and many useful linguistic
    functions like segmentation, stemming, and globalisation support).
    The philosophy of Ultra Search is to provide you with an out-of-the-box solution for crawling and searching your data without the
    need for programming. Ultra Search is built on top of Text, so I would advise you to use Text to do the further analysis of your
    documents after they have been located by the crawler.
    Best Regards,
    Stefan Buchta

  • Using Oracle Text to search through WORD, EXCEL and PDF documents

    Hello again,
    What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
    Thanks
    Doug

    Yes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
    Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
    http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
    begin example.
    -- The following needs to be executed
    -- as sys.
    DROP DIRECTORY docs_dir;
    CREATE OR REPLACE DIRECTORY docs_dir
    AS 'C:\sql\oracle_text\documents';
    GRANT READ ON DIRECTORY docs_dir TO text;
    -- End sys ran SQL
    DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
    CREATE TABLE db_docs (
    id NUMBER,
    format VARCHAR2(10),
    location VARCHAR2(50),
    document BLOB,
    CONSTRAINT i_db_docs_p PRIMARY KEY(id)
    -- Several notes need to be made about this anonymous block.
    -- First the 'DOCS_DIR' parameter is a directory object name.
    -- This directory object name must be in upper case.
    DECLARE
    f_lob BFILE;
    b_lob BLOB;
    document_name VARCHAR2(50);
    BEGIN
    document_name := 'externaltables.doc';
    INSERT INTO db_docs
    VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
    RETURN document INTO b_lob;
    f_lob := BFILENAME('DOCS_DIR', document_name);
    DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
    DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
    DBMS_LOB.FILECLOSE(f_lob);
    COMMIT;
    END;
    -- build the index
    -- Note that this index differs than the file system stored file
    -- in that paramter datastore is ctxsys.defautl_datastore and not
    -- ctxsys.file_datastore. FILE_DATASTORE is for documents that
    -- exist on the file system. DEFAULT_DATASTORE is for documents
    -- that are stored in the column.
    create index db_docs_ctx on db_docs(document)
    indextype is ctxsys.context
    parameters (
    'datastore ctxsys.default_datastore
    filter ctxsys.inso_filter
    format column format');
    --search for something that is known to not be in the document.
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
    --search for something that is known to be in the document.  
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Albright', 1) > 0;

  • Is Oracle Text the right solution for this need of a specific search!

    Hi ,
    We are on Oracle 11.2.0.2 on Solaris 10. We have the need to be able to do search on data that are having diacritical marks and we should be able to do the serach ignoring this diacritical marks. That is the requirement. Now I got to hear that Oracle Text has a preference called BASIC_LEXER which can bypass the diacritical marks and so solely due to this feature I implemented Oracle Text and just for this diacritical search and no other need.
    I mean I set up preference like this:
      ctxsys.ctx_ddl.create_preference ('cust_lexer', 'BASIC_LEXER');
      ctxsys.ctx_ddl.set_attribute ('cust_lexer', 'base_letter', 'YES'); -- removes diacritics
    With this I set up like this:
    CREATE TABLE TEXT_TEST
      NAME  VARCHAR2(255 BYTE)
    --created Oracle Text index
    CREATE INDEX TEXT_TEST_IDX1 ON TEXT_TEST
    (NAME)
    INDEXTYPE IS CTXSYS.CONTEXT
    PARAMETERS('LEXER cust_lexer WORDLIST cust_wl SYNC (ON COMMIT)');
    --sample data to illustrate the problem
    Insert into TEXT_TEST
       (NAME)
    Values
       ('muller');
    Insert into TEXT_TEST
       (NAME)
    Values
       ('müller');
    Insert into TEXT_TEST
       (NAME)
    Values
       ('MULLER');
    Insert into TEXT_TEST
       (NAME)
    Values
       ('MÜLLER');
    Insert into TEXT_TEST
       (NAME)
    Values
       ('PAUL HERNANDEZ');
    Insert into TEXT_TEST
       (NAME)
    Values
       ('CHRISTOPHER Phil');
    COMMIT;
    --Now there is an alternative solution that is there,  instead of thee Oracle Text which is just a plain function given below (and it seems to work neat for my simple need of removing diacritical characters effect in search)
    --I need to evaluate which is better given my specific needs -the function below or Oracle Text.
    CREATE OR REPLACE FUNCTION remove_dia(p_value IN VARCHAR2, p_doUpper IN VARCHAR2 := 'Y')
    RETURN VARCHAR2 DETERMINISTIC
    IS
    OUTPUT_STR VARCHAR2(4000);
    begin
    IF (p_doUpper = 'Y') THEN
       OUTPUT_STR := UPPER(p_value);
    ELSE
       OUTPUT_STR := p_value;
    END IF;
    OUTPUT_STR := TRANSLATE(OUTPUT_STR,'ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ', 'AAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy');
    RETURN (OUTPUT_STR);
    end;
    --now I query for which name stats with  a P%:
    --Below query gets me unexpected result of one row as I am using Oracle Text where each word is parsed for search using CONTAINS...
    SQL> select * from text_test where contains(name,'P%')>0;
    NAME
    PAUL HERNANDEZ
    CHRISTOPHER Phil
    --Below query gets me the right and expected result of one row...
    SQL> select * from text_test where name like 'P%';
    NAME
    PAUL HERNANDEZ
    --Below query gets me the right and expected result of one row...
    SQL>  select * from text_test where remove_dia(name) like remove_dia('P%');
    NAME
    PAUL HERNANDEZMy entire need was only to be able to do a search that bypasses diacritical characters. To implement Oracle Text for that reason, I am wondering if that was the right choice! More so when I am now finding that the functionality of LIKE is not available in Oracle Text - the Oracle text search are based on tokens or words and they are different from output of the LIKE operator. So may be should I have just used a simple function like below and used that for my purpose instead of using Oracle Text:
    This function (remove_dia) just removes the diacritical characters and may be for my need this is all that is needed. Can someone help to review that given my need I am better of not using Oracle Text? I need to continue using the functionality of Like operator and also need to bypass diacritical characters so the simple function that I have meets my need whereas Oracle Text causes a change in behaviour of search queries.
    Thanks,
    OrauserN

    If all you need is LIKE functionality and you do not need any of the complex search capabilities of Oracle Text, then I would not use Oracle Text. I would create a function-based index on your name column that uses your function that removes the diacritical marks, so that your searches will be faster. Please see the demonstration below.
    SCOTT@orcl_11gR2> CREATE TABLE TEXT_TEST
      2    (NAME  VARCHAR2(255 BYTE))
      3  /
    Table created.
    SCOTT@orcl_11gR2> Insert all
      2  into TEXT_TEST (NAME) Values ('muller')
      3  into TEXT_TEST (NAME) Values ('müller')
      4  into TEXT_TEST (NAME) Values ('MULLER')
      5  into TEXT_TEST (NAME) Values ('MÜLLER')
      6  into TEXT_TEST (NAME) Values ('PAUL HERNANDEZ')
      7  into TEXT_TEST (NAME) Values ('CHRISTOPHER Phil')
      8  select * from dual
      9  /
    6 rows created.
    SCOTT@orcl_11gR2> CREATE OR REPLACE FUNCTION remove_dia
      2    (p_value   IN VARCHAR2,
      3       p_doUpper IN VARCHAR2 := 'Y')
      4    RETURN VARCHAR2 DETERMINISTIC
      5  IS
      6    OUTPUT_STR VARCHAR2(4000);
      7  begin
      8    IF (p_doUpper = 'Y') THEN
      9        OUTPUT_STR := UPPER(p_value);
    10    ELSE
    11        OUTPUT_STR := p_value;
    12    END IF;
    13    RETURN
    14        TRANSLATE
    15          (OUTPUT_STR,
    16           'ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ',
    17           'AAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy');
    18  end;
    19  /
    Function created.
    SCOTT@orcl_11gR2> show errors
    No errors.
    SCOTT@orcl_11gR2> CREATE INDEX text_test_remove_dia_name
      2  ON text_test (remove_dia (name))
      3  /
    Index created.
    SCOTT@orcl_11gR2> set autotrace on explain
    SCOTT@orcl_11gR2> select * from text_test
      2  where  remove_dia (name) like remove_dia ('mü%')
      3  /
    NAME
    muller
    müller
    MULLER
    MÜLLER
    4 rows selected.
    Execution Plan
    Plan hash value: 3139591283
    | Id  | Operation                   | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
    |   0 | SELECT STATEMENT            |                           |     1 |  2131 |     2   (0)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| TEXT_TEST                 |     1 |  2131 |     2   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | TEXT_TEST_REMOVE_DIA_NAME |     1 |       |     1   (0)| 00:00:01 |
    Predicate Information (identified by operation id):
       2 - access("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('mü%'))
           filter("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('mü%'))
    Note
       - dynamic sampling used for this statement (level=2)
    SCOTT@orcl_11gR2> select * from text_test
      2  where  remove_dia (name) like remove_dia ('P%')
      3  /
    NAME
    PAUL HERNANDEZ
    1 row selected.
    Execution Plan
    Plan hash value: 3139591283
    | Id  | Operation                   | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
    |   0 | SELECT STATEMENT            |                           |     1 |  2131 |     2   (0)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| TEXT_TEST                 |     1 |  2131 |     2   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | TEXT_TEST_REMOVE_DIA_NAME |     1 |       |     1   (0)| 00:00:01 |
    Predicate Information (identified by operation id):
       2 - access("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('P%'))
           filter("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('P%'))
    Note
       - dynamic sampling used for this statement (level=2)
    SCOTT@orcl_11gR2>

  • Searching using Oracle Text instead of LIKE '%'

    Hello all,
    I hope you help me in this:
    I have a table looks like this
    create table subscribers (
    id numer(10),
    first_name varchar2(30),
    father_name varchar2(30),
    grandfather_name varchar2(30),
    last_name varchar2(30))
    The application is built using Oracle Forms. Many times, the end users are not so sure of the spelling of the name, therefore they use the "%" wildcard with name fields. This will be reflected to the queries the application will send them to the Oracle Server.
    We have the following queries
    1) select *
    from subscribers
    where last_name like '%family_name%';
    2) select *
    from subscribers
    where last_name like 'family_name%';
    3) select *
    from subscribers
    where last_name like '%family_name%' and first_name like '%first_name%';
    4) select *
    from subscribers
    where last_name like 'family_name%' and first_name like 'first_name%';
    As well as searching on the father_name and grandfather_name fields. But most of the search are on the first_name and the last_name.
    These queries are killing the server since we have millions of records. BTree indexes will not help here because of the LIKE and the "%"
    I am thinking to use Oracle Text here, but I am not sure whether I have to go for a CONTEXT index on each individual column, or I can use the MULTI_COLUMN_DATASTORE indexing.
    Any idea will be appreciated

    The ctxcat index and catsearch operator are generally intended for usage with one text column and one or more columns of structured data. You would have to pick just one of your columns as the text column and the others as structured columns. I would be more inclined to use the multi_column_datastore with a context index and contains operator, so that you can search all of your columns as text columns.

  • Product Search Using Oracle Text or By Any Other Methods using PL/SQL

    Hi All,
    I have requirement for product search using the product table which has around 5 million products. I Need to show top 100 disitnct products searched  in the following order
    1. = ProductDescription
    2. ProductDescription_%
    3. %_ProductDescription_%
    4. %_ProductDescription
    5. ProductDescription%
    6. %ProductDescription
    Where '_' is space.  If first two/three/or any criteria itslef gives me 100 records then i need not search for another patterns
    Table Structure Is as follows
    Create Table Tbl_Product_Lookup
        Barcode_number                Varchar2(9),
        Product_Description Varchar2(200),
        Product_Start_Date Date,
        Product_End_Date Date,
        Product_Price Number(12,4)
    Could you please help me implementing this one ? SLA for the search result is 2 seconds
    Thanks,
    Varun

    You could use an Oracle Text context index with a wordlist to speed up substring searches and return all rows that match any of your criteria, combined with a case statement to provide a ranking that can be ordered by within an inner query, then use rownum to limit the rows in an outer query.  You could also use the first_rows(n) hint to speed up the return of limited rows.  Please see the demonstration below.  If you decide to use Oracle Text, you may want to ask further questions in the Oracle Text sub-forum on this forum or space or whatever they call it now.
    SCOTT@orcl_11gR2> -- table:
    SCOTT@orcl_11gR2> Create Table Tbl_Product_Lookup
      2    (
      3       Barcode_number       Varchar2(9),
      4       Product_Description  Varchar2(200),
      5       Product_Start_Date   Date,
      6       Product_End_Date     Date,
      7       Product_Price          Number(12,4)
      8    )
      9  /
    Table created.
    SCOTT@orcl_11gR2> -- sample data:
    SCOTT@orcl_11gR2> insert all
      2  into tbl_product_lookup (product_description) values ('test product')
      3  into tbl_product_lookup (product_description) values ('test product and more')
      4  into tbl_product_lookup (product_description) values ('another test product and more')
      5  into tbl_product_lookup (product_description) values ('another test product')
      6  into tbl_product_lookup (product_description) values ('test products')
      7  into tbl_product_lookup (product_description) values ('selftest product')
      8  select * from dual
      9  /
    6 rows created.
    SCOTT@orcl_11gR2> insert into tbl_product_lookup (product_description) select object_name from all_objects
      2  /
    75046 rows created.
    SCOTT@orcl_11gR2> -- wordlist:
    SCOTT@orcl_11gR2> begin
      2    ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
      3    ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');
      4    ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH', '3');
      5    ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', '4');
      6    ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
      7    ctx_ddl.set_attribute('mywordlist', 'wildcard_maxterms', 0) ;
      8  end;
      9  /
    PL/SQL procedure successfully completed.
    SCOTT@orcl_11gR2> -- context index that uses wordlist:
    SCOTT@orcl_11gR2> create index prod_desc_text_idx
      2  on tbl_product_lookup (product_description)
      3  indextype is ctxsys.context
      4  parameters ('wordlist mywordlist')
      5  /
    Index created.
    SCOTT@orcl_11gR2> -- gather statistics:
    SCOTT@orcl_11gR2> exec dbms_stats.gather_table_stats (user, 'TBL_PRODUCT_LOOKUP')
    PL/SQL procedure successfully completed.
    SCOTT@orcl_11gR2> -- query:
    SCOTT@orcl_11gR2> variable productdescription varchar2(100)
    SCOTT@orcl_11gR2> exec :productdescription := 'test product'
    PL/SQL procedure successfully completed.
    SCOTT@orcl_11gR2> column product_description format a45
    SCOTT@orcl_11gR2> set autotrace on explain
    SCOTT@orcl_11gR2> set timing on
    SCOTT@orcl_11gR2> select /*+ FIRST_ROWS(100) */ *
      2  from   (select /*+ FIRST_ROWS(100) */ distinct
      3              case when product_description = :productdescription            then 1
      4               when product_description like :productdescription || ' %'       then 2
      5               when product_description like '% ' || :productdescription || ' %' then 3
      6               when product_description like '% ' || :productdescription       then 4
      7               when product_description like :productdescription || '%'       then 5
      8               when product_description like '%' || :productdescription       then 6
      9              end as ranking,
    10              product_description
    11           from   tbl_product_lookup
    12           where  contains (product_description, '%' || :productdescription || '%') > 0
    13           order  by ranking)
    14  where  rownum <= 100
    15  /
       RANKING PRODUCT_DESCRIPTION
             1 test product
             2 test product and more
             3 another test product and more
             4 another test product
             5 test products
             6 selftest product
    6 rows selected.
    Elapsed: 00:00:00.10
    Execution Plan
    Plan hash value: 459057338
    | Id  | Operation                      | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
    |   0 | SELECT STATEMENT               |                    |    38 |  3990 |    13  (16)| 00:00:01 |
    |*  1 |  COUNT STOPKEY                 |                    |       |       |            |          |
    |   2 |   VIEW                         |                    |    38 |  3990 |    13  (16)| 00:00:01 |
    |*  3 |    SORT UNIQUE STOPKEY         |                    |    38 |   988 |    12   (9)| 00:00:01 |
    |   4 |     TABLE ACCESS BY INDEX ROWID| TBL_PRODUCT_LOOKUP |    38 |   988 |    11   (0)| 00:00:01 |
    |*  5 |      DOMAIN INDEX              | PROD_DESC_TEXT_IDX |       |       |     4   (0)| 00:00:01 |
    Predicate Information (identified by operation id):
       1 - filter(ROWNUM<=100)
       3 - filter(ROWNUM<=100)
       5 - access("CTXSYS"."CONTAINS"("PRODUCT_DESCRIPTION",'%'||:PRODUCTDESCRIPTION||'%')>0)
    SCOTT@orcl_11gR2>

  • Oracle Text And Oracle Ultra search

    Hi all,
    I have a problem. I have some files in the server file system e.g C:/docs. I want to search these MS Office word files in order to see if they contain a word. I tried oracle ultra search but when i put File data source it provides a form file:/// and when i put C:/docs it gives me file://localhost/C:/docs. Can i configure Oracle ultra search to search into the server file system?Where is the directory that oracle ultra search searches a file?
    How can index and search the files in the file systems exept oracle ultra search? Can Oracle Text do my job?
    Sorry to bother and thank you in advance for your help
    Antonis.

    Hi,
    Searching a word in a document (MS Word) can definitely be done using Oracle Text.
    But, you other requirements can make things a bit complicated.
    Oracle Text have something called file_datastore, where you need to mention the file_path/s and individual file names too.
    Once you mention them, i.e. path and file names, oracle text can read the file , index them and you can query using simple queries.
    If the number of files are huge, entering all the file names can be difficult. In that case you can give the following a try.
    -- You can use a perl script to read the file name in a directory, put it into a file in a particular format (1 file name per line) and then load that text file into a table using ctxloader.
    You can go through the FILE_DATASTORE and CTXLOADER examples from oracle text reference document.

  • Oracle Text Query of abbreviated word / name

    I'm new to Oracle Text so please excuse the (probably) simple question. I want to be able to create a search that excludes (includes?) special characters and/or spaces between an abbreviated name. I'm not sure if it's possible but I would like to be able to return all of the below results if someone queried for "ABC" in one form or another.
    Would this be something I'd add to a thesaurus? I see there is a STOPLIST but I'm not sure if there is the opposite of a stoplist.
    Thanks in advance!
    Regards,
    Rich
    set def off;
    drop table docs;
    CREATE TABLE docs (id NUMBER PRIMARY KEY, text VARCHAR2(200));
    INSERT INTO docs VALUES(1, 'ABC are my favorite letters.');
    INSERT INTO docs VALUES(2, 'My favorite letters are A,B,C');
    INSERT INTO docs VALUES(3, 'The best letters are A.B.C.');
    INSERT INTO docs VALUES(4, 'Three of the word letters are A-B-C.');
    INSERT INTO docs VALUES(5, 'A B C are great letters.');
    INSERT INTO docs VALUES(6, 'AB and C are easy letters to remember');
    INSERT INTO docs VALUES(7, 'What if we used A, B, & C?');
    commit;
    begin
    ctx_ddl.drop_preference('english_lexar');
    end;
    begin
    ctx_ddl.create_preference('english_lexar', 'BASIC_LEXER');
    ctx_ddl.set_attribute('english_lexar', 'printjoins', '_-');
    ctx_ddl.set_attribute('english_lexar', 'skipjoins', '-.');
    --ctx_ddl.set_attribute ( 'english_lexar', 'index_themes', 'YES');
    ctx_ddl.set_attribute ( 'english_lexar', 'index_text', 'YES');
    ctx_ddl.set_attribute ( 'english_lexar', 'index_stems', 'SPANISH');
    ctx_ddl.set_attribute ( 'english_lexar', 'mixed_case', 'YES');
    ctx_ddl.set_attribute ( 'english_lexar', 'base_letter', 'YES');
    end;
    begin
    ctx_ddl.drop_preference('STEM_FUZZY_PREF');
    end;
    begin
      ctx_ddl.create_preference('STEM_FUZZY_PREF', 'BASIC_WORDLIST');
      ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_MATCH','ENGLISH');
      ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_SCORE','0');
      ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_NUMRESULTS','5000');
      ctx_ddl.set_attribute('STEM_FUZZY_PREF','SUBSTRING_INDEX','TRUE');
      ctx_ddl.set_attribute('STEM_FUZZY_PREF','PREFIX_INDEX','TRUE');
      ctx_ddl.set_attribute('STEM_FUZZY_PREF','STEMMER','ENGLISH');
    end;
    begin
    ctx_ddl.drop_preference('wildcard_pref');
    end;
    begin
        Ctx_Ddl.create_Preference('wildcard_pref', 'BASIC_WORDLIST');
        ctx_ddl.set_attribute('wildcard_pref', 'wildcard_maxterms', 100) ;
    end;
    DROP index myindex;
    create index myindex on docs (text)
      indextype is ctxsys.context
      parameters ( 'LEXER english_lexar Wordlist wildcard_pref' );
    EXEC CTX_DDL.SYNC_INDEX('myindex', '2M');
    SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'ABC', 1) > 0;It may be that my SQL statement isn't taking advantage of the Text options -- i.e. I'm forgetting something obvious :)

    Indexes are case-insensitive by default, so let's ignore that.
    You can make wal-mart and wal*mart match walmart by defining "-" and "*" as SKIPJOINS characters. However, you cannot make wal mart match walmart, other than by using NDATA.
    NDATA does seem to work - any variation of wal mart walmart wal*mart and wal-mart do manage to match both walmart and wal mart. See example:
    SQL> create table testcase (text varchar2(2000));
    Table created.
    SQL> insert into testcase values ('<nd>walmart</nd>');
    1 row created.
    SQL> insert into testcase values ('<nd>wal mart</nd>');
    1 row created.
    SQL> exec ctx_ddl.drop_section_group('tcsg')
    PL/SQL procedure successfully completed.
    SQL> exec ctx_ddl.create_section_group('tcsg', 'xml_section_group')
    PL/SQL procedure successfully completed.
    SQL> exec ctx_ddl.add_ndata_section('tcsg', 'nd', 'nd')
    PL/SQL procedure successfully completed.
    SQL> create index testcase_index on testcase(text)
      2  indextype is ctxsys.context
      3  parameters ('section group tcsg')
      4  /
    Index created.
    SQL> select * from testcase where contains (text, 'ndata(nd, wal mart)') > 0;
    TEXT
    <nd>walmart</nd>
    <nd>wal mart</nd>
    SQL> select * from testcase where contains (text, 'ndata(nd, wal-mart)') > 0;
    TEXT
    <nd>walmart</nd>
    <nd>wal mart</nd>
    SQL> select * from testcase where contains (text, 'ndata(nd, wal*mart)') > 0;
    TEXT
    <nd>walmart</nd>
    <nd>wal mart</nd>
    SQL> select * from testcase where contains (text, 'ndata(nd, walmart)') > 0;
    TEXT
    <nd>walmart</nd>
    <nd>wal mart</nd>Edited by: Roger Ford on Jun 21, 2012 10:22 AM

  • Oracle text performance with context search indexes

    Search performance using context index.
    We are intending to move our search engine to a new one based on Oracle Text, but we are meeting some
    bad performance issues when using search.
    Our application allows the user to search stored documents by name, object identifier and annotations(formerly set on objects).
    For example, suppose I want to find a document named ImportSax2.c: according to user set parameters, our search engine format the following
    search queries :
    1) If the user explicitely ask for a search by document name, the query is the following one =>
         select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c WITHIN objname' , 1 ) > 0;
    2) If the user don't specify any extra parameters, the query is the following one =>
         select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c' , 1 ) > 0;
    Oracle text only need around 7 seconds to answer the second query, whereas it need around 50 seconds to give an answer for the first query.
    Here is a part of the sql script used for creating the Oracle Text index on the column OBJFIELDURL
    (this column stores a path to an xml file containing properties that have to be indexed for each object) :
    begin
    Ctx_Ddl.Create_Preference('wildcard_pref', 'BASIC_WORDLIST');
    ctx_ddl.set_attribute('wildcard_pref', 'wildcard_maxterms', 200) ;
    ctx_ddl.set_attribute('wildcard_pref','prefix_min_length',3);
    ctx_ddl.set_attribute('wildcard_pref','prefix_max_length',6);
    ctx_ddl.set_attribute('wildcard_pref','STEMMER','AUTO');
    ctx_ddl.set_attribute('wildcard_pref','fuzzy_match','AUTO');
    ctx_ddl.set_attribute('wildcard_pref','prefix_index','TRUE');
    ctx_ddl.set_attribute('wildcard_pref','substring_index','TRUE');
    end;
    begin
    ctx_ddl.create_preference('doc_lexer_perigee', 'BASIC_LEXER');
    ctx_ddl.set_attribute('doc_lexer_perigee', 'printjoins', '_-');
    ctx_ddl.set_attribute('doc_lexer_perigee', 'BASE_LETTER', 'YES');
    ctx_ddl.set_attribute('doc_lexer_perigee','index_themes','yes');
    ctx_ddl.create_preference('english_lexer','basic_lexer');
    ctx_ddl.set_attribute('english_lexer','index_themes','yes');
    ctx_ddl.set_attribute('english_lexer','theme_language','english');
    ctx_ddl.set_attribute('english_lexer', 'printjoins', '_-');
    ctx_ddl.set_attribute('english_lexer', 'BASE_LETTER', 'YES');
    ctx_ddl.create_preference('german_lexer','basic_lexer');
    ctx_ddl.set_attribute('german_lexer','composite','german');
    ctx_ddl.set_attribute('german_lexer','alternate_spelling','GERMAN');
    ctx_ddl.set_attribute('german_lexer','printjoins', '_-');
    ctx_ddl.set_attribute('german_lexer', 'BASE_LETTER', 'YES');
    ctx_ddl.set_attribute('german_lexer','NEW_GERMAN_SPELLING','YES');
    ctx_ddl.set_attribute('german_lexer','OVERRIDE_BASE_LETTER','TRUE');
    ctx_ddl.create_preference('japanese_lexer','JAPANESE_LEXER');
    ctx_ddl.create_preference('global_lexer', 'multi_lexer');
    ctx_ddl.add_sub_lexer('global_lexer','default','doc_lexer_perigee');
    ctx_ddl.add_sub_lexer('global_lexer','german','german_lexer','ger');
    ctx_ddl.add_sub_lexer('global_lexer','japanese','japanese_lexer','jpn');
    ctx_ddl.add_sub_lexer('global_lexer','english','english_lexer','en');
    end;
    begin
         ctx_ddl.create_section_group('axmlgroup', 'AUTO_SECTION_GROUP');
    end;
    drop index ADSOBJ_XOBJFIELDURL force;
    create index ADSOBJ_XOBJFIELDURL on ADSOBJ(OBJFIELDURL) indextype is ctxsys.context
    parameters
    ('datastore ctxsys.file_datastore
    filter ctxsys.inso_filter
    sync (on commit)
    lexer global_lexer
    language column OBJFIELDURLLANG
    charset column OBJFIELDURLCHARSET
    format column OBJFIELDURLFORMAT
    section group axmlgroup
    Wordlist wildcard_pref
    Oracle created a table named DR$ADSOBJ_XOBJFIELDURL$I which now contains around 25 millions records.
    ADSOBJ is the table contaings information for our documents,OBJFIELDURL is the field that contains the path to the xml file containing
    data to index. That file looks like this :
    <?xml version="1.0" encoding="UTF-8" ?>
    <fields>
    <OBJNAME><![CDATA[NomLnk_177527o.jpgp]]></OBJNAME>
    <OBJREM><![CDATA[Z_CARACT_141]]></OBJREM>
    <OBJID>295926o.jpgp</OBJID>
    </fields>
    Can someone tell me how I can make that kind of request
    "select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c WITHIN objname' , 1 ) > 0;"
    run faster ?

    Below are the execution plan for both the 2 requests :
    select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c WITHIN objname' , 1 ) > 0
    PLAN_TABLE_OUTPUT
    |     Id     | Operation                              |Name                         |Rows     |Bytes     |Cost (%CPU)|
    |     0     | SELECT STATEMENT                    |                              |1272     |119K     |     4     (0)     |
    |     1      | TABLE ACCESS BY INDEX ROWID     |ADSOBJ      |1272     |119K     |     4     (0)     |
    |     2      |     DOMAIN INDEX                    |ADSOBJ_XOBJFIELDURL     |          |          |     4     (0)     |
    Note
    - 'PLAN_TABLE' is old version
    Executed in 2 seconds
    select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c' , 1 ) > 0
    PLAN_TABLE_OUTPUT
    |     Id     |Operation                              |Name                         |Rows     |Bytes     |Cost (%CPU)|
    |     0     | SELECT STATEMENT                    |                              |1272     |119K     |     4     (0)     |
    |     1     | TABLE ACCESS BY INDEX ROWID     |ADSOBJ                         |1272     |119K     |     4     (0)     |
    |     2     | DOMAIN INDEX                    |ADSOBJ_XOBJFIELDURL     |          |          |     4     (0)     |
    Sorry for the result formatting, I can't get it "easily" readable :(

  • How to limit the number of search results returned by oracle text

    Hello All,
    I am running an oracle text search which returned the following error to my java program.
    ORA-20000: Oracle Text error:
    DRG-51030: wildcard query expansion resulted in too many terms
    #### ORA-29902: Fehler bei der Ausführung von Routine ODCIIndexStart()
    ORA-20000: Oracle Text error:
    DRG-51030: wildcard query expansion resulted in too many terms
    java.sql.SQLException: ORA-29902: Fehler bei der Ausführung von Routine ODCIIndexStart()
    ORA-20000: Oracle Text error:
    DRG-51030: wildcard query expansion resulted in too many terms
    at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:169)
    When i looked up the net, one suggestion that was given is to narrow the wildcard query, which i cannot in my search page. Hence I am left with the only alternative of limiting the number of results returned by oracle text search query.
    Please let me know how to limit the number of search results returned by oracle text so that this error can be avoided.
    Thanks in advance
    krk

    Hi,
    If not set explicitly, the default value for WILDCARD_MAXTERMS is 5000. This is set as a wordlist preference. This means that if your wildcard query matches more than 5000 terms the message is returned. Exceeding that would give the user a lot to sift through.
    My suggestion: trap the error and return a meaningful message to the user indicating that the search needs to be refined. Although it is possible to increase the number of terms before hitting maxterms (increase wildcard_maxterms preference value for your wordlist), 5000 records is generally too much for a user to deal with anyway. The search is not a good one since it is not restricting rows adequately. If it happens frequently, get the query log and see the terms that are being searched that generate this message. It may be necessary to add one or more words to a stoplist if they are too generic.
    Example: The word mortgage might be a great search term for a local business directory. It might be a terrible search term for a national directory of mortgage lenders though (since 99% of them have the term in their name). In the case of the national directory, that term would be a candidate for inclusion in the stoplist.
    Also remember that full terms do not need a wildcard. Search for "car %" is not necessary and will give you the error you mentioned. A search for "car" will yield the results you need even if it is only part of a bigger sentence because everything is based on the token. You may already know all of this, but without an example query I figured I'd mention it to be sure.
    As for limiting the results - the best way to do that is to allow the user to limit the results through their query. This will ensure accurate and more meaningful results for them.
    -Ron

  • Using oracle text in apex report search

    I am trying to use oracle text in apex, integrating it in an existing application. The idea is that it will allow to do a search in bigger textfields. Thats how I want it to get to work. In one of the oracle packaged applications oracle text is used as well, so I will have a look to that as well. I've addapted this search. I've added
    AND t. contains(oplossing, :P15_OPLOSSING)
    AND t.contains(sleutelwoorden, :P15_SLEUTELWOORDEN)
    That didn't work, so I changed those two to:
    AND t.oplossing = (t.contains(oplossing, :P15_OPLOSSING)>0)
    AND t.sleutelwoorden = (t.contains(sleutelwoorden, :P15_SLEUTELWOORDEN)>0)
    which didn't work either, which I expected to be the case. Clearly I'm not doing it correctly, I intend to look it up tonight in the packaged applications as I do want to findt it myself to.
    But does anyone can give a hint, on what I am doing wrong ?
    SELECT t.ticketid ticketnr, t.ticketid,
    g.voornaam||' '||g.naam aangemaaktdoor,
    t.credt, t.applicatiecd, t.titel,
    s.statusdefoms,
    si.statusdefoms instat,
    NVL2(t.toegekend,'Y','N') toegekend,
    sleutelwoorden, klantprioriteitid, oplossing, s.htmlkleur, si.htmlkleur inthtmlkleur
    FROM ticket t,
    gebruiker g,
    status s,
    status si
    WHERE t.gebruikerid = g.gebruikerid
    AND t.statusid = s.statusid
    AND t.statusinternid = si.statusid (+)
    AND t.applicatiecd = NVL(:P0_APPLICATIECD, :F101_APPLICATIECD)
    AND (t.categorieid = :P15_CATEGORIEID OR NVL(:P15_CATEGORIEID, 0) = 0)
    AND (t.moduleid = :P15_MODULEID OR NVL(:P15_MODULEID, 0) = 0)
    AND (t.statusid = :P15_STATUSID OR NVL(:P15_STATUSID, 0) = 0)
    AND (t.statusinternid = :P15_INTSTATUSID OR NVL(:P15_INTSTATUSID, 0) = 0)
    AND (t.versieid = :P15_VERSIEID OR NVL(:P15_VERSIEID, 0) = 0)
    AND t.ticketid LIKE '%'||:P15_TICKETID||'%'
    AND t.gebruikerid = DECODE(NVL(:P15_GEBRUIKERID,0), 0, t.gebruikerid, :P15_GEBRUIKERID)
    AND t.credt BETWEEN NVL(:P15_DATUMVAN, To_Date('01-01-1900', 'DD-MM-YYYY')) AND NVL(To_Date(:P15_DATUMTOT, 'DD-MM-YYYY'), sysdate) +1
    AND t.titel LIKE '%'||:P15_TITEL||'%'
    AND t. contains(oplossing, :P15_OPLOSSING)
    AND t.contains(sleutelwoorden, :P15_SLEUTELWOORDEN)
    AND PCK$Ticket_Admin.getklantid(t.gebruikerid) = DECODE(Pck$Ticket_Admin.isklantadminroleN(:APP_USER,NVL(:P0_APPLICATIECD, :F101_APPLICATIECD)), 1, PCK$Ticket_Admin.getklantid(:APP103_GEBRUIKERID), PCK$Ticket_Admin.getklantid(t.gebruikerid))
    AND (:APP103_GEBRUIKERID IN (t.voor_gebruikerid, t.gebruikerid)
    OR Pck$Ticket_Admin.isintern(:APP_USER,:P0_APPLICATIECD) = 1)
    changed to:
    AND t.oplossing = (t.contains(oplossing, :P15_OPLOSSING)>0)
    AND t.sleutelwoorden = (t.contains(sleutelwoorden, :P15_SLEUTELWOORDEN)>0)

    I have worked it further out now, and looked at the search of the packaged application. It turned out to be a pl/sql block . I used what I found in there to adapt the previous search. I added the following:
    OR (CONTAINS(t.oplossing, :P15_OPLOSSING)>0)
    OR (CONTAINS(t.sleutelwoorden, :P15_SLEUTELWOORDEN)>0)
         OR (CONTAINS(t.titel,:P15_SEARCH_T_O_S)>0 OR
         CONTAINS (t.oplossing, :P15_SEARCH_T_O_S)>0 OR
         CONTAINS(t.sleutelwoorden, :P15_SEARCH_T_O_S)>0 )
    OR (CONTAINS(t.titel,:P15_SEARCH_T_O_S)>0 AND
         CONTAINS (t.oplossing, :P15_SEARCH_T_O_S)>0 AND
         CONTAINS(t.sleutelwoorden, :P15_SEARCH_T_O_S)>0 )
    oplossing means solution
    sleutelwoorden means keywords
    titel means title
    Yet this doesn't work yet. It gives an error message:
    failed to parse SQL query:
    ORA-01719: outer join operator (+) not allowed in operand of OR or IN
    I've tried adding the addition in a different place, yet that gives the same error message. I'm not sure now.

  • Oracle iRecruitment: Keyword Search within Resumes using Oracle Text

    Dear All,
    As per my understanding (and Note: 247064.1) simple Keyword searches can be performed in iRecruitment if oracle Text is installed. However searching for Keywords within resumes is not possible using Oracle Text and is possible ONLY if Resume Parsing is enabled via a third party (non-oracle) service provider.
    Can you please let me know if my understanding is correct and if not provide further inputs on this.
    Thanks,
    Subrat

    Got this confirmation from Oracle via SR:
    Resume searching is independent of resume parsing and not required to search resumes.
    Oracle Text is the text engine that allows you to search documents using content-based queries. Oracle Text allows you to upload documents, search documents, parse resumes, etc.
    Hence to conclude - Installation of Oracle Text will allow Keyword Searches on resumes.
    Thanks,
    Subrat

  • Using Oracle Text for searching with UCM 10g

    I am using Oracle text with UCM 10gR3 and Site Studio 10gR4 and I am trying to sort the search results by relevancy and to also include a snippet of the retrieved document. I have the fields that the SS_GET_SEARCH_RESULTS service returns but the relevancy score is always equals 5 and the snippet contains characters such as &lt; idcnull, /p, etc., which you can see are XML/HTML/UCM tags but which result sin even more strangeness in the snippet if I try to remove them programmatically.
    I have read the Oracle Text documentation and there appear to be ways you can configure Oracle Text but I am not clear at all on what I can do from UCM. It looks like the configuration is either done in database tables or in the query itself, neither of which are readily configurable to me.
    Is anyone experienced in this or know of any documentation this might help?
    Bill

    Hi
    If I remember correctly then this issue was seen with an older version of OTS component and Core Update patch / bundle . Upgrade the UCM instance with the latest CS10gr35 update bundle patchset 6907073 and also upgrade OTS component from the same patchset .
    Let me know how it goes after this .
    Thanks
    Srinath

  • Highlite oracle text search terms

    I have a report that I set up using the instructions for Oracle Text Application in APEX. It works very well however I have the actual document as a link and I would like the search terms highlighted in the actual document. Is there a way to do that in APEX?
    I use this Region Source:
    select score(1) relevance, filename, dbms_lob.getlength("DOCUMENT") Document, code_id
    from documents
    where contains (document, :P10_SEARCH, 1) > 0
    order by 1 desc
    I read something about using ctx_doc.snippet to highlight but can get that to work.
    Any suggestions or can APEX highlight terms when the actual document is used?

    '8265490,
    Take a look at the ctx_doc.markup procedure. I think it will do what you want.
    http://download.oracle.com/docs/cd/B19306_01/text.102/b14217/view.htm#sthref599
    My home server is on a moving truck, so I can only point you to some old forum posts for examples:
    Re: Using Oracle Text with Apex
    Re: Use apex to display email
    Doug

  • How to configure what is displayed in srfDocSnippet (Oracle Text Search)?

    I am using UCM 10gR3 with SiteStudio 10gR4. I installed the Oracle Text Search component and it is working well. One of the requirements for a search results page is the specific layout of the data displayed. srfDocSnippet returns a really ugly display if the search result return is just metadata. I didn't find any documentation in UCM on how to configure this display.
    How to you configure what is displayed in srfDocSnippet?
    Thank you,
    Ken

    How To get the Summary from CIS on Document Objects (Doc ID 837740.1)
    What is the Closest Equivalant toVdksummary option In Oracle Full Text Search (Doc ID 787173.1)
    those notes on MOS may help.

Maybe you are looking for

  • SAP standard  AP reporting

    Dear All, We would like to have some information on AP reporting.Our requirement are as follows. 1.Ageing analysis not readily available, in one report.Due date analysis is available in the SAP standard report  S_ALR_87012078 but we are in need of li

  • Need help in formatting the report created using SSRS

    I have created an image and used it as the body background. But I want that image to come at right Set bodyRepeat as RepeatY.  Currently its showing as below              I need this blue image at the right most side of the report as below Also I nee

  • How do I get other tabs to automatically load my home screen, "google", instead of a blank?

    When I used a previous browser the tabs opened to the home page? Can this be done with Firefox and if so how? == This happened == Every time Firefox opened

  • Migration from weblogic 9.2 to 11.g

    Hi I am trying to migrate from weblogic 9.2 to 11.g and I am getting the follwoing error : 'weblogic.kernel.Default (self-tuning)'] - java.lang.NoSuchMethodError: org/apache/commons/pool/impl/GenericObjectPoolFactory.<init>(Lorg/apache/commons/pool/P

  • The quality of streaming a live recorded feed

    Hello, I'm asking this as a general post in regards to the quality of streaming video a flash based app provides. When you compair it to say the Flash Media Live Encoder, it's compression of the camera is terrible. Is there ever going to be something