Differences Oracle Text Soundex Search & Standar Soundex

Hi all,
I want to ask some question:
1. Is Oracle Text soundex searching using soundex matching algorithm invented
by Donald Knuth?
2. Why Oracle Text soundex search returns different results to a standard
soundex?
3. Can anybody describe how Oracle Text soundex searching process?
Thanx,
Robby

Hi Ron,
thank for your reply.
I've already read the thread and soundex matching algorithm invented by Donald Knuth.
but sorry i still don't understand about oracle soundex searching.
According to Knuth's algorithm the first letter is the important key to searching.
i.e with standard soundex a word "PEEL" will find "PILE" or "P???" and so on.
but with oracle text soundex search a word "PEEL" will find "PILE", "BEEL", "BELL", "FEEL", "VERE" etc.
Is oracle text soundex search not using Knuth's algorithm? if is then how the process work?
Thanks,
Robby

Similar Messages

  • Oracle Text - CTX Context Index Soundex Problem

    Hi,
    I'm running into a problem with Oracle Text when searching using the ! (soundex) option. I've created a simple test example to highlight the issue.
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit
    Windows 2008 Server 64-bit
    create table test_tab (test_col  varchar2(200));
    insert all
      into test_tab (test_col) values ('ab-tönes')
      into test_tab (test_col) values ('ab-tones')
      into test_tab (test_col) values ('abtones')
      into test_tab (test_col) values ('ab tones')
      into test_tab (test_col) values ('ab-tanes')
      select * from dual
    select * from test_tab
    begin
          ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
          ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&''');
          ctx_ddl.set_attribute('test_lex1','base_letter','YES');
          -- ctx_ddl.set_attribute('test_lex1','skipjoins','-');
    end;
    create index test_idx on test_tab (test_col)
      indextype is ctxsys.context
        parameters
          ('lexer        test_lex1'     
    select token_text from dr$test_idx$i;
    TOKEN_TEXT
    AB
    ABTONES
    TANES
    TONES
    select * from test_tab where contains (test_col, '!ab tones') > 0;
    TEST_COL
    ab-tönes
    ab-tones
    ab tones
    select * from test_tab where soundex(test_col) = soundex('ab tones');
    TEST_COL
    ab-tönes
    ab-tones
    abtones
    ab tones
    ab-tanes
    So my question is, can anyone suggest an approach whereby I can get the Oracle Text Context index (or CTXCAT index if it's more appropriate) to return all 5 rows like the simple Soundex is doing?
    I can't really use soundex as this search query will form part of a search screen for a multi-language application. Soundex is limited to English sounding words, so I need the solution to be able to compare strings that may not "sound" English.
    It must be an attribute of the BASIC_LEXER, and I've tried skipjoins, start/end-joins, stop lists, but I just cannot get the Soundex feature of Oracle Text to function like the SOUNDEX() function!
    Looking at how the tokens are stored dr$test_idx$i I need Oracle Text to almost concat 'AB' and 'TONES' to search as a single string.
    Any help greatly appreciated.
    Thanks,

    I am not getting the same problem that you are getting with the umlat, but I don't see what is different.  Please post the result of:
    select ctx_report.create_index_script ('test_idx') from dual;
    Here are the results on my system.  Perhaps you can spot the difference.  I added an empty_stoplist, so that it won't print out a long list of stopwords.
    SCOTT@orcl12c> create table test_tab (test_col    varchar2(200))
      2  /
    Table created.
    SCOTT@orcl12c> insert all
      2    into test_tab (test_col) values ('ab-tönes')
      3    into test_tab (test_col) values ('ab-tones')
      4    into test_tab (test_col) values ('abtones')
      5    into test_tab (test_col) values ('ab tones')
      6    into test_tab (test_col) values ('ab-tanes')
      7  select * from dual
      8  /
    5 rows created.
    SCOTT@orcl12c> select * from test_tab
      2  /
    TEST_COL
    ab-tönes
    ab-tones
    abtones
    ab tones
    ab-tanes
    5 rows selected.
    SCOTT@orcl12c> begin
      2    ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
      3    ctx_ddl.set_attribute('test_lex1','base_letter','YES');
      4  end;
      5  /
    PL/SQL procedure successfully completed.
    SCOTT@orcl12c> create or replace procedure test_proc
      2    (p_rowid in          rowid,
      3      p_clob    in out nocopy clob)
      4  as
      5  begin
      6    select replace (translate (test_col, '/\|-_+&''', '      '), ' ', '')
      7    into   p_clob
      8    from   test_tab
      9    where  rowid = p_rowid;
    10  end test_proc;
    11  /
    Procedure created.
    SCOTT@orcl12c> show errors
    No errors.
    SCOTT@orcl12c> begin
      2    ctx_ddl.create_preference ('test_ds', 'user_datastore');
      3    ctx_ddl.set_attribute ('test_ds', 'procedure', 'test_proc');
      4  end;
      5  /
    PL/SQL procedure successfully completed.
    SCOTT@orcl12c> create index test_idx on test_tab (test_col)
      2    indextype is ctxsys.context
      3    parameters
      4       ('lexer    test_lex1
      5         datastore    test_ds
      6         stoplist    ctxsys.empty_stoplist')
      7  /
    Index created.
    SCOTT@orcl12c> select token_text from dr$test_idx$i
      2  /
    TOKEN_TEXT
    ABTANES
    ABTONES
    2 rows selected.
    SCOTT@orcl12c> variable search_string varchar2(100)
    SCOTT@orcl12c> exec :search_string := 'ab tones'
    PL/SQL procedure successfully completed.
    SCOTT@orcl12c> select * from test_tab
      2  where  contains
      3            (test_col,
      4             '!' || replace (:search_string, ' ', ' !') ||
      5             ' or !' || replace (:search_string, ' ', '')) > 0
      6  /
    TEST_COL
    ab-tönes
    ab-tones
    abtones
    ab tones
    ab-tanes
    5 rows selected.
    SCOTT@orcl12c> exec :search_string := 'abtones'
    PL/SQL procedure successfully completed.
    SCOTT@orcl12c> /
    TEST_COL
    ab-tönes
    ab-tones
    abtones
    ab tones
    ab-tanes
    5 rows selected.
    SCOTT@orcl12c> exec :search_string := 'ab tönes'
    PL/SQL procedure successfully completed.
    SCOTT@orcl12c> /
    TEST_COL
    ab-tönes
    ab-tones
    abtones
    ab tones
    ab-tanes
    5 rows selected.
    SCOTT@orcl12c> select ctx_report.create_index_script ('test_idx') from dual
      2  /
    CTX_REPORT.CREATE_INDEX_SCRIPT('TEST_IDX')
    begin
      ctx_ddl.create_preference('"TEST_IDX_DST"','USER_DATASTORE');
      ctx_ddl.set_attribute('"TEST_IDX_DST"','PROCEDURE','"SCOTT"."TEST_PROC"');
    end;
    begin
      ctx_ddl.create_preference('"TEST_IDX_FIL"','NULL_FILTER');
    end;
    begin
      ctx_ddl.create_section_group('"TEST_IDX_SGP"','NULL_SECTION_GROUP');
    end;
    begin
      ctx_ddl.create_preference('"TEST_IDX_LEX"','BASIC_LEXER');
      ctx_ddl.set_attribute('"TEST_IDX_LEX"','BASE_LETTER','YES');
    end;
    begin
      ctx_ddl.create_preference('"TEST_IDX_WDL"','BASIC_WORDLIST');
      ctx_ddl.set_attribute('"TEST_IDX_WDL"','STEMMER','ENGLISH');
      ctx_ddl.set_attribute('"TEST_IDX_WDL"','FUZZY_MATCH','GENERIC');
    end;
    begin
      ctx_ddl.create_stoplist('"TEST_IDX_SPL"','BASIC_STOPLIST');
    end;
    begin
      ctx_ddl.create_preference('"TEST_IDX_STO"','BASIC_STORAGE');
      ctx_ddl.set_attribute('"TEST_IDX_STO"','R_TABLE_CLAUSE','lob (data) store as (
    cache)');
      ctx_ddl.set_attribute('"TEST_IDX_STO"','I_INDEX_CLAUSE','compress 2');
    end;
    begin
      ctx_output.start_log('TEST_IDX_LOG');
    end;
    create index "SCOTT"."TEST_IDX"
      on "SCOTT"."TEST_TAB"
          ("TEST_COL")
      indextype is ctxsys.context
      parameters('
        datastore       "TEST_IDX_DST"
        filter          "TEST_IDX_FIL"
        section group   "TEST_IDX_SGP"
        lexer           "TEST_IDX_LEX"
        wordlist        "TEST_IDX_WDL"
        stoplist        "TEST_IDX_SPL"
        storage         "TEST_IDX_STO"
    begin
      ctx_output.end_log;
    end;
    1 row selected.

  • Using Oracle Text to search through WORD, EXCEL and PDF documents

    Hello again,
    What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
    Thanks
    Doug

    Yes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
    Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
    http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
    begin example.
    -- The following needs to be executed
    -- as sys.
    DROP DIRECTORY docs_dir;
    CREATE OR REPLACE DIRECTORY docs_dir
    AS 'C:\sql\oracle_text\documents';
    GRANT READ ON DIRECTORY docs_dir TO text;
    -- End sys ran SQL
    DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
    CREATE TABLE db_docs (
    id NUMBER,
    format VARCHAR2(10),
    location VARCHAR2(50),
    document BLOB,
    CONSTRAINT i_db_docs_p PRIMARY KEY(id)
    -- Several notes need to be made about this anonymous block.
    -- First the 'DOCS_DIR' parameter is a directory object name.
    -- This directory object name must be in upper case.
    DECLARE
    f_lob BFILE;
    b_lob BLOB;
    document_name VARCHAR2(50);
    BEGIN
    document_name := 'externaltables.doc';
    INSERT INTO db_docs
    VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
    RETURN document INTO b_lob;
    f_lob := BFILENAME('DOCS_DIR', document_name);
    DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
    DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
    DBMS_LOB.FILECLOSE(f_lob);
    COMMIT;
    END;
    -- build the index
    -- Note that this index differs than the file system stored file
    -- in that paramter datastore is ctxsys.defautl_datastore and not
    -- ctxsys.file_datastore. FILE_DATASTORE is for documents that
    -- exist on the file system. DEFAULT_DATASTORE is for documents
    -- that are stored in the column.
    create index db_docs_ctx on db_docs(document)
    indextype is ctxsys.context
    parameters (
    'datastore ctxsys.default_datastore
    filter ctxsys.inso_filter
    format column format');
    --search for something that is known to not be in the document.
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
    --search for something that is known to be in the document.  
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Albright', 1) > 0;

  • Using Oracle Text for searching with UCM 10g

    I am using Oracle text with UCM 10gR3 and Site Studio 10gR4 and I am trying to sort the search results by relevancy and to also include a snippet of the retrieved document. I have the fields that the SS_GET_SEARCH_RESULTS service returns but the relevancy score is always equals 5 and the snippet contains characters such as < idcnull, /p, etc., which you can see are XML/HTML/UCM tags but which result sin even more strangeness in the snippet if I try to remove them programmatically.
    I have read the Oracle Text documentation and there appear to be ways you can configure Oracle Text but I am not clear at all on what I can do from UCM. It looks like the configuration is either done in database tables or in the query itself, neither of which are readily configurable to me.
    Is anyone experienced in this or know of any documentation this might help?
    Bill

    Hi
    If I remember correctly then this issue was seen with an older version of OTS component and Core Update patch / bundle . Upgrade the UCM instance with the latest CS10gr35 update bundle patchset 6907073 and also upgrade OTS component from the same patchset .
    Let me know how it goes after this .
    Thanks
    Srinath

  • Oracle Text Web Search

    Can someone point me to some tutorials for a simple web search app using oracle text?
    Thanks in advance,
    New-B

    Check the Oracle Text Application Developers Guide Appendix for an example. The guide is available from tahiti.oracle.com

  • Oracle10g: Oracle text & Ultra search??

    Dear All,
    Can anyone explain or direct me to any data which describes how powerful the Oracle full text search engine is??
    Also can i install Oracle Full text search on a separate server? If yes, how can I failsafe it??
    please reply as this is urgent.
    Regards
    Mandeep

    1: http://www.oracle.com/technology/products/text/index.html
    2: What do you mean with "Oracle Full text search"?

  • Index document with Oracle Text from an ECM without saving the content

    Hi,
    I have documents in a ECM (Alfresco, UCM and more) and I would like Oracle Text to index the document without saving the content. I want to save space and not have redundant information. I would use Oracle Text to search for document's identification (ID) and fetch the document from the ECM using the ID.
    Is it possible ?
    Do I have to use Secure Enterprise Search ?
    Thanks
    Simon

    I want to save space and not have redundant information.The database space or the disk space (in OS)?
    If it the database space, it is not possible to index/serach without storing the file conetents.
    using , FILE_DATASTORE you can save the file in the disk (OS) and index them.
    When you remove the file, you need to re-index it.
    I donot see any other ways.
    Do I have to use Secure Enterprise Search ?SES also uses Oracle Text as its base. It also uses FILE_DATASTORE. But the re-indexing part is automated using crawlers.

  • Oracle text query

    Hi,
    I have a View object with various attributes (eg, name1, name2, name3, address1, address2, address3 etc). A query/table component based on this view object works just fine. However, I wish to replace name1, name2, name3 and other attributes in the query with just 'name'. These attributes are still to be shown in the result table. This new 'name' attribute will be used in an Oracle Text query clause, instead of individual searches on each attribute.
    My plan was to simply make the various name1, name2 etc attributes non-'queryable' in the View def to hide them from the query. Then I'd add a transient 'name' attribute. My hope was, that I could override the getWhereClause() in the ViewObjectImpl and simply tack on the oracle text clause to the WHERE (example below):
    WHERE CONTAINS (
    SOMECOLUMN,
                '<query>
       <textquery lang="ENGLISH" grammar="CONTEXT">TRANSIENT_ATTR_VALUE
    ..... Oracle Text query grammar stuff here ....  </query>') > 0How do I access the transient value in the ViewObjectImpl to add the above SQL? Or am I going about this in completely the wrong way?
    thanks,
    Barry.

    Based on what I found in
    http://www.oracle.com/technology/oramag/oracle/09-nov/o69frame.html?_template=/ocom/print
    and
    http://blogs.oracle.com/smuenchadf/examples/
    136.     Introducing a Checkbox to Toggle a Custom SQL Predicate on an LOV's Search Form. [11.1.1.0.0] 19-NOV-2008
    I have the following implementation, which seems to work. Does anyone see any problems with this?
    With regard to SQL injection, does ViewCriteriaItem sanitise the 'val' from the query, or should I do that manually here myself?
        @Override
        public java.lang.String getCriteriaItemClause(ViewCriteriaItem vci) {
            if ("OraTextTransientAttrib".equals(vci.getAttributeDef().getName())) {
                if (vci.getViewCriteria().isCriteriaForQuery()) {
                    String val = (String)vci.getValue();
                    logger.debug("Doing oracle text name search on '" + val + "'");
                    // simplified version of my oracle text query
                    return "CONTAINS ('<query>..... " + val + "....</query>') > 0 ";
                } else {
                    // SQL predicate for no changes to the results
                    // spaces needed if you have several of these blocks
                    return " 1=1 ";
            // other blocks for other similar oracle text attribs
            return super.getCriteriaItemClause(vci);
        }

  • Oracle text for varchar2

    Hi ...
    can i use oracle Text for searching in varchar2 field ....
    IF yes , plz give me the details ....
    Thanks ....

    SELECT OD OID, TAB Layer, COLUM Field, TEX Result,
    score(22) Score FROM VIEW_MASTER
    WHERE CONTAINS ( TEXT_VALUE, SEARCH_TERMS, 22 ) > 0
    ORDER BY Score;
    The search_terms are an inbound parameter. Not sure
    what the 22 does, i think its just an alias name. I
    don't know what the score coming back means.
    Sometimes I get 16, sometimes 12, sometimes 7.
    I could use some help on this myself.Yes, 22 is just an alias. You can use any number here since it is just a label which is used to correlate the CONTAINS function with its corresponding SCORE function.
    The details of how the score is computed are available in the Oracle Text Reference book, Appendix "F The Oracle Text Scoring Algorithm".
    Faisal

  • Oracle Text with Hibernate & Spring.

    Hi,
    I am looking for some code samples of Oracle text based search using Hibernate & Spring. Can the three of these technologies be used in a J2EE application.
    --Irshad.                                                                                                                                                                                                                                                                                                                                                               

    TimesTen doesn't support the CONTEXT indextype or CONTAINS clause (or other domain indexes/operators), so you can't create Oracle Text indexes in it.

  • Beginning Oracle Text...

    Could someone perhaps point to a good online source of basic information about how to USE oracle text in searches?
    I'm specifically looking for information about how to do searches like {woman NOT man}, or whether "woman" will select "women" or whether "$woman" will select "man" and so on. What switches are there to control what is searched for? What booleans are allowed and how must they be presented, and so on.
    I'm doing OK with the official oracle documentation, but something snappier and abstracted would be good to find!
    Any good book recommendations would be appreciated, too. (Especially since doing a search at Amazon for "oracle text" brings up a lot of textbooks about Oracle, but not many obviously about the specific database feature!)
    In the meantime, could someone answer a simple question I've not been able to find a simple answer to so far? Can Oracle text do 'NOT' searches? (As in 'man not boy')?

    Most of what you are asking about is covered in the section of the Text Reference on Contains Query Operators:
    http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/cqoper.htm#CCREF0300
    Here are some examples regarding the specific questions you asked:
    SCOTT@orcl_11g> CREATE TABLE test_tab (test_col VARCHAR2 (60))
      2  /
    Table created.
    SCOTT@orcl_11g> INSERT ALL
      2  INTO test_tab (test_col) VALUES ('woman')
      3  INTO test_tab (test_col) VALUES ('man woman')
      4  INTO test_tab (test_col) VALUES ('women')
      5  INTO test_tab (test_col) VALUES ('men women')
      6  INTO test_tab (test_col) VALUES ('man boy')
      7  INTO test_tab (test_col) VALUES ('man')
      8  SELECT * FROM DUAL
      9  /
    6 rows created.
    SCOTT@orcl_11g> CREATE INDEX test_idx ON test_tab (test_col) INDEXTYPE IS CTXSYS.CONTEXT
      2  /
    Index created.
    SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'woman NOT man') > 0
      2  /
    TEST_COL
    woman
    SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'woman') > 0
      2  /
    TEST_COL
    woman
    man woman
    SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, '$woman') > 0
      2  /
    TEST_COL
    woman
    man woman
    women
    men women
    SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'man NOT boy') > 0
      2  /
    TEST_COL
    man woman
    man
    SCOTT@orcl_11g>

  • Oracle Text:Problems in starting

    hi all
    i am working on Oracle 10g in windows and i want to do Text Mining,but i am having some problems.when i use the JDeveloper and start the text wizard it createa a jsp file but it is not loading properly.is there any document from which i can learn how to do it.i think i missed out some configurations for the http server.i really need it very soon.
    thanks in advance

    hello
    i had followed the tutorial "Building JSP Applications that Use Oracle Text to Search Content in the Database using JDeveloper" and using that now i am able to load the jsp file but cant search as it shows the following error.
    The requested method POST is not allowed for the URL /textsearch/mysearch.jsp.
    i dont know what is the problem.can u help me.i am using Oracle 10g and i have added a Alias in the httpd.conf instead of the ojsp.conf file that has been mentioned in the tutorial as there is no such file in 10g.can u help me

  • Differences between oracle text in enterprise and express editions

    are there any differences between oracle text features found in express edition and ent. edition. if so what are they?

    There is a list of features available in the online documentation. The only thing that it mentions as being missing from Oracle Text are the english and french knowledge bases. There isn't any Intermedia or Ultra Search or Data Mining. Here is a link to the 10g Express Edition features guide:
    http://download-west.oracle.com/docs/cd/B25329_01/doc/license.102/b25456/toc.htm#BABDDIAE
    There is a separate discussion group for 10g Express Edition, that requires a separate free registration:
    Oracle Database Express Edition (XE)
    There is a thread on this subject in the 10g Express discussion group:
    Oracle Text
    The above thread includes the following quick comparison:
    If it's not in Standard Edition One, it is not in Express Edition. (Therefore everything that is in Enterprise but not in Standard Edition, is also not in Express Edition. That includes all EE-only options.)
    If it requires Java in the database, it is not in Express Edition.
    Other than that, it's mainly the size limits: 1 CPU (no parallel processing), 1 GB RAM, and 4 GB user-related tablespaces.

  • Difference between Oracle Text and XML DB?

    We are currently storing XML's in Oracle text. I understand that XML DB is faster to retrieve XML's based on conditional search within XML.
    Is there a place I could find the difference between these two?

    Text offer xpath like searching, assuming that you don't need to worry about little things like namespaces :). Oracle XML DB offers native XML storage, indexing and searching fully compliant with rellevant XML standards.

  • Oracle Secure enterprise Search versus Oracle Text

    I'm involeved in a project where we're using Oracle text for its text search capability. Yesterday during a meeting Oracles Secure Enterprise search engine came up. I see similar functionality offered in both products - Oracle text comes with 10g - not sure if SES comes with additional cost. Has anyone done analysis on why one would implement one over the other - I understand that SES gives the customer a federated option and some internet search capabilities but since I'm not concerned with that for this project does it make a difference?

    SES is a complete seaerch application with connectors to many different data sources, such as email systems and document management systems.
    Oracle Text, on the other hand, is a toolkit for building applications (and is used as such by SES).
    Oracle Text comes free with the database. SES is chargable, but comes with a free database (though it's restricted to use by SES only!)
    Generally speaking, if your data is in the database and you want fine control over how to search it, Oracle Text is a better option.
    If your data is scattered around diverse enterprise sources, and you want a ready-built application to collect, index and search that data, SES is the proper choice.
    Here's a slide from my OpenWorld presentation, which I guess says much the same thing:
    Oracle Text is the toolkit and platform for building sophisticated Information Retrieval applications and services
    - Fine control over indexes, partitioning, etc
    Oracle Secure Enterprise Search is a stand-alone application built on the foundation of Oracle Text
    - Includes its own database
    - No programming needed
    - Includes crawlers and an end-user UI

Maybe you are looking for