Differences Oracle Text Soundex Search & Standar Soundex
Hi all,
I want to ask some question:
1. Is Oracle Text soundex searching using soundex matching algorithm invented
by Donald Knuth?
2. Why Oracle Text soundex search returns different results to a standard
soundex?
3. Can anybody describe how Oracle Text soundex searching process?
Thanx,
Robby
Hi Ron,
thank for your reply.
I've already read the thread and soundex matching algorithm invented by Donald Knuth.
but sorry i still don't understand about oracle soundex searching.
According to Knuth's algorithm the first letter is the important key to searching.
i.e with standard soundex a word "PEEL" will find "PILE" or "P???" and so on.
but with oracle text soundex search a word "PEEL" will find "PILE", "BEEL", "BELL", "FEEL", "VERE" etc.
Is oracle text soundex search not using Knuth's algorithm? if is then how the process work?
Thanks,
Robby
Similar Messages
-
Oracle Text - CTX Context Index Soundex Problem
Hi,
I'm running into a problem with Oracle Text when searching using the ! (soundex) option. I've created a simple test example to highlight the issue.
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit
Windows 2008 Server 64-bit
create table test_tab (test_col varchar2(200));
insert all
into test_tab (test_col) values ('ab-tönes')
into test_tab (test_col) values ('ab-tones')
into test_tab (test_col) values ('abtones')
into test_tab (test_col) values ('ab tones')
into test_tab (test_col) values ('ab-tanes')
select * from dual
select * from test_tab
begin
ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&''');
ctx_ddl.set_attribute('test_lex1','base_letter','YES');
-- ctx_ddl.set_attribute('test_lex1','skipjoins','-');
end;
create index test_idx on test_tab (test_col)
indextype is ctxsys.context
parameters
('lexer test_lex1'
select token_text from dr$test_idx$i;
TOKEN_TEXT
AB
ABTONES
TANES
TONES
select * from test_tab where contains (test_col, '!ab tones') > 0;
TEST_COL
ab-tönes
ab-tones
ab tones
select * from test_tab where soundex(test_col) = soundex('ab tones');
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
So my question is, can anyone suggest an approach whereby I can get the Oracle Text Context index (or CTXCAT index if it's more appropriate) to return all 5 rows like the simple Soundex is doing?
I can't really use soundex as this search query will form part of a search screen for a multi-language application. Soundex is limited to English sounding words, so I need the solution to be able to compare strings that may not "sound" English.
It must be an attribute of the BASIC_LEXER, and I've tried skipjoins, start/end-joins, stop lists, but I just cannot get the Soundex feature of Oracle Text to function like the SOUNDEX() function!
Looking at how the tokens are stored dr$test_idx$i I need Oracle Text to almost concat 'AB' and 'TONES' to search as a single string.
Any help greatly appreciated.
Thanks,I am not getting the same problem that you are getting with the umlat, but I don't see what is different. Please post the result of:
select ctx_report.create_index_script ('test_idx') from dual;
Here are the results on my system. Perhaps you can spot the difference. I added an empty_stoplist, so that it won't print out a long list of stopwords.
SCOTT@orcl12c> create table test_tab (test_col varchar2(200))
2 /
Table created.
SCOTT@orcl12c> insert all
2 into test_tab (test_col) values ('ab-tönes')
3 into test_tab (test_col) values ('ab-tones')
4 into test_tab (test_col) values ('abtones')
5 into test_tab (test_col) values ('ab tones')
6 into test_tab (test_col) values ('ab-tanes')
7 select * from dual
8 /
5 rows created.
SCOTT@orcl12c> select * from test_tab
2 /
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
5 rows selected.
SCOTT@orcl12c> begin
2 ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
3 ctx_ddl.set_attribute('test_lex1','base_letter','YES');
4 end;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> create or replace procedure test_proc
2 (p_rowid in rowid,
3 p_clob in out nocopy clob)
4 as
5 begin
6 select replace (translate (test_col, '/\|-_+&''', ' '), ' ', '')
7 into p_clob
8 from test_tab
9 where rowid = p_rowid;
10 end test_proc;
11 /
Procedure created.
SCOTT@orcl12c> show errors
No errors.
SCOTT@orcl12c> begin
2 ctx_ddl.create_preference ('test_ds', 'user_datastore');
3 ctx_ddl.set_attribute ('test_ds', 'procedure', 'test_proc');
4 end;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> create index test_idx on test_tab (test_col)
2 indextype is ctxsys.context
3 parameters
4 ('lexer test_lex1
5 datastore test_ds
6 stoplist ctxsys.empty_stoplist')
7 /
Index created.
SCOTT@orcl12c> select token_text from dr$test_idx$i
2 /
TOKEN_TEXT
ABTANES
ABTONES
2 rows selected.
SCOTT@orcl12c> variable search_string varchar2(100)
SCOTT@orcl12c> exec :search_string := 'ab tones'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> select * from test_tab
2 where contains
3 (test_col,
4 '!' || replace (:search_string, ' ', ' !') ||
5 ' or !' || replace (:search_string, ' ', '')) > 0
6 /
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
5 rows selected.
SCOTT@orcl12c> exec :search_string := 'abtones'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> /
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
5 rows selected.
SCOTT@orcl12c> exec :search_string := 'ab tönes'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> /
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
5 rows selected.
SCOTT@orcl12c> select ctx_report.create_index_script ('test_idx') from dual
2 /
CTX_REPORT.CREATE_INDEX_SCRIPT('TEST_IDX')
begin
ctx_ddl.create_preference('"TEST_IDX_DST"','USER_DATASTORE');
ctx_ddl.set_attribute('"TEST_IDX_DST"','PROCEDURE','"SCOTT"."TEST_PROC"');
end;
begin
ctx_ddl.create_preference('"TEST_IDX_FIL"','NULL_FILTER');
end;
begin
ctx_ddl.create_section_group('"TEST_IDX_SGP"','NULL_SECTION_GROUP');
end;
begin
ctx_ddl.create_preference('"TEST_IDX_LEX"','BASIC_LEXER');
ctx_ddl.set_attribute('"TEST_IDX_LEX"','BASE_LETTER','YES');
end;
begin
ctx_ddl.create_preference('"TEST_IDX_WDL"','BASIC_WORDLIST');
ctx_ddl.set_attribute('"TEST_IDX_WDL"','STEMMER','ENGLISH');
ctx_ddl.set_attribute('"TEST_IDX_WDL"','FUZZY_MATCH','GENERIC');
end;
begin
ctx_ddl.create_stoplist('"TEST_IDX_SPL"','BASIC_STOPLIST');
end;
begin
ctx_ddl.create_preference('"TEST_IDX_STO"','BASIC_STORAGE');
ctx_ddl.set_attribute('"TEST_IDX_STO"','R_TABLE_CLAUSE','lob (data) store as (
cache)');
ctx_ddl.set_attribute('"TEST_IDX_STO"','I_INDEX_CLAUSE','compress 2');
end;
begin
ctx_output.start_log('TEST_IDX_LOG');
end;
create index "SCOTT"."TEST_IDX"
on "SCOTT"."TEST_TAB"
("TEST_COL")
indextype is ctxsys.context
parameters('
datastore "TEST_IDX_DST"
filter "TEST_IDX_FIL"
section group "TEST_IDX_SGP"
lexer "TEST_IDX_LEX"
wordlist "TEST_IDX_WDL"
stoplist "TEST_IDX_SPL"
storage "TEST_IDX_STO"
begin
ctx_output.end_log;
end;
1 row selected. -
Using Oracle Text to search through WORD, EXCEL and PDF documents
Hello again,
What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
Thanks
DougYes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
begin example.
-- The following needs to be executed
-- as sys.
DROP DIRECTORY docs_dir;
CREATE OR REPLACE DIRECTORY docs_dir
AS 'C:\sql\oracle_text\documents';
GRANT READ ON DIRECTORY docs_dir TO text;
-- End sys ran SQL
DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
CREATE TABLE db_docs (
id NUMBER,
format VARCHAR2(10),
location VARCHAR2(50),
document BLOB,
CONSTRAINT i_db_docs_p PRIMARY KEY(id)
-- Several notes need to be made about this anonymous block.
-- First the 'DOCS_DIR' parameter is a directory object name.
-- This directory object name must be in upper case.
DECLARE
f_lob BFILE;
b_lob BLOB;
document_name VARCHAR2(50);
BEGIN
document_name := 'externaltables.doc';
INSERT INTO db_docs
VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
RETURN document INTO b_lob;
f_lob := BFILENAME('DOCS_DIR', document_name);
DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
DBMS_LOB.FILECLOSE(f_lob);
COMMIT;
END;
-- build the index
-- Note that this index differs than the file system stored file
-- in that paramter datastore is ctxsys.defautl_datastore and not
-- ctxsys.file_datastore. FILE_DATASTORE is for documents that
-- exist on the file system. DEFAULT_DATASTORE is for documents
-- that are stored in the column.
create index db_docs_ctx on db_docs(document)
indextype is ctxsys.context
parameters (
'datastore ctxsys.default_datastore
filter ctxsys.inso_filter
format column format');
--search for something that is known to not be in the document.
SELECT SCORE(1), id, location
FROM db_docs
WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
--search for something that is known to be in the document.
SELECT SCORE(1), id, location
FROM db_docs
WHERE CONTAINS(document, 'Albright', 1) > 0; -
Using Oracle Text for searching with UCM 10g
I am using Oracle text with UCM 10gR3 and Site Studio 10gR4 and I am trying to sort the search results by relevancy and to also include a snippet of the retrieved document. I have the fields that the SS_GET_SEARCH_RESULTS service returns but the relevancy score is always equals 5 and the snippet contains characters such as < idcnull, /p, etc., which you can see are XML/HTML/UCM tags but which result sin even more strangeness in the snippet if I try to remove them programmatically.
I have read the Oracle Text documentation and there appear to be ways you can configure Oracle Text but I am not clear at all on what I can do from UCM. It looks like the configuration is either done in database tables or in the query itself, neither of which are readily configurable to me.
Is anyone experienced in this or know of any documentation this might help?
BillHi
If I remember correctly then this issue was seen with an older version of OTS component and Core Update patch / bundle . Upgrade the UCM instance with the latest CS10gr35 update bundle patchset 6907073 and also upgrade OTS component from the same patchset .
Let me know how it goes after this .
Thanks
Srinath -
Can someone point me to some tutorials for a simple web search app using oracle text?
Thanks in advance,
New-BCheck the Oracle Text Application Developers Guide Appendix for an example. The guide is available from tahiti.oracle.com
-
Oracle10g: Oracle text & Ultra search??
Dear All,
Can anyone explain or direct me to any data which describes how powerful the Oracle full text search engine is??
Also can i install Oracle Full text search on a separate server? If yes, how can I failsafe it??
please reply as this is urgent.
Regards
Mandeep1: http://www.oracle.com/technology/products/text/index.html
2: What do you mean with "Oracle Full text search"? -
Index document with Oracle Text from an ECM without saving the content
Hi,
I have documents in a ECM (Alfresco, UCM and more) and I would like Oracle Text to index the document without saving the content. I want to save space and not have redundant information. I would use Oracle Text to search for document's identification (ID) and fetch the document from the ECM using the ID.
Is it possible ?
Do I have to use Secure Enterprise Search ?
Thanks
SimonI want to save space and not have redundant information.The database space or the disk space (in OS)?
If it the database space, it is not possible to index/serach without storing the file conetents.
using , FILE_DATASTORE you can save the file in the disk (OS) and index them.
When you remove the file, you need to re-index it.
I donot see any other ways.
Do I have to use Secure Enterprise Search ?SES also uses Oracle Text as its base. It also uses FILE_DATASTORE. But the re-indexing part is automated using crawlers. -
Hi,
I have a View object with various attributes (eg, name1, name2, name3, address1, address2, address3 etc). A query/table component based on this view object works just fine. However, I wish to replace name1, name2, name3 and other attributes in the query with just 'name'. These attributes are still to be shown in the result table. This new 'name' attribute will be used in an Oracle Text query clause, instead of individual searches on each attribute.
My plan was to simply make the various name1, name2 etc attributes non-'queryable' in the View def to hide them from the query. Then I'd add a transient 'name' attribute. My hope was, that I could override the getWhereClause() in the ViewObjectImpl and simply tack on the oracle text clause to the WHERE (example below):
WHERE CONTAINS (
SOMECOLUMN,
'<query>
<textquery lang="ENGLISH" grammar="CONTEXT">TRANSIENT_ATTR_VALUE
..... Oracle Text query grammar stuff here .... </query>') > 0How do I access the transient value in the ViewObjectImpl to add the above SQL? Or am I going about this in completely the wrong way?
thanks,
Barry.Based on what I found in
http://www.oracle.com/technology/oramag/oracle/09-nov/o69frame.html?_template=/ocom/print
and
http://blogs.oracle.com/smuenchadf/examples/
136. Introducing a Checkbox to Toggle a Custom SQL Predicate on an LOV's Search Form. [11.1.1.0.0] 19-NOV-2008
I have the following implementation, which seems to work. Does anyone see any problems with this?
With regard to SQL injection, does ViewCriteriaItem sanitise the 'val' from the query, or should I do that manually here myself?
@Override
public java.lang.String getCriteriaItemClause(ViewCriteriaItem vci) {
if ("OraTextTransientAttrib".equals(vci.getAttributeDef().getName())) {
if (vci.getViewCriteria().isCriteriaForQuery()) {
String val = (String)vci.getValue();
logger.debug("Doing oracle text name search on '" + val + "'");
// simplified version of my oracle text query
return "CONTAINS ('<query>..... " + val + "....</query>') > 0 ";
} else {
// SQL predicate for no changes to the results
// spaces needed if you have several of these blocks
return " 1=1 ";
// other blocks for other similar oracle text attribs
return super.getCriteriaItemClause(vci);
} -
Hi ...
can i use oracle Text for searching in varchar2 field ....
IF yes , plz give me the details ....
Thanks ....SELECT OD OID, TAB Layer, COLUM Field, TEX Result,
score(22) Score FROM VIEW_MASTER
WHERE CONTAINS ( TEXT_VALUE, SEARCH_TERMS, 22 ) > 0
ORDER BY Score;
The search_terms are an inbound parameter. Not sure
what the 22 does, i think its just an alias name. I
don't know what the score coming back means.
Sometimes I get 16, sometimes 12, sometimes 7.
I could use some help on this myself.Yes, 22 is just an alias. You can use any number here since it is just a label which is used to correlate the CONTAINS function with its corresponding SCORE function.
The details of how the score is computed are available in the Oracle Text Reference book, Appendix "F The Oracle Text Scoring Algorithm".
Faisal -
Oracle Text with Hibernate & Spring.
Hi,
I am looking for some code samples of Oracle text based search using Hibernate & Spring. Can the three of these technologies be used in a J2EE application.
--Irshad.TimesTen doesn't support the CONTEXT indextype or CONTAINS clause (or other domain indexes/operators), so you can't create Oracle Text indexes in it.
-
Beginning Oracle Text...
Could someone perhaps point to a good online source of basic information about how to USE oracle text in searches?
I'm specifically looking for information about how to do searches like {woman NOT man}, or whether "woman" will select "women" or whether "$woman" will select "man" and so on. What switches are there to control what is searched for? What booleans are allowed and how must they be presented, and so on.
I'm doing OK with the official oracle documentation, but something snappier and abstracted would be good to find!
Any good book recommendations would be appreciated, too. (Especially since doing a search at Amazon for "oracle text" brings up a lot of textbooks about Oracle, but not many obviously about the specific database feature!)
In the meantime, could someone answer a simple question I've not been able to find a simple answer to so far? Can Oracle text do 'NOT' searches? (As in 'man not boy')?Most of what you are asking about is covered in the section of the Text Reference on Contains Query Operators:
http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/cqoper.htm#CCREF0300
Here are some examples regarding the specific questions you asked:
SCOTT@orcl_11g> CREATE TABLE test_tab (test_col VARCHAR2 (60))
2 /
Table created.
SCOTT@orcl_11g> INSERT ALL
2 INTO test_tab (test_col) VALUES ('woman')
3 INTO test_tab (test_col) VALUES ('man woman')
4 INTO test_tab (test_col) VALUES ('women')
5 INTO test_tab (test_col) VALUES ('men women')
6 INTO test_tab (test_col) VALUES ('man boy')
7 INTO test_tab (test_col) VALUES ('man')
8 SELECT * FROM DUAL
9 /
6 rows created.
SCOTT@orcl_11g> CREATE INDEX test_idx ON test_tab (test_col) INDEXTYPE IS CTXSYS.CONTEXT
2 /
Index created.
SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'woman NOT man') > 0
2 /
TEST_COL
woman
SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'woman') > 0
2 /
TEST_COL
woman
man woman
SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, '$woman') > 0
2 /
TEST_COL
woman
man woman
women
men women
SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'man NOT boy') > 0
2 /
TEST_COL
man woman
man
SCOTT@orcl_11g> -
Oracle Text:Problems in starting
hi all
i am working on Oracle 10g in windows and i want to do Text Mining,but i am having some problems.when i use the JDeveloper and start the text wizard it createa a jsp file but it is not loading properly.is there any document from which i can learn how to do it.i think i missed out some configurations for the http server.i really need it very soon.
thanks in advancehello
i had followed the tutorial "Building JSP Applications that Use Oracle Text to Search Content in the Database using JDeveloper" and using that now i am able to load the jsp file but cant search as it shows the following error.
The requested method POST is not allowed for the URL /textsearch/mysearch.jsp.
i dont know what is the problem.can u help me.i am using Oracle 10g and i have added a Alias in the httpd.conf instead of the ojsp.conf file that has been mentioned in the tutorial as there is no such file in 10g.can u help me -
Differences between oracle text in enterprise and express editions
are there any differences between oracle text features found in express edition and ent. edition. if so what are they?
There is a list of features available in the online documentation. The only thing that it mentions as being missing from Oracle Text are the english and french knowledge bases. There isn't any Intermedia or Ultra Search or Data Mining. Here is a link to the 10g Express Edition features guide:
http://download-west.oracle.com/docs/cd/B25329_01/doc/license.102/b25456/toc.htm#BABDDIAE
There is a separate discussion group for 10g Express Edition, that requires a separate free registration:
Oracle Database Express Edition (XE)
There is a thread on this subject in the 10g Express discussion group:
Oracle Text
The above thread includes the following quick comparison:
If it's not in Standard Edition One, it is not in Express Edition. (Therefore everything that is in Enterprise but not in Standard Edition, is also not in Express Edition. That includes all EE-only options.)
If it requires Java in the database, it is not in Express Edition.
Other than that, it's mainly the size limits: 1 CPU (no parallel processing), 1 GB RAM, and 4 GB user-related tablespaces. -
Difference between Oracle Text and XML DB?
We are currently storing XML's in Oracle text. I understand that XML DB is faster to retrieve XML's based on conditional search within XML.
Is there a place I could find the difference between these two?Text offer xpath like searching, assuming that you don't need to worry about little things like namespaces :). Oracle XML DB offers native XML storage, indexing and searching fully compliant with rellevant XML standards.
-
Oracle Secure enterprise Search versus Oracle Text
I'm involeved in a project where we're using Oracle text for its text search capability. Yesterday during a meeting Oracles Secure Enterprise search engine came up. I see similar functionality offered in both products - Oracle text comes with 10g - not sure if SES comes with additional cost. Has anyone done analysis on why one would implement one over the other - I understand that SES gives the customer a federated option and some internet search capabilities but since I'm not concerned with that for this project does it make a difference?
SES is a complete seaerch application with connectors to many different data sources, such as email systems and document management systems.
Oracle Text, on the other hand, is a toolkit for building applications (and is used as such by SES).
Oracle Text comes free with the database. SES is chargable, but comes with a free database (though it's restricted to use by SES only!)
Generally speaking, if your data is in the database and you want fine control over how to search it, Oracle Text is a better option.
If your data is scattered around diverse enterprise sources, and you want a ready-built application to collect, index and search that data, SES is the proper choice.
Here's a slide from my OpenWorld presentation, which I guess says much the same thing:
Oracle Text is the toolkit and platform for building sophisticated Information Retrieval applications and services
- Fine control over indexes, partitioning, etc
Oracle Secure Enterprise Search is a stand-alone application built on the foundation of Oracle Text
- Includes its own database
- No programming needed
- Includes crawlers and an end-user UI
Maybe you are looking for
-
Ever since yesterday, my Ipod apps won't open properly. I googled it, and couldn't find anything accurate. So i deleted all of the apps, and re-downloaded them. Now they're doing it again. When i tap on it, it looks like it wants to open, but then it
-
After software update, Epson Artisan printers will not print 4x6
Hi all, After updating the Epson drivers to the latest Apple-Provided version (Via software update), neither my Artisan 837 or Artisan 835 will print to Epson Premium Glossy 4 x 6 photo ppaper. Both worked prior to the update. I have everything set c
-
Best practice for function module development
When designing a function module, what is the best practice. Should we develop it so that most of the extraction is done within the function module, or should we develop it such that prior to calling the function module the extraction should be done
-
Hi I am having an issue with exporting CS5 .indd as SWF. I want to create an e-book with placed SWF Media that is exported as SWF and includes Page Curl. I have set the placed SWF media to play on page load, poster is standard, and export settings, I
-
How is actual coverage calculated in MD04?
Hi Experts, Can you please explain how is the actual coverage calculated in transaction MD04? What are the tables involved? From which tables the quantities are retrieved? Thanks, Vitz.