Oracle Text Example
Can someone post a quick example of an Oracle Text query?
Ben,
Thanks for the quick answer! I was teaching an APEX class and encouraging them to use the forum. I said "I bet someone answers this in an hour or less". You did it in 13 minutes! I tried to ask a question that didn't require any research, so I hope you didn't invest much time in it.
Thanks again,
Tyler
Tyler Muth
http://tylermuth.wordpress.com
"Applied Oracle Security: Developing Secure Database and Middleware Environments": http://sn.im/aos.book
Similar Messages
-
I have been following the Oracle Text -- OBE example and am running into a problem when Creating a Database Access Descriptor (DAD) in the HTTP Server.
Environment: W2k on a standalone machine with no network connection, running 9i v2.
When trying to access <hostname>:80 I get a page not found error.
The HTTP server is running and I can access OEM throught the web browser (port 3339).
Any suggestions would be apprecieated.
thanksTry using port number 7778 (http://yourmachinename:7778). You should see the Oracle HTTP server index page.
-
Pre-loading Oracle text in memory with Oracle 12c
There is a white paper from Roger Ford that explains how to load the Oracle index in memory : http://www.oracle.com/technetwork/database/enterprise-edition/mem-load-082296.html
In our application, Oracle 12c, we are indexing a big XML field (which is stored as XMLType with storage secure file) with the PATH_SECTION_GROUP. If I don't load the I table (DR$..$I) into memory using the technique explained in the white paper then I cannot have decent performance (and especially not predictable performance, it looks like if the blocks from the TOKEN_INFO columns are not memory then performance can fall sharply)
But after migrating to oracle 12c, I got a different problem, which I can reproduce: when I create the index it is relatively small (as seen with ctx_report.index_size) and by applying the technique from the whitepaper, I can pin the DR$ I table into memory. But as soon as I do a ctx_ddl.optimize_index('Index','REBUILD') the size becomes much bigger and I can't pin the index in memory. Not sure if it is bug or not.
What I found as work-around is to build the index with the following storage options:
ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'YES' );
ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
so that the token_info column will be stored in a secure file. Then I can change the storage of that column to put it in the keep buffer cache, and write a procedure to read the LOB so that it will be loaded in the keep cache. The size of the LOB column is more or less the same as when creating the index without the BIG_IO option but it remains constant even after a ctx_dll.optimize_index. The procedure to read the LOB and to load it into the cache is very similar to the loaddollarR procedure from the white paper.
Because of the SDATA section, there is a new DR table (S table) and an IOT on top of it. This is not documented in the white paper (the white paper was written for Oracle 10g). In my case this DR$ S table is much used, and the IOT also, but putting it in the keep cache is not as important as the token_info column of the DR I table. A final note: doing SEPARATE_OFFSETS = 'YES' was very bad in my case, the combined size of the two columns is much bigger than having only the TOKEN_INFO column and both columns are read.
Here is an example on how to reproduce the problem with the size increasing when doing ctx_optimize
1. create the table
drop table test;
CREATE TABLE test
(ID NUMBER(9,0) NOT NULL ENABLE,
XML_DATA XMLTYPE
XMLTYPE COLUMN XML_DATA STORE AS SECUREFILE BINARY XML (tablespace users disable storage in row);
2. insert a few records
insert into test values(1,'<Book><TITLE>Tale of Two Cities</TITLE>It was the best of times.<Author NAME="Charles Dickens"> Born in England in the town, Stratford_Upon_Avon </Author></Book>');
insert into test values(2,'<BOOK><TITLE>The House of Mirth</TITLE>Written in 1905<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
insert into test values(3,'<BOOK><TITLE>Age of innocence</TITLE>She got a prize for it.<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
3. create the text index
drop index i_test;
exec ctx_ddl.create_section_group('TEST_SGP','PATH_SECTION_GROUP');
begin
CTX_DDL.ADD_SDATA_SECTION(group_name => 'TEST_SGP',
section_name => 'SData_02',
tag => 'SData_02',
datatype => 'varchar2');
end;
exec ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
exec ctx_ddl.set_attribute('TEST_STO','I_TABLE_CLAUSE','tablespace USERS storage (initial 64K)');
exec ctx_ddl.set_attribute('TEST_STO','I_INDEX_CLAUSE','tablespace USERS storage (initial 64K) compress 2');
exec ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'NO' );
exec ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
create index I_TEST
on TEST (XML_DATA)
indextype is ctxsys.context
parameters('
section group "TEST_SGP"
storage "TEST_STO"
') parallel 2;
4. check the index size
select ctx_report.index_size('I_TEST') from dual;
it says :
TOTALS FOR INDEX TEST.I_TEST
TOTAL BLOCKS ALLOCATED: 104
TOTAL BLOCKS USED: 72
TOTAL BYTES ALLOCATED: 851,968 (832.00 KB)
TOTAL BYTES USED: 589,824 (576.00 KB)
4. optimize the index
exec ctx_ddl.optimize_index('I_TEST','REBUILD');
and now recompute the size, it says
TOTALS FOR INDEX TEST.I_TEST
TOTAL BLOCKS ALLOCATED: 1112
TOTAL BLOCKS USED: 1080
TOTAL BYTES ALLOCATED: 9,109,504 (8.69 MB)
TOTAL BYTES USED: 8,847,360 (8.44 MB)
which shows that it went from 576KB to 8.44MB. With a big index the difference is not so big, but still from 14G to 19G.
5. Workaround: use the BIG_IO option, so that the token_info column of the DR$ I table will be stored in a secure file and the size will stay relatively small. Then you can load this column in the cache using a procedure similar to
alter table DR$I_TEST$I storage (buffer_pool keep);
alter table dr$i_test$i modify lob(token_info) (cache storage (buffer_pool keep));
rem: now we must read the lob so that it will be loaded in the keep buffer pool, use the prccedure below
create or replace procedure loadTokenInfo is
type c_type is ref cursor;
c2 c_type;
s varchar2(2000);
b blob;
buff varchar2(100);
siz number;
off number;
cntr number;
begin
s := 'select token_info from DR$i_test$I';
open c2 for s;
loop
fetch c2 into b;
exit when c2%notfound;
siz := 10;
off := 1;
cntr := 0;
if dbms_lob.getlength(b) > 0 then
begin
loop
dbms_lob.read(b, siz, off, buff);
cntr := cntr + 1;
off := off + 4096;
end loop;
exception when no_data_found then
if cntr > 0 then
dbms_output.put_line('4K chunks fetched: '||cntr);
end if;
end;
end if;
end loop;
end;
Rgds, PierreI have been working a lot on that issue recently, I can give some more info.
First I totally agree with you, I don't like to use the keep_pool and I would love to avoid it. On the other hand, we have a specific use case : 90% of the activity in the DB is done by queuing and dbms_scheduler jobs where response time does not matter. All those processes are probably filling the buffer cache. We have a customer facing application that uses the text index to search the database : performance is critical for them.
What kind of performance do you have with your application ?
In my case, I have learned the hard way that having the index in memory (the DR$I table in fact) is the key : if it is not, then performance is poor. I find it reasonable to pin the DR$I table in memory and if you look at competitors this is what they do. With MongoDB they explicitly says that the index must be in memory. With elasticsearch, they use JVM's that are also in memory. And effectively, if you look at the awr report, you will see that Oracle is continuously accessing the DR$I table, there is a SQL similar to
SELECT /*+ DYNAMIC_SAMPLING(0) INDEX(i) */
TOKEN_FIRST, TOKEN_LAST, TOKEN_COUNT, ROWID
FROM DR$idxname$I
WHERE TOKEN_TEXT = :word AND TOKEN_TYPE = :wtype
ORDER BY TOKEN_TEXT, TOKEN_TYPE, TOKEN_FIRST
which is continuously done.
I think that the algorithm used by Oracle to keep blocks in cache is too complex. A just realized that in 12.1.0.2 (was released last week) there is finally a "killer" functionality, the in-memory parameters, with which you can pin tables or columns in memory with compression, etc. this looks ideal for the text index, I hope that R. Ford will finally update his white paper :-)
But my other problem was that the optimize_index in REBUILD mode caused the DR$I table to double in size : it seems crazy that this was closed as not a bug but it was and I can't do anything about it. It is a bug in my opinion, because the create index command and "alter index rebuild" command both result in a much smaller index, so why would the guys that developped the optimize function (is it another team, using another algorithm ?) make the index two times bigger ?
And for that the track I have been following is to put the index in a 16K tablespace : in this case the space used by the index remains more or less flat (increases but much more reasonably). The difficulty here is to pin the index in memory because the trick of R. Ford was not working anymore.
What worked:
first set the keep_pool to zero and set the db_16k_cache_size to instead. Then change the storage preference to make sure that everything you want to cache (mostly the DR$I) table come in the tablespace with the non-standard block size of 16k.
Then comes the tricky part : the pre-loading of the data in the buffer cache. The problem is that with Oracle 12c, Oracle will use direct_path_read for FTS which basically means that it bypasses the cache and read directory from file to the PGA !!! There is an event to avoid that, I was lucky to find it on a blog (I can't remember which, sorry for the credit).
I ended-up doing that. the events to 10949 is to avoid the direct path reads issue.
alter session set events '10949 trace name context forever, level 1';
alter table DR#idxname0001$I cache;
alter table DR#idxname0002$I cache;
alter table DR#idxname0003$I cache;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0001$I;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0002$I;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0003$I;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0001$I ITAB;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0002$I ITAB;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0003$I ITAB;
It worked. With a big relief I expected to take some time out, but there was a last surprise. The command
exec ctx_ddl.optimize_index(idx_name=>'idxname',part_name=>'partname',optlevel=>'REBUILD');
gqve the following
ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-50857: oracle error in drftoptrebxch
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DDL", line 1141
ORA-06512: at line 1
Which is very much exactly described in a metalink note 1645634.1 but in the case of a non-partitioned index. The work-around given seemed very logical but it did not work in the case of a partitioned index. After experimenting, I found out that the bug occurs when the partitioned index is created with dbms_pclxutil.build_part_index procedure (this enables enables intra-partition parallelism in the index creation process). This is a very annoying and stupid bug, maybe there is a work-around, but did not find it on metalink
Other points of attention with the text index creation (stuff that surprised me at first !) ;
- if you use the dbms_pclxutil package, then the ctx_output logging does not work, because the index is created immediately and then populated in the background via dbms_jobs.
- this in combination with the fact that if you are on a RAC, you won't see any activity on the box can be very frightening : this is because oracle can choose to start the workers on the other node.
I understand much better how the text indexing works, I think it is a great technology which can scale via partitioning. But like always the design of the application is crucial, most of our problems come from the fact that we did not choose the right sectioning (we choosed PATH_SECTION_GROUP while XML_SECTION_GROUP is so much better IMO). Maybe later I can convince the dev to change the sectionining, especially because SDATA and MDATA section are not supported with PATCH_SECTION_GROUP (although it seems to work, even though we had one occurence of a bad result linked to the existence of SDATA in the index definition). Also the whole problematic of mixed structured/unstructured searches is completly tackled if one use XML_SECTION_GROUP with MDATA/SDATA (but of course the app was written for Oracle 10...)
Regards, Pierre -
Issues using Oracle Text conditions
Hi all,
I'm facing an issue executing a query on a VIEW using Oracle Text Indexes.
The DB version I'm using is "Enterprise 9.2.0.5".
TEST_VIEW is an sql-view that has a query on several tables where one of them has two Oracle Text indexes, one on field FIELD2 and another on FIELD3
executing this query I get 10 rows:
select *
from TEST_VIEW
where FIELD1 = 1001 -- regular condition
and (contains(FIELD2, 'Blitz') > 0 ) -- Oracle text condition
But if I add another condition on an existent Oracle Text Index, I'll get only 1 row:
select *
from TEST_VIEW
where FIELD1 = 1001
where (contains(FIELD2, 'Blitz') > 0 OR contains(FIELD3, 'Blitz') > 0)
As you can see the third condition was added using a logical OR, so I should get at least 10 rows ...
Can anyone help me ?
ThaNks in advance.
Eduardo.Eduardo,
Without a full test case, it is hard to see if there is something wrong or not. I did the following, and all worked fine on my 10g instance. I had to assume some things, but I at least think I have the basic gist of your inquiry in this example. Run it/change it to match your situation, and post back when you can.
Thanks,
Ron
CREATE TABLE Z_TEST1 (
FIELD1 VARCHAR2(30));
INSERT INTO Z_TEST1
VALUES ('QUICK');
INSERT INTO Z_TEST1
VALUES ('BROWN');
INSERT INTO Z_TEST1
VALUES ('FOX');
INSERT INTO Z_TEST1
VALUES ('QUICK');
INSERT INTO Z_TEST1
VALUES ('BROWN');
INSERT INTO Z_TEST1
VALUES ('FOX');
CREATE TABLE Z_TEST2 (
FIELD2 VARCHAR2(30));
INSERT INTO Z_TEST2
VALUES ('JUMPED');
INSERT INTO Z_TEST2
VALUES ('OVER');
INSERT INTO Z_TEST2
VALUES ('LAZY');
INSERT INTO Z_TEST2
VALUES ('DOG');
INSERT INTO Z_TEST2
VALUES ('QUICK');
INSERT INTO Z_TEST2
VALUES ('BROWN');
INSERT INTO Z_TEST2
VALUES ('FOX');
commit;
CREATE VIEW TEST_VIEW
AS
SELECT Z_TEST1.FIELD1 AS "FIELD1", Z_TEST2.FIELD2 AS "FIELD2"
FROM Z_TEST1, Z_TEST2;
CREATE INDEX Z_TEST1_IDX ON Z_TEST1(FIELD1)
INDEXTYPE IS CTXSYS.CONTEXT;
CREATE INDEX Z_TEST2_IDX ON Z_TEST2(FIELD2)
INDEXTYPE IS CTXSYS.CONTEXT;
select *
from TEST_VIEW
where CONTAINS(FIELD1, 'FOX') > 0;
14 rows
select *
from TEST_VIEW
where (CONTAINS(FIELD1, 'FOX') > 0 OR CONTAINS(FIELD2, 'FOX') > 0);
18 rows -
Index rules in oracle text and query using matches
Dear All,
I would like to ask about rules and matches function in oracle text.
I followed an example in oracle text application developer's guide.
I have a rule table like this :
1 oracle
2 larry or ellison
3 oracle and text
4 market share
then, I create an index to that table. This is needed for calling matches function. Here is the syntax :
create index queryx on queries(query_string)
indextype is ctxsys.ctxrule;
then, I noticed that the result on DR$QUERYX$I table as follows :
LARRY 0 2 2 1 (BLOB)
MARKET 0 4 4 1 (BLOB) {MARKET} {SHARE}
ORACLE 0 1 1 1 (BLOB)
ORACLE 0 3 3 1 (BLOB) {TEXT}
ELLISON 0 2 2 1 (BLOB)
What I want to ask is why doesn't the words 'share' and 'text' appear in the DR$QUERYX$ table?
When we use matches function, it then search on the index result and consequently it wion't find the 'share' word. so when for example I do query like this :
select query_id from queries where matches(query_string,' It only share ten percent of all products sold')>0
it will give 0 result since the no word in ' It only share ten percent of all products sold' was in index table. But actually it could possibly be categorized as the 4 category which rules is 'market share'
I tried this in a larger set of data and get same result.
Here is my generated rules from my document collection :
1 {REQUIREMENTS} & {ELICITATION}
1 {REQUIREMENTS} ~ {ELICITATION} & {ACTOR}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} & {FURPS}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} & {PROC}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} & {SPEED}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} & {DOCUME}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} ~ {DOCUME} & {PLACED}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} ~ {DOCUME} ~ {PLACED} & {UNNECESSARY}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} ~ {DOCUME} ~ {PLACED} ~ {UNNECESSARY} & {MISUSE}
1 {INTERPRETATION} ~ {REQUIREMENTS}
2 {DESIGN} & {REPRESENTATION}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} & {OCTOBER}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} & {PROCEDURAL}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} & {STRICT}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} ~ {STRICT} & {GRASP}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} ~ {STRICT} ~ {GRASP} & {MANY} & {LAYER}
2 {DESIGN} ~ {REPRESENTATION} ~ {MAY}
3 {PM} & {TESTING} & {ATTRIBUTI}
And this is the index table result with ctxrule :
(only the token_text column shown)
PM
DESIGN
DESIGN
DESIGN
DESIGN
DESIGN
DESIGN
DESIGN
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
INTERPRETATION
so when I try to classify a document with the word ouline inside it, it should produce category 1 (based on the rules) but since there are no word 'outline' in index tabel, the matches will return 0 means that the document is not classifiedto any category. I don't understand why it happen. Anybody knows about this? I would really appreciate any help.
Thank you very much.Hm, I see. It do make sense. so nice to know.
But then in the second example I gift where I used larger table, as shown below :
Here is my generated rules from my document collection :
1 {REQUIREMENTS} & {ELICITATION}
1 {REQUIREMENTS} ~ {ELICITATION} & {ACTOR}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} & {FURPS}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} & {PROC}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} & {SPEED}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} & {DOCUME}
1 {INTERPRETATION} ~ {REQUIREMENTS}
2 {DESIGN} & {REPRESENTATION}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} & {OCTOBER}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} & {PROCEDURAL}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} & {STRICT}
2 {DESIGN} ~ {REPRESENTATION} ~ {MAY}
3 {PM} & {TESTING} & {ATTRIBUTI}
As far as I know, the sign ' ~ ' means 'OR' and '&' means 'and' . So based on the 4th line in my table :
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE}
it can be concluded that if any of the words stated there been queried, so the category '1' will appear as a result. But then before we can use 'matches' to query it, we need ti create index for the rules table . I did it and the result were :
(only the token_text column shown)
PM
DESIGN
DESIGN
DESIGN
DESIGN
DESIGN
DESIGN
DESIGN
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
INTERPRETATION
there were no words other than PM, DESIGN< REQUIREMENTS and INTERPRETATION. Why the words REQUIREMENTS, ELICITATION, ACTOR, FURPS, OUTLINE don't appear in the index result? -
Hi
I am trying to make a query run against a table as follows
SQL> desc FILES_INCLUDED;
Name Null? Type
PID NOT NULL VARCHAR2(16)
FILENAME NOT NULL VARCHAR2(240)
SQL> select count(*) from FILES_INCLUDED;
COUNT(*)
5719417
SQL>
where PID & FILENAME essentially contains all the files delivered by a particular patch, ie
123456-01 woudl be pid and FILENAME might be
/usr/bin/ls, so 123456-01 might have multiple entries, one row for each deliverable i.e.
SQL> select count(*) from FILES_INCLUDED where PID='xxxxxx-xx';
COUNT(*)
969
SQL>
I know this is poor really, but that is what this looks like so short of rewriting apps, that is the way it is.
Anyway users want to do
select distinct PID from FILES_INCLUDED where FILENAME like '%bin/ls%';
now this just takes up to one minute to run ( not unexpected that really )
So I recommended oracle text as perhaps a better way,
I set this up via
begin
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','true');
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
ctx_ddl.set_attribute('mywordlist','WILDCARD_MAXTERMS', '15000');
end;
and this works to a degree, but is there a way to know upfront, how to avoid getting
DRG-51030: wildcard query expansion resulted in too many terms
that is what is the minimum number of chars that one can use ( how to determine this )
ie
select distinct pid from files_included where (contains (filename,'/bin/%',1)>0)
gets a DRG-51030
I can query
select count(*) from dr$FILE_NAME_INX$i;
select count(*) from dr$FILE_NAME_INX$p;
to get an idea of the number of tokens, but not sure how to make a query run on this kind of data that is foolproof.
The use case is for users to enter
%ls% and get back the PID's that deliver %ls%.
EndaIt looks like you are wanting to search by sub-directory names or a combination of sub-directory names. By default, Oracle Text views the directory delimiter / as white space, so the individual sub-directories are tokenized. Therefore, you don't need the wildcards or / to do your searches. Please see the example below.
SCOTT@orcl_11g> create table files_included
2 (pid varchar2 (16) not null,
3 filename varchar2 (40) not null)
4 /
Table created.
SCOTT@orcl_11g> insert all
2 into files_included values
3 ('123456-01', '/usr/bin/ls/a')
4 into files_included values
5 ('123456-02', '/usr/bin/ls/b')
6 into files_included values
7 ('123456-03', '/usr/x/ls/a')
8 into files_included values
9 ('123456-02', '/usr/bin/x/b')
10 into files_included values
11 ('654321', '/usr/bin/other')
12 select * from dual
13 /
5 rows created.
SCOTT@orcl_11g> create index myindex
2 on files_included (filename)
3 indextype is ctxsys.context
4 /
Index created.
SCOTT@orcl_11g> select token_text
2 from dr$myindex$i
3 /
TOKEN_TEXT
B
BIN
LS
OTHER
USR
X
6 rows selected.
SCOTT@orcl_11g> select * from files_included
2 where contains (filename, 'bin ls') > 0
3 /
PID FILENAME
123456-01 /usr/bin/ls/a
123456-02 /usr/bin/ls/b
SCOTT@orcl_11g> select * from files_included
2 where contains (filename, 'bin') > 0
3 /
PID FILENAME
123456-01 /usr/bin/ls/a
123456-02 /usr/bin/ls/b
123456-02 /usr/bin/x/b
654321 /usr/bin/other
SCOTT@orcl_11g> select * from files_included
2 where contains (filename, 'ls') > 0
3 /
PID FILENAME
123456-01 /usr/bin/ls/a
123456-02 /usr/bin/ls/b
123456-03 /usr/x/ls/a
SCOTT@orcl_11g> -
Using Oracle Text to search through WORD, EXCEL and PDF documents
Hello again,
What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
Thanks
DougYes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
begin example.
-- The following needs to be executed
-- as sys.
DROP DIRECTORY docs_dir;
CREATE OR REPLACE DIRECTORY docs_dir
AS 'C:\sql\oracle_text\documents';
GRANT READ ON DIRECTORY docs_dir TO text;
-- End sys ran SQL
DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
CREATE TABLE db_docs (
id NUMBER,
format VARCHAR2(10),
location VARCHAR2(50),
document BLOB,
CONSTRAINT i_db_docs_p PRIMARY KEY(id)
-- Several notes need to be made about this anonymous block.
-- First the 'DOCS_DIR' parameter is a directory object name.
-- This directory object name must be in upper case.
DECLARE
f_lob BFILE;
b_lob BLOB;
document_name VARCHAR2(50);
BEGIN
document_name := 'externaltables.doc';
INSERT INTO db_docs
VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
RETURN document INTO b_lob;
f_lob := BFILENAME('DOCS_DIR', document_name);
DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
DBMS_LOB.FILECLOSE(f_lob);
COMMIT;
END;
-- build the index
-- Note that this index differs than the file system stored file
-- in that paramter datastore is ctxsys.defautl_datastore and not
-- ctxsys.file_datastore. FILE_DATASTORE is for documents that
-- exist on the file system. DEFAULT_DATASTORE is for documents
-- that are stored in the column.
create index db_docs_ctx on db_docs(document)
indextype is ctxsys.context
parameters (
'datastore ctxsys.default_datastore
filter ctxsys.inso_filter
format column format');
--search for something that is known to not be in the document.
SELECT SCORE(1), id, location
FROM db_docs
WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
--search for something that is known to be in the document.
SELECT SCORE(1), id, location
FROM db_docs
WHERE CONTAINS(document, 'Albright', 1) > 0; -
Oracle Text in installing Oracle 10g without licence!!
Hi. Everyone.
I've read some thread , but I am still confused about "oracle text".
Now, I am testing oracle10g database.
I downloaded 10g software from www.oracle.com, and installed it sucessfully
on windows xp.
When I was trying to import a dump file from oracle9i to
the unlicenced oracle10g database, I got the error , IMP-00017, which
is related to "Oracle Text".
I checked "dba_users" dictionary, but ctxsys user is locked and expired.
I read some thread on this site, and according to the advice, I tried to
enable oracle text, using "DBCA".
However, every database option on DBCA is disabled, I was not able to
check oracle text.
Lastly, how can I enable "Oracle Text" with unlicenced oracle 10g ?
Is this possible without licence?
I am very confused about this.
I am looking forward to hear your experience and advices.
Have a nice day.
Best Regards.
Ho.Well, instead of being confused, you could go to http://www.oracle.com/pls/db102/portal.portal_db?selected=1 and look at
1) the licensing document, which would tell you whether you need a separate license, and
2) under the 'Books' tab, look at the Text Application Developer's Guide or the Text Reference manuals for details.
You could also look for the Oracle Text forum (from the http://forums.oracle.com page, under Database - More, or Text and ask the people who concentrate on that set of features.
In general, Oracle Text is a set of extensions, the definitions for which are stored under user ctxsys. You would use these extensions by creating your own objects that are based on the extensions.
For example, suppose your tables contain varchar2 columns. Create indexes that are based on ctxsys's 'context index type' and your application can then use the 'CONTAINS' keyword search capability (which is effectively a ctxsys-owned extension to the select)
However, you would never log on to ctxsys and do anythibng with that as you risk changing the template code that Oracle has supplied.
Message was edited by:
Hans Forbrich
PS: Yes, Oracle Text is included as part of the base database. Most of it is even included in the free Oracle XE database. -
I have successfully created (at least I think) oracle text indexes on my XMLType table:
EXEC ctx_ddl.create_section_group('contract_xmlgroup', 'XML_SECTION_GROUP');
EXEC CTX_DDL.Add_Zone_Section (group_name => 'contract_xmlgroup', section_name => 'complete_entry', tag => 'complete_entry')
CREATE INDEX complete_entry ON boss_contracts INDEXTYPE IS ctxsys.context
parameters('section group contract_xmlgroup');
however I am unsure how to now search using CONTAINS with this index, I tried this at first:
SELECT count(*) FROM boss_contracts b
WHERE CONTAINS(value(b), 'string WITHIN complete_entry') > 0;
this just gave me the error:
ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-10599: column is not indexed
any help would be appreciated
PaulIt looks like you are wanting to search by sub-directory names or a combination of sub-directory names. By default, Oracle Text views the directory delimiter / as white space, so the individual sub-directories are tokenized. Therefore, you don't need the wildcards or / to do your searches. Please see the example below.
SCOTT@orcl_11g> create table files_included
2 (pid varchar2 (16) not null,
3 filename varchar2 (40) not null)
4 /
Table created.
SCOTT@orcl_11g> insert all
2 into files_included values
3 ('123456-01', '/usr/bin/ls/a')
4 into files_included values
5 ('123456-02', '/usr/bin/ls/b')
6 into files_included values
7 ('123456-03', '/usr/x/ls/a')
8 into files_included values
9 ('123456-02', '/usr/bin/x/b')
10 into files_included values
11 ('654321', '/usr/bin/other')
12 select * from dual
13 /
5 rows created.
SCOTT@orcl_11g> create index myindex
2 on files_included (filename)
3 indextype is ctxsys.context
4 /
Index created.
SCOTT@orcl_11g> select token_text
2 from dr$myindex$i
3 /
TOKEN_TEXT
B
BIN
LS
OTHER
USR
X
6 rows selected.
SCOTT@orcl_11g> select * from files_included
2 where contains (filename, 'bin ls') > 0
3 /
PID FILENAME
123456-01 /usr/bin/ls/a
123456-02 /usr/bin/ls/b
SCOTT@orcl_11g> select * from files_included
2 where contains (filename, 'bin') > 0
3 /
PID FILENAME
123456-01 /usr/bin/ls/a
123456-02 /usr/bin/ls/b
123456-02 /usr/bin/x/b
654321 /usr/bin/other
SCOTT@orcl_11g> select * from files_included
2 where contains (filename, 'ls') > 0
3 /
PID FILENAME
123456-01 /usr/bin/ls/a
123456-02 /usr/bin/ls/b
123456-03 /usr/x/ls/a
SCOTT@orcl_11g> -
Using Oracle Text to Data Mine
Can someone provide me with an idea of how to Data Mine with just using Oracle Text and not the data mining option. I need to search a column of customer complaints and then put it in a category based on that. It would be best if the categories were auto generated. It has to be done in PL/SQL.
Thanks,You cannot have the categories created automatically without data mining. However, if you are willing to create the categories and queries that determine them, then you can do it with just Oracle Text. I posted an example on the 2nd page of the following thread:
Re: New to Oracle Text search -
Highlite oracle text search terms
I have a report that I set up using the instructions for Oracle Text Application in APEX. It works very well however I have the actual document as a link and I would like the search terms highlighted in the actual document. Is there a way to do that in APEX?
I use this Region Source:
select score(1) relevance, filename, dbms_lob.getlength("DOCUMENT") Document, code_id
from documents
where contains (document, :P10_SEARCH, 1) > 0
order by 1 desc
I read something about using ctx_doc.snippet to highlight but can get that to work.
Any suggestions or can APEX highlight terms when the actual document is used?'8265490,
Take a look at the ctx_doc.markup procedure. I think it will do what you want.
http://download.oracle.com/docs/cd/B19306_01/text.102/b14217/view.htm#sthref599
My home server is on a moving truck, so I can only point you to some old forum posts for examples:
Re: Using Oracle Text with Apex
Re: Use apex to display email
Doug -
I'm storing files in a blob field in a 9i database, sometimes I need to query using the details stored in the database about the file and sometimes I need to search the files to find matches with some text (like search engine), I was told that oracle text can help me accomplish this functionality , however I don't know if it supports arabic text and I don't know how to use it from my application developed in 9i.
Regards.Friend by using these step you can easily use Oracle inter text media
j a h a n z e b
[email protected]
Oracle Developer
6th Floor, State Bank of Pakistan
I.I.Chundrigar Road, Karachi.
Please note that in SqlPlus you can use '?' in stead of $ORACLE_HOME, and this works on Unix and Windows so if you want to execute $ORACLE_HOME/rdbms/admin/catalog.sql you can simply use:
on Unix sql> @?/rdbms/admin/catalog.sql
on Windows sql> @?\rdbms\admin\catalog.sql
5.2.1 Explanation of installation steps
1. Connected to database as SYSDBA and create CTXSYS user:
Ctxsys user is created by calling following script:
@?/ctx/admin/dr0csys.sql <ctxsys> <system> <temp>
Where:
change_on_install - is the ctxsys user password
DRSYS - is the default tablespace for ctxsys
TEMP - is the temporary tablespace for ctxsys
This will create user CTXSYS and grants full privileges to CTXSYS in order to create and insert into result tables, execute callbacks, rewrite queries, and perform system cleanup. At this point CTXSYS will not own any objects.ss
2. Connected to database as CTXSYS and create all necessary objects
All necessary object are creates by calling following script:
connect CTXSYS/change_on_install
@?/ctx/admin/dr0inst <replace with $ORACLE_HOME>/ctx/lib/libctxx9.so;
Please not that you have to put full path to your ORACLE_HOME, for example home as paramter
On Solaris/Aix/Linux with $ORACLE_HOME of /u01/app/oracle/product/8.1.7
@?/ctx/admin/dr0inst.sql /u01/app/oracle/product/8.1.7/ctx/lib/libctxx8.so
On HP-UX with $ORACLE_HOME of /u01/app/oracle/product/8.1.7
@?/ctx/admin/dr0inst.sql /u01/app/oracle/product/8.1.7/ctx/lib/libctxx8.sl
Windows NT/2000 with D:\oracle\product\8.1.7
@?/ctx/admin/dr0inst.sql D:\oracle\product\8.1.7\bin\oractxx8.dll
This will installs all Oracle database objects required by the Oracle Text system. This includes:
a) Data dictionary tables, views, sequence, packages
b) Server management tables, views and packages
c) Dispatcher packages
d) Service queue objects
3) Install appropriate language-specific default preferences.
The next step is to install appropriate language-specific default preferences.When you use CREATE INDEX to create an index or ALTER INDEX to manage an index, you can optionally specify indexing preferences in the parameter string. There are seven preference classes:
- Lexer, defines the language being indexed. ( language specific )
- Wordlist, defines the expantion of stem and fuzzy queries. ( language specific )
- Stoplist, defines words and themes that are not be indexed. ( language specific )
- Datastore, defines document storage.
- Filter, defines standards for converion of documents to plaintext.
- Storage, defines the storage of the index tables.
- Section group, enables possibilities to define document sections.
There is script which creates language-specific default preferences for every language Oracle text supports in <ORACLE_HOME>/ctx/admin/defaults directory, such as English(US), Danish(DK), Dutch(NL), Finnish(SF), French(FR), German(DE), Italian(IT), Portuguese(PR), Spanish(ES), and Swedish(S). They are named in the form drdefXX.sql, where XX is the language code. To manually install US default preferences, for example, log into sqlplus as CTXSYS, and run 'drdefus.sql' as described below:
@?/ctx/admin/defaults/drdefus.sql
create user textuser identified by textuser
default tablespace users
temporary tablespace temp;
-- You must grant 'ctxapp' role to textuser
grant connect, resource, ctxapp to textuser;
connect textuser/textuser
drop table quick;
create table quick (
quick_id number
constraint quick_pk primary key,
text varchar2(80) );
insert into quick ( quick_id, text ) values (1,'The cat sat on the mat');
insert into quick ( quick_id, text ) values (2,'The quick brown fox jumps over the lazy dog' );
insert into quick ( quick_id, text ) values (3,'The dog barked like a dog');
commit;
create index quick_text on quick ( text )
indextype is ctxsys.context;
col text format a45
col s format 999
select text, score(42) s from quick
where contains ( text, 'dog', 42 ) > 0
order by s desc; -
Hello,
I need to search in a number column for particular "subnumbers". For
example I have a column with 3453454 in it an I like to search e.g for the
number "53" in it. I know I could use
select * from table where number_column like '%53%'
but since the table is rather big I'd like to use Oracle Text for it to avoid a full table scan and query like
select * from table where contains(number_column, '53') > 0
but above query would return NULL after converting the number column
to a varchar2 column! Only full numbers are indexed and therefore only
a search on the full number 3453454 would yield a result. What are my
options to make above query with "contains" clause work?
Thanks in advanceYou can configure Text to do substring searches.
Do this:
ctx_ddl.create_preference( 'SUBSTR_SUPPORT_PREF', 'basic_wordlist' );
ctx_ddl.set_attribute( 'SUBSTR_SUPPORT_PREF', 'SUBSTRING_INDEX', 'YES' );
Then you can do something like:
where contains(col,'%53%',1) > 0
Tom Best -
How to compute a global SCORE over a few oracle text indexed tables?
Dear experts!
I want to search a website with Oracle Text. The website consists of four tables:
- site
- chapter
- text
- binaries
Each table has two or three columns which should be indexed with oracle text. So I have created a MULTI_COLUMN_DATASTORE oracle text index on each table - So I have four indexes on my website.
When I want to search over the website I have to join my 4 tables (4 contain clauses). So how do I get a global SCORE over these 4 contains clauses?
The next question is can I change the weight of my text indexes (useful for the search hit list)? For example the highest weight has the site index, the second highest weight the chapter index and so on?
Thanks
MarkusIf it's a simple JOIN, then you could just add the scores for each CONTAINS clause
select score(1)+score(2)+score(3)+score(4)
from table1 t1,table2 t2, table3 t3,table4 t4
where [join conditions]
and contains(t1.col, 'xxx', 1) > 0 or
contains(t2, col, 'xxx', 2) > 0 or
... etc
then to change the weight you just add a multiplying factor.
Can't help thinking it's probably more complex than this, though. -
Hi,
I was going through this document.Actually I am going to implement something like full text search functionality in our system.
We get the info as .doc file.
Earlier what we used to do is, we used to parse the file and store it into the database and then searched using PL/SQL.
But what I understand from this article that this can be done using oracle text also.
One concern is that whether the oracle text is able to parse the .doc file having tables embedded within it.
Please let me know about this.(Whether oracle text will be able to parse the files having tables embedded within it).
I am attaching an example file for this.
Please let me know about this as early as possible.Yes Oracle Text have this capability. Use AUTO_FILTER or USER_FILTER to create index
Maybe you are looking for
-
HP Photosmart c309a no longer printing and drivers not available
My HP Photosmart c309a printer which previously worked successfully printing wirelessly suddenly stopped longer working and showed as Offline and the dirvers do not seem to be available for my Macbook Air. I have set the printer back to default sett
-
Flex 3 : Scrollbar gap with cutoff
Flex 3 app, subclass of TitleWindow with scrollbar Sometimes, we see windows that appear with the end of the scrollbar showing a gap, where there is no clickable region. The scrollbar functions correctly, but looks somewhat unappealing. This is wit
-
Upgrade impact on ALE, IDOC, RFC & XI
Hi all We are having SAP R/3 4.6c and planning to upgrade ECC 6.0. Please tell me what are impacts on ALE, IDOC, RFC & XI integrations. Thanks in Advance. Raju
-
Calendars not updating...
After installing Mavericks, my desktop calendar is not updating. When I hit refressh, the "updating..." dialogue appears and never goes away. However, changes made on my partners desxtop calendar are showing on all of our iOS devices? Any thoughts on
-
Why Microsoft doesn't have correct fix to run automatically. How many years we need to cry with same error again n again? Does any one have correct solution which will fix permanently while running more than 4000 systems in our environment Manual stu