Context indexing and PDFs
I recently used application express to create an upload system to house word docs, Excel files and PDFs in a BLOB. I used a context index to index them and a CONTAINS query to search through the BLOB and have it find which documents contained certain words.
My Problem is some PDFs are not indexing correctly, and I wondered what limitations there were to this and if we could fix them. Has anyone else encountered this?
I am logged in as the user who owns the index.
I set a filter "token_text = 'oracle' " when I am viewing the data in the DR$TEMP_INDEX$I table and I don't get anything returned(oracle is obviously listed in the 10g product guide, but it isn't in anything else). It returns other keywords in other documents.
I am thinking it may be the embedded fonts issue? thets the only difference in the files I can see.
Message was edited by:
in3d
Similar Messages
-
Trying to understand context indexes and contains-help
Hi
i am using
Achieving functionality of many preferences using one context index
to understand context indexes and contains
and i get the following
Error starting at line 1 in command:
begin
ctx_ddl.create_preference ('nd_mcds', 'multi_column_datastore');
ctx_ddl.set_attribute ('nd_mcds', 'columns', 'text nd, text text');
ctx_ddl.create_section_group ('nd_sg', 'basic_section_group');
ctx_ddl.add_ndata_section ('nd_sg', 'nd', 'nd');
ctx_ddl.create_preference ('test_lex', 'basic_lexer');
ctx_ddl.set_attribute ('test_lex', 'whitespace', '/\|-_+');
end;
Error report:
ORA-06550: line 5, column 15:
PLS-00302: component 'ADD_NDATA_SECTION' must be declared
ORA-06550: line 5, column 7:
PL/SQL: Statement ignored
06550. 00000 - "line %s, column %s:\n%s"
*Cause: Usually a PL/SQL compilation error.
*Action:
so i am using the following to check for the error
http://docs.oracle.com/cd/E18283_01/text.112/e16593/cddlpkg.htm#BABCBFCB
plus
oracle text application developer's guide
plus
oracle text reference
but these have not listed that error (i have even googled this in vain)
background::we were actually using catsearch but because of its downsides i want to implement this
Is Achieving functionality of many preferences using one context index a good place to start when one does not know about
context and contains??
please post any other useful link for contains and context index that even explains
1) fuzzy
2) stem
3) synonym
4) near
5) soundex
6)ndata
7)lexer
thanks in advanceNdata is new to Oracle 11g. Your other posts indicate that you are using Oracle 10g, so you don't have ndata, so you get an error when you try to use it. If you want to use the 11g features that enable context indexes with contains to do all of the things that ctxcat indexes with catsearch do, then you need to upgrade to 11g.
The online documentation is searchable. Most things regarding Oracle Text are contained in either the Oracle Text Reference or the Oracle Text Application Developer's guide.
I suggest that you start with something very simple, then build from there.
The following is similar to your other post that used catsearch:
SCOTT@orcl_11gR2> CREATE TABLE mv_cat_seg_reg_prod
2 (cat_ids VARCHAR2 ( 7),
3 act_status VARCHAR2 (10),
4 name VARCHAR2 ( 1),
5 email VARCHAR2 ( 1),
6 address1 VARCHAR2 ( 1),
7 address2 VARCHAR2 ( 1),
8 contact_name VARCHAR2 ( 1),
9 mobile VARCHAR2 ( 1),
10 telephone VARCHAR2 ( 1))
11 /
Table created.
SCOTT@orcl_11gR2> INSERT ALL
2 INTO mv_cat_seg_reg_prod VALUES
3 ('1', 'Y', 'A', 'B', 'C', 'D', 'E', 'F', 'G')
4 INTO mv_cat_seg_reg_prod VALUES
5 ('2', 'N', 'H', 'I', 'J', 'K', 'L', 'M', 'N')
6 SELECT * FROM DUAL
7 /
2 rows created.
SCOTT@orcl_11gR2> CREATE INDEX mv_cat_seg_reg_prod_idx
2 ON mv_cat_seg_reg_prod (cat_ids)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 /
Index created.
SCOTT@orcl_11gR2> SELECT token_text FROM dr$mv_cat_seg_reg_prod_idx$i
2 /
TOKEN_TEXT
1
2
2 rows selected.
SCOTT@orcl_11gR2> SELECT *
2 FROM (SELECT SCORE (1), name, email, address1, address2, contact_name, mobile, telephone
3 FROM mv_cat_seg_reg_prod
4 WHERE CONTAINS (cat_ids, '1', 1) > 0
5 AND act_status = 'Y'
6 ORDER BY DBMS_RANDOM.VALUE)
7 WHERE ROWNUM < 8
8 /
SCORE(1) N E A A C M T
4 A B C D E F G
1 row selected. -
Context index and contains operator syntax how it works ?
Hi
I create a context index on four collumns (text_prof, text_gest, text_citizen, text)
of the same table content.
When i have more than one collumn being queryed using the contains syntax, oracle display the ora 29907 error saying found duplicated labels in primary invocations .
This query works:
SELECT * FROM content WHERE cod_type = '1'
AND (UPPER(title) LIKE UPPER('%tabagismo%')
OR contains (text, 'tabagismo',1)>0
This not works:
SELECT * FROM content
WHERE cod_type = '1' AND (
UPPER(title) LIKE UPPER('%tabagismo%')
OR contains (text, 'tabagismo',1)>0
OR contains (text_citizen,'tabagismo',1)>0
OR contains (text_gest,'tabagismo',1)>0
OR contains (text_prof,'tabagismo',1)>0
How can i fix it ?
I need to query all these colluns !
Does the contains operator can be used only in one collumn?
Thank´s in advanceHi
I create a context index on four collumns (text_prof, text_gest, text_citizen, text)
of the same table content.
When i have more than one collumn being queryed using the contains syntax, oracle display the ora 29907 error saying found duplicated labels in primary invocations .
This query works:
SELECT * FROM content WHERE cod_type = '1'
AND (UPPER(title) LIKE UPPER('%tabagismo%')
OR contains (text, 'tabagismo',1)>0
This not works:
SELECT * FROM content
WHERE cod_type = '1' AND (
UPPER(title) LIKE UPPER('%tabagismo%')
OR contains (text, 'tabagismo',1)>0
OR contains (text_citizen,'tabagismo',1)>0
OR contains (text_gest,'tabagismo',1)>0
OR contains (text_prof,'tabagismo',1)>0
How can i fix it ?
I need to query all these colluns !
Does the contains operator can be used only in one collumn?
Thank´s in advance -
Hi,
I want to create a context index on one column which contains large text. And the table contains millions of records and daily inserts happen into the same table. My question is
1.Do we need to run any procedures after inserting the records daily?
2.Is there any problem from performace point of view creating context index on the table
Thanks,
Srisri333 wrote:
Hi,
I want to create a context index on one column which contains large text. And the table contains millions of records and daily inserts happen into the same table. My question is
1.Do we need to run any procedures after inserting the records daily?Not for what you describe. But you didn't describe much. I guess you will do something with this table data later. It depends from that. But since you only mentioned that you insert. Then no there is nothing to do after that.
2.Is there any problem from performace point of view creating context index on the tableSure. Creating the index takes time. If the index is there new inserts will take more time.
Edited by: Sven W. on Oct 10, 2012 12:02 PM -
Context Indexes and ignoring characters
So, we're trying to get a text index to ignore apostrophe's.
insert into table values ('Arby''s');We want the above entry to be located with either of the following queries:
select *
from table
where contains(field, 'Arbys')>0and also
select *
from table
where contains(field, 'Arby''s')>0The second SQL works already, it's the former search that is finding no records. I tried adding an apostrophe to the STOPLIST, but it didn't seem to make a difference. Is there another tweak I can make, so that this works? Or am I going to need to create 2 columns, one without special characters which actually has the context index on it?
Thanks,
--=ChuckSo this is what I've setup:
SQL> begin
2
3
4
5 CTX_DDL.CREATE_STOPLIST(stoplist_name => 'TEST_APOSTROPHE',
6 stoplist_type => 'BASIC_STOPLIST');
7
8 CTX_DDL.ADD_STOPWORD(stoplist_name => 'TEST_APOSTROPHE',
9 stopword => '''');
10
11
12 end;
13 /
PL/SQL procedure successfully completed.
SQL> create table test_apos (name varchar2(100));
Table created.
SQL> CREATE INDEX TEST_APOS_NAME_CTX ON TEST_APOS
2 (NAME)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS('STOPLIST TEST_APOSTROPHE SYNC(ON COMMIT)');
Index created.
SQL> insert into test_apos values ('Arby''s');
1 row created.
SQL> commit;
Commit complete.The following usages of CONTAINS( ) will find the record, but, they either explicitly mention the apostrophe, or, they look for the string prior to the apostrophe. I expect all of these to work regardless of the STOPLIST entry:
SQL> select * from test_apos where contains (name, 'Arby') > 0;
NAME
Arby's
SQL> select * from test_apos where contains (name, 'Arby%') > 0;
NAME
Arby's
SQL> select * from test_apos where contains (name, 'Arby''s') > 0;
NAME
Arby's
SQL> select * from test_apos where contains (name, '$(Arby)') > 0;
NAME
Arby'sNone of the following work (incl your suggestion, which I greatly appreciate, btw):
SQL> select * from test_apos where contains (name, 'Arbys') > 0;
no rows selected
SQL> select * from test_apos where contains (name, 'Arby%s') > 0;
no rows selected
SQL> select * from test_apos where contains (name, 'Arbys%') > 0;
no rows selected
SQL> select * from test_apos where contains (name, '$(Arbys)') > 0;
no rows selected--=cf -
Substring search with Oracle context indexes
Hi,
i would like to know if it is possibile to do a substring search with one of the obtion offer with the context indexes.
(ctxcat,ctxrule,context)
example:
i would like to search the word 'berub' in a column A in table_example.
the value in the column a are :
The betther
berube
A.berube
berub
Berub
BERUB
R berube
S tartif
Y Thibeault
the rows return should be :
berube
A.berube
berub
Berub
BERUB
R berube
A simple sql could be
select * from table_example where upper(a) like upper('%berub%' );
How i can do this same action with the context indexes and a select (catsearch, contains, matches), if it is possible?
A example will be welcome
ThanksI know how to do explain plan.
my point is not the query i post, it's just a example.
I have many query on my production we optimize many times (they past from 3min to 15 sec with optimisation, but we want to have better result). At this point we are looking to implant the context indexes to make them more efficient.
Do make this sql more efficient we have to deal with like '%xxxxxx%' and the context indexes like to be a option, but we have to be able to do some substring search with context option.
Is it possible to do it and how?
This is my question and why i post it here. The query is just a simple example to illsutrate what i want.
Thanks to anyone who can answer my question. -
I have created a context index and want to optimize it regularly. I issue the following statement:
ctx_ddl.optimize_index('MY_INDEX','FAST');
In the Enterprise Manager I can see, that there are a lot of Sort/Merge-operations, but the optimize process doesn't finish.
(During optimize there are inserts into the table and the index is updated by the ctx_ddl.sync_index procedure).
Can anyone tell me why the optimize doesn't finish? How can I avoid this situation?How long did you leave it running?Optimizing ran more than 10 hours. So I cancelled it, because this is not acceptable.
Optimization is an intensive process, and can sometimes take longer than the original index creation.
If you want it to complete in less time, consider using FULL optimize, but setting a time limit on it. That way it will do as much as it can within the time limit, and then start again next time from where it finished last time.During the night, I start a FULL optimize with MAXTIME = 60 minutes. But this seems not to be enough. If I don't start FAST optimizing during the day, the index get to much fragmentated.
Alternatively, if optimization is taking a lot longer than creating the index, consider dropping the index and recreating it from scratch. This will require downtime on the system, unless you get clever and use the USER_DATASTORE to create two indexes on different, dummy, columns - and switch searching between them when you want to rebuild the index. Make sure you use a generous setting for INDEX_MEMORY to avoid fragmentation.
- Roger -
Indexing and Searching PDF Files
Hi All,
I am trying to store and search PDF files in the oracle database.
I can insert and index the PDF files just fine but cannot get any result. I always get No Rows.
Here's what I am doing and the issues I am facing.
I created a Table with fields
ID (VARCHAR)
NAME (VARCHAR)
DOC (BLOB)
I inserted the PDF file in the BLOB field through a Java program and insert worked fine as I verified by retreiving the PDF and writing to file.
I created index using following SQL:
create index my_index on PDF_TABLE(PDF_FLD) indextype is ctxsys.context
parameters ('datastore ctxsys.default_datastore
filter ctxsys.inso_filter');
The index was created successfully without any problems.
I ran query as follows and got no rows although the searched text is in PDF
SELECT SCORE(1), PDF_FLD from PDF_TABLE WHERE CONTAINS (PDF_FLD, 'Table of Cotents',
1) > 0;
I tried alternate queries as well with no luck.
Any ideas ??
ThanksAfter creation the index you need execute next operations.
first, check that your index tables conatins indexed terms. Execute
select token_text from dr$YOUR_INDEX$i;
Second, you will need to check the index errors table CTX_INDEX_ERRORS. This is owned by the user CTXSYS, and most users do NOT have # SELECT privilege to it by default.
If it's OK, then check that your PDF documents is supported by INSO filter.
Citation:
"PDF - Portable Document Format
Acrobat Versions 2.1, 3.0, 4.0, and 5.0 including Japanese PDF"
(Appendix B. Supported Document Formats in Oracle Text Reference 9.2)
For Oracle 9i you could install 9.2.0.4 patchset (it included INSO FILTER 7.5)
P.S.
for the beginning, you could find answers for your question about Oracle Text here
http://otn.oracle.com/products/text
Sorry for my English.
Best regards, Victor Zogin. -
Creating a single context index on a one-to-many and lookup table
Hello,
I've been successfully setting up text indexes on multiple columns on the same table (using MULTI_COLUMN_DATASTORE preferences), but now I have a situation with a one-to-many data collection table (with a FK to a lookup table), and I need to search columns across both of these tables. Sample code below, more of my chattering after the code block:
CREATE TABLE SUBMISSION
( SUBMISSION_ID NUMBER(10) NOT NULL,
SUBMISSION_NAME VARCHAR2(100) NOT NULL
CREATE TABLE ADVISOR_TYPE
( ADVISOR_TYPE_ID NUMBER(10) NOT NULL,
ADVISOR_TYPE_NAME VARCHAR2(50) NOT NULL
CREATE TABLE SUBMISSION_ADVISORS
( SUBMISSION_ADVISORS_ID NUMBER(10) NOT NULL,
SUBMISSION_ID NUMBER(10) NOT NULL,
ADVISOR_TYPE_ID NUMBER(10) NOT NULL,
FIRST_NAME VARCHAR(50) NULL,
LAST_NAME VARCHAR(50) NULL,
SUFFIX VARCHAR(20) NULL
INSERT INTO SUBMISSION (SUBMISSION_ID, SUBMISSION_NAME) VALUES (1, 'Some Research Paper');
INSERT INTO SUBMISSION (SUBMISSION_ID, SUBMISSION_NAME) VALUES (2, 'Thesis on 17th Century Weather Patterns');
INSERT INTO SUBMISSION (SUBMISSION_ID, SUBMISSION_NAME) VALUES (3, 'Statistical Analysis on Sunny Days in March');
INSERT INTO ADVISOR_TYPE (ADVISOR_TYPE_ID, ADVISOR_TYPE_NAME) VALUES (1, 'Department Chair');
INSERT INTO ADVISOR_TYPE (ADVISOR_TYPE_ID, ADVISOR_TYPE_NAME) VALUES (2, 'Department Co-Chair');
INSERT INTO ADVISOR_TYPE (ADVISOR_TYPE_ID, ADVISOR_TYPE_NAME) VALUES (3, 'Professor');
INSERT INTO ADVISOR_TYPE (ADVISOR_TYPE_ID, ADVISOR_TYPE_NAME) VALUES (4, 'Associate Professor');
INSERT INTO ADVISOR_TYPE (ADVISOR_TYPE_ID, ADVISOR_TYPE_NAME) VALUES (5, 'Scientist');
INSERT INTO SUBMISSION_ADVISORS (SUBMISSION_ADVISORS_ID, SUBMISSION_ID, ADVISOR_TYPE_ID, FIRST_NAME, LAST_NAME, SUFFIX) VALUES (1,1,2,'John', 'Doe', 'PhD');
INSERT INTO SUBMISSION_ADVISORS (SUBMISSION_ADVISORS_ID, SUBMISSION_ID, ADVISOR_TYPE_ID, FIRST_NAME, LAST_NAME, SUFFIX) VALUES (2,1,2,'Jane', 'Doe', 'PhD');
INSERT INTO SUBMISSION_ADVISORS (SUBMISSION_ADVISORS_ID, SUBMISSION_ID, ADVISOR_TYPE_ID, FIRST_NAME, LAST_NAME, SUFFIX) VALUES (3,2,3,'Johan', 'Smith', NULL);
INSERT INTO SUBMISSION_ADVISORS (SUBMISSION_ADVISORS_ID, SUBMISSION_ID, ADVISOR_TYPE_ID, FIRST_NAME, LAST_NAME, SUFFIX) VALUES (4,2,4,'Magnus', 'Jackson', 'MS');
INSERT INTO SUBMISSION_ADVISORS (SUBMISSION_ADVISORS_ID, SUBMISSION_ID, ADVISOR_TYPE_ID, FIRST_NAME, LAST_NAME, SUFFIX) VALUES (5,3,5,'Williard', 'Forsberg', 'AMS');
COMMIT;I want to be able to create a text index to lump these fields together:
SUBMISSION_ADVISORS.FIRST_NAME
SUBMISSION_ADVISORS.LAST_NAME
SUBMISSION_ADVISORS.SUFFIX
ADVISOR_TYPE.ADVISOR_TYPE_NAME
I've looked at DETAIL_DATASTORE and USER_DATASTORE, but the examples in Oracle Docs for DETAIL_DATASTORE leave me a little bit perplexed. It seems like this should be pretty straightforward.
Ideally, I'm trying to avoid creating new columns, and keeping the trigger adjustments to a minimum. But I'm open to any and all suggestions. Thanks for for your time and thoughts.
-JamieI would create a procedure that creates a virtual document with tags, which is what the multi_column_datatstore does behind the scenes. Then I would use that procedure in a user_datastore, so the result is the same for multiple tables as what a multi_column_datastore does for one table. I would also use either auto_section_group or some other type of section group, so that you can search using WITHIN as with the multi_column_datastore. Please see the demonstration below.
SCOTT@orcl_11gR2> -- tables and data that you provided:
SCOTT@orcl_11gR2> CREATE TABLE SUBMISSION
2 ( SUBMISSION_ID NUMBER(10) NOT NULL,
3 SUBMISSION_NAME VARCHAR2(100) NOT NULL
4 )
5 /
Table created.
SCOTT@orcl_11gR2> CREATE TABLE ADVISOR_TYPE
2 ( ADVISOR_TYPE_ID NUMBER(10) NOT NULL,
3 ADVISOR_TYPE_NAME VARCHAR2(50) NOT NULL
4 )
5 /
Table created.
SCOTT@orcl_11gR2> CREATE TABLE SUBMISSION_ADVISORS
2 ( SUBMISSION_ADVISORS_ID NUMBER(10) NOT NULL,
3 SUBMISSION_ID NUMBER(10) NOT NULL,
4 ADVISOR_TYPE_ID NUMBER(10) NOT NULL,
5 FIRST_NAME VARCHAR(50) NULL,
6 LAST_NAME VARCHAR(50) NULL,
7 SUFFIX VARCHAR(20) NULL
8 )
9 /
Table created.
SCOTT@orcl_11gR2> INSERT ALL
2 INTO SUBMISSION (SUBMISSION_ID, SUBMISSION_NAME)
3 VALUES (1, 'Some Research Paper')
4 INTO SUBMISSION (SUBMISSION_ID, SUBMISSION_NAME)
5 VALUES (2, 'Thesis on 17th Century Weather Patterns')
6 INTO SUBMISSION (SUBMISSION_ID, SUBMISSION_NAME)
7 VALUES (3, 'Statistical Analysis on Sunny Days in March')
8 SELECT * FROM DUAL
9 /
3 rows created.
SCOTT@orcl_11gR2> INSERT ALL
2 INTO ADVISOR_TYPE (ADVISOR_TYPE_ID, ADVISOR_TYPE_NAME)
3 VALUES (1, 'Department Chair')
4 INTO ADVISOR_TYPE (ADVISOR_TYPE_ID, ADVISOR_TYPE_NAME)
5 VALUES (2, 'Department Co-Chair')
6 INTO ADVISOR_TYPE (ADVISOR_TYPE_ID, ADVISOR_TYPE_NAME)
7 VALUES (3, 'Professor')
8 INTO ADVISOR_TYPE (ADVISOR_TYPE_ID, ADVISOR_TYPE_NAME)
9 VALUES (4, 'Associate Professor')
10 INTO ADVISOR_TYPE (ADVISOR_TYPE_ID, ADVISOR_TYPE_NAME)
11 VALUES (5, 'Scientist')
12 SELECT * FROM DUAL
13 /
5 rows created.
SCOTT@orcl_11gR2> INSERT ALL
2 INTO SUBMISSION_ADVISORS (SUBMISSION_ADVISORS_ID, SUBMISSION_ID, ADVISOR_TYPE_ID, FIRST_NAME, LAST_NAME, SUFFIX)
3 VALUES (1,1,2,'John', 'Doe', 'PhD')
4 INTO SUBMISSION_ADVISORS (SUBMISSION_ADVISORS_ID, SUBMISSION_ID, ADVISOR_TYPE_ID, FIRST_NAME, LAST_NAME, SUFFIX)
5 VALUES (2,1,2,'Jane', 'Doe', 'PhD')
6 INTO SUBMISSION_ADVISORS (SUBMISSION_ADVISORS_ID, SUBMISSION_ID, ADVISOR_TYPE_ID, FIRST_NAME, LAST_NAME, SUFFIX)
7 VALUES (3,2,3,'Johan', 'Smith', NULL)
8 INTO SUBMISSION_ADVISORS (SUBMISSION_ADVISORS_ID, SUBMISSION_ID, ADVISOR_TYPE_ID, FIRST_NAME, LAST_NAME, SUFFIX)
9 VALUES (4,2,4,'Magnus', 'Jackson', 'MS')
10 INTO SUBMISSION_ADVISORS (SUBMISSION_ADVISORS_ID, SUBMISSION_ID, ADVISOR_TYPE_ID, FIRST_NAME, LAST_NAME, SUFFIX)
11 VALUES (5,3,5,'Williard', 'Forsberg', 'AMS')
12 SELECT * FROM DUAL
13 /
5 rows created.
SCOTT@orcl_11gR2> -- constraints presumed based on your description:
SCOTT@orcl_11gR2> ALTER TABLE submission ADD CONSTRAINT submission_id_pk
2 PRIMARY KEY (submission_id)
3 /
Table altered.
SCOTT@orcl_11gR2> ALTER TABLE advisor_type ADD CONSTRAINT advisor_type_id_pk
2 PRIMARY KEY (advisor_type_id)
3 /
Table altered.
SCOTT@orcl_11gR2> ALTER TABLE submission_advisors ADD CONSTRAINT submission_advisors_id_pk
2 PRIMARY KEY (submission_advisors_id)
3 /
Table altered.
SCOTT@orcl_11gR2> ALTER TABLE submission_advisors ADD CONSTRAINT submission_id_fk
2 FOREIGN KEY (submission_id) REFERENCES submission (submission_id)
3 /
Table altered.
SCOTT@orcl_11gR2> ALTER TABLE submission_advisors ADD CONSTRAINT advisor_type_id_fk
2 FOREIGN KEY (advisor_type_id) REFERENCES advisor_type (advisor_type_id)
3 /
Table altered.
SCOTT@orcl_11gR2> -- resulting data:
SCOTT@orcl_11gR2> COLUMN submission_name FORMAT A45
SCOTT@orcl_11gR2> COLUMN advisor FORMAT A40
SCOTT@orcl_11gR2> SELECT s.submission_name,
2 a.advisor_type_name || ' ' ||
3 sa.first_name || ' ' ||
4 sa.last_name || ' ' ||
5 sa.suffix AS advisor
6 FROM submission_advisors sa,
7 submission s,
8 advisor_type a
9 WHERE sa.advisor_type_id = a.advisor_type_id
10 AND sa.submission_id = s.submission_id
11 /
SUBMISSION_NAME ADVISOR
Some Research Paper Department Co-Chair John Doe PhD
Some Research Paper Department Co-Chair Jane Doe PhD
Thesis on 17th Century Weather Patterns Professor Johan Smith
Thesis on 17th Century Weather Patterns Associate Professor Magnus Jackson MS
Statistical Analysis on Sunny Days in March Scientist Williard Forsberg AMS
5 rows selected.
SCOTT@orcl_11gR2> -- procedure to create virtual documents:
SCOTT@orcl_11gR2> CREATE OR REPLACE PROCEDURE submission_advisors_proc
2 (p_rowid IN ROWID,
3 p_clob IN OUT NOCOPY CLOB)
4 AS
5 BEGIN
6 FOR r1 IN
7 (SELECT *
8 FROM submission_advisors
9 WHERE ROWID = p_rowid)
10 LOOP
11 IF r1.first_name IS NOT NULL THEN
12 DBMS_LOB.WRITEAPPEND (p_clob, 12, '<first_name>');
13 DBMS_LOB.WRITEAPPEND (p_clob, LENGTH (r1.first_name), r1.first_name);
14 DBMS_LOB.WRITEAPPEND (p_clob, 13, '</first_name>');
15 END IF;
16 IF r1.last_name IS NOT NULL THEN
17 DBMS_LOB.WRITEAPPEND (p_clob, 11, '<last_name>');
18 DBMS_LOB.WRITEAPPEND (p_clob, LENGTH (r1.last_name), r1.last_name);
19 DBMS_LOB.WRITEAPPEND (p_clob, 12, '</last_name>');
20 END IF;
21 IF r1.suffix IS NOT NULL THEN
22 DBMS_LOB.WRITEAPPEND (p_clob, 8, '<suffix>');
23 DBMS_LOB.WRITEAPPEND (p_clob, LENGTH (r1.suffix), r1.suffix);
24 DBMS_LOB.WRITEAPPEND (p_clob, 9, '</suffix>');
25 END IF;
26 FOR r2 IN
27 (SELECT *
28 FROM submission
29 WHERE submission_id = r1.submission_id)
30 LOOP
31 DBMS_LOB.WRITEAPPEND (p_clob, 17, '<submission_name>');
32 DBMS_LOB.WRITEAPPEND (p_clob, LENGTH (r2.submission_name), r2.submission_name);
33 DBMS_LOB.WRITEAPPEND (p_clob, 18, '</submission_name>');
34 END LOOP;
35 FOR r3 IN
36 (SELECT *
37 FROM advisor_type
38 WHERE advisor_type_id = r1.advisor_type_id)
39 LOOP
40 DBMS_LOB.WRITEAPPEND (p_clob, 19, '<advisor_type_name>');
41 DBMS_LOB.WRITEAPPEND (p_clob, LENGTH (r3.advisor_type_name), r3.advisor_type_name);
42 DBMS_LOB.WRITEAPPEND (p_clob, 20, '</advisor_type_name>');
43 END LOOP;
44 END LOOP;
45 END submission_advisors_proc;
46 /
Procedure created.
SCOTT@orcl_11gR2> SHOW ERRORS
No errors.
SCOTT@orcl_11gR2> -- examples of virtual documents that procedure creates:
SCOTT@orcl_11gR2> DECLARE
2 v_clob CLOB := EMPTY_CLOB();
3 BEGIN
4 FOR r IN
5 (SELECT ROWID rid FROM submission_advisors)
6 LOOP
7 DBMS_LOB.CREATETEMPORARY (v_clob, TRUE);
8 submission_advisors_proc (r.rid, v_clob);
9 DBMS_OUTPUT.PUT_LINE (v_clob);
10 DBMS_LOB.FREETEMPORARY (v_clob);
11 END LOOP;
12 END;
13 /
<first_name>John</first_name><last_name>Doe</last_name><suffix>PhD</suffix><submission_name>Some
Research Paper</submission_name><advisor_type_name>Department Co-Chair</advisor_type_name>
<first_name>Jane</first_name><last_name>Doe</last_name><suffix>PhD</suffix><submission_name>Some
Research Paper</submission_name><advisor_type_name>Department Co-Chair</advisor_type_name>
<first_name>Johan</first_name><last_name>Smith</last_name><submission_name>Thesis on 17th Century
Weather Patterns</submission_name><advisor_type_name>Professor</advisor_type_name>
<first_name>Magnus</first_name><last_name>Jackson</last_name><suffix>MS</suffix><submission_name>The
sis on 17th Century Weather Patterns</submission_name><advisor_type_name>Associate
Professor</advisor_type_name>
<first_name>Williard</first_name><last_name>Forsberg</last_name><suffix>AMS</suffix><submission_name
Statistical Analysis on Sunny Days inMarch</submission_name><advisor_type_name>Scientist</advisor_type_name>
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> -- user_datastore that uses procedure:
SCOTT@orcl_11gR2> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('sa_datastore', 'USER_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('sa_datastore', 'PROCEDURE', 'submission_advisors_proc');
4 END;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> -- index (on optional extra column) that uses user_datastore and section group:
SCOTT@orcl_11gR2> ALTER TABLE submission_advisors ADD (any_column VARCHAR2(1))
2 /
Table altered.
SCOTT@orcl_11gR2> CREATE INDEX submission_advisors_idx
2 ON submission_advisors (any_column)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS
5 ('DATASTORE sa_datastore
6 SECTION GROUP CTXSYS.AUTO_SECTION_GROUP')
7 /
Index created.
SCOTT@orcl_11gR2> -- what is tokenized, indexed, and searchable:
SCOTT@orcl_11gR2> SELECT token_text FROM dr$submission_advisors_idx$i
2 /
TOKEN_TEXT
17TH
ADVISOR_TYPE_NAME
AMS
ANALYSIS
ASSOCIATE
CENTURY
CHAIR
CO
DAYS
DEPARTMENT
DOE
FIRST_NAME
FORSBERG
JACKSON
JANE
JOHAN
JOHN
LAST_NAME
MAGNUS
MARCH
PAPER
PATTERNS
PHD
PROFESSOR
RESEARCH
SCIENTIST
SMITH
STATISTICAL
SUBMISSION_NAME
SUFFIX
SUNNY
THESIS
WEATHER
WILLIARD
34 rows selected.
SCOTT@orcl_11gR2> -- sample searches across all data:
SCOTT@orcl_11gR2> VARIABLE search_string VARCHAR2(100)
SCOTT@orcl_11gR2> EXEC :search_string := 'professor'
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> SELECT s.submission_name,
2 a.advisor_type_name || ' ' ||
3 sa.first_name || ' ' ||
4 sa.last_name || ' ' ||
5 sa.suffix AS advisor
6 FROM submission_advisors sa,
7 submission s,
8 advisor_type a
9 WHERE CONTAINS (sa.any_column, :search_string) > 0
10 AND sa.advisor_type_id = a.advisor_type_id
11 AND sa.submission_id = s.submission_id
12 /
SUBMISSION_NAME ADVISOR
Thesis on 17th Century Weather Patterns Professor Johan Smith
Thesis on 17th Century Weather Patterns Associate Professor Magnus Jackson MS
2 rows selected.
SCOTT@orcl_11gR2> EXEC :search_string := 'doe'
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> /
SUBMISSION_NAME ADVISOR
Some Research Paper Department Co-Chair John Doe PhD
Some Research Paper Department Co-Chair Jane Doe PhD
2 rows selected.
SCOTT@orcl_11gR2> EXEC :search_string := 'paper'
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> /
SUBMISSION_NAME ADVISOR
Some Research Paper Department Co-Chair John Doe PhD
Some Research Paper Department Co-Chair Jane Doe PhD
2 rows selected.
SCOTT@orcl_11gR2> -- sample searches within specific columns:
SCOTT@orcl_11gR2> EXEC :search_string := 'chair'
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> SELECT s.submission_name,
2 a.advisor_type_name || ' ' ||
3 sa.first_name || ' ' ||
4 sa.last_name || ' ' ||
5 sa.suffix AS advisor
6 FROM submission_advisors sa,
7 submission s,
8 advisor_type a
9 WHERE CONTAINS (sa.any_column, :search_string || ' WITHIN advisor_type_name') > 0
10 AND sa.advisor_type_id = a.advisor_type_id
11 AND sa.submission_id = s.submission_id
12 /
SUBMISSION_NAME ADVISOR
Some Research Paper Department Co-Chair John Doe PhD
Some Research Paper Department Co-Chair Jane Doe PhD
2 rows selected.
SCOTT@orcl_11gR2> EXEC :search_string := 'phd'
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> SELECT s.submission_name,
2 a.advisor_type_name || ' ' ||
3 sa.first_name || ' ' ||
4 sa.last_name || ' ' ||
5 sa.suffix AS advisor
6 FROM submission_advisors sa,
7 submission s,
8 advisor_type a
9 WHERE CONTAINS (sa.any_column, :search_string || ' WITHIN suffix') > 0
10 AND sa.advisor_type_id = a.advisor_type_id
11 AND sa.submission_id = s.submission_id
12 /
SUBMISSION_NAME ADVISOR
Some Research Paper Department Co-Chair John Doe PhD
Some Research Paper Department Co-Chair Jane Doe PhD
2 rows selected.
SCOTT@orcl_11gR2> EXEC :search_string := 'weather'
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> SELECT s.submission_name,
2 a.advisor_type_name || ' ' ||
3 sa.first_name || ' ' ||
4 sa.last_name || ' ' ||
5 sa.suffix AS advisor
6 FROM submission_advisors sa,
7 submission s,
8 advisor_type a
9 WHERE CONTAINS (sa.any_column, :search_string || ' WITHIN submission_name') > 0
10 AND sa.advisor_type_id = a.advisor_type_id
11 AND sa.submission_id = s.submission_id
12 /
SUBMISSION_NAME ADVISOR
Thesis on 17th Century Weather Patterns Professor Johan Smith
Thesis on 17th Century Weather Patterns Associate Professor Magnus Jackson MS
2 rows selected. -
PDF indexing and multiple searches.
Dear members:
Please forgive me if my question is rather basic but I haven't been able to find the exact answers I am looking for in order to address my project needs.
I have a folder where I keep all of my PDF files. These are all articles from medical journals that I keep organized using a browser application specific for these types of articles. The application allows me to search these articles but it only looks for specific keywords (title, author name, date, journal name and keyword just to name a few). However, it doesn't look at the content of the PDF file to find words that are contained in the body of the article itself.
I would like to be able to use Acrobat to search these articles and try to find words I am looking for in the entire article instead of being restricted only to keywords. These are the questions I have:
1. What is the best way to index these PDF files so that they can become searchable ?
2. Is there a way to find out if they have already been indexed by the publishing company so that I avoid wasting time by doing it again ?
3. My library now contains approximately 15,000 articles and I expect it to grow to at least 30,000. How can I handle these searches so that performance doesn't become an issue ? Is there a way to ensure that Acrobat can search these number of files without taking a long time ?
4. I understand from the help files that Acrobat can search an entire folder so I don't have to run my search one article or file at a time. Is this correct ? What is the best way to run my search so that Acrobat looks at all files in one folder ? In this folder I have subfolders (subdirectories) ? Will Acrobat look at all files when searching including those in subdirectories within the specified directory ?
Thank you in advance for your help and replies.
Best regards,
Joseph ChamberlainiAfter creation the index you need execute next operations.
first, check that your index tables conatins indexed terms. Execute
select token_text from dr$YOUR_INDEX$i;
Second, you will need to check the index errors table CTX_INDEX_ERRORS. This is owned by the user CTXSYS, and most users do NOT have # SELECT privilege to it by default.
If it's OK, then check that your PDF documents is supported by INSO filter.
Citation:
"PDF - Portable Document Format
Acrobat Versions 2.1, 3.0, 4.0, and 5.0 including Japanese PDF"
(Appendix B. Supported Document Formats in Oracle Text Reference 9.2)
For Oracle 9i you could install 9.2.0.4 patchset (it included INSO FILTER 7.5)
P.S.
for the beginning, you could find answers for your question about Oracle Text here
http://otn.oracle.com/products/text
Sorry for my English.
Best regards, Victor Zogin. -
Indexing and Searching pdf files which are used as attachment in an Announcemnet list item
Hi all,
I am using a SharePoint 2013 online environment and trying to search and find pdf files which are attached to a announcement list item. However it does not find anything when I search for the name of the pdf file or the content of the pdf file.
When I attach a word to the list item it gets indexed and it find the file.
thanks and appreciate every kind of advice.Are you able to search for pdfs in other locations? SharePoint 2013 comes with an iFilter out of the box unlike 2010 which needed configuration.
-
Context Index World Lexer ORA-03113: end-of-file on communication channel
I have release 10g Release 1 (10.1.0.2.0) for Windows and trying to take advantage of the World Lexer.
My table is:
create table worldtest(
filename char(32),
content blob
I've created a preference for the WORLD_LEXER:
begin
ctx_ddl.create_preference('wlex', 'WORLD_LEXER');
end;
Right now I'm working with 10,000 records of PDF, MS-Word, Text, and HTM documents. When I try and create a context index using this lexer:
create index i_ctx_wc on worldtest(content)
indextype is ctxsys.context
parameters ('lexer wlex');
The following error is returned, and I have to use drop index force to remove the index. Without 'Force' a message is return saying the index is an a loading state, but nothing is occuring.
create index i_ctx_wc on worldtest(content)
indextype is ctxsys.context
ERROR at line 1:
ORA-03113: end-of-file on communication channel
The indexing works fine when I leave the world lexer preference out.
Any suggestions would be wonderful. I've been banging my head on this one for a while.
Thanks.Hi,
I couldn't reproduce (different version which is perhaps the problem). Check out bug 4056162. ORA-3113 related to the use of the world lexer. The resolution is not published externally, so no help there, but the bug is closed so perhaps support can provide some insight.
Thanks. -
Dear All,
I am using oracle text and planning to make a search engine for text search on a set of document collection.
I have indexed the document set which is pdf and power point document and then resulted in 4 extra tables. Could anybody please give me explanation of each table meaning and also each of their columns.And how is it if I want to know which text is from which document?
I had search for their description but found nothing.
Then if I want to use 'contain', how I can do it. The example in oracle text reference and developer's guide only show example if my text of document is stored in a row for each document title. Here I use a large set of document and already make an index for it.
I really appreciate any help. I do really need to finish this as soon as I can. Thank you.
Edited by: theOrange on Apr 29, 2010 10:05 PMWhen you create a context index, the four tables are created automatically. Oracle uses them internally and you would not ordinarily access them directly or need to know about them to use Oracle Text. However, if you wish to understand what is happening in the background, the following link contains such information:
Oracle Text information
http://www.oracle.com/technology/products/text/index.html
Within the above is a section on the four tables:
dr$...$... tables:
http://www.oracle.com/technology/products/text/htdocs/text_dml_processing.html#P3
Most of what you need to know can be found in either of the following:
Text Application Developer's Guide
http://download.oracle.com/docs/cd/B28359_01/text.111/b28303/toc.htm
Text Reference
http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/toc.htm
For example, the following section has information about the syntax for contains:
contains syntax:
http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/csql.htm#CCREF0104
If you need further help, then you need to post a copy and paste of what you have tried, the results, and what you want, that is different. Please include create table statement, create index statement, sample data, the search you tried, and what results you want from that data based on the search criteria. -
Oracle 10g – Performance with BIG CONTEXT indexes
I would like to use Oracle XE 10.2.0.1.0 only for the full-text searching of the files residing outside the database on the FTP server.
Recently I have found out that size of the files to be indexed is 5GB.
As I have read somewhere on this forum before size of the index should be 30-40% of the indexed text files (so with formatted documents like PDF or DOC even less).
Lets say that the CONTEXT index size over these files will be 1.5-2GB.
Number of the concurrent user will be max. 5.
I can not easily test it my self yet.
Does anybody have any experience with Oracle XE or other Oracle Database edition performance with the CONTEXT index this BIG?
Will Oracle XE hardware resources license limitation be sufficient to handle one CONTEXT indexe this BIG?
(Oracle XE license limitations: 1 GB RAM and 1 CPU)
Regards.That depends on at least three things:
(1) what is the range of words that will appear in the document set (wide range of documents = smaller resultsets = better performance)
(2) how precise are the user's queries likely to be (more precise = smaller resultsets = better performance)
(3) how many milliseconds are your users willing to wait for results
So, unfortunately, you'll probably have to experiment a bit before you'll know... -
Creation of context index on index-organized table
I encountered a problem when creating a domain index(intermediate text context index) on a index-organised table in oracle 8i.
The description of the error is stated below:
"ORA-29866: cannot create domain index on a column of index-organized table "
I have configured intermediate text properly and even it worked for those tables which are not index-organised(ordinary tables).
This problem has occured only when i made the tables as index organised.
Please provide us a solution to this problem as early as possible.
In case if you require any more details i shall provide them.Please ask questions about Oracle Text (formerly interMedia text) in the Oracle Text forum. You will get a quicker, more expert answer there.
Maybe you are looking for
-
What are the good ways to send a big file( 20MB-100MB) file to my friend?
what are the good ways to send a big file( 20MB-100MB) file to my friend? Thanks in advance
-
Can't Print B&W Only With Samsung CLP-300N
I recently purchased an iMac to replace my XP CD. I've been a Windows XP user for a while, and had a Samsung CLP-300N attached to my network. I was able to fine-tune the printer driver settings (quality, number of copies, bw/color, etc) in Windows XP
-
Error Calling Oracle Stored Procedure From Within Report
Hi, I have a report that calls an oracle stored procedure which returns a ref cursor. The report is working ok in our development environment when called from our development website through .NET. When the report is moved and accessed from our UAT we
-
PC and DDTS 100 simultaneously
Hello,I have an actually simple problem I can't solve with out help...here it is:I want to connect my PC Soundcard (Creative X-Fi xtreme Music) and the Decoder DDTS-100 to my Soundsystem GigaWorks S750. I realized this through the use of the Switch B
-
Trouble with printing wireless
Can anyone help. Up until recently I have had no trouble printing wirelessly with iPad and HP photos mart B210 series. Recently it has become a nightmare. Printing jobs are left in a que. it connects to the printer, but will not complete print task.