Highlight in Oracle Text

Hi!!
I followed the steps described in the reference guide of Oracle Text to highlights the result of a search in a text, but I can't show the text whit these words highlighting.
I copy the code I ran:
create table hightab(query_id number,
offset number,
length number);
Table created
begin
ctx_doc.highlight('sentencias', '4', 'redactor', 'hightab', 0, TRUE);
end;
PL/SQL procedure successfully completed
select * from hightab
QUERY_ID OFFSET LENGTH
0 89 8
Thanks a lot!
Regards,
Fabiana.

It works for me, so in must be due to some differences in settings.
scott@ORA92> CREATE TABLE sentencias_ws
2    (id NUMBER PRIMARY KEY,
3      doc XMLTYPE)
4 /
Table created.
scott@ORA92> INSERT INTO sentencias_ws VALUES
2 (189, XMLTYPE ('<EST>
3 <JUR>
4 <SENT>
5 <NUMERO>2/04</NUMERO>
6 </SENT>
7 </JUR>
8 <NUMERO>1/2005</NUMERO>
9 </EST>
10 '))
11 /
1 row created.
scott@ORA92> begin
2    ctx_ddl.create_section_group('XMLGroup', 'XML_SECTION_GROUP');
3    ctx_ddl.ADD_ZONE_SECTION( 'XMLGroup', 'EST', 'EST' );
4    ctx_ddl.ADD_ZONE_SECTION('XMLGroup', 'JUR', 'JUR' );
5    ctx_ddl.ADD_ZONE_SECTION( 'XMLGroup', 'SENT', 'SENT' );
6    ctx_ddl.ADD_ZONE_SECTION( 'XMLGroup', 'NUMERO', 'NUMERO' );
7 end;
8 /
PL/SQL procedure successfully completed.
scott@ORA92> begin
2    ctx_ddl.create_preference('MYLEX','BASIC_LEXER');
3    ctx_ddl.set_attribute('MYLEX','SKIPJOINS','.');
4    ctx_ddl.set_attribute('MYLEX','NUMJOIN',',');
5 end;
6 /
PL/SQL procedure successfully completed.
scott@ORA92> CREATE INDEX busquedaXML ON sentencias_ws(doc)
2 INDEXTYPE IS ctxsys.context
3 PARAMETERS
4    ('datastore ctxsys.default_datastore
5       filter ctxsys.null_filter
6       section group XMLGroup
7       lexer mylex')
8 /
Index created.
scott@ORA92> Select * from sentencias_ws
2 where contains(doc, '(((2/04 within NUMERO) WITHIN SENT) WITHIN JUR) WITHIN EST')>0
3 /
        ID
DOC
       189
<EST>
<JUR>
    <SENT>
      <NUMERO>2/04</NUMERO>
    </SENT>
</JUR>
<NUMERO>1/2005</NUMERO>
</EST>
scott@ORA92>

Similar Messages

Highlighting results using oracle text

Hi
I've hooked up to Oracle text using odp.net and can query that fine. Now I would like to use the highlighting feature of OT but I'm having problems finding suitable examples of how to do it.
I've read through a number of Oracle Text documents, they say use CTX_DOC package to hightlight the result but that's PL/SQL. How can I do it with odp.net ?
thanks

Ok thanks for that - I've done that and its cleared the error but I'm still finding more problems
i've tweaked my code some more as the one and currently have this
          setType();
          OracleConnection oConn= dbConnect();
          //setType();
          oConn.Open();
          OracleCommand oCmd = new OracleCommand();
          oCmd.CommandText = "ctx_doc.Highlight";
          oCmd.Connection = oConn;
          oCmd.CommandType=CommandType.StoredProcedure;
          //params
OracleParameter oparam2 = oCmd.Parameters.Add("index_name", OracleDbType.Varchar2);
oparam2.Direction = ParameterDirection.Input;
oparam2.Value = "description_idx";
     OracleParameter oparam3 = oCmd.Parameters.Add("textkey", OracleDbType.Varchar2);
oparam3.Direction = ParameterDirection.Input;
oparam3.Value = "ID";
     OracleParameter oparam4 = oCmd.Parameters.Add("text_query", OracleDbType.Varchar2);
oparam4.Direction = ParameterDirection.Input;
oparam4.Value = "test || hello";
OracleParameter oparam = oCmd.Parameters.Add("restab", OracleDbType.Varchar2);
     oparam.Value = "ctx_hitab";
oparam.Direction = ParameterDirection.Input;
     OracleParameter oparam5 = oCmd.Parameters.Add("plaintext", OracleDbType.Int32);
oparam5.Direction = ParameterDirection.Input;
oparam5.Value = false;
          OracleDataAdapter da = new OracleDataAdapter(oCmd);
          DataSet ds = new DataSet();
          da.Fill(ds,"TEST");
          //oCmd.ExecuteNonQuery();
          oConn.Close();
and it's producing this error which obviously relates to parameter 3 but I'm not sure how I can correct it.
ORA-20000: Oracle Text error: DRG-11445: rowid value is invalid: ID ORA-06512:
at "CTXSYS.DRUE", line 157 ORA-06512: at "CTXSYS.CTX_DOC", line 876 ORA-06512: at
line 1
many thanks

Highlighting a keyword in Oracle Text

Hi
I am trying to highlight the query terms returned as part of the result set from Oracle text. Although I know this is possible, I am unsure as to exactly how to achieve it. Please guide me.
Thanks
AJ

You can use ctx_doc.markup or ctx_doc.snippet. You can find the syntax and examples in the online documentation. There are also some examples on this forum.

Using Oracle Text with Apex

Can someone point me to some resources on how to integrate Oracle Text and APEX to do searches, highlight results, etc (all the features of Oracle Text)?
The data to be indexed is in files on the filesystem, so I would like to keep it that way and use the FILE_DATASTORE option for Text.
Thanks for any pointers.
Update: Yes, I did see http://www.oracle.com/technology/products/database/application_express/pdf/apex_text_application_v1.6.pdf
but the search results there just returns the URL/file containing the "hit". It doesn't show the actual text fragment that caused the match, doesn't highlight it, etc. I am looking for a real Google-like search. Hm, having said that, I might as well use Google Desktop! Nah, where's the fun in that?

This is a very simple application for my own use. It started life in 8i when there were fewer Text options.
As such, it uses the query string as entered. This returns all of the matches:
select msgid, msgdate, Box, fromaddr, subject
from eudora.inbox
where contains(body, :P703_MailSearch) > 0
order by msgdate descI display the selected result like this:
select subject,
Replace(eudora.mmarkup(:P704_MSGID, :P702_SEARCH), Chr(13), '<BR>') Body
from eudora.inbox
where msgid = :P704_MSGIDIn a newer application, I experimented with the CTXCAT grammer.
That query looks like this:
select m.ID, m.pdpno, m.shortdesc
from pdp_mast m
where contains(m.dphistory, '<query><textquery lang="ENGLISH" grammar="CTXCAT">
                                         ' || :P1_Text || '
                                     </textquery>
                                  <score datatype="INTEGER"/>
                              </query>') > 0
    or contains(m.shortdesc, '<query><textquery lang="ENGLISH" grammar="CTXCAT">
                                         ' || :P1_Text || '
                                     </textquery>
                                  <score datatype="INTEGER"/>
                              </query>') > 0As always, once you figure out the syntax, its easy to make it work in Apex.
Text indexes are very fast. On my old 600MHz PC, searches in 250MB of text take less than a second.

Highlite oracle text search terms

I have a report that I set up using the instructions for Oracle Text Application in APEX. It works very well however I have the actual document as a link and I would like the search terms highlighted in the actual document. Is there a way to do that in APEX?
I use this Region Source:
select score(1) relevance, filename, dbms_lob.getlength("DOCUMENT") Document, code_id
from documents
where contains (document, :P10_SEARCH, 1) > 0
order by 1 desc
I read something about using ctx_doc.snippet to highlight but can get that to work.
Any suggestions or can APEX highlight terms when the actual document is used?

'8265490,
Take a look at the ctx_doc.markup procedure. I think it will do what you want.
http://download.oracle.com/docs/cd/B19306_01/text.102/b14217/view.htm#sthref599
My home server is on a moving truck, so I can only point you to some old forum posts for examples:
Re: Using Oracle Text with Apex
Re: Use apex to display email
Doug

Document management system using oracle text

i plan to create document management system using oracle text with following features
1) document comparision
2) document search
and more...
can oracle text be used to display documents of various formats by converting them to HTML. and can search keywords be highlighted in the document.
please help!

Have you ever considered doing this in Oracle Application Express (free on top of the Oracle database)? How about something like:
http://download-west.oracle.com/docs/cd/B31036_01/doc/appdev.22/b28839/up_dn_files.htm
Index the files using the CONTEXT index, and perhaps the docs' meta with it using the Oracle Text MULTI_COLUMN_DATASTORE, and then when you write your query for a report on the documents include a search string.
I've created a number of APEX-based document management systems and it is quite easy once you get the hang of using this environment. I suggest looking at some of the tutorials/how-to documents and you'll be on your way quickly.
Start with the upload application. Once you can get your documents in, create a report that shows everything except the document. Verify all of this works correctly.
Add some "items" to the page for the report, and include them as bind variables in the where clause.
After that, add your Oracle Text index to the database, and toss in a "text-field" item to the APEX page. Modify your report query, adding the CONTAINS clause, and use the newly created item as a bind variable. There's your keyword search.
Linking to Oracle Apps is done through API's and may be over database links.
Hope it helps. Though not a step-by-step how to document, this should point you in the right direction. Get familiar with APEX as that covers most of what you described.
-Ron

Who's using Oracle Text?

Hi all,
I've done some experimenting with Oracle Text, and it seems to
offer some handy stuff. I am a bit disappointed with the quality
of the INSO filtering technology, especially for PDFs. There are
also some items that seem, at best, extremely difficult, and at
worst, impossible. (Highlighting a matched search term found in
a PDF file comes to mind. It's not going to do my users a whole
lot of good to get a match to a search in a 200 page PDF file,
just to then have to go digging all the way through Acrobat
Reader to get to the actual content that matched...) My initial
impression is that this may be great if you just have a pile of
ASCII text files, but that using it for much more than that is
pushing it...
Anyway, I've done lots of digging through here and Metalink, and
have seen tons of weird kinds of problems. What I'm looking for
are some stories: Tell me how you used it to do something good.
Let me know what the problems were, and how you surmounted them
(or didn't). Basically, I'd like to get some personal
experiences from folks that I can use to make a determination
whether I should continue to investigate using this tool, or if I
should pursue a third-party solution (like Verity, etc.).
Advice from folks with specific experience doing
document/content/knowledge management with Oracle Text would be
appreciated as well.
Thanks!
Sean

Sean,Steve, first: Thanks for the response! I'd about given up on
this thread ;-)
You certainly bring up some interesting points.
Couple fo points from my side ...
1. I am affiliated with Oracle - I work in the Product team,
doing Product managementThanks for being up front about that. I only questioned Omar
about it simply because his "Author" identifier didn't show an
e-mail address -- yours does, so I could've figured it out.
2. You are quite correct that the point of these forums is to get
real customers to discuss real applications/problems/solutions.
This Text forum seems to be getting some input from customers,
though it's patchy.I assumed that was the case, but I've been a bit disappointed in
the results (not specifically about this thread, just in general
from reviewing various other forums too -- not that I've been
anywhere near exhaustive in my reading of all the forums.) In
retrospect, I should've just left the PDF filtering question out.
It changed the focus of this thread from "what cool stuff are
you doing with Oracle Text and how hard was it to do?" to "the
filtering sucks and can we fix it." I didn't mean for that to
happen, and got a bit too short with Omar as a result.
As usual, hindsight's 20/20.
I tend to come more from an "open source" (don't judge me for
that, I don't particularly care for that term -- I just use it
for lack of a current better term) perspective -- basically
fairly open discussions about projects with lots of sharing of
ideas and such. It seems from an outsider's perspective (I'm
very new to the Oracle world) that the current culture may not be
quite ready for that (see Oliver's post before yours above). We
seem to end up here less "talking about stuff" and more "help,
it's broken!" That's perfectly OK, but it seems like we already
have that in the Metalink forums (although I guess this area is
more public). I'd venture that this tendancy might disappear
over time, especially if discussions are fostered by Oracle (like
you're doing here -- thanks again!).
Will you be at Oracle OpenWorld in December ?
We are trying to set up a SIG within the Oracle User Group, as a
further incentive for customers to have these kinds of
discussions.Unfortunately, no. I'll be traveling most of the month of
December for the holidays. I really wish I was going now though,
it would be great to get in on that. I'll definately try to make
it next time.
3. Yes it is difficult to highlight within a PDF document. You
don't say which version you are using, but we recently (8.1.7)
switched to INSO for our PDF filter (before that we used INSO for
everything except PDF). The 9i PDF filter is much better : and
the 9.2 (not yet released, but a backport is available) is better
still.I'm using 8.1.7.2 -- I'll be moving to 9i soon (probably late
this year or first thing after the new year). I look forward to
seeing how the filters do in the new version.
When I've recently talked to other vendors that are managing to
do the PDF highlighting (Verity and Convera, for example) they've
indicated that there is a standard mechanism to specify word
highlighting when you launch or trigger the reader. I haven't
checked into it beyond that, but at least it's a thread of hope.
PDF is a difficult format to work with. It's intended to be a
"print format" or "final-form format", i.e. it was never intended
to be filtered/indexed/highlighted.Yeah, there are several problems with it. Unfortunately (or
fortunately, depending on your viewpoint) it's fairly ubiquitous,
so I'm stuck with it. I'd still rather have the documents in PDF
than something even more proprietary though. At least I can get
a reliable viewer for PDFs for my UNIX systems.
I've seen a number of
companies that claim to do PDF filtering and highlighting, but
I haven't yet seen anyone that makes a great job of it. We have
discussed this with Adobe : watch this space, hopefully we'll be
able to provide a better solution soon.That would be excellent. I personally have seen it work
adequately. It may not be perfect, but it's still handy. When
speaking with a Verity rep, they indicated something about the
reader accepting an XML stream of positioning data and then he
clammed up (this guy may just have been blowing smoke so take
that with as many grains of salt as you like).
Meanwhile I, like you, encourage other customers to speak up on
this forum - I'd certainly like to know more about people's
experiences and requirements in this area.Thank you for your encouragement. My responses here in no way
reflect any disatisfaction with Oracle products. I'm having a
good time using a solid database for a change. I just want to
insure I use it to its fullest, and that's why I'm here.
Oh, and if anybody wants the advice of an old-school UNIX geek,
I'll be around ;-)
Sean

Can oracle text be used to compare documents?

lets say that i 've some documents stored in binary(LOB). can oracle text be used to compare documents and show their similarity on the basis of their content. how would i be able to compare documents using Oracle text. does it require mining algorithm like neural network. please help.
thanks for reading.

Thank you for your interest in my question. Let me see whether I can further clarify it. In an ordinary PDF document, assume that I have a picture of a user interface for microsoft Word. The common method for identifying items in the picture, such as a toolbar, would be to either:
--use a callout labeled "toolbar" that points to the toolbar
or
--use a callout labelled "A" and have a caption underneath the picture that says: A) toolbar.
What I would like to do is have text underneath the picture such as:
"The major features of the interface shown above are:
toolbar
main menu
status bar
formatting menu"
such that, when the user clicks one of the bullet items, the object becomes highlighted in the picture. The bullet list also needs to be translatable into Japanese. So, as far as I know, it can't be part of the swf file. Or can it?

Oracle Text - CTX Context Index Soundex Problem

Hi,
I'm running into a problem with Oracle Text when searching using the ! (soundex) option. I've created a simple test example to highlight the issue.
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit
Windows 2008 Server 64-bit
create table test_tab (test_col varchar2(200));
insert all
into test_tab (test_col) values ('ab-tönes')
into test_tab (test_col) values ('ab-tones')
into test_tab (test_col) values ('abtones')
into test_tab (test_col) values ('ab tones')
into test_tab (test_col) values ('ab-tanes')
select * from dual
select * from test_tab
begin
      ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
      ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&''');
      ctx_ddl.set_attribute('test_lex1','base_letter','YES');
      -- ctx_ddl.set_attribute('test_lex1','skipjoins','-');
end;
create index test_idx on test_tab (test_col)
indextype is ctxsys.context
    parameters
      ('lexer        test_lex1'
select token_text from dr$test_idx$i;
TOKEN_TEXT
AB
ABTONES
TANES
TONES
select * from test_tab where contains (test_col, '!ab tones') > 0;
TEST_COL
ab-tönes
ab-tones
ab tones
select * from test_tab where soundex(test_col) = soundex('ab tones');
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
So my question is, can anyone suggest an approach whereby I can get the Oracle Text Context index (or CTXCAT index if it's more appropriate) to return all 5 rows like the simple Soundex is doing?
I can't really use soundex as this search query will form part of a search screen for a multi-language application. Soundex is limited to English sounding words, so I need the solution to be able to compare strings that may not "sound" English.
It must be an attribute of the BASIC_LEXER, and I've tried skipjoins, start/end-joins, stop lists, but I just cannot get the Soundex feature of Oracle Text to function like the SOUNDEX() function!
Looking at how the tokens are stored dr$test_idx$i I need Oracle Text to almost concat 'AB' and 'TONES' to search as a single string.
Any help greatly appreciated.
Thanks,

I am not getting the same problem that you are getting with the umlat, but I don't see what is different. Please post the result of:
select ctx_report.create_index_script ('test_idx') from dual;
Here are the results on my system. Perhaps you can spot the difference. I added an empty_stoplist, so that it won't print out a long list of stopwords.
SCOTT@orcl12c> create table test_tab (test_col    varchar2(200))
2 /
Table created.
SCOTT@orcl12c> insert all
2    into test_tab (test_col) values ('ab-tönes')
3    into test_tab (test_col) values ('ab-tones')
4    into test_tab (test_col) values ('abtones')
5    into test_tab (test_col) values ('ab tones')
6    into test_tab (test_col) values ('ab-tanes')
7 select * from dual
8 /
5 rows created.
SCOTT@orcl12c> select * from test_tab
2 /
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
5 rows selected.
SCOTT@orcl12c> begin
2    ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
3    ctx_ddl.set_attribute('test_lex1','base_letter','YES');
4 end;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> create or replace procedure test_proc
2    (p_rowid in          rowid,
3      p_clob    in out nocopy clob)
4 as
5 begin
6    select replace (translate (test_col, '/\|-_+&''', '      '), ' ', '')
7    into   p_clob
8    from   test_tab
9    where rowid = p_rowid;
10 end test_proc;
11 /
Procedure created.
SCOTT@orcl12c> show errors
No errors.
SCOTT@orcl12c> begin
2    ctx_ddl.create_preference ('test_ds', 'user_datastore');
3    ctx_ddl.set_attribute ('test_ds', 'procedure', 'test_proc');
4 end;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl12c> create index test_idx on test_tab (test_col)
2    indextype is ctxsys.context
3    parameters
4       ('lexer    test_lex1
5         datastore    test_ds
6         stoplist    ctxsys.empty_stoplist')
7 /
Index created.
SCOTT@orcl12c> select token_text from dr$test_idx$i
2 /
TOKEN_TEXT
ABTANES
ABTONES
2 rows selected.
SCOTT@orcl12c> variable search_string varchar2(100)
SCOTT@orcl12c> exec :search_string := 'ab tones'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> select * from test_tab
2 where contains
3            (test_col,
4             '!' || replace (:search_string, ' ', ' !') ||
5             ' or !' || replace (:search_string, ' ', '')) > 0
6 /
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
5 rows selected.
SCOTT@orcl12c> exec :search_string := 'abtones'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> /
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
5 rows selected.
SCOTT@orcl12c> exec :search_string := 'ab tönes'
PL/SQL procedure successfully completed.
SCOTT@orcl12c> /
TEST_COL
ab-tönes
ab-tones
abtones
ab tones
ab-tanes
5 rows selected.
SCOTT@orcl12c> select ctx_report.create_index_script ('test_idx') from dual
2 /
CTX_REPORT.CREATE_INDEX_SCRIPT('TEST_IDX')
begin
ctx_ddl.create_preference('"TEST_IDX_DST"','USER_DATASTORE');
ctx_ddl.set_attribute('"TEST_IDX_DST"','PROCEDURE','"SCOTT"."TEST_PROC"');
end;
begin
ctx_ddl.create_preference('"TEST_IDX_FIL"','NULL_FILTER');
end;
begin
ctx_ddl.create_section_group('"TEST_IDX_SGP"','NULL_SECTION_GROUP');
end;
begin
ctx_ddl.create_preference('"TEST_IDX_LEX"','BASIC_LEXER');
ctx_ddl.set_attribute('"TEST_IDX_LEX"','BASE_LETTER','YES');
end;
begin
ctx_ddl.create_preference('"TEST_IDX_WDL"','BASIC_WORDLIST');
ctx_ddl.set_attribute('"TEST_IDX_WDL"','STEMMER','ENGLISH');
ctx_ddl.set_attribute('"TEST_IDX_WDL"','FUZZY_MATCH','GENERIC');
end;
begin
ctx_ddl.create_stoplist('"TEST_IDX_SPL"','BASIC_STOPLIST');
end;
begin
ctx_ddl.create_preference('"TEST_IDX_STO"','BASIC_STORAGE');
ctx_ddl.set_attribute('"TEST_IDX_STO"','R_TABLE_CLAUSE','lob (data) store as (
cache)');
ctx_ddl.set_attribute('"TEST_IDX_STO"','I_INDEX_CLAUSE','compress 2');
end;
begin
ctx_output.start_log('TEST_IDX_LOG');
end;
create index "SCOTT"."TEST_IDX"
on "SCOTT"."TEST_TAB"
      ("TEST_COL")
indextype is ctxsys.context
parameters('
    datastore       "TEST_IDX_DST"
    filter          "TEST_IDX_FIL"
    section group   "TEST_IDX_SGP"
    lexer           "TEST_IDX_LEX"
    wordlist        "TEST_IDX_WDL"
    stoplist        "TEST_IDX_SPL"
    storage         "TEST_IDX_STO"
begin
ctx_output.end_log;
end;
1 row selected.

Oracle Text Storage Issue

Hi Everyone,
My name is John and I just have 3 small queries which your expertise and assistance is greatly needed and appreciated.
I'm currently using Oracle Text on Oracle 10g Enterprise Edition Release 10.2.0.2.0 database and experiencing some kind of space storage problem. I have a table with 2 BLOB columns. One of the column is storing the TIF image file and the other column is storing the TIF's OCR version in PDF format. We are indexing on the PDF format column for rapid text retrieval. As we are loading them into the table, the index and table tablespaces were used up very rapidly. I've used and created my context index storage using the statements below:
ctx_ddl.create_preference('OCR_DOC_OCR_CONTENT_I_STORAGE','BASIC_STORAGE');
ctx_ddl.set_attribute('OCR_DOC_OCR_CONTENT_I_STORAGE','I_INDEX_CLAUSE',
'tablespace TS_OCR_IDX_LGE compress 2');
ctx_ddl.set_attribute('OCR_DOC_OCR_CONTENT_I_STORAGE','I_TABLE_CLAUSE',
'tablespace TS_OCR_IDX_LGE');
ctx_ddl.set_attribute('OCR_DOC_OCR_CONTENT_I_STORAGE','K_TABLE_CLAUSE',
'tablespace TS_OCR_IDX_LGE');
ctx_ddl.set_attribute('OCR_DOC_OCR_CONTENT_I_STORAGE','N_TABLE_CLAUSE',
'tablespace TS_OCR_IDX_LGE');
ctx_ddl.set_attribute('OCR_DOC_OCR_CONTENT_I_STORAGE','P_TABLE_CLAUSE',
'tablespace TS_OCR_IDX_LGE');
ctx_ddl.set_attribute('OCR_DOC_OCR_CONTENT_I_STORAGE','R_TABLE_CLAUSE',
'tablespace TS_OCR_IDX_LGE lob (data) store as (cache)');
I've created my table using the following commands below:
create table OCR_DOCUMENT (
DOC_ID number
,DOC_NAME varchar2(255)
,DOC_DIRECTORY varchar2(255)
,DOC_EXTENSION varchar2(10)
,DOC_CONTENT blob
,OCR_EXTENSION varchar2(10)
,OCR_CONTENT blob
,HAS_BLOB varchar2(1)
,CREATED_DATETIME date
,FILE_NAME VARCHAR2(2000)
,DW_DOC_ID NUMBER
,PAGE_NO NUMBER
,DOC_TYPE VARCHAR2(100)
,DOC_CLASS VARCHAR2(100)
,DOC_DESCRIPTION VARCHAR2(2000)
,PAGES NUMBER(10)
,CLT_NUMBER NUMBER(10)
,TAXENT_NUMBER NUMBER(10)
,REG_DATE DATE
,TAX_YEAR VARCHAR2(20)
,ORIG_FILE_NAME VARCHAR2(2000)
tablespace TS_OCR_TBL_LGE
pctfree 5 initrans 2 maxtrans 255
nologging noparallel;
My first question is, is there anything wrong with my storage clauses so I can improve and save some additional space?
Second question is, is there a way that I can compress and save some space on the table blob columns, i.e. DOC_CONTENT and OCR_CONTENT, without affecting the document service retreival?
Because at the beginning of the project, I've used utl_compress.lz_uncompress to compress the BLOB content before storing them to the table but I soon ditched such idea after finding out when I attempt to retrieve the compressed BLOB content using ctx_doc.markup for highlight document service (to highlight the text which I've used in my searching), it displayed some sort of garbage text information and I could not find any workaround to it.
Also, if we are preapred NOT to use the THEME and GIST features of Oracle Text, can I perhaps remove them to save some addition space? Any feedback that I can save space would be welcomed and appreciated. Have a nice day.
Thanks and Regards,
John

The BEST solution to your problem is to move to 11gRelease1
I am not sure how feasible that will be on your part, but 11gR1 have exactly the same capabilities as you are looking for.
You can compress, deduplicate all the LOB fields (with SECUREFILE clause) in all the tables including internal index tables ($R etc) and the base table (OCR_DOCUMENT).
This is just for your information.
I dont reallyhave any other information to share with you to resolve your problem :(

How do I know what Oracle products I need installed for Oracle Text to work

I get the following errors from a VB application when trying to access the Oracle Text procedures:
Oracle Error: -20000
ORA-20000: Oracle Text error:
ORA-02074: cannot ROLLBACK in a distributed transaction
ORA-06512: at "CTXSYS.CTX_DOC", line 483
ORA-02074: cannot SET SAVEPOINT in a distributed transaction
DRG-10816: display/highlight call failed
ORA-02074: cannot SET SAVEPOINT in a distributed transaction
I need to resolve this ASAP

Oracle Text is part of the database. Try the quick start sample (http://otn.oracle.com/products/text/x/Samples/Quick_Start/index.html) and check if a sql-contains statement works.

How do I get Oracle Text to index files on a file server?

I am new to Oracle (I'm a MS-SQL DBA looking for a Full-Text Search solution that is better than linking to a MS index server.)
So - Here's the objective:
I have Oracle Server(Express) installed on a Windows server.
I would like for Oracle to build a Full-Text Catalog of the files on a separate file server based on file paths in a table in the database.
(No desire to store terabytes of images and documents inside the database)
I can get Oracle text up and running, using the URL_Datastore:
CREATE TABLE files (id NUMBER PRIMARY KEY, issue_id NUMBER, path VARCHAR(255) UNIQUE, ot_format VARCHAR(6), ot_version VARCHAR(10));
The Compaq server is a remote windows server on my local workgroup, so the fully qualified path is just "compaq" and the URL is valid:
INSERT INTO files VALUES (9,9,'file://Compaq/FTQ/00000003.pdf',NULL,NULL);
INSERT INTO files VALUES (13,13,'file://Compaq/FTQ/01.txt',NULL,NULL);
CREATE INDEX file_index ON files(path) INDEXTYPE IS ctxsys.context
PARAMETERS ('datastore ctxsys.URL_DATASTORE format column ot_format');
but when I enter:
Select * from CTX_User_Index_errors, I see the following errors:
DRG-11609: URL store: unable to open local file specified by file://Compaq/FTQ/00000003.pdf
DRG-11609: URL store: unable to open local file specified by file://Compaq/FTQ/01.txt
Did I miss something?
Do I need to install anything on the file server?
I would like to convince my company that Oracle can be much quicker than Microsoft's Indexing Service because it can avoid joining two large result sets (one result set from Full_text (indexing service) and one for specific data contained in fields in the MS-SQL database.) Full Text Searches commonly take 40 - 60 seconds where there are 1.5 million multi-page PDF files for a particular set that I sample search on. Without this massive join, I believe I can get the search to run in under 10 seconds.

Thank you!
File_Datastore worked fine.
I was staying away from File_Datastore because the information I gathered from googling suggested that file_datastore would only work locally.
Now I just have to get Oracle to pull data out of tables in a MS-SQL database on the local network (don't have a clue yet), and then have it index compiled file paths.
Then MS-SQL can query Oracle with index and full-text criteria and Oracle can send back a result set
It may sound like a bad way of performing Full-Text Queries, but anything will be better than the way things are currently running. We are currently performing Full Text Searches on a table that is rebuilt nightly, so the table containing millions of file paths is not live..
It would be so much better if we just migrated to Oracle, but we currently do not have the resources.

Error while running the Oracle Text optimize index procedure (even as a dba user too)

Hi Experts,
I am on Oracle on 11.2.0.2 on Linux. I have implemented Oracle Text. My Oracle Text indexes are fragmented but I am getting an error while running the optimize_index error. Following is the error:
begin
ctx_ddl.optimize_index(idx_name=>'ACCESS_T1',optlevel=>'FULL');
end;
ERROR at line 1:
ORA-20000: Oracle Text error:
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DDL", line 941
ORA-06512: at line 1
Now I tried then to run this as DBA user too and it failed the same way!
begin
ctx_ddl.optimize_index(idx_name=>'BVSCH1.ACCESS_T1',optlevel=>'FULL');
end;
ERROR at line 1:
ORA-20000: Oracle Text error:
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DDL", line 941
ORA-06512: at line 1
Now CTXAPP role is granted to my schema and still I am getting this error. I will be thankful for the suggestions.
Also one other important observation: We have this issue ONLY in one database and in the other two databases, I don't see any problem at all.
I am unable to figure out what the issue is with this one database!
Thanks,
OrauserN

How about check the following?
Bug 10626728 - CTX_DDL.optimize_index "full" fails with an empty ORA-20000 since 11.2.0.2 upgrade (DOCID 10626728.8)

Getting error while importing schema with ORACLE TEXT

IMP-00003: ORACLE error 20000 encountered
ORA-20000: Oracle Text error:
DRG-52204: error while registering index
DRG-10507: duplicate index name: WORKORDER_Q, owner: SYS
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.DRIIMP", line 115
ORA-06512: at line 2
IMP-00088: Problem importing metadata for index WORKORDER_Q. Index creation will be skipped
Database version - Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
Os version - Linux nlxs1012.slb.atosorigin-asp.com 2.6.18-308.el5 #1 SMP Fri Jan 27 17:17:51 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
We have take export of schema from production db now importing data to qa environment..
In import facing above error..

I am importing objects from P20_MAXIMO to Q25_MAXIMO to another database..
Below is import par file..
USERID='/ as sysdba'
FILE=exp_P20_MAXIMO_C2364781.dmp
LOG=imp_P20_MAXIMO__Q25_MAXIMO_C2364781_1.log
FROMUSER=P20_MAXIMO
TOUSER=Q25_MAXIMO
buffer=1000000
feedback=100000
Export parfile
userid='/ as sysdba'
owner=P20_MAXIMO
FILE=exp_P20_MAXIMO_C2364781.dmp
LOG=exp_P20_MAXIMO_C2364781.log
buffer=10000000
feedback=100000
statistics=none

Pre-loading Oracle text in memory with Oracle 12c

There is a white paper from Roger Ford that explains how to load the Oracle index in memory : http://www.oracle.com/technetwork/database/enterprise-edition/mem-load-082296.html
In our application, Oracle 12c, we are indexing a big XML field (which is stored as XMLType with storage secure file) with the PATH_SECTION_GROUP. If I don't load the I table (DR$..$I) into memory using the technique explained in the white paper then I cannot have decent performance (and especially not predictable performance, it looks like if the blocks from the TOKEN_INFO columns are not memory then performance can fall sharply)
But after migrating to oracle 12c, I got a different problem, which I can reproduce: when I create the index it is relatively small (as seen with ctx_report.index_size) and by applying the technique from the whitepaper, I can pin the DR$ I table into memory. But as soon as I do a ctx_ddl.optimize_index('Index','REBUILD') the size becomes much bigger and I can't pin the index in memory. Not sure if it is bug or not.
What I found as work-around is to build the index with the following storage options:
ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'YES' );
ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
so that the token_info column will be stored in a secure file. Then I can change the storage of that column to put it in the keep buffer cache, and write a procedure to read the LOB so that it will be loaded in the keep cache. The size of the LOB column is more or less the same as when creating the index without the BIG_IO option but it remains constant even after a ctx_dll.optimize_index. The procedure to read the LOB and to load it into the cache is very similar to the loaddollarR procedure from the white paper.
Because of the SDATA section, there is a new DR table (S table) and an IOT on top of it. This is not documented in the white paper (the white paper was written for Oracle 10g). In my case this DR$ S table is much used, and the IOT also, but putting it in the keep cache is not as important as the token_info column of the DR I table. A final note: doing SEPARATE_OFFSETS = 'YES' was very bad in my case, the combined size of the two columns is much bigger than having only the TOKEN_INFO column and both columns are read.
Here is an example on how to reproduce the problem with the size increasing when doing ctx_optimize
1. create the table
drop table test;
CREATE TABLE test
(ID NUMBER(9,0) NOT NULL ENABLE,
XML_DATA XMLTYPE
XMLTYPE COLUMN XML_DATA STORE AS SECUREFILE BINARY XML (tablespace users disable storage in row);
2. insert a few records
insert into test values(1,'<Book><TITLE>Tale of Two Cities</TITLE>It was the best of times.<Author NAME="Charles Dickens"> Born in England in the town, Stratford_Upon_Avon </Author></Book>');
insert into test values(2,'<BOOK><TITLE>The House of Mirth</TITLE>Written in 1905<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
insert into test values(3,'<BOOK><TITLE>Age of innocence</TITLE>She got a prize for it.<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
3. create the text index
drop index i_test;
exec ctx_ddl.create_section_group('TEST_SGP','PATH_SECTION_GROUP');
begin
CTX_DDL.ADD_SDATA_SECTION(group_name => 'TEST_SGP',
                            section_name => 'SData_02',
                            tag => 'SData_02',
                            datatype => 'varchar2');
end;
exec ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
exec ctx_ddl.set_attribute('TEST_STO','I_TABLE_CLAUSE','tablespace USERS storage (initial 64K)');
exec ctx_ddl.set_attribute('TEST_STO','I_INDEX_CLAUSE','tablespace USERS storage (initial 64K) compress 2');
exec ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'NO' );
exec ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
create index I_TEST
on TEST (XML_DATA)
indextype is ctxsys.context
parameters('
    section group   "TEST_SGP"
    storage         "TEST_STO"
') parallel 2;
4. check the index size
select ctx_report.index_size('I_TEST') from dual;
it says :
TOTALS FOR INDEX TEST.I_TEST
TOTAL BLOCKS ALLOCATED:                                                104
TOTAL BLOCKS USED:                                                      72
TOTAL BYTES ALLOCATED:                                 851,968 (832.00 KB)
TOTAL BYTES USED:                                      589,824 (576.00 KB)
4. optimize the index
exec ctx_ddl.optimize_index('I_TEST','REBUILD');
and now recompute the size, it says
TOTALS FOR INDEX TEST.I_TEST
TOTAL BLOCKS ALLOCATED:                                               1112
TOTAL BLOCKS USED:                                                    1080
TOTAL BYTES ALLOCATED:                                 9,109,504 (8.69 MB)
TOTAL BYTES USED:                                      8,847,360 (8.44 MB)
which shows that it went from 576KB to 8.44MB. With a big index the difference is not so big, but still from 14G to 19G.
5. Workaround: use the BIG_IO option, so that the token_info column of the DR$ I table will be stored in a secure file and the size will stay relatively small. Then you can load this column in the cache using a procedure similar to
alter table DR$I_TEST$I storage (buffer_pool keep);
alter table dr$i_test$i modify lob(token_info) (cache storage (buffer_pool keep));
rem: now we must read the lob so that it will be loaded in the keep buffer pool, use the prccedure below
create or replace procedure loadTokenInfo is
type c_type is ref cursor;
c2 c_type;
s varchar2(2000);
b blob;
buff varchar2(100);
siz number;
off number;
cntr number;
begin
    s := 'select token_info from DR$i_test$I';
    open c2 for s;
    loop
       fetch c2 into b;
       exit when c2%notfound;
       siz := 10;
       off := 1;
       cntr := 0;
       if dbms_lob.getlength(b) > 0 then
         begin
           loop
             dbms_lob.read(b, siz, off, buff);
             cntr := cntr + 1;
             off := off + 4096;
           end loop;
         exception when no_data_found then
           if cntr > 0 then
             dbms_output.put_line('4K chunks fetched: '||cntr);
           end if;
         end;
       end if;
    end loop;
end;
Rgds, Pierre

I have been working a lot on that issue recently, I can give some more info.
First I totally agree with you, I don't like to use the keep_pool and I would love to avoid it. On the other hand, we have a specific use case : 90% of the activity in the DB is done by queuing and dbms_scheduler jobs where response time does not matter. All those processes are probably filling the buffer cache. We have a customer facing application that uses the text index to search the database : performance is critical for them.
What kind of performance do you have with your application ?
In my case, I have learned the hard way that having the index in memory (the DR$I table in fact) is the key : if it is not, then performance is poor. I find it reasonable to pin the DR$I table in memory and if you look at competitors this is what they do. With MongoDB they explicitly says that the index must be in memory. With elasticsearch, they use JVM's that are also in memory. And effectively, if you look at the awr report, you will see that Oracle is continuously accessing the DR$I table, there is a SQL similar to
SELECT /*+ DYNAMIC_SAMPLING(0) INDEX(i) */
TOKEN_FIRST, TOKEN_LAST, TOKEN_COUNT, ROWID
FROM DR$idxname$I
WHERE TOKEN_TEXT = :word AND TOKEN_TYPE = :wtype
ORDER BY TOKEN_TEXT, TOKEN_TYPE, TOKEN_FIRST
which is continuously done.
I think that the algorithm used by Oracle to keep blocks in cache is too complex. A just realized that in 12.1.0.2 (was released last week) there is finally a "killer" functionality, the in-memory parameters, with which you can pin tables or columns in memory with compression, etc. this looks ideal for the text index, I hope that R. Ford will finally update his white paper :-)
But my other problem was that the optimize_index in REBUILD mode caused the DR$I table to double in size : it seems crazy that this was closed as not a bug but it was and I can't do anything about it. It is a bug in my opinion, because the create index command and "alter index rebuild" command both result in a much smaller index, so why would the guys that developped the optimize function (is it another team, using another algorithm ?) make the index two times bigger ?
And for that the track I have been following is to put the index in a 16K tablespace : in this case the space used by the index remains more or less flat (increases but much more reasonably). The difficulty here is to pin the index in memory because the trick of R. Ford was not working anymore.
What worked:
first set the keep_pool to zero and set the db_16k_cache_size to instead. Then change the storage preference to make sure that everything you want to cache (mostly the DR$I) table come in the tablespace with the non-standard block size of 16k.
Then comes the tricky part : the pre-loading of the data in the buffer cache. The problem is that with Oracle 12c, Oracle will use direct_path_read for FTS which basically means that it bypasses the cache and read directory from file to the PGA !!! There is an event to avoid that, I was lucky to find it on a blog (I can't remember which, sorry for the credit).
I ended-up doing that. the events to 10949 is to avoid the direct path reads issue.
alter session set events '10949 trace name context forever, level 1';
alter table DR#idxname0001$I cache;
alter table DR#idxname0002$I cache;
alter table DR#idxname0003$I cache;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0001$I;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0002$I;
SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0003$I;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0001$I ITAB;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0002$I ITAB;
SELECT /*+ INDEX(ITAB) CACHE(ITAB) */ SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0003$I ITAB;
It worked. With a big relief I expected to take some time out, but there was a last surprise. The command
exec ctx_ddl.optimize_index(idx_name=>'idxname',part_name=>'partname',optlevel=>'REBUILD');
gqve the following
ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-50857: oracle error in drftoptrebxch
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DDL", line 1141
ORA-06512: at line 1
Which is very much exactly described in a metalink note 1645634.1 but in the case of a non-partitioned index. The work-around given seemed very logical but it did not work in the case of a partitioned index. After experimenting, I found out that the bug occurs when the partitioned index is created with dbms_pclxutil.build_part_index procedure (this enables enables intra-partition parallelism in the index creation process). This is a very annoying and stupid bug, maybe there is a work-around, but did not find it on metalink
Other points of attention with the text index creation (stuff that surprised me at first !) ;
- if you use the dbms_pclxutil package, then the ctx_output logging does not work, because the index is created immediately and then populated in the background via dbms_jobs.
- this in combination with the fact that if you are on a RAC, you won't see any activity on the box can be very frightening : this is because oracle can choose to start the workers on the other node.
I understand much better how the text indexing works, I think it is a great technology which can scale via partitioning. But like always the design of the application is crucial, most of our problems come from the fact that we did not choose the right sectioning (we choosed PATH_SECTION_GROUP while XML_SECTION_GROUP is so much better IMO). Maybe later I can convince the dev to change the sectionining, especially because SDATA and MDATA section are not supported with PATCH_SECTION_GROUP (although it seems to work, even though we had one occurence of a bad result linked to the existence of SDATA in the index definition). Also the whole problematic of mixed structured/unstructured searches is completly tackled if one use XML_SECTION_GROUP with MDATA/SDATA (but of course the app was written for Oracle 10...)
Regards, Pierre

Highlight in Oracle Text

Similar Messages

Maybe you are looking for