About Multi_language features of Oracle Text.

I have a customer who has to store into one table docs in different languages and
use contains index to perform some text search.
He would like to use the multi_language feature of Oracle Text.
The database we are using is Oracle 10gR2
We create a table with doc and language column, and then we have to create the context index.
In documentation I found some info's about how set different lexer (MULTI_LEXER) for language that have different lexer,
different stoplist (MULTI_STOPLIST) for different languages stop words,
but I don't understand if it
is possible use the stemmer features for different languages, and if there are some other features that I can set for using multi_language properties.
Thank you in advance
Paola

According to the online documentation: "The Oracle Text stemmer, licensed from Xerox Corporation's XSoft Division, supports the following languages with the BASIC_LEXER: English, French, Spanish, Italian, German, and Dutch. Japanese stemming is supported with the JAPANESE_LEXER."
Please see the demonstration below. Also, if you are using 10g, you can specify the language in the query, instead of changing the language for the session. 10g also has a world_lexer.
scott@ORA92> CREATE TABLE your_table
  2    (id         NUMBER,
  3       doc         CLOB,
  4       lang         VARCHAR2 (3),
  5       CONSTRAINT  your_table_id_pk PRIMARY KEY (id))
  6  /
Table created.
scott@ORA92> INSERT ALL
  2  INTO your_table VALUES (1, 'They say only the good die young.', 'eng')
  3  INTO your_table VALUES (2, 'The dogs like the cats.',          'eng')
  4  INTO your_table VALUES (3, 'cats and dogs',               'eng')
  5  INTO your_table VALUES (4, 'cat and dog',                    'eng')
  6  INTO your_table VALUES (5, 'chats et chiens',               'fre')
  7  INTO your_table VALUES (6, 'chat et chien',               'fre')
  8  INTO your_table VALUES (7, 'Die Hunde mögen die Katzen',          'ger')
  9  INTO your_table VALUES (8, 'Katzen und Hunde',               'ger')
10  INTO your_table VALUES (9, 'Katze und Hund',               'ger')
11  SELECT * FROM DUAL
12  /
9 rows created.
scott@ORA92> BEGIN
  2    ctx_ddl.create_preference ('english_lexer','basic_lexer');
  3    ctx_ddl.set_attribute      ('english_lexer','index_themes','yes');
  4    ctx_ddl.set_attribute      ('english_lexer','theme_language','english');
  5 
  6    ctx_ddl.create_preference ('french_lexer','basic_lexer');
  7    ctx_ddl.set_attribute      ('french_lexer','index_themes','yes');
  8    ctx_ddl.set_attribute      ('french_lexer','theme_language','french');
  9 
10    ctx_ddl.create_preference ('german_lexer','basic_lexer');
11    ctx_ddl.set_attribute      ('german_lexer','composite','german');
12    ctx_ddl.set_attribute      ('german_lexer','alternate_spelling','german');
13 
14    CTX_DDL.CREATE_PREFERENCE ('global_lexer', 'MULTI_LEXER');
15    ctx_ddl.add_sub_lexer      ('global_lexer','english','english_lexer', 'eng');
16    ctx_ddl.add_sub_lexer      ('global_lexer','french','french_lexer', 'fre');
17    ctx_ddl.add_sub_lexer      ('global_lexer','german','german_lexer','ger');
18    ctx_ddl.add_sub_lexer      ('global_lexer','default','english_lexer');
19 
20    CTX_DDL.CREATE_STOPLIST ('global_stoplist', 'MULTI_STOPLIST');
21    CTX_DDL.ADD_STOPWORD    ('global_stoplist', 'and', 'english');
22    CTX_DDL.ADD_STOPWORD    ('global_stoplist', 'und', 'german');
23    CTX_DDL.ADD_STOPWORD    ('global_stoplist', 'et', 'french');
24    CTX_DDL.ADD_STOPWORD    ('global_stoplist', 'the', 'ALL');
25    CTX_DDL.ADD_STOPWORD    ('global_stoplist', 'die', 'german');
26  END;
27  /
PL/SQL procedure successfully completed.
scott@ORA92> CREATE INDEX your_table_doc_idx
  2  ON your_table (doc)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  PARAMETERS
  5    ('LEXER           global_lexer
  6        LANGUAGE COLUMN lang
  7        STOPLIST      global_stoplist')
  8  /
Index created.
scott@ORA92> ALTER SESSION SET NLS_LANGUAGE = 'AMERICAN'
  2  /
Session altered.
scott@ORA92> SELECT * FROM your_table WHERE CONTAINS (doc, 'die') > 0
  2  /
        ID DOC                                                                              LAN
         1 They say only the good die young.                                                eng
scott@ORA92> SELECT * FROM your_table WHERE CONTAINS (doc, 'cat AND dog') > 0
  2  /
        ID DOC                                                                              LAN
         4 cat and dog                                                                      eng
scott@ORA92> SELECT * FROM your_table WHERE  CONTAINS (doc, '$cat AND $dog') > 0
  2  /
        ID DOC                                                                              LAN
         4 cat and dog                                                                      eng
         3 cats and dogs                                                                    eng
         2 The dogs like the cats.                                                          eng
scott@ORA92> ALTER SESSION SET NLS_LANGUAGE = 'FRENCH'
  2  /
Session altered.
scott@ORA92> SELECT * FROM your_table WHERE CONTAINS (doc, 'chat AND chien') > 0
  2  /
        ID DOC                                                                              LAN
         6 chat et chien                                                                    fre
scott@ORA92> SELECT * FROM your_table WHERE  CONTAINS (doc, '$chat AND $chien') > 0
  2  /
        ID DOC                                                                              LAN
         6 chat et chien                                                                    fre
         5 chats et chiens                                                                  fre
scott@ORA92> ALTER SESSION SET NLS_LANGUAGE = 'GERMAN'
  2  /
Session altered.
scott@ORA92> SELECT * FROM your_table WHERE CONTAINS (doc, 'Die') > 0
  2  /
no rows selected
scott@ORA92> SELECT * FROM your_table WHERE CONTAINS (doc, 'Katze AND Hund') > 0
  2  /
        ID DOC                                                                              LAN
         9 Katze und Hund                                                                   gerMessage was edited by:
Barbara Boehmer

Similar Messages

  • Beginning Oracle Text...

    Could someone perhaps point to a good online source of basic information about how to USE oracle text in searches?
    I'm specifically looking for information about how to do searches like {woman NOT man}, or whether "woman" will select "women" or whether "$woman" will select "man" and so on. What switches are there to control what is searched for? What booleans are allowed and how must they be presented, and so on.
    I'm doing OK with the official oracle documentation, but something snappier and abstracted would be good to find!
    Any good book recommendations would be appreciated, too. (Especially since doing a search at Amazon for "oracle text" brings up a lot of textbooks about Oracle, but not many obviously about the specific database feature!)
    In the meantime, could someone answer a simple question I've not been able to find a simple answer to so far? Can Oracle text do 'NOT' searches? (As in 'man not boy')?

    Most of what you are asking about is covered in the section of the Text Reference on Contains Query Operators:
    http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/cqoper.htm#CCREF0300
    Here are some examples regarding the specific questions you asked:
    SCOTT@orcl_11g> CREATE TABLE test_tab (test_col VARCHAR2 (60))
      2  /
    Table created.
    SCOTT@orcl_11g> INSERT ALL
      2  INTO test_tab (test_col) VALUES ('woman')
      3  INTO test_tab (test_col) VALUES ('man woman')
      4  INTO test_tab (test_col) VALUES ('women')
      5  INTO test_tab (test_col) VALUES ('men women')
      6  INTO test_tab (test_col) VALUES ('man boy')
      7  INTO test_tab (test_col) VALUES ('man')
      8  SELECT * FROM DUAL
      9  /
    6 rows created.
    SCOTT@orcl_11g> CREATE INDEX test_idx ON test_tab (test_col) INDEXTYPE IS CTXSYS.CONTEXT
      2  /
    Index created.
    SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'woman NOT man') > 0
      2  /
    TEST_COL
    woman
    SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'woman') > 0
      2  /
    TEST_COL
    woman
    man woman
    SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, '$woman') > 0
      2  /
    TEST_COL
    woman
    man woman
    women
    men women
    SCOTT@orcl_11g> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'man NOT boy') > 0
      2  /
    TEST_COL
    man woman
    man
    SCOTT@orcl_11g>

  • Using Oracle Text with Apex

    Can someone point me to some resources on how to integrate Oracle Text and APEX to do searches, highlight results, etc (all the features of Oracle Text)?
    The data to be indexed is in files on the filesystem, so I would like to keep it that way and use the FILE_DATASTORE option for Text.
    Thanks for any pointers.
    Update: Yes, I did see http://www.oracle.com/technology/products/database/application_express/pdf/apex_text_application_v1.6.pdf
    but the search results there just returns the URL/file containing the "hit". It doesn't show the actual text fragment that caused the match, doesn't highlight it, etc. I am looking for a real Google-like search. Hm, having said that, I might as well use Google Desktop! Nah, where's the fun in that?

    This is a very simple application for my own use. It started life in 8i when there were fewer Text options.
    As such, it uses the query string as entered. This returns all of the matches:
    select msgid, msgdate, Box, fromaddr, subject
      from eudora.inbox
    where contains(body, :P703_MailSearch) > 0
    order by msgdate descI display the selected result like this:
    select subject,
      Replace(eudora.mmarkup(:P704_MSGID, :P702_SEARCH), Chr(13), '<BR>') Body
      from eudora.inbox
    where msgid = :P704_MSGIDIn a newer application, I experimented with the CTXCAT grammer.
    That query looks like this:
    select m.ID, m.pdpno, m.shortdesc
      from pdp_mast m
    where contains(m.dphistory, '<query><textquery lang="ENGLISH" grammar="CTXCAT">
                                             ' || :P1_Text || '
                                         </textquery>
                                      <score datatype="INTEGER"/>
                                  </query>') > 0     
        or contains(m.shortdesc, '<query><textquery lang="ENGLISH" grammar="CTXCAT">
                                             ' || :P1_Text || '
                                         </textquery>
                                      <score datatype="INTEGER"/>
                                  </query>') > 0As always, once you figure out the syntax, its easy to make it work in Apex.
    Text indexes are very fast. On my old 600MHz PC, searches in 250MB of text take less than a second.

  • SQL Injection with Oracle Text

    I did a search here for any posts about SQL Injection on Oracle Text indexes, but returned no hits.
    Can anyone give their opinion about whether SQL Injection is a concern when using Oracle Text or what steps can be taken ahead of time to prevent (or at least reduce the attack surface) on Oracle Text queries.
    We're running a web app. that will use Oracle Text and our users can enter any search string as well as select pre-defined items from a drop down box.
    Thanks in advance for any opinions
    LJ

    quote:
    Originally posted by:
    Dan Bracuk
    What others can do is more relevent than what we think. When
    in doubt, test.
    very true, although my final solution went more like, "When
    in doubt, manually add about 600 cfqueryparams in 406 cfquery
    tags".

  • Parsing the word file using oracle text having tables within it............

    Hi,
    I was going through this document.Actually I am going to implement something like full text search functionality in our system.
    We get the info as .doc file.
    Earlier what we used to do is, we used to parse the file and store it into the database and then searched using PL/SQL.
    But what I understand from this article that this can be done using oracle text also.
    One concern is that whether the oracle text is able to parse the .doc file having tables embedded within it.
    Please let me know about this.(Whether oracle text will be able to parse the files having tables embedded within it).
    I am attaching an example file for this.
    Please let me know about this as early as possible.

    Yes Oracle Text have this capability. Use AUTO_FILTER or USER_FILTER to create index

  • Scoring in Oracle Text

    Hi,
    I am using features of Oracle text in my present application.I have to calculate the accurate scoring of search results with related to a search word in the search engine.
    I have presently used
    select score(5) from xyztable where contains(columnname,searchword,6)scr(6) order by desc;
    what is the significance of this number as i am extracting some 6 nodes from xml (column)on which i have performed oracle text indexing.
    Please shed some light as how to get the exact score of my search related to my search word
    My present version of oracle is 10g.
    Thanks

    http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/ascore.htm#CCREF2307
    "To calculate a relevance score for a returned document in a word query, Oracle Text uses an inverse frequency algorithm based on Salton's formula.
    Inverse frequency scoring assumes that frequently occurring terms in a document set are noise terms, and so these terms are scored lower. For a document to score high, the query term must occur frequently in the document but infrequently in the document set as a whole.
    The following table illustrates Oracle Text's inverse frequency scoring. The first column shows the number of documents in the document set, and the second column shows the number of terms in the document necessary to score 100:
    Number of Documents in Document Set Occurences of Term in Document Needed To Score 100
    1 34
    5 20
    10 17
    50 13
    100 12
    500 10
    1000 9
    10,000 7
    100,000 5
    1,000,000 4 "

  • Oracle Text and Order By

    In the Portal Search Properties you can turn on Oracle
    Text Searching. When reading the help page for that
    page you can follow a link at the bottom to a help page
    called "Performing a custom search". In the middle
    of that page there is a section called "Order By List".
    The third paragraph contains this sentence: "If Oracle
    Text is enabled, this option does not appear in the
    search submission portlet.".
    What is seems to mean is that if you turn on Oracle Text
    the developer or user can no longer have control of the
    order of found items.
    Is there really no way (even undocumented) of ordering
    found items when Oracle Text is used?
    As I have custom attributes on my custom items I must
    use Oracle Text if I want a search to work on those
    attributes, right?
    I have added a hidden field called p_order_by_attribute
    in my search form with the value "3,0" that should mean
    Display Name but without effect.
    Kind regards
    Tomas Albinsson
    Stockhlm, Sweden

    When Oracle Text is enabled there is no way to order search results as they will always be ordered by Oracle text score.
    Enabling Order By feature with Oracle Text on is a planned feature for a future release.

  • Does Oracle Text need to be "enabled"?

    We want to start using Oracle Text. Does it need to be "enabled"? Any scripts that need to be run first?
    Oracle version 10.1.0.4

    Hi,
    There is nothing to "enable" in order to use Oracle Text. But you need to create domain indexes(CONTEXT, CTXCAT, CTXRULE etc) depending on which features of Oracle Text you want to use. Also, in order to use some of the Oracle Text procedures and packages, you will need the CTXAPP role assigned to you.
    Enjoy searching!
    Regards,
    VenkatR

  • Oracle Text - CTX Context Index Soundex Problem

    Hi,
    I'm running into a problem with Oracle Text when searching using the ! (soundex) option. I've created a simple test example to highlight the issue.
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit
    Windows 2008 Server 64-bit
    create table test_tab (test_col  varchar2(200));
    insert all
      into test_tab (test_col) values ('ab-tönes')
      into test_tab (test_col) values ('ab-tones')
      into test_tab (test_col) values ('abtones')
      into test_tab (test_col) values ('ab tones')
      into test_tab (test_col) values ('ab-tanes')
      select * from dual
    select * from test_tab
    begin
          ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
          ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&''');
          ctx_ddl.set_attribute('test_lex1','base_letter','YES');
          -- ctx_ddl.set_attribute('test_lex1','skipjoins','-');
    end;
    create index test_idx on test_tab (test_col)
      indextype is ctxsys.context
        parameters
          ('lexer        test_lex1'     
    select token_text from dr$test_idx$i;
    TOKEN_TEXT
    AB
    ABTONES
    TANES
    TONES
    select * from test_tab where contains (test_col, '!ab tones') > 0;
    TEST_COL
    ab-tönes
    ab-tones
    ab tones
    select * from test_tab where soundex(test_col) = soundex('ab tones');
    TEST_COL
    ab-tönes
    ab-tones
    abtones
    ab tones
    ab-tanes
    So my question is, can anyone suggest an approach whereby I can get the Oracle Text Context index (or CTXCAT index if it's more appropriate) to return all 5 rows like the simple Soundex is doing?
    I can't really use soundex as this search query will form part of a search screen for a multi-language application. Soundex is limited to English sounding words, so I need the solution to be able to compare strings that may not "sound" English.
    It must be an attribute of the BASIC_LEXER, and I've tried skipjoins, start/end-joins, stop lists, but I just cannot get the Soundex feature of Oracle Text to function like the SOUNDEX() function!
    Looking at how the tokens are stored dr$test_idx$i I need Oracle Text to almost concat 'AB' and 'TONES' to search as a single string.
    Any help greatly appreciated.
    Thanks,

    I am not getting the same problem that you are getting with the umlat, but I don't see what is different.  Please post the result of:
    select ctx_report.create_index_script ('test_idx') from dual;
    Here are the results on my system.  Perhaps you can spot the difference.  I added an empty_stoplist, so that it won't print out a long list of stopwords.
    SCOTT@orcl12c> create table test_tab (test_col    varchar2(200))
      2  /
    Table created.
    SCOTT@orcl12c> insert all
      2    into test_tab (test_col) values ('ab-tönes')
      3    into test_tab (test_col) values ('ab-tones')
      4    into test_tab (test_col) values ('abtones')
      5    into test_tab (test_col) values ('ab tones')
      6    into test_tab (test_col) values ('ab-tanes')
      7  select * from dual
      8  /
    5 rows created.
    SCOTT@orcl12c> select * from test_tab
      2  /
    TEST_COL
    ab-tönes
    ab-tones
    abtones
    ab tones
    ab-tanes
    5 rows selected.
    SCOTT@orcl12c> begin
      2    ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
      3    ctx_ddl.set_attribute('test_lex1','base_letter','YES');
      4  end;
      5  /
    PL/SQL procedure successfully completed.
    SCOTT@orcl12c> create or replace procedure test_proc
      2    (p_rowid in          rowid,
      3      p_clob    in out nocopy clob)
      4  as
      5  begin
      6    select replace (translate (test_col, '/\|-_+&''', '      '), ' ', '')
      7    into   p_clob
      8    from   test_tab
      9    where  rowid = p_rowid;
    10  end test_proc;
    11  /
    Procedure created.
    SCOTT@orcl12c> show errors
    No errors.
    SCOTT@orcl12c> begin
      2    ctx_ddl.create_preference ('test_ds', 'user_datastore');
      3    ctx_ddl.set_attribute ('test_ds', 'procedure', 'test_proc');
      4  end;
      5  /
    PL/SQL procedure successfully completed.
    SCOTT@orcl12c> create index test_idx on test_tab (test_col)
      2    indextype is ctxsys.context
      3    parameters
      4       ('lexer    test_lex1
      5         datastore    test_ds
      6         stoplist    ctxsys.empty_stoplist')
      7  /
    Index created.
    SCOTT@orcl12c> select token_text from dr$test_idx$i
      2  /
    TOKEN_TEXT
    ABTANES
    ABTONES
    2 rows selected.
    SCOTT@orcl12c> variable search_string varchar2(100)
    SCOTT@orcl12c> exec :search_string := 'ab tones'
    PL/SQL procedure successfully completed.
    SCOTT@orcl12c> select * from test_tab
      2  where  contains
      3            (test_col,
      4             '!' || replace (:search_string, ' ', ' !') ||
      5             ' or !' || replace (:search_string, ' ', '')) > 0
      6  /
    TEST_COL
    ab-tönes
    ab-tones
    abtones
    ab tones
    ab-tanes
    5 rows selected.
    SCOTT@orcl12c> exec :search_string := 'abtones'
    PL/SQL procedure successfully completed.
    SCOTT@orcl12c> /
    TEST_COL
    ab-tönes
    ab-tones
    abtones
    ab tones
    ab-tanes
    5 rows selected.
    SCOTT@orcl12c> exec :search_string := 'ab tönes'
    PL/SQL procedure successfully completed.
    SCOTT@orcl12c> /
    TEST_COL
    ab-tönes
    ab-tones
    abtones
    ab tones
    ab-tanes
    5 rows selected.
    SCOTT@orcl12c> select ctx_report.create_index_script ('test_idx') from dual
      2  /
    CTX_REPORT.CREATE_INDEX_SCRIPT('TEST_IDX')
    begin
      ctx_ddl.create_preference('"TEST_IDX_DST"','USER_DATASTORE');
      ctx_ddl.set_attribute('"TEST_IDX_DST"','PROCEDURE','"SCOTT"."TEST_PROC"');
    end;
    begin
      ctx_ddl.create_preference('"TEST_IDX_FIL"','NULL_FILTER');
    end;
    begin
      ctx_ddl.create_section_group('"TEST_IDX_SGP"','NULL_SECTION_GROUP');
    end;
    begin
      ctx_ddl.create_preference('"TEST_IDX_LEX"','BASIC_LEXER');
      ctx_ddl.set_attribute('"TEST_IDX_LEX"','BASE_LETTER','YES');
    end;
    begin
      ctx_ddl.create_preference('"TEST_IDX_WDL"','BASIC_WORDLIST');
      ctx_ddl.set_attribute('"TEST_IDX_WDL"','STEMMER','ENGLISH');
      ctx_ddl.set_attribute('"TEST_IDX_WDL"','FUZZY_MATCH','GENERIC');
    end;
    begin
      ctx_ddl.create_stoplist('"TEST_IDX_SPL"','BASIC_STOPLIST');
    end;
    begin
      ctx_ddl.create_preference('"TEST_IDX_STO"','BASIC_STORAGE');
      ctx_ddl.set_attribute('"TEST_IDX_STO"','R_TABLE_CLAUSE','lob (data) store as (
    cache)');
      ctx_ddl.set_attribute('"TEST_IDX_STO"','I_INDEX_CLAUSE','compress 2');
    end;
    begin
      ctx_output.start_log('TEST_IDX_LOG');
    end;
    create index "SCOTT"."TEST_IDX"
      on "SCOTT"."TEST_TAB"
          ("TEST_COL")
      indextype is ctxsys.context
      parameters('
        datastore       "TEST_IDX_DST"
        filter          "TEST_IDX_FIL"
        section group   "TEST_IDX_SGP"
        lexer           "TEST_IDX_LEX"
        wordlist        "TEST_IDX_WDL"
        stoplist        "TEST_IDX_SPL"
        storage         "TEST_IDX_STO"
    begin
      ctx_output.end_log;
    end;
    1 row selected.

  • Oracle Text Storage Issue

    Hi Everyone,
    My name is John and I just have 3 small queries which your expertise and assistance is greatly needed and appreciated.
    I'm currently using Oracle Text on Oracle 10g Enterprise Edition Release 10.2.0.2.0 database and experiencing some kind of space storage problem. I have a table with 2 BLOB columns. One of the column is storing the TIF image file and the other column is storing the TIF's OCR version in PDF format. We are indexing on the PDF format column for rapid text retrieval. As we are loading them into the table, the index and table tablespaces were used up very rapidly. I've used and created my context index storage using the statements below:
    ctx_ddl.create_preference('OCR_DOC_OCR_CONTENT_I_STORAGE','BASIC_STORAGE');
    ctx_ddl.set_attribute('OCR_DOC_OCR_CONTENT_I_STORAGE','I_INDEX_CLAUSE',
    'tablespace TS_OCR_IDX_LGE compress 2');
    ctx_ddl.set_attribute('OCR_DOC_OCR_CONTENT_I_STORAGE','I_TABLE_CLAUSE',
    'tablespace TS_OCR_IDX_LGE');
    ctx_ddl.set_attribute('OCR_DOC_OCR_CONTENT_I_STORAGE','K_TABLE_CLAUSE',
    'tablespace TS_OCR_IDX_LGE');
    ctx_ddl.set_attribute('OCR_DOC_OCR_CONTENT_I_STORAGE','N_TABLE_CLAUSE',
    'tablespace TS_OCR_IDX_LGE');
    ctx_ddl.set_attribute('OCR_DOC_OCR_CONTENT_I_STORAGE','P_TABLE_CLAUSE',
    'tablespace TS_OCR_IDX_LGE');
    ctx_ddl.set_attribute('OCR_DOC_OCR_CONTENT_I_STORAGE','R_TABLE_CLAUSE',
    'tablespace TS_OCR_IDX_LGE lob (data) store as (cache)');
    I've created my table using the following commands below:
    create table OCR_DOCUMENT (
    DOC_ID number
    ,DOC_NAME varchar2(255)
    ,DOC_DIRECTORY varchar2(255)
    ,DOC_EXTENSION varchar2(10)
    ,DOC_CONTENT blob
    ,OCR_EXTENSION varchar2(10)
    ,OCR_CONTENT blob
    ,HAS_BLOB varchar2(1)
    ,CREATED_DATETIME date
    ,FILE_NAME VARCHAR2(2000)
    ,DW_DOC_ID NUMBER
    ,PAGE_NO NUMBER
    ,DOC_TYPE VARCHAR2(100)
    ,DOC_CLASS VARCHAR2(100)
    ,DOC_DESCRIPTION VARCHAR2(2000)
    ,PAGES NUMBER(10)
    ,CLT_NUMBER NUMBER(10)
    ,TAXENT_NUMBER NUMBER(10)
    ,REG_DATE DATE
    ,TAX_YEAR VARCHAR2(20)
    ,ORIG_FILE_NAME VARCHAR2(2000)
    tablespace TS_OCR_TBL_LGE
    pctfree 5 initrans 2 maxtrans 255
    nologging noparallel;
    My first question is, is there anything wrong with my storage clauses so I can improve and save some additional space?
    Second question is, is there a way that I can compress and save some space on the table blob columns, i.e. DOC_CONTENT and OCR_CONTENT, without affecting the document service retreival?
    Because at the beginning of the project, I've used utl_compress.lz_uncompress to compress the BLOB content before storing them to the table but I soon ditched such idea after finding out when I attempt to retrieve the compressed BLOB content using ctx_doc.markup for highlight document service (to highlight the text which I've used in my searching), it displayed some sort of garbage text information and I could not find any workaround to it.
    Also, if we are preapred NOT to use the THEME and GIST features of Oracle Text, can I perhaps remove them to save some addition space? Any feedback that I can save space would be welcomed and appreciated. Have a nice day.
    Thanks and Regards,
    John

    The BEST solution to your problem is to move to 11gRelease1
    I am not sure how feasible that will be on your part, but 11gR1 have exactly the same capabilities as you are looking for.
    You can compress, deduplicate all the LOB fields (with SECUREFILE clause) in all the tables including internal index tables ($R etc) and the base table (OCR_DOCUMENT).
    This is just for your information.
    I dont reallyhave any other information to share with you to resolve your problem :(

  • About ORacle  Text

    Hi,
    I would like to know some points about oracle Text.
    I have used oracle Text to extract just the text contents from blobs(of pdf,xls,..) stored in the database.
    1) Whether the same Oracle text can be used to extract the contents from the file?(so that i can store the text in database instead of blobs)
    2) What is the main area we use ORacle Text?
    Thnks

    Oracle text is mainly used to search in large sized text columns like clob, varchar2(4000) etc.
    The LIKE search hits performance if tried to search on such columns, this new feature oracle text gives good performance on search.
    Oracle text creates index on such columns.
    You may find more details at http://www.stanford.edu/dept/itss/docs/oracle/10g/text.101/b10730.pdf

  • How to use all oracle text features under ifs?

    i'd like to know that is possible to use a oracle text features like a Theme Capabilities to do a classification when documents its incoming.....

    You will probably need to use dynamic SQL.
    See the discussion at:
    http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:53140567326263
    (while the question deals with SQLX, the answer is the same - Pro*C can't parse all SQL, but dynamic SQL is not parsed by Pro*C)

  • About index memory parameter for Oracle text indexes

    Hi Experts,
    I am on Oracle 11.2.0.3 on Linux and have implemented Oracle Text. I am not an expert in this subject and need help about one issue. I created Oracle Text indexes with default setting. However in an oracle white paper I read that the default setting may not be right. Here is the excerpt from the white paper by Roger Ford:
    URL:http://www.oracle.com/technetwork/database/enterprise-edition/index-maintenance-089308.html
    "(Part of this white paper below....)
    Index Memory                                    As mentioned above, cached $I entries are flushed to disk each time the indexing memory is exhausted. The default index memory at installation is a mere 12MB, which is very low. Users can specify up to 50MB at index creation time, but this is still pretty low.                                   
    This would be done by a CREATE INDEX statement something like:
    CREATE INDEX myindex ON mytable(mycol) INDEXTYPE IS ctxsys.context PARAMETERS ('index memory 50M'); 
    Allow index memory settings above 50MB, the CTXSYS user must first increase the value of the MAX_INDEX_MEMORY parameter, like this:                                
    begin ctx_adm.set_parameter('max_index_memory', '500M'); end; 
    The setting for index memory should never be so high as to cause paging, as this will have a serious effect on indexing speed. On smaller dedicated systems, it is sometimes advantageous to temporarily decrease the amount of memory consumed by the Oracle SGA (for example by decreasing DB_CACHE_SIZE and/or SHARED_POOL_SIZE) during the index creation process. Once the index has been created, the SGA size can be increased again to improve query performance.&quot;
    (End here from the white paper excerpt)
    My question is:
    1) To apply this procedure (ctx_adm.set_parameter) required me to login as CTXSYS user. Is that right? or can it be avoided and be done from the application schema? This user CTXSYS is locked by default and I had to unlock it. Is that ok to do in production?
    2) What is the value that I should use for the max_index_memory should it be 500 mb - my SGA is 2 GB in Dev/ QA and 3GB in production. Also in the index creation what is the value I should set for index memory parameter  - I had left that at default but how should I change now? Should it be 50MB as shown in example above?
    3) The white paper also refer to rebuilding an index at some interval like once in a month:   ALTER INDEX DR$index_name$X REBUILD ONLINE;
    --Is this correct advice? i would like to ask the experts once before doing that.  We are on Oracle 11g and the white paper was written in 2003.
    Basically while I read the paper, I am still not very clear on several aspects and need help to understand this.
    Thanks,
    OrauserN

    Perhaps it's time I updated that paper
    1.  To change max_index_memory you must be a DBA user OR ctxsys. As you say, the ctxsys account is locked by default. It's usually easiest to log in as a DBA and run something like
    exec ctxsys.ctx_adm.set_parameter('MAX_INDEX_MEMORY', '10G')
    2.  Index memory is allocated from PGA memory, not SGA memory. So the size of SGA is not relevant. If you use too high a setting your index build may fail with an error saying you have exceeded PGA_AGGREGATE_LIMIT.  Of course, you can increase that parameter if necessary. Also be aware that when indexing in parallel, each parallel process will allocated up to the index memory setting.
    What should it be set to?  It's really a "safety" setting to prevent users grabbing too much machine memory when creating indexes. If you don't have ad-hoc users, then just set it as high as you need. In 10.1 it was limited to just under 500M, in 10.2 you can set it to any value.
    The actual amount of memory used is not governed by this parameter, but by the MEMORY setting in the parameters clause of the CREATE INDEX statement. eg:
    create index fooindex on foo(bar) indextype is ctxsys.context parameters ('memory 1G')
    What's a good number to use for memory?  Somewhere in the region of 100M to 200M is usually good.
    3.  No - that's out of date.  To optimize your index use CTX_DDL.OPTIMIZE_INDEX.  You can do that in FULL mode daily or weekly, and REBUILD mode perhaps once a month.

  • About detection of entities using Oracle Text

    Hi,
    Anyone can explain me how using Oracle Text Technologies can help me in the detection of
    named entities such as names of persons, names of countries, email address and others in the text.
    that´s possible ?
    thanks
    Edited by: user13420813 on 05-dic-2010 20:09
    Edited by: user13420813 on 05-dic-2010 20:12

    Hi,
    it is possible, but only starting with version 11.2.0.2, so the latest now availbale version. You can start reading the manual: [url http://download.oracle.com/docs/cd/E11882_01/text.112/e16594/entity.htm#sthref873] Extracting entities in Oracle Text . Before you have to do it yourself with the help of tokens from an Oracle text index.
    Herald ten Dam'
    http://htendam.wordpress.com

  • Oracle text (basic question about availability)

    Dear sirs,
    I want to confirm what I think to be true:
    Oracle text is standard/builit in component of Oracle 11g (enterprise edition). Anybody who is licensed
    to use Oracle 11g (enterprise edition) should be licensed to use Oracle text and have access to this product.

    Hi,
    yes it is true. But all the editions of the Oracle Database have Oracle Text and you may use it if you licensed the database.
    Overview editions: http://www.oracle.com/us/products/database/enterprise-edition/comparisons/index.html
    Herald ten Dam
    http://htendam.wordpress.com

Maybe you are looking for