Indexing accentuated word in oracle text

Hello.
I have some problems understanding how oracle text works with accentuated words.
I want to store french words encoded in utf8, for example the french word libération which is encoded as 'libÂ©ration'(utf8 conversion)
in the database.(note that the database in utf8 encoded).
begin
ctx_ddl.create_preference('doc_lexer_perigee', 'BASIC_LEXER');
ctx_ddl.set_attribute('doc_lexer_perigee', 'printjoins', '_-');
ctx_ddl.set_attribute('doc_lexer_perigee', 'BASE_LETTER', 'YES');
ctx_ddl.set_attribute('doc_lexer_perigee','index_themes','yes');
end;
Above is the definition of the lexer used when indexing french documents.
Below is some lines found in oracle documentation :
base_letter
Specify whether characters that have diacritical marks (umlauts, cedillas, acute
accents, and so on) are converted to their base form before being stored in the Text
index. The default is NO (base-letter conversion disabled). For more information on
base-letter conversions and base_letter_type, see Base-Letter Conversion on
page 15-2.
According to what I understand above, the word 'libération' stores as 'libÂ©ration' should also be stored as 'liberation'.
But when I search documents containing the word 'liberation', oracle found no documents matching my query.
Is there anything I have misunderstood about base_letter conversion ?

Indeed, i think I have found a solution to my problems(changed the value of the NLS_LANG parameter) : things seem to work as I want now

Similar Messages

"MS" reserved word in oracle text query?

Wondering if anyone has run into the string "MS" behaving as a reserved word in oracle text queries. For example, this specification returns all records from Texas:
'<query>
<textquery>
<progression>
<seq> TX WITHIN CUSTOMER_STATE </seq>
</progression>
</textquery>
</query>'
But this one does NOT find any results for Mississippi:
'<query>
<textquery>
<progression>
<seq> MS WITHIN CUSTOMER_STATE </seq>
</progression>
</textquery>
</query>'
I've confirmed we have data that should match, and I've tried escaping it with the sequences as described in the SQL docs (I've tried single quotes, pairs of single quotes, braces, and combinations of those) . And trying to find info on the web is tough since all web queries that contain 'MS' bring back tons of Microsoft-relevant information.
Can anyone nudge me in the right direction for a better google-search, or some materials in these forums (my initial searches here didn't turn anything up either).
Thanks for any feedback!
jh

Wondering if anyone has run into the string "MS" behaving as a reserved word in oracle text queries.Maybe because »MS« is in the default english stoplist?:
English Default Stoplist.

I need in more lines of the Index some words in Bold text and some others in Kursiv text. How can i get it? It seems to me that either i can have all the Style in Bold Text or in Kursiv Text :(

I need in more lines of the Index some words in Bold text and some others in Kursiv text. How can i get it? It seems to me that either I can edit a Style only in Bold Text or in Kursiv Text
I make you an example to clear what I really need:
Index
Introduction
I. Leonardo's Monnalisa
II. Leonardo's Battaglia
Bibliography
Please HELP HELP HELP

What version of Pages are you referring to?
Basically if you are talking about the Table of Contents in Pages and want to have different character styles within paragraphs in the T.O.C. you will have to export the T.O.C. and bring it back in as text and change that.
Peter

About index memory parameter for Oracle text indexes

Hi Experts,
I am on Oracle 11.2.0.3 on Linux and have implemented Oracle Text. I am not an expert in this subject and need help about one issue. I created Oracle Text indexes with default setting. However in an oracle white paper I read that the default setting may not be right. Here is the excerpt from the white paper by Roger Ford:
URL:http://www.oracle.com/technetwork/database/enterprise-edition/index-maintenance-089308.html
"(Part of this white paper below....)
Index Memory                                  As mentioned above, cached $I entries are flushed to disk each time the indexing memory is exhausted. The default index memory at installation is a mere 12MB, which is very low. Users can specify up to 50MB at index creation time, but this is still pretty low.
This would be done by a CREATE INDEX statement something like:
CREATE INDEX myindex ON mytable(mycol) INDEXTYPE IS ctxsys.context PARAMETERS ('index memory 50M');
Allow index memory settings above 50MB, the CTXSYS user must first increase the value of the MAX_INDEX_MEMORY parameter, like this:
begin ctx_adm.set_parameter('max_index_memory', '500M'); end;
The setting for index memory should never be so high as to cause paging, as this will have a serious effect on indexing speed. On smaller dedicated systems, it is sometimes advantageous to temporarily decrease the amount of memory consumed by the Oracle SGA (for example by decreasing DB_CACHE_SIZE and/or SHARED_POOL_SIZE) during the index creation process. Once the index has been created, the SGA size can be increased again to improve query performance."
(End here from the white paper excerpt)
My question is:
1) To apply this procedure (ctx_adm.set_parameter) required me to login as CTXSYS user. Is that right? or can it be avoided and be done from the application schema? This user CTXSYS is locked by default and I had to unlock it. Is that ok to do in production?
2) What is the value that I should use for the max_index_memory should it be 500 mb - my SGA is 2 GB in Dev/ QA and 3GB in production. Also in the index creation what is the value I should set for index memory parameter - I had left that at default but how should I change now? Should it be 50MB as shown in example above?
3) The white paper also refer to rebuilding an index at some interval like once in a month:   ALTER INDEX DR$index_name$X REBUILD ONLINE;
--Is this correct advice? i would like to ask the experts once before doing that. We are on Oracle 11g and the white paper was written in 2003.
Basically while I read the paper, I am still not very clear on several aspects and need help to understand this.
Thanks,
OrauserN

Perhaps it's time I updated that paper
1. To change max_index_memory you must be a DBA user OR ctxsys. As you say, the ctxsys account is locked by default. It's usually easiest to log in as a DBA and run something like
exec ctxsys.ctx_adm.set_parameter('MAX_INDEX_MEMORY', '10G')
2. Index memory is allocated from PGA memory, not SGA memory. So the size of SGA is not relevant. If you use too high a setting your index build may fail with an error saying you have exceeded PGA_AGGREGATE_LIMIT. Of course, you can increase that parameter if necessary. Also be aware that when indexing in parallel, each parallel process will allocated up to the index memory setting.
What should it be set to? It's really a "safety" setting to prevent users grabbing too much machine memory when creating indexes. If you don't have ad-hoc users, then just set it as high as you need. In 10.1 it was limited to just under 500M, in 10.2 you can set it to any value.
The actual amount of memory used is not governed by this parameter, but by the MEMORY setting in the parameters clause of the CREATE INDEX statement. eg:
create index fooindex on foo(bar) indextype is ctxsys.context parameters ('memory 1G')
What's a good number to use for memory? Somewhere in the region of 100M to 200M is usually good.
3. No - that's out of date. To optimize your index use CTX_DDL.OPTIMIZE_INDEX. You can do that in FULL mode daily or weekly, and REBUILD mode perhaps once a month.

How to find distinct words in Oracle Text index

We have a requirement to fetch all distinct words in the CLOB field for all records
and find the no. of records in which each word appears.
DR$<Index Name>$I table stores exactly such information. Is it ok to use this table in queries?
Are there any disadvantages in using it?
Help is very much appreciated.

the disadv -
with every sync index the contents will change.
with every release the structure may change without any prior information - your application may go bad.
It is not advisable to use them in your applications. For your own investigation etc you can always use its contents.
thanks

Is the word "text" an operator or reserved word in Oracle Text?

DB: Oracle 10.2.0.3.0
I have created a CONTEXT index on a table which uses a Multi-Column Datastore on two columns (text and keywords)
When I perform this query, I'm getting back every row in the table even though none contain a value of "text" within the TEXT or KEYWORDS columns.
SELECT *
FROM tb_rh_links
WHERE CONTAINS(search_column, 'text') > 0
** UPDATE ** it does the same things when searching for word "keywords." I'm beginning to think it has something to do with the section groups...
SELECT *
FROM tb_rh_links
WHERE CONTAINS(search_column, 'keywords') > 0
Here is how the index and multi-column datastore is setup:
BEGIN
ctx_ddl.create_preference ('rh_lex', 'basic_lexer');
ctx_ddl.set_attribute ('rh_lex', 'index_stems', 'english');
ctx_ddl.create_preference('rh_links_mcds', 'MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute('rh_links_mcds', 'columns', 'text, keywords');
END;
BEGIN
ctx_ddl.create_section_group('rh_links_sg','basic_section_group');
ctx_ddl.add_field_section(group_name => 'rh_links_sg',
section_name => 'text',
tag => 'text');
ctx_ddl.add_field_section(group_name => 'rh_links_sg',
section_name => 'keywords',
tag => 'keywords');
END;
CREATE INDEX TB_RH_LINKS_IDX5 ON TB_RH_LINKS (KEYWORDS)
INDEXTYPE IS ctxsys.CONTEXT
PARAMETERS('datastore rh_links_mcds section group rh_links_sg');
Edited by: Dishoom on Apr 20, 2009 11:07 AM

When you create a multi_column_datastore without specifying a delimiter, by default the column names are tokenized and indexed. If you specify the newline delimiter, then the column names are not tokenized. Or, once you have created sections and used the section group in your parameters, then the column names are not tokenized. You may have created your multi_column_datastore, then created your index, tokenized the column names, then added the sections, but the tokenized column names weren't dropped until you recreated the index. Please see the demonstration below.
SCOTT@orcl_11g> CREATE TABLE tb_rh_links
2    (text            VARCHAR2(10),
3      keywords       VARCHAR2(10),
4      search_column VARCHAR2( 1))
5 /
Table created.
SCOTT@orcl_11g> INSERT ALL
2 INTO tb_rh_links VALUES ('word1', 'word2', null)
3 INTO tb_rh_links VALUES ('word3', 'word4', null)
4 SELECT * FROM DUAL
5 /
2 rows created.
SCOTT@orcl_11g> BEGIN
2    ctx_ddl.create_preference ('rh_lex', 'basic_lexer');
3    ctx_ddl.set_attribute ('rh_lex', 'index_stems', 'english');
4    ctx_ddl.create_preference('rh_links_mcds', 'MULTI_COLUMN_DATASTORE');
5    ctx_ddl.set_attribute('rh_links_mcds', 'columns', 'text, keywords');
6 END;
7 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> CREATE INDEX TB_RH_LINKS_IDX5 ON TB_RH_LINKS (search_column)
2 INDEXTYPE IS ctxsys.CONTEXT
3 PARAMETERS('datastore rh_links_mcds')
4 /
Index created.
SCOTT@orcl_11g> -- the column names are tokenized:
SCOTT@orcl_11g> COLUMN token_text FORMAT A10
SCOTT@orcl_11g> SELECT token_text, token_first, token_last, token_count
2 FROM   dr$tb_rh_links_idx5$i
3 /
TOKEN_TEXT TOKEN_FIRST TOKEN_LAST TOKEN_COUNT
KEYWORDS             1          2           2
TEXT                 1          2           2
WORD1                1          1           1
WORD2                1          1           1
WORD3                2          2           1
WORD4                2          2           1
6 rows selected.
SCOTT@orcl_11g> SELECT *
2 FROM tb_rh_links
3 WHERE CONTAINS(search_column, 'text') > 0
4 /
TEXT       KEYWORDS   S
word1      word2
word3      word4
SCOTT@orcl_11g> SELECT *
2 FROM tb_rh_links
3 WHERE CONTAINS(search_column, 'keywords') > 0
4 /
TEXT       KEYWORDS   S
word1      word2
word3      word4
SCOTT@orcl_11g> -- if you use the newline delimieter, the column names are not tokenized:
SCOTT@orcl_11g> BEGIN
2    CTX_DDL.SET_ATTRIBUTE ('rh_links_mcds', 'DELIMITER', 'NEWLINE');
3 END;
4 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> DROP INDEX tb_rh_links_idx5
2 /
Index dropped.
SCOTT@orcl_11g> CREATE INDEX TB_RH_LINKS_IDX5 ON TB_RH_LINKS (search_column)
2 INDEXTYPE IS ctxsys.CONTEXT
3 PARAMETERS('datastore rh_links_mcds')
4 /
Index created.
SCOTT@orcl_11g> SELECT token_text, token_first, token_last, token_count
2 FROM   dr$tb_rh_links_idx5$i
3 /
TOKEN_TEXT TOKEN_FIRST TOKEN_LAST TOKEN_COUNT
WORD1                1          1           1
WORD2                1          1           1
WORD3                2          2           1
WORD4                2          2           1
SCOTT@orcl_11g> SELECT *
2 FROM tb_rh_links
3 WHERE CONTAINS(search_column, 'text') > 0
4 /
no rows selected
SCOTT@orcl_11g> SELECT *
2 FROM tb_rh_links
3 WHERE CONTAINS(search_column, 'keywords') > 0
4 /
no rows selected
SCOTT@orcl_11g> -- or, with the section groups, the column names are not tokenized:
SCOTT@orcl_11g> BEGIN
2    -- return to default:
3    CTX_DDL.SET_ATTRIBUTE ('rh_links_mcds', 'DELIMITER', 'COLUMN_NAME_TAG');
4 END;
5 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> BEGIN
2    ctx_ddl.create_section_group('rh_links_sg','basic_section_group');
3    ctx_ddl.add_field_section(group_name => 'rh_links_sg',
4       section_name => 'text',
5       tag => 'text');
6    ctx_ddl.add_field_section(group_name => 'rh_links_sg',
7       section_name => 'keywords',
8       tag => 'keywords');
9 END;
10 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> DROP INDEX tb_rh_links_idx5
2 /
Index dropped.
SCOTT@orcl_11g> CREATE INDEX TB_RH_LINKS_IDX5 ON TB_RH_LINKS (search_column)
2 INDEXTYPE IS ctxsys.CONTEXT
3 PARAMETERS('datastore rh_links_mcds section group rh_links_sg')
4 /
Index created.
SCOTT@orcl_11g> SELECT token_text, token_first, token_last, token_count
2 FROM   dr$tb_rh_links_idx5$i
3 /
TOKEN_TEXT TOKEN_FIRST TOKEN_LAST TOKEN_COUNT
WORD1                1          1           1
WORD2                1          1           1
WORD3                2          2           1
WORD4                2          2           1
SCOTT@orcl_11g> SELECT *
2 FROM tb_rh_links
3 WHERE CONTAINS(search_column, 'text') > 0
4 /
no rows selected
SCOTT@orcl_11g> SELECT *
2 FROM tb_rh_links
3 WHERE CONTAINS(search_column, 'keywords') > 0
4 /
no rows selected
SCOTT@orcl_11g>

How to index ORDSYS.orddoc type using Oracle Text?

Dear All,
I am very new to Oracle Text and Oracle intermedia ORDSYS.orddoc type.
As what I know it is impossible to index ORDSYS.orddoc using Oracle Text, so
may I know is there anyway alternative to index ORDSYS.orddoc type using Oracle Text?
I am using ORDDOC type due to my application need to allow user to upload various type of media file such as audio, video, word document etc...
Please help as I need it to do full text search for those uploaded document, thanks in advanced.
Best Regards,
Chin

Dear All,
I am very new to Oracle Text and Oracle intermedia ORDSYS.orddoc type.
As what I know it is impossible to index ORDSYS.orddoc using Oracle Text, so
may I know is there anyway alternative to index ORDSYS.orddoc type using Oracle Text?
I am using ORDDOC type due to my application need to allow user to upload various type of media file such as audio, video, word document etc...
Please help as I need it to do full text search for those uploaded document, thanks in advanced.
Best Regards,
Chin

Scoring in Oracle Text

Hi,
I am using features of Oracle text in my present application.I have to calculate the accurate scoring of search results with related to a search word in the search engine.
I have presently used
select score(5) from xyztable where contains(columnname,searchword,6)scr(6) order by desc;
what is the significance of this number as i am extracting some 6 nodes from xml (column)on which i have performed oracle text indexing.
Please shed some light as how to get the exact score of my search related to my search word
My present version of oracle is 10g.
Thanks

http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/ascore.htm#CCREF2307
"To calculate a relevance score for a returned document in a word query, Oracle Text uses an inverse frequency algorithm based on Salton's formula.
Inverse frequency scoring assumes that frequently occurring terms in a document set are noise terms, and so these terms are scored lower. For a document to score high, the query term must occur frequently in the document but infrequently in the document set as a whole.
The following table illustrates Oracle Text's inverse frequency scoring. The first column shows the number of documents in the document set, and the second column shows the number of terms in the document necessary to score 100:
Number of Documents in Document Set Occurences of Term in Document Needed To Score 100
1 34
5 20
10 17
50 13
100 12
500 10
1000 9
10,000 7
100,000 5
1,000,000 4 "

Oracle Text query: Escaping characters and specifying progression sequences

How can I combine the escaping of a search string and the specification of progression sequences within an oracle text query
so that in all cases the correct results are delivered (see example below)?
The scenario in which to use this is the following:
+ Database: Oracle Database 10g Enterprise Edition Release 10.2.0.2.0
+ Requirement: Hitlist of results ordered by score whereby the different part within
the result list are specified using progression sequences within oracle text query
Example:
create table service_provider (
id number,
name_c varchar(100),
uri_c varchar(255)
insert into service_provider values (1,'ABB Company Mgmt','http://www.abb-company-mgmt.de');
insert into service_provider values (2,'Dr. Abbas Ming','http://www.dr-abbas-ming.de');
insert into service_provider values (3,'SABBATA United','http://www.sabbata-united.de');
insert into service_provider values (4,'ABB','http://www.abb.de');
insert into service_provider values (5,'AND Company Mgmt','http://www.and-company-mgmt.de');
insert into service_provider values (6,'Dr. Andas Ming','http://www.dr-andas-ming.de');
insert into service_provider values (7,'SANDATA United','http://www.sandata-united.de');
insert into service_provider values (8,'AND','http://www.and.de');
Query 1: works correctly in this case
select * from (
select /*+ FIRST_ROWS */ score(1), this_.*
from service_provider this_
where
CONTAINS ( this_.NAME_C , '<QUERY><textquery grammar="CONTEXT">' ||
'<progression>' ||
'<seq>abb</seq>' ||
'<seq>abb%</seq>' ||
'<seq>%abb%</seq>' ||
'<seq>fuzzy(abb,1,100,WEIGHT)</seq>' ||
'</progression></textquery></QUERY>', 1 ) > 0
order by score(1) desc, this_.NAME_C
) where rownum < 21
delivers
76     4     ABB     http://www.abb.de
76     1     ABB Company Mgmt     http://www.abb-company-mgmt.de
51     2     Dr. Abbas Ming     http://www.dr-abbas-ming.de
26     3     SABBATA United     http://www.sabbata-united.de
Query 2: procudes error
select * from (
select /*+ FIRST_ROWS */ score(1), this_.*
from service_provider this_
where
CONTAINS ( this_.NAME_C , '<QUERY><textquery grammar="CONTEXT">' ||
'<progression>' ||
'<seq>and</seq>' ||
'<seq>and%</seq>' ||
'<seq>%and%</seq>' ||
'<seq>fuzzy(and,1,100,WEIGHT)</seq>' ||
'</progression></textquery></QUERY>', 1 ) > 0
order by score(1) desc, this_.NAME_C
) where rownum < 21
produces ORA-29902, ORA-20000, DRG-50901 because AND is a reserved word in oracle text
So we need escaping ...
Query 3: does not work correctly
select * from (
select /*+ FIRST_ROWS */ score(1), this_.*
from service_provider this_
where
CONTAINS ( this_.NAME_C , '<QUERY><textquery grammar="CONTEXT">' ||
'<progression>' ||
'<seq>{abb}</seq>' ||
'<seq>{abb%}</seq>' ||
'<seq>{%abb%}</seq>' ||
'<seq>fuzzy({abb},1,100,WEIGHT)</seq>' ||
'</progression></textquery></QUERY>', 1 ) > 0
order by score(1) desc, this_.NAME_C
) where rownum < 21
delivers
76     4     ABB     http://www.abb.de
76     1     ABB Company Mgmt     http://www.abb-company-mgmt.de
Query 4: does not produce an error, but also does not work correctly
select * from (
select /*+ FIRST_ROWS */ score(1), this_.*
from service_provider this_
where
CONTAINS ( this_.NAME_C , '<QUERY><textquery grammar="CONTEXT">' ||
'<progression>' ||
'<seq>{and}</seq>' ||
'<seq>{and%}</seq>' ||
'<seq>{%and%}</seq>' ||
'<seq>fuzzy({and},1,100,WEIGHT)</seq>' ||
'</progression></textquery></QUERY>', 1 ) > 0
order by score(1) desc, this_.NAME_C
) where rownum < 21
delivers
76     8     AND     http://www.and.de
76     5     AND Company Mgmt     http://www.and-company-mgmt.de

Anywhere that you just use the word by itself, enclose it in {}, but anywhere that you add % on either side or both don't enclose it in {}. Please see the demonstration below.
SCOTT@10gXE> SELECT * FROM v$version
2 /
BANNER
Oracle Database 10g Express Edition Release 10.2.0.1.0 - Product
PL/SQL Release 10.2.0.1.0 - Production
CORE     10.2.0.1.0     Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
SCOTT@10gXE> create table service_provider
2    (id     number,
3      name_c     varchar(100),
4      uri_c     varchar(255))
5 /
Table created.
SCOTT@10gXE> insert all
2 into service_provider values (1,'ABB Company Mgmt','http://www.abb-company-mgmt.de')
3 into service_provider values (2,'Dr. Abbas Ming','http://www.dr-abbas-ming.de')
4 into service_provider values (3,'SABBATA United','http://www.sabbata-united.de')
5 into service_provider values (4,'ABB','http://www.abb.de')
6 into service_provider values (5,'AND Company Mgmt','http://www.and-company-mgmt.de')
7 into service_provider values (6,'Dr. Andas Ming','http://www.dr-andas-ming.de')
8 into service_provider values (7,'SANDATA United','http://www.sandata-united.de')
9 into service_provider values (8,'AND','http://www.and.de')
10 into service_provider values (9,'EBB','fuzzy test')
11 into service_provider values (10,'OND','fuzzy test')
12 select * from dual
13 /
10 rows created.
SCOTT@10gXE> CREATE INDEX your_index
2 ON service_provider (name_c)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS ('STOPLIST CTXSYS.EMPTY_STOPLIST')
5 /
Index created.
SCOTT@10gXE> VARIABLE search_string VARCHAR2 (100)
SCOTT@10gXE> EXEC :search_string := 'abb'
PL/SQL procedure successfully completed.
SCOTT@10gXE> COLUMN name_c FORMAT A20 WORD_WRAPPED
SCOTT@10gXE> COLUMN uri_c FORMAT A40
SCOTT@10gXE> select *
2 from   (select /*+ FIRST_ROWS */ score(1), this_.*
3           from   service_provider this_
4           where CONTAINS
5                 (this_.NAME_C ,
6                  '<QUERY>
7                  <textquery grammar="CONTEXT">
8                    <progression>
9                      <seq>{'         || :search_string || '}</seq>
10                      <seq>'         || :search_string || '%</seq>
11                      <seq>%'         || :search_string || '%</seq>
12                      <seq>fuzzy({' || :search_string || '},1,100,WEIGHT)</seq>
13                    </progression>
14                 </textquery>
15                  </QUERY>', 1 ) > 0
16           order by score(1) desc, this_.NAME_C)
17 where rownum < 21
18 /
SCORE(1)         ID NAME_C               URI_C
        76          4 ABB                  http://www.abb.de
        76          1 ABB Company Mgmt     http://www.abb-company-mgmt.de
        51          2 Dr. Abbas Ming       http://www.dr-abbas-ming.de
        26          3 SABBATA United       http://www.sabbata-united.de
         4          9 EBB                  fuzzy test
SCOTT@10gXE> EXEC :search_string := 'and'
PL/SQL procedure successfully completed.
SCOTT@10gXE> /
SCORE(1)         ID NAME_C               URI_C
        76          8 AND                  http://www.and.de
        76          5 AND Company Mgmt     http://www.and-company-mgmt.de
        51          6 Dr. Andas Ming       http://www.dr-andas-ming.de
        26          7 SANDATA United       http://www.sandata-united.de
         5         10 OND                  fuzzy test
SCOTT@10gXE>

Process for Oracle Text

am working as a dba and we plan to introduce oracle text for text search. since I am new to this concept, I would like to know step by step implementation of Oracle text. I've searched some web sites but still not clear on the implementation part.
Please help me out

Hi,
Oracle Text is included in both standard and enterprise editions of the data server. When you are creating your database, select Oracle Text during configuration (one of the options). You will then have Oracle Text available on your database. The schema name is CTXSYS. You need to unlock this account just like any other.
To use Text, either grant permissions on the specific objects you need for the user, or use the CTXAPP role. It is up to you to know the permissions required for the objects (in other words, I can't tell you your requirements), so research this in the reference manual.
At this point, it is ready to use. Just create your indexes according to the Oracle Text Developer's Guide (you can find this with all of the documentation - look at the Application Developer's tab in Doc Library). Your search syntax depends totally on your requirements, and the type of index you choose to create. For example, the CONTEXT index uses the CONTAINS operator, and the CTXCAT index uses CATSEARCH (unless of course you want to use templates, but let's not go there just yet...).
There are two references you will want to review: The Oracle Text Developer's Guide and the Oracle Text Reference.
Hope this helps,
Ron

[Oracle Text]How to register additional datas when indexing documents ?

Hello,
For the moment we index documents (Word, excel, pdf, ppt, html, xml...) from the filesystem and it works well.
Now, we need to attach some informations on each documents and we must be able to search on these attributes, for instance :
We can index a Word document and we would like some additionnal index informations like :
YEAR
SIZE
NUMBER
These informations are stored in a table, the table contains also the path to the documents on the filesystem.
We are able to query a text on the index mixed with a filter on the columns above.
We tought with the solution to store these informations directly in the index, but we don't know if it's a good solution (in term of speed, structure...)
So, Is there any solution to index the documents on the filesystem with extra information at index time ?
Is it possible ? How can we do that ?
What do you think about that ?
Thanks by advance

1. If you're using 12c, you can use ctx_doc.policy_languages. (https://docs.oracle.com/database/121/CCREF/cdocpkg011.htm#CCREF24102)
2. If you want multiple stoplists based on each document's language, you have to use the multi-lexer. For world_lexer, there is one stoplist; since the stoplists are somewhat dynamic (you can add but not remove them), the most accurate way to fetch the list is using ctx_report.describe_index or ctx_report.create_index_script and parse the report.

Index rules in oracle text and query using matches

Dear All,
I would like to ask about rules and matches function in oracle text.
I followed an example in oracle text application developer's guide.
I have a rule table like this :
1 oracle
2 larry or ellison
3 oracle and text
4 market share
then, I create an index to that table. This is needed for calling matches function. Here is the syntax :
create index queryx on queries(query_string)
indextype is ctxsys.ctxrule;
then, I noticed that the result on DR$QUERYX$I table as follows :
LARRY 0 2 2 1 (BLOB)
MARKET 0 4 4 1 (BLOB) {MARKET} {SHARE}
ORACLE 0 1 1 1 (BLOB)
ORACLE 0 3 3 1 (BLOB) {TEXT}
ELLISON 0 2 2 1 (BLOB)
What I want to ask is why doesn't the words 'share' and 'text' appear in the DR$QUERYX$ table?
When we use matches function, it then search on the index result and consequently it wion't find the 'share' word. so when for example I do query like this :
select query_id from queries where matches(query_string,' It only share ten percent of all products sold')>0
it will give 0 result since the no word in ' It only share ten percent of all products sold' was in index table. But actually it could possibly be categorized as the 4 category which rules is 'market share'
I tried this in a larger set of data and get same result.
Here is my generated rules from my document collection :
1 {REQUIREMENTS} & {ELICITATION}
1 {REQUIREMENTS} ~ {ELICITATION} & {ACTOR}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} & {FURPS}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} & {PROC}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} & {SPEED}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} & {DOCUME}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} ~ {DOCUME} & {PLACED}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} ~ {DOCUME} ~ {PLACED} & {UNNECESSARY}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} ~ {DOCUME} ~ {PLACED} ~ {UNNECESSARY} & {MISUSE}
1 {INTERPRETATION} ~ {REQUIREMENTS}
2 {DESIGN} & {REPRESENTATION}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} & {OCTOBER}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} & {PROCEDURAL}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} & {STRICT}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} ~ {STRICT} & {GRASP}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} ~ {STRICT} ~ {GRASP} & {MANY} & {LAYER}
2 {DESIGN} ~ {REPRESENTATION} ~ {MAY}
3 {PM} & {TESTING} & {ATTRIBUTI}
And this is the index table result with ctxrule :
(only the token_text column shown)
PM
DESIGN
DESIGN
DESIGN
DESIGN
DESIGN
DESIGN
DESIGN
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
INTERPRETATION
so when I try to classify a document with the word ouline inside it, it should produce category 1 (based on the rules) but since there are no word 'outline' in index tabel, the matches will return 0 means that the document is not classifiedto any category. I don't understand why it happen. Anybody knows about this? I would really appreciate any help.
Thank you very much.

Hm, I see. It do make sense. so nice to know.
But then in the second example I gift where I used larger table, as shown below :
Here is my generated rules from my document collection :
1 {REQUIREMENTS} & {ELICITATION}
1 {REQUIREMENTS} ~ {ELICITATION} & {ACTOR}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} & {FURPS}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} & {PROC}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} & {SPEED}
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE} ~ {PROC} ~ {SPEED} & {DOCUME}
1 {INTERPRETATION} ~ {REQUIREMENTS}
2 {DESIGN} & {REPRESENTATION}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} & {OCTOBER}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} & {PROCEDURAL}
2 {DESIGN} ~ {REPRESENTATION} & {MAY} & {FOUNDATI} ~ {OCTOBER} ~ {PROCEDURAL} & {STRICT}
2 {DESIGN} ~ {REPRESENTATION} ~ {MAY}
3 {PM} & {TESTING} & {ATTRIBUTI}
As far as I know, the sign ' ~ ' means 'OR' and '&' means 'and' . So based on the 4th line in my table :
1 {REQUIREMENTS} ~ {ELICITATION} ~ {ACTOR} ~ {FURPS} ~ {OUTLINE}
it can be concluded that if any of the words stated there been queried, so the category '1' will appear as a result. But then before we can use 'matches' to query it, we need ti create index for the rules table . I did it and the result were :
(only the token_text column shown)
PM
DESIGN
DESIGN
DESIGN
DESIGN
DESIGN
DESIGN
DESIGN
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
REQUIREMENTS
INTERPRETATION
there were no words other than PM, DESIGN< REQUIREMENTS and INTERPRETATION. Why the words REQUIREMENTS, ELICITATION, ACTOR, FURPS, OUTLINE don't appear in the index result?

Using Oracle Text with MS WORD

Hi,
We have just installed Text and we want to use it for indexing hundreds of MS WORD documents that are in the same directory. But I could not find a document / example about indexing/filtering Word Documents. I will be grateful iy you can help me for finding these..
Thanks..

You could specify INSO_FILTER for FILTER preference
in command for creating index
(see
http://otn.oracle.com/products/text/x/samples/indexing/filters/inso_filter/inso_filter_idx.sql) or
use USER filter.
For example see also
http://otn.oracle.com/products/text/x/samples/indexing/filters/INSO_Filter/index.html
(for loading data you could use any other tools than SQL*Loader)
Regards, Victor.

Using Oracle Text to search through WORD, EXCEL and PDF documents

Hello again,
What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
Thanks
Doug

Yes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
begin example.
-- The following needs to be executed
-- as sys.
DROP DIRECTORY docs_dir;
CREATE OR REPLACE DIRECTORY docs_dir
AS 'C:\sql\oracle_text\documents';
GRANT READ ON DIRECTORY docs_dir TO text;
-- End sys ran SQL
DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
CREATE TABLE db_docs (
id NUMBER,
format VARCHAR2(10),
location VARCHAR2(50),
document BLOB,
CONSTRAINT i_db_docs_p PRIMARY KEY(id)
-- Several notes need to be made about this anonymous block.
-- First the 'DOCS_DIR' parameter is a directory object name.
-- This directory object name must be in upper case.
DECLARE
f_lob BFILE;
b_lob BLOB;
document_name VARCHAR2(50);
BEGIN
document_name := 'externaltables.doc';
INSERT INTO db_docs
VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
RETURN document INTO b_lob;
f_lob := BFILENAME('DOCS_DIR', document_name);
DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
DBMS_LOB.FILECLOSE(f_lob);
COMMIT;
END;
-- build the index
-- Note that this index differs than the file system stored file
-- in that paramter datastore is ctxsys.defautl_datastore and not
-- ctxsys.file_datastore. FILE_DATASTORE is for documents that
-- exist on the file system. DEFAULT_DATASTORE is for documents
-- that are stored in the column.
create index db_docs_ctx on db_docs(document)
indextype is ctxsys.context
parameters (
'datastore ctxsys.default_datastore
filter ctxsys.inso_filter
format column format');
--search for something that is known to not be in the document.
SELECT SCORE(1), id, location
FROM db_docs
WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
--search for something that is known to be in the document.
SELECT SCORE(1), id, location
FROM db_docs
WHERE CONTAINS(document, 'Albright', 1) > 0;

Problem full-text in blob column index created using Oracle Text

Hi,
I'm running Oracle Database 10g 10.2 on solaris
I configure Oracle text if i look for in a varchar2 column is ok, but with blob column doesn't works the search.
I have a table with a blob column which contains document. I load document with Oracle UCM (stellent)
My index scripts is:
CREATE INDEX ORAUCM.FT_IDCCOLL1 ON ORAUCM.IDCCOLL1
(DDOCFULLTEXT)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('DATASTORE CTXSYS.DEFAULT_DATASTORE FILTER CTXSYS.AUTO_FILTER FORMAT COLUMN DFULLTEXTFORMAT CHARSET
COLUMN DFULLTEXTCHARSET LEXER OCS_IDCCOLL1_LEXER SYNC (ON COMMIT)')
NOPARALLEL;
And my select retunm 0 rows although it will be many documents:
SELECT IdcColl2.dID, dDocName, dDocTitle, dDocType, dRevisionID, dSecurityGroup, dDocAuthor,
dDocAccount, dRevLabel, dFormat, dOriginalName, dExtension, dWebExtension, dInDate, dOutDate,
dPublishType, dRendition1, dRendition2, VaultFileSize, WebFileSize, URL, dFullTextFormat,
dFullTextCharset, DocMeta.*
FROM IdcColl1, DocMeta
WHERE IdcColl1.dID=DocMeta.dID AND (CONTAINS(dDocFullText,'SUBIR') > 0 )
ORDER BY dInDate Desc
Thanks in advance.

Thank you for your answer.
I response your question:
- yes DDOCFULLTEXT is a BLOB column.
- The document that word, excels, whatever. We load the document with UCM (universal Content Management)
because i need full-test search form UCM tool.
- Yes 'subir' containts in the word document.
- select * from CTX_USER_INDEX_ERRORS ;
No rows returned.
- SELECT TOKEN_TEXT FROM DR$FT_IDCCOLL1$I
No rows returned.
- I tried create symplifying index and doen't work.
I tried create table and index context on oracle 10.2.0.3 (test database)and works ok.
I compared both context (test database and ucm database) and i saw a difference:
In ucm database there are these preferences "analyze text"
BEGIN ctx_ddl.create_preference('ORAUCM.', 'WORLD_LEXER'); end;
BEGIN ctx_ddl.create_preference('ORAUCM.', 'DETAIL_DATASTORE'); end;
I don't know if is important diference or no.
Please if you need more information, tell me.
Thanks for your time.

Indexing accentuated word in oracle text

Similar Messages

Maybe you are looking for