Oracle Text word count in 10g?

Given a clob column full of text in 10g and a particular word, is there a way to return the frequency (word count) of this word in the documents search? Not a count of the records returned but an actual count of the number of times the word is in the documents searched. I couldn't seem to find how in the Oracle text documentation, seems like it would be a simple operation, so I may be looking in the wrong place. Any tips?

In 10g, you can specify algorithm="count" within a query template. I have demonstrated in 11g below, but have used it in 10g previously and it is in the 10g documentaiton.
SCOTT@orcl_11g> drop table t;
Table dropped.
SCOTT@orcl_11g>
SCOTT@orcl_11g> create table t (id varchar2(20) primary key, text varchar2(2000));
Table created.
SCOTT@orcl_11g>
SCOTT@orcl_11g> insert into t values ('1', 'the cat cat cat dog dog sat on the big brown mat');
1 row created.
SCOTT@orcl_11g> insert into t values ('2', 'the big brown mat sat on the big brown mat');
1 row created.
SCOTT@orcl_11g>
SCOTT@orcl_11g> create index ti on t(text) indextype is ctxsys.context;
Index created.
SCOTT@orcl_11g>
SCOTT@orcl_11g> variable search_string varchar2(10)
SCOTT@orcl_11g> exec :search_string := 'cat'
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> select :search_string, id, score (0) as frequency_count
2 from   t
3 where contains
4            (text,
5             '<query>
6             <textquery lang="ENGLISH" grammar="CONTEXT">'
7             || :search_string ||
8            '</textquery>
9             <score datatype="INTEGER" algorithm="COUNT"/>
10           </query>',
11             0) > 0
12 /
:SEARCH_STRING                   ID                   FREQUENCY_COUNT
cat                              1                                  3
SCOTT@orcl_11g> exec :search_string := 'cat dog'
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> /
:SEARCH_STRING                   ID                   FREQUENCY_COUNT
cat dog                          1                                  1
SCOTT@orcl_11g> exec :search_string := 'brown mat'
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> /
:SEARCH_STRING                   ID                   FREQUENCY_COUNT
brown mat                        1                                  1
brown mat                        2                                  2
SCOTT@orcl_11g>

Similar Messages

Creating of the word-frequency histogram from the Oracle Text

I need make from the Oracle Text index of the "word-frequency histogram", this is list of the tokens in this index, where each token contains the list of documents that contain that token and frequency this token in the every document. Don´t anybody know how to get this data from Oracle Text index so that result will save to the table or to the text file?

You can use ctx_report.token_info to decipher the token_info column, but I don't think the report format that it produces is what you want. You can use a query template and specify algorithm=count to obtain the number of times a token appears in the indexed column. You can do that for every token by using the dr$...$i table, as shown below. Formatting is preserved by prefacing the code with pre enclosed in square brackets on the line above all of the code and /pre in square brackets on the line below all of the code.
SCOTT@10gXE> create table otntest
2    (doc_id       number primary key,
3      document varchar2(100))
4 /
Table created.
SCOTT@10gXE> insert all
2 into otntest values (1, 'This is a test for generating a histogram')
3 into otntest values (2, 'Histogram shows the list of documents that contain that token and frequency')
4 into otntest values (3, 'frequency histogram frequency histogram frequency')
5 select * from dual
6 /
3 rows created.
SCOTT@10gXE> create index otntest_ctx_idx
2 on otntest(document)
3 indextype is ctxsys.context
4 /
Index created.
SCOTT@10gXE> column token_text format a30
SCOTT@10gXE> select t.doc_id, i.token_text, score (1) as token_count
2 from   otntest t,
3          (select distinct token_text
4           from   dr$otntest_ctx_idx$i) i
5           where contains
6                 (document,
7                  '<query>
8                  <textquery grammar="CONTEXT">'
9                  || i.token_text ||
10                  '</textquery>
11                  <score datatype="INTEGER" algorithm="COUNT"/>
12                  </query>',
13                  1) > 0
14 order by doc_id, token_text
15 /
    DOC_ID TOKEN_TEXT                     TOKEN_COUNT
         1 GENERATING                               1
         1 HISTOGRAM                                1
         1 TEST                                     1
         2 CONTAIN                                  1
         2 DOCUMENTS                                1
         2 FREQUENCY                                1
         2 HISTOGRAM                                1
         2 LIST                                     1
         2 SHOWS                                    1
         2 TOKEN                                    1
         3 FREQUENCY                                3
         3 HISTOGRAM                                2
12 rows selected.
SCOTT@10gXE>

ORACLE TEXT 10G의 수동 설치와, 설치 해제 방법

제품 : ORACLE SERVER
작성날짜 : 2006-03-17
ORACLE TEXT 10G의 수동 설치와, 설치 해제 방법
===================================
PURPOSE
이 문서는, Oracle Text 10gR1의 수동 설치와 설치 내용의 확인, 설치 해제 방법을 기술하고 있다.
이 정보는, Oracle 10g Release 1 (10.1.0.2)의 Text를 구성하는 데이터베이스 관리자와, 기술 지원 담당자에게
유용한 정보가 될 것이다.
Explanation
* 주의 사항
오라클 데이터베이스를 Database Configuration Assistant (DBCA)를 사용하여 생성하였다면, Text는
기본적으로 설치되며, 별도로 아래 기술된 절차에 따라 설치를 진행할 필요가 없다.
Oracle Text는 모든 데이터베이스 에디션에 (Oracle Database Standard Edition One,
Oracle Database Standard Edition (SE), Oracle Database Enterprise Edition (EE),
Oracle Database Personal Edition ) 추가 라이센스 비용 없이 사용할 수 있다.
Oracle Database Enterprise Edition (EE)의 경우, Oracle Text를 설치하기전에 Oracle Data Mining (ODM)
기능을 사용할 수 있게 하는 것이 좋다. 이와 같이 하면, SVM classifier 와 KMEANS clustering 기능을 사용
할 수 있다. RULE classifier 나 TEXTK clustering과 같은 다른 기능은 ODM이 설치되지 않는 상태에서도
실행 가능하다.
* 설치 절차
데이터베이스를 수동으로 생성하였거나, 데이터베이스 생성 후 Text를 나중에 별도로 설치하기 위해서는
다음과 같은 절차를 따른다.
참고: SQL*Plus에서 $ORACLE_HOME 대신에 '?'를 사용하였다.
1. SQL*Plus에 SYSDBA로 연결 한 수, 다음과 같이 스크립트를 호출하면, CTXSYS 스키마에 Text dictionary가
생성된다.
SQL> connect SYS/password as SYSDBA
SQL> spool text_install.txt
SQL>@?/ctx/admin/catctx.sql CTXSYS SYSAUX TEMP LOCK
위 명령에서:
CTXSYS - ctxsys 사용자의 패스워드
SYSAUX - ctxsys 사용자의 default tablespace명
TEMP - ctxsys 사용자의 temporary tablespace명
LOCK|NOLOCK - ctxsys 사용자 계정을 잠글 것인지, 잠금 해제 할 것인지 여부
2. 위 작업이 끝나면 언어별로 적절한 default preference를 생성한다.
Oracle text가 지원하는 언어별 default preference는 /ctx/admin/defaults 디렉토리에 있다.
예를 들면, English(US), Danish(DK), Dutch(NL), Finnish(SF), French(F), German(D), Italian(IT),
Portuguese(PT), Spanish(E), Swedish(S) 등이 있다.
이들 스크립트는 drdefXX.sql과 같은 파일 명을 갖는데, 여기서 XX는 사용하고자 하는 국제
코드이다. 예를 들어 US default preference를 수동으로 설치하기 위해서는 sqlplus에 CTXSYS
계정으로 로그인 한 후, 'drdefus.sql'을 다음과 같이 실행시킨다.
예를 들어 한국의 경우엔 KR default preference를 수동으로 설치하기 위해서는 sqlplus에 CTXSYS
계정으로 로그인 한 후, 'drdefkr.sql'을 다음과 같이 실행시킨다.
SQL> connect CTXSYS/password
SQL>@?/ctx/admin/defaults/drdefko.sql
SQL> spool off
*** 주의 ***
만약 Text를 설치하기 전에 Oracle Data Mining (ODM)을 설치하였다면, text_install.txt 파일에서
public synonym과 관련된 ORA-955 에러가 남겨진 것을 확인 할 수 있다. 예를 들어 dm_svm_build 와
관련된 에러가 발생하게 되는데 이 에러는 무시하면 된다. CTXSYS 스키마에 API를 휴내내는 더미
패키지를 생성하는데, 설치 과정에서 public synonym을 생성하는 것을 시도해 보기 때문이다.
ODM이 설치되어 있는 상태에서는, public synonym 생성은 실패하게 되고, public synonym이
ODM의 객체를 가리키는 상태가 되므로, 정상적인 동작이 된다.
3. Text 10gR1 (10.1.0.x) 정상 설치 여부에 대한 확인
a. 모든 Text 객체가 CTXSYS 스키마에 정상적으로 설치 되었는지 여부를 확인한다.
b. CTXSYS 스키마에 속한 객체중 invalid 상태인 것들이 없는 것이 있는지 확인한다.
정상적으로 설치된 상태에서는 결과 값으로 "no rows selected"가 나와야 한다.
만약 invalid 상태의 객체가 있다면 수동으로 이들 객체를 컴파일 해 주어야 한다.
------------------- cut here ------------------------------
connect SYS/password as SYSDBA
set pages 1000
col object_name format a40
col object_type format a20
col comp_name format a30
column library_name format a8
column file_spec format a60 wrap
spool text_install_verification.log
-- check on setup
select comp_name, status, substr(version,1,10) as version
from dba_registry
where comp_id = 'CONTEXT';
select * from ctxsys.ctx_version;
select substr(ctxsys.dri_version,1,10) VER_CODE from dual;
select count(*)
from dba_objects where owner='CTXSYS';
-- Get a summary count
select object_type, count(*)
from dba_objects where owner='CTXSYS'
group by object_type;
-- Any invalid objects
select object_name, object_type, status
from dba_objects
where owner='CTXSYS'
and status != 'VALID'
order by object_name;
spool off
------------------- cut here ------------------------------
정상적으로 설치가 되었다면 다음과 같은 결과가 나와야 한다:
SQL> select comp_name, status, substr(version,1,10) as version
from dba_registry
where comp_id = 'CONTEXT';
COMP_NAME STATUS VERSION
Oracle Text VALID 10.1.0.2.0
SQL> select * from ctxsys.ctx_version;
VER_DICT VER_CODE
10.1.0.2.0 10.1.0.2.0
SQL> select substr(ctxsys.dri_version,1,10) VER_CODE from dual;
VER_CODE
10.1.0.2.0
SQL> select count(*)
from dba_objects where owner='CTXSYS';
COUNT(*)
338
SQL> select object_type, count(*)
from dba_objects where owner='CTXSYS'
group by object_type;
OBJECT_TYPE COUNT(*)
FUNCTION 5
INDEX 46
INDEXTYPE 4
LIBRARY 1
LOB 1
OPERATOR 6
PACKAGE 71
PACKAGE BODY 58
PROCEDURE 3
SEQUENCE 3
TABLE 37
TYPE 42
TYPE BODY 7
VIEW 54
14 rows selected.
SQL> select object_name, object_type, status
from dba_objects
where owner='CTXSYS'
and status != 'VALID'
order by object_name;
no rows selected
SQL>
4. 수동으로 설치 해제를 하는 절차
*** 주의 ***
Oracle Text를 설치 해제 하기 전에, CTXSYS 이외의 계정에서 생성한 모든 Text 인덱스를 drop 시키는 것이
좋다.
CTXSYS 스키마의 Text dictionary는 SQL*Plus에 SYSDBA로 연결하여 다음과 같이 스크립트를 실행 시킴으로써
제거된다.
SQL> connect SYS/password as SYSDBA
SQL> spool spool textdeinstall.log
SQL>@?/ctx/admin/catnoctx.sql
SQL> spool off
Review the output file textdeinstall.log for errors.
Deinstallation of Oracle Text 10.1.0.x is complete.
Example
Reference Documents
Oracle Text Reference 10g Release 1 (10.1) Part Number B10730-02
Note 280713.1 Manual installation, deinstallation of Oracle Text 10gR1

Is there a way to exclude title, heading and bibliography text from the word count in pages?

I've just got the newest version of pages.
The word count includes everything, from titles to endnotes - including the numbers in my sub-headings.
I used to use open office where you could select a style i.e. 'body tex't, and get a word count that didn't include every single word in a document.
Is there a way of doing something similar, or reseting the word count to only 'body text' so I know exactly how long my essays are?
Thanks.

Unfortunately, no version of Pages has this fine-grained control over document components and their word count, so the answer is no, regarding user changeable settings.
Programmatically, I just told AppleScript to count the words of body text in a currently opened Pages v5.2 document. The count matched the Pages word count for the document. So, no solution there either.

Oracle text in oracle 10g

Hi friends,
thinking about migration from oracle 9i to oracle 10g. We use Oracle Text indexes on text columns of some tables. Is there any advantage with oracle 10g intermedia? Which are the most significant changes?
Thanks for answers.

The following links describe the new Text features in Oracle 10g and 11g.
http://download.oracle.com/docs/cd/B19306_01/text.102/b14218/whatsnew.htm#i969790
http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/whatsnew.htm#sthref6

Using Oracle Text to search through WORD, EXCEL and PDF documents

Hello again,
What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
Thanks
Doug

Yes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
begin example.
-- The following needs to be executed
-- as sys.
DROP DIRECTORY docs_dir;
CREATE OR REPLACE DIRECTORY docs_dir
AS 'C:\sql\oracle_text\documents';
GRANT READ ON DIRECTORY docs_dir TO text;
-- End sys ran SQL
DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
CREATE TABLE db_docs (
id NUMBER,
format VARCHAR2(10),
location VARCHAR2(50),
document BLOB,
CONSTRAINT i_db_docs_p PRIMARY KEY(id)
-- Several notes need to be made about this anonymous block.
-- First the 'DOCS_DIR' parameter is a directory object name.
-- This directory object name must be in upper case.
DECLARE
f_lob BFILE;
b_lob BLOB;
document_name VARCHAR2(50);
BEGIN
document_name := 'externaltables.doc';
INSERT INTO db_docs
VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
RETURN document INTO b_lob;
f_lob := BFILENAME('DOCS_DIR', document_name);
DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
DBMS_LOB.FILECLOSE(f_lob);
COMMIT;
END;
-- build the index
-- Note that this index differs than the file system stored file
-- in that paramter datastore is ctxsys.defautl_datastore and not
-- ctxsys.file_datastore. FILE_DATASTORE is for documents that
-- exist on the file system. DEFAULT_DATASTORE is for documents
-- that are stored in the column.
create index db_docs_ctx on db_docs(document)
indextype is ctxsys.context
parameters (
'datastore ctxsys.default_datastore
filter ctxsys.inso_filter
format column format');
--search for something that is known to not be in the document.
SELECT SCORE(1), id, location
FROM db_docs
WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
--search for something that is known to be in the document.
SELECT SCORE(1), id, location
FROM db_docs
WHERE CONTAINS(document, 'Albright', 1) > 0;

Oracle Text in installing Oracle 10g without licence!!

Hi. Everyone.
I've read some thread , but I am still confused about "oracle text".
Now, I am testing oracle10g database.
I downloaded 10g software from www.oracle.com, and installed it sucessfully
on windows xp.
When I was trying to import a dump file from oracle9i to
the unlicenced oracle10g database, I got the error , IMP-00017, which
is related to "Oracle Text".
I checked "dba_users" dictionary, but ctxsys user is locked and expired.
I read some thread on this site, and according to the advice, I tried to
enable oracle text, using "DBCA".
However, every database option on DBCA is disabled, I was not able to
check oracle text.
Lastly, how can I enable "Oracle Text" with unlicenced oracle 10g ?
Is this possible without licence?
I am very confused about this.
I am looking forward to hear your experience and advices.
Have a nice day.
Best Regards.
Ho.

Well, instead of being confused, you could go to http://www.oracle.com/pls/db102/portal.portal_db?selected=1 and look at
1) the licensing document, which would tell you whether you need a separate license, and
2) under the 'Books' tab, look at the Text Application Developer's Guide or the Text Reference manuals for details.
You could also look for the Oracle Text forum (from the http://forums.oracle.com page, under Database - More, or Text and ask the people who concentrate on that set of features.
In general, Oracle Text is a set of extensions, the definitions for which are stored under user ctxsys. You would use these extensions by creating your own objects that are based on the extensions.
For example, suppose your tables contain varchar2 columns. Create indexes that are based on ctxsys's 'context index type' and your application can then use the 'CONTAINS' keyword search capability (which is effectively a ctxsys-owned extension to the select)
However, you would never log on to ctxsys and do anythibng with that as you risk changing the template code that Oracle has supplied.
Message was edited by:
Hans Forbrich
PS: Yes, Oracle Text is included as part of the base database. Most of it is even included in the free Oracle XE database.

Problem with getting word count in TLF text

Hi,
I want to get the word count from my TLF text, but the problem is that I am not being able to handle th case for space.
I am using the findNextWordBoundary property of ParagraphElement as shown below:
private function countWords( para : ParagraphElement ) : void
            var wordBoundary:int = 0;
            var prevBoundary:int = 0;
            while ( wordBoundary != para.findNextWordBoundary( wordBoundary ) )
               // If the value is greater than 1, then it's a word, otherwise it's a space.
                if ( para.findNextWordBoundary( wordBoundary ) - wordBoundary > 1)
                    wordCount += 1;
                prevBoundary = wordBoundary;
                wordBoundary = para.findNextWordBoundary( wordBoundary );
                // If the value is greater than 1, then it's a word, otherwise it's a space.
                if ( wordBoundary - prevBoundary > 1 )
                    var s:String = para.getText().substring( prevBoundary, wordBoundary );
                    lenTotal += s.length;
Now I have 2 issues here:
If my string is for eg: Hi, I am writing in "TLF". And I want to get its word count then
1) Suppose I take the case of the string Hi, . Then para.getText().substring( prevBoundary, wordBoundary ) gives the text as Hi i.e without the comma. Same case for the string "TLF forums" , It treats each " as a single word and not the whole "TLF" as a single word. Why doesn't it compute till spaces, that should be the ideal case. So until we don't give a space it should count the whole thing as a word.
2) So now the problem is I have applied a condition   if ( wordBoundary - prevBoundary > 1 ) to check if it is a space i.e. if the diff is <= 1 it is a Space. But if I use this I miss out on single words. Like for eg if I have "Hi, This is a string" ,then 'a' is ignored too.
Now I could have added a check here along with the space check that the string between prevBoundary and wordBoundary is " "(i.e a space), Then also it is a problem as then the single words like a,&,I will be ignored.
So, now I am stuck with this issue and need some help from you guys.
Thanks

findNextWordBoundary is not going to serve your purpose. I'd propose doing something like this:
// didn't test this but something like this - whitespace matches any set of 1 or more white space characters
static const whiteSpaceRegExp:RegExp = /[u0020|u000A|u000D]*/
public static function countWords( para : ParagraphElement ) : void
     return para.getText().split(whiteSpaceRegExp).length;
A good list of everything considered whitespace extracted from the unicode space can be found here:
http://sourceforge.net/adobe/tlf/svn/449/tree/trunk/textLayout/src/flashx/textLayout/utils /CharacterUtil.as
In function createWhiteSpaceObject
Hope that helps,
Richard

On Pages 09. Error Message "Missing Font" - text on all my files/Documents has disappeared. I know it is still there from the word count - but it is invisible. Any clues Gratefully received.

On Pages 09. Error Message "Missing Font" - text on all my files/Documents has disappeared. I know it is still there from the word count - but it is invisible. Any clues Gratefully received.

What version of Pages '09?
Have you updated it to the latest iWork '09 v4.3?
Peter

Oracle Text omitting the words as, is, do, or etc in searches

I am trying to do a search using Oracle Text for the country code for Dominican Republic and I get no hits back. This is also the case for some other countries: IS (Iceland), NO (Norway), IT (Italy). Here is my where clause:
where contains(tablename, 'DO within country')
Is there any workaround for this?

Those words are stopwords, words that Oracle does not tokenize and index and are not searchable. If you do not specify a stoplist, then Oracle uses the default stoplist, that contains these words. If you use an empty stoplist or a stoplist that does not contain those words, then they are tokenized, inexed, and searchable. Please see the demonstration below that first recreates your situation, or something similar since I don't know what sort of section group you are using, then drops the index and recreates it using an empty stoplist, then shows that DO is tokenized, indexed, and searchable.
SCOTT@10gXE> CREATE TABLE your_table
2    (address   VARCHAR2 (30),
3      country   VARCHAR2 (2),
4      tablename VARCHAR2 (30))
5 /
Table created.
SCOTT@10gXE> INSERT INTO your_table VALUES ('somewhere', 'DO', NULL)
2 /
1 row created.
SCOTT@10gXE> EXEC CTX_DDL.CREATE_PREFERENCE ('my_multi', 'MULTI_COLUMN_DATASTORE')
PL/SQL procedure successfully completed.
SCOTT@10gXE> EXEC CTX_DDL.SET_ATTRIBUTE ('my_multi', 'COLUMNS', 'address, country')
PL/SQL procedure successfully completed.
SCOTT@10gXE> EXEC CTX_DDL.CREATE_SECTION_GROUP ('my_section_group', 'BASIC_SECTION_GROUP')
PL/SQL procedure successfully completed.
SCOTT@10gXE> EXEC CTX_DDL.ADD_FIELD_SECTION ('my_section_group', 'country', 'country')
PL/SQL procedure successfully completed.
SCOTT@10gXE> CREATE INDEX your_index ON your_table (tablename)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS
4    ('DATASTORE     my_multi
5       SECTION GROUP my_section_group')
6 /
Index created.
SCOTT@10gXE> SELECT token_text FROM dr$your_index$i
2 /
TOKEN_TEXT
SOMEWHERE
SCOTT@10gXE> SELECT * FROM your_table
2 WHERE CONTAINS (tablename, 'DO WITHIN country') > 0
3 /
no rows selected
SCOTT@10gXE> DROP INDEX your_index
2 /
Index dropped.
SCOTT@10gXE> CREATE INDEX your_index ON your_table (tablename)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS
4    ('DATASTORE     my_multi
5       SECTION GROUP my_section_group
6       STOPLIST CTXSYS.EMPTY_STOPLIST')
7 /
Index created.
SCOTT@10gXE> SELECT token_text FROM dr$your_index$i
2 /
TOKEN_TEXT
DO
SOMEWHERE
SCOTT@10gXE> SELECT * FROM your_table
2 WHERE CONTAINS (tablename, 'DO WITHIN country') > 0
3 /
ADDRESS                        CO TABLENAME
somewhere                      DO
SCOTT@10gXE>

Is it possible to limit the size of a text box by word count?

I know that it is possible to use a character limit but most users prefer a word limit as they feel it is a more meaningful restriction rather than punishing those that use long words!
Is there some way to apply a word count limit on a text field in Designer?

Thanks Elaine, I did find a a few scripts like this online but none of them were geared toward people using LCD to create forms. The one you've posted is the simplest one I've seen and I've adapted the script form the example you provided but have had a blank on how to call the script object using a click event on a button.
So far I have
function countWords()
form1.test.countText.rawValue = form1.test.enterText.rawValue.split(' ').length;

Indexing accentuated word in oracle text

Hello.
I have some problems understanding how oracle text works with accentuated words.
I want to store french words encoded in utf8, for example the french word libération which is encoded as 'libÂ©ration'(utf8 conversion)
in the database.(note that the database in utf8 encoded).
begin
ctx_ddl.create_preference('doc_lexer_perigee', 'BASIC_LEXER');
ctx_ddl.set_attribute('doc_lexer_perigee', 'printjoins', '_-');
ctx_ddl.set_attribute('doc_lexer_perigee', 'BASE_LETTER', 'YES');
ctx_ddl.set_attribute('doc_lexer_perigee','index_themes','yes');
end;
Above is the definition of the lexer used when indexing french documents.
Below is some lines found in oracle documentation :
base_letter
Specify whether characters that have diacritical marks (umlauts, cedillas, acute
accents, and so on) are converted to their base form before being stored in the Text
index. The default is NO (base-letter conversion disabled). For more information on
base-letter conversions and base_letter_type, see Base-Letter Conversion on
page 15-2.
According to what I understand above, the word 'libération' stores as 'libÂ©ration' should also be stored as 'liberation'.
But when I search documents containing the word 'liberation', oracle found no documents matching my query.
Is there anything I have misunderstood about base_letter conversion ?

Indeed, i think I have found a solution to my problems(changed the value of the NLS_LANG parameter) : things seem to work as I want now

"MS" reserved word in oracle text query?

Wondering if anyone has run into the string "MS" behaving as a reserved word in oracle text queries. For example, this specification returns all records from Texas:
'<query>
<textquery>
<progression>
<seq> TX WITHIN CUSTOMER_STATE </seq>
</progression>
</textquery>
</query>'
But this one does NOT find any results for Mississippi:
'<query>
<textquery>
<progression>
<seq> MS WITHIN CUSTOMER_STATE </seq>
</progression>
</textquery>
</query>'
I've confirmed we have data that should match, and I've tried escaping it with the sequences as described in the SQL docs (I've tried single quotes, pairs of single quotes, braces, and combinations of those) . And trying to find info on the web is tough since all web queries that contain 'MS' bring back tons of Microsoft-relevant information.
Can anyone nudge me in the right direction for a better google-search, or some materials in these forums (my initial searches here didn't turn anything up either).
Thanks for any feedback!
jh

Wondering if anyone has run into the string "MS" behaving as a reserved word in oracle text queries.Maybe because »MS« is in the default english stoplist?:
English Default Stoplist.

ODF support in Oracle Text 10g R2 version ??

Currently, we are using Oracle Text 10g Release 2 version for HTML section searching in our application. we don't have any issues in Microsoft office 2003 documents.
But, when we use Open office documents(ODF), it is not working. It is throwing the following exception:
java.sql.SQLException: ORA-20000: Oracle Text error:
DRG-11207: user filter command exited with status 1
DRG-11222: Third-party filter does not support this known document format.
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DOC", line 825
ORA-06512: at line 1
We are using "AUT_FILTER" filter technology.
Any ideas for solving this issue?

You start to have to think outside the box at this point -- AUTO_LEXER isn't going to be able to support you natively.
You could file an SR, and let Oracle tell you whether they'd be willing to integrate changes (like new Verity libraries as they are developed) to 10.2.
That assumes that Autonomy (owner of Verity) has improved their support for ODF.
The OpenOffice formats are all xml-based; you could write something custom to extract the text from your openoffice files and submit them to Oracle as straight XML. I've done something similar to support Office 2007 formats.
You could write a custom USER_LEXER (which is essentially the same as custom extraction, but may be an easier place to hook in your custom code).
That's the main reason I suggested moving up to 11g -- none of the other choices have any easy, short-term fix or workaround.

How to install Oracle text in 10g database.

Hi ,
I need to install Oracle text in a 10g database which is a pre-requisite for Application express 3.1.1.I checked for the user CTXSYS user in the database and i did not find that.
Thanks in advance

If you dont see CTXSYS user, you have to use the OUI to install the Oracle text. You have to connect as sys to do it. See the metalink post below,
https://metalink.oracle.com/metalink/plsql/f?p=130:14:1539928194117127807::::p14_database_id,p14_docid,p14_show_header,p14_show_help,p14_black_frame,p14_font:NOT,275689.1,1,1,1,helvetica

Oracle Text word count in 10g?

Similar Messages

Maybe you are looking for