Oracle text clustering - use of Stemming

Hi,
I am using K-means to cluster a set of documents. But I am unable to set the clustering algorithm parameters to use stemming for the tokens. The clustering algorithm uses 'move', 'moving', 'moves', 'moved' as separate words and clusters the documents into different clusters. I would like to group all the documents that contain 'move', 'moving', 'moves', 'moved' into a single group by using the stem 'move'. I am unable to do this so far. In case any of you have some ideas, please suggest.
I use the following preferences and attributes to create a text index:
BEGIN
CTX_DDL.DROP_PREFERENCE ('test_lex');
CTX_DDL.CREATE_PREFERENCE ('test_lex', 'BASIC_LEXER');
CTX_DDL.SET_ATTRIBUTE ('test_lex', 'INDEX_STEMS', 'ENGLISH');
END;
drop index temp_idx;
CREATE index temp_idx ON temp(text1) indextype is CTXSYS.CONTEXT parameters ('WORDLIST CTXSYS.BASIC_WORDLIST LEXER test_lex SYNC (ON COMMIT)');
And below is the code I use to cluster the documents:
create table temp0 (docid NUMBER, clusterid NUMBER, score NUMBER);
create table temp1 (clusterid NUMBER, descript varchar2(4000), label varchar2(200), sze number, quality_score number, parent number);
begin
ctx_ddl.drop_preference('my_cluster');
ctx_ddl.create_preference('my_cluster','KMEAN_CLUSTERING');
ctx_ddl.set_attribute('my_cluster','CLUSTER_NUM','10');
ctx_ddl.set_attribute('my_cluster','STEM_ON','FALSE');
ctx_output.start_log('my_log');
ctx_cls.clustering('temp_idx','seq','temp0','temp1','my_cluster');
ctx_output.end_log;
end;
Thanks!

Make the following true
ctx_ddl.set_attribute('my_cluster','STEM_ON','FALSE');
i.e.
ctx_ddl.set_attribute('my_cluster','STEM_ON','TRUE');
and then create the clusters. Also as you have already done, the lexer should have INDEX_STEM on

Similar Messages

Error When Creating Oracle Text index using Lexer Keyword

Hi All,
I am getting following error when i creating oracle text index using lexer & stoplist keyword.
Pls Help me if any body know.
Thanks in Advance.
Error starting at line 1 in command:
CREATE INDEX TXT_INX_TEXT_SEARCH ON TEXT_SEARCH (BFILE_DOC)
Post INDEXTYPE IS "CTXSYS"."CONTEXT" LOCAL (
PARTITION "BEFORE_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)') ,
PARTITION "Q1_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q2_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q3_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q4_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q1_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q2_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q3_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q4_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q1_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q2_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q3_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q4_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "THE_REST" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)')
Error at Command Line:1 Column:13
Error report:
SQL Error: ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: Oracle Text error:
DRG-11000: invalid keyword LEXER
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 365
29855. 00000 - "error occurred in the execution of ODCIINDEXCREATE routine"
*Cause: Failed to successfully execute the ODCIIndexCreate routine.
*Action: Check to see if the routine has been coded correctly.
Regards,
Jack R.

Hi,
it works if you put an extra PARAMETERS clause at the end so the creation looks like:
CREATE INDEX TXT_INX_TEXT_SEARCH ON TEXT_SEARCH (BFILE_DOC)
INDEXTYPE IS "CTXSYS"."CONTEXT" LOCAL (
PARTITION "BEFORE_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)') ,
PARTITION "Q1_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q2_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q3_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q4_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q1_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q2_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q3_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q4_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q1_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q2_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q3_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q4_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "THE_REST" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)')
PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)') <== Added
Hope this helps
Herald ten Dam

Can oracle text be used to compare documents?

lets say that i 've some documents stored in binary(LOB). can oracle text be used to compare documents and show their similarity on the basis of their content. how would i be able to compare documents using Oracle text. does it require mining algorithm like neural network. please help.
thanks for reading.

Thank you for your interest in my question. Let me see whether I can further clarify it. In an ordinary PDF document, assume that I have a picture of a user interface for microsoft Word. The common method for identifying items in the picture, such as a toolbar, would be to either:
--use a callout labeled "toolbar" that points to the toolbar
or
--use a callout labelled "A" and have a caption underneath the picture that says: A) toolbar.
What I would like to do is have text underneath the picture such as:
"The major features of the interface shown above are:
toolbar
main menu
status bar
formatting menu"
such that, when the user clicks one of the bullet items, the object becomes highlighted in the picture. The bullet list also needs to be translatable into Japanese. So, as far as I know, it can't be part of the swf file. Or can it?

Error When Creating OR Rebuilding Oracle Text index using Lexer Keyword

Hi All,
I am getting following error when i creating oracle text index using lexer & stoplist keyword.
Pls Help me if any body know.
Thanks in Advance.
Error starting at line 1 in command:
CREATE INDEX TXT_INX_TEXT_SEARCH ON TEXT_SEARCH (BFILE_DOC)
Post INDEXTYPE IS "CTXSYS"."CONTEXT" LOCAL (
PARTITION "BEFORE_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)') ,
PARTITION "Q1_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q2_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q3_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q4_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q1_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q2_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q3_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q4_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q1_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q2_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q3_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q4_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "THE_REST" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)')
Error at Command Line:1 Column:13
Error report:
SQL Error: ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: Oracle Text error:
DRG-11000: invalid keyword LEXER
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 365
29855. 00000 - "error occurred in the execution of ODCIINDEXCREATE routine"
*Cause: Failed to successfully execute the ODCIIndexCreate routine.
*Action: Check to see if the routine has been coded correctly.
Regards,
Jack R.

Hi,
it works if you put an extra PARAMETERS clause at the end so the creation looks like:
CREATE INDEX TXT_INX_TEXT_SEARCH ON TEXT_SEARCH (BFILE_DOC)
INDEXTYPE IS "CTXSYS"."CONTEXT" LOCAL (
PARTITION "BEFORE_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)') ,
PARTITION "Q1_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q2_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q3_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q4_2007" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q1_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q2_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q3_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q4_2008" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q1_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q2_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q3_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "Q4_2009" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)'),
PARTITION "THE_REST" PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)')
PARAMETERS ('LEXER dd_lexer STOPLIST dd_stoplist SYNC (ON COMMIT)') <== Added
Hope this helps
Herald ten Dam

Should Oracle Text be used here?

Hi,
We are developing a search feature for a bank that has thousands of documents. Each document has a set of free-form comments written by multiple bank officials. The comments are in a table in a Oracle 9i database. The comments can be 10 to 10000 characters. The actual documents are not available in the database. Only a document identifier is kept in the table containing the comments.
The search engine will have single-word as well as phrase searches. Do you think using Oracle Text is the best approach for such a serach facility?
Thanks
Yash

Sameer,
I guess I was right about it being personal. I do feel like we're getting somewhere though! DETAILS!!!
This isn't about me, you, or Text. The posts are to help those that need assistance, and for some of us (like me) to gain some exposure to problems I have not run into. You'll see a fair amount of research goes into many people's posts here.
Your earlier write-up telling the person to stay away provided no specifics about your situation. It was alarmist. They might have totally different requirements than you, so it pays to ask follow-up questions to find out if they are going to hit what you did.
I don't mind 'talking crap' about something so long as it is specific and can be addressed, or people can find situations where it should/should not be used by comparing their system to yours. If I tell you that cars suck because they break down, it isn't terribly useful to anyone. If I tell you that the 1982 Pontiac Bonneville's transmission needed to be replaced 13 times since I bought it...that kind of detail might be of some use to someone with that car. This is what I was soliciting from you.
What do we have from your last post:
* 9.2.0.5
* Millions of user's per day
* Peak time the box was only 5-10% idle
* The problem was with queries on a CONTEXT index
(rather than the indexing process itself)
* You couldn't use CTXCAT because of application requirements
Two things you said that I will agree with.
1) CONTAINS queries can be costly. There are some ways to improve the performance if you post more detail about your requirements. This is the first time you mentioned that you were using the CONTEXT index instead of CTXCAT...good information. It would indicate that your 'this totally sucks' gut response from earlier is from the perspective of CONTEXT and doesn't extend to the other index types (or am I mistaken?).
2) 10g has improved upon some things. Later patchsets on Release 1, and the newly available Release 2 have a different filter as well. 9i made some major changes from prior releases though, so if 9i is the only option, I'd still take it.
If you are interested in continuing, I'd like to find out how many indexes, the number of documents indexed, the size of the index tables and data tables, and some more about the application. Give us your best shot.
As for hurting the ego...I'll still sleep well tonight. I would just like to see this remain a constructive place for people to post questions and try out solutions. That can't happen if the replies are a blanket 'stay away' without justification or a matching of requirements to problems.
Finally, feel free to post to anything I participate in...just remember that I'm not shy (and neither are you it seems), so if a debate happens it will likely be lively.
-Ron

Text clustering using kmeans & other classified algorithm

Hi,
Few questions:
1. How can we tweak Kmean_clustering algorithm to improve quality of clustering results?
The current arguments the algorithm accepts are:
MAX_DOCTERMS, MAX_FEATURES, THEME_ON , TOKEN_ON, STEM_ON, MEMORY_SIZE, SECTION_WEIGHT, CLUSTER_NUM
These arguments (apart from cluster_num) do not effect the algorithms quality. I remember reading in the documentation that: "k-Mean is a trade-off between quality and scalability", so how can a user chose between quality and scalability? What arguments can we play with? (Are arguments like number of iterations, distance metric avialable?, is yes, where and how can we specify?)
2. I tried searching for the documentation on the Textk (hierarchial clustering). But found that its undocumented. The arguments currently available are hierarchy depth and similary score but changing these arguments doesnt effect the clustering results. How can we improve Textk clustering results?
3. I tried applying the theme_on, cluster_num arguments on Textk (just as we can for kmeans). The algorithm didnt complain but it didnt care the changes either. The results didnt change. - Any suggestions?
4. Where can I find more documentation on clustering apart from the developer guide and general docs?
5. The concepts of 'stopword' and 'stoptheme' are quite confusing. Please if my understanding of these is correct:
Stopwords are usually NOT considered by the algorithm and are removed before any text analyses.
Stopthemes are words that could have weightage with other words and can influence the clustering results but will not affect the results by themselves (individual words as themes) - for example when we use 'THEME_ON' the algorithm will not consider adding the stopthemes to the feature set but might consider the 'phrases' which include these stoptheme words.
Hence, in clustering with THEME_ON preference, its best to add stopthemes rather than stopwords so that the algorithm might not create clusters based on words such as 'good' but might use phrases such as 'good example'
Is that right? Please guide...
Thanks!

Make the following true
ctx_ddl.set_attribute('my_cluster','STEM_ON','FALSE');
i.e.
ctx_ddl.set_attribute('my_cluster','STEM_ON','TRUE');
and then create the clusters. Also as you have already done, the lexer should have INDEX_STEM on

Oracle TEXT and using 'ABOUT' as a string literal, not a command

We have a context index on a NAME column, and we're getting an error when searching on the string
CONTAINS(name, '%ALL ABOUT TRAVEL%') > 0A query on ALL ABOUT, and ABOUT TRAVEL works, but the entire phrase causes a problem. I've since found that ABOUT is a keyword when using CONTAINS( ).
So, is there a way to escape ABOUT, include it in double quotes, etc, so that it gets treated as a string literal?
Thanks,
--=Chuck

Should've searched google more before posting ... :(
Maybe this'll help someone else out, though:
CONTAINS(name, '%ALL {ABOUT} TRAVEL%') > 0

Oracle text clustering

Hi,
I'm trying the ctx_cluster package.
My question is:
Is it possible to create for a whole collection
the themes table before and then apply cluster analysis on the whole collection or a subset?
With the ctx_cluster package it seems that the themes
must be extracted every time you start the clustering
process. For large collections this could be a long task.
Giorgio

Hi,
I'm trying the ctx_cluster package.
My question is:
Is it possible to create for a whole collection
the themes table before and then apply cluster analysis on the whole collection or a subset?
With the ctx_cluster package it seems that the themes
must be extracted every time you start the clustering
process. For large collections this could be a long task.
Giorgio

Using oracle text in apex report search

I am trying to use oracle text in apex, integrating it in an existing application. The idea is that it will allow to do a search in bigger textfields. Thats how I want it to get to work. In one of the oracle packaged applications oracle text is used as well, so I will have a look to that as well. I've addapted this search. I've added
AND t. contains(oplossing, :P15_OPLOSSING)
AND t.contains(sleutelwoorden, :P15_SLEUTELWOORDEN)
That didn't work, so I changed those two to:
AND t.oplossing = (t.contains(oplossing, :P15_OPLOSSING)>0)
AND t.sleutelwoorden = (t.contains(sleutelwoorden, :P15_SLEUTELWOORDEN)>0)
which didn't work either, which I expected to be the case. Clearly I'm not doing it correctly, I intend to look it up tonight in the packaged applications as I do want to findt it myself to.
But does anyone can give a hint, on what I am doing wrong ?
SELECT t.ticketid ticketnr, t.ticketid,
g.voornaam||' '||g.naam aangemaaktdoor,
t.credt, t.applicatiecd, t.titel,
s.statusdefoms,
si.statusdefoms instat,
NVL2(t.toegekend,'Y','N') toegekend,
sleutelwoorden, klantprioriteitid, oplossing, s.htmlkleur, si.htmlkleur inthtmlkleur
FROM ticket t,
gebruiker g,
status s,
status si
WHERE t.gebruikerid = g.gebruikerid
AND t.statusid = s.statusid
AND t.statusinternid = si.statusid (+)
AND t.applicatiecd = NVL(:P0_APPLICATIECD, :F101_APPLICATIECD)
AND (t.categorieid = :P15_CATEGORIEID OR NVL(:P15_CATEGORIEID, 0) = 0)
AND (t.moduleid = :P15_MODULEID OR NVL(:P15_MODULEID, 0) = 0)
AND (t.statusid = :P15_STATUSID OR NVL(:P15_STATUSID, 0) = 0)
AND (t.statusinternid = :P15_INTSTATUSID OR NVL(:P15_INTSTATUSID, 0) = 0)
AND (t.versieid = :P15_VERSIEID OR NVL(:P15_VERSIEID, 0) = 0)
AND t.ticketid LIKE '%'||:P15_TICKETID||'%'
AND t.gebruikerid = DECODE(NVL(:P15_GEBRUIKERID,0), 0, t.gebruikerid, :P15_GEBRUIKERID)
AND t.credt BETWEEN NVL(:P15_DATUMVAN, To_Date('01-01-1900', 'DD-MM-YYYY')) AND NVL(To_Date(:P15_DATUMTOT, 'DD-MM-YYYY'), sysdate) +1
AND t.titel LIKE '%'||:P15_TITEL||'%'
AND t. contains(oplossing, :P15_OPLOSSING)
AND t.contains(sleutelwoorden, :P15_SLEUTELWOORDEN)
AND PCK$Ticket_Admin.getklantid(t.gebruikerid) = DECODE(Pck$Ticket_Admin.isklantadminroleN(:APP_USER,NVL(:P0_APPLICATIECD, :F101_APPLICATIECD)), 1, PCK$Ticket_Admin.getklantid(:APP103_GEBRUIKERID), PCK$Ticket_Admin.getklantid(t.gebruikerid))
AND (:APP103_GEBRUIKERID IN (t.voor_gebruikerid, t.gebruikerid)
OR Pck$Ticket_Admin.isintern(:APP_USER,:P0_APPLICATIECD) = 1)
changed to:
AND t.oplossing = (t.contains(oplossing, :P15_OPLOSSING)>0)
AND t.sleutelwoorden = (t.contains(sleutelwoorden, :P15_SLEUTELWOORDEN)>0)

I have worked it further out now, and looked at the search of the packaged application. It turned out to be a pl/sql block . I used what I found in there to adapt the previous search. I added the following:
OR (CONTAINS(t.oplossing, :P15_OPLOSSING)>0)
OR (CONTAINS(t.sleutelwoorden, :P15_SLEUTELWOORDEN)>0)
     OR (CONTAINS(t.titel,:P15_SEARCH_T_O_S)>0 OR
     CONTAINS (t.oplossing, :P15_SEARCH_T_O_S)>0 OR
     CONTAINS(t.sleutelwoorden, :P15_SEARCH_T_O_S)>0 )
OR (CONTAINS(t.titel,:P15_SEARCH_T_O_S)>0 AND
     CONTAINS (t.oplossing, :P15_SEARCH_T_O_S)>0 AND
     CONTAINS(t.sleutelwoorden, :P15_SEARCH_T_O_S)>0 )
oplossing means solution
sleutelwoorden means keywords
titel means title
Yet this doesn't work yet. It gives an error message:
failed to parse SQL query:
ORA-01719: outer join operator (+) not allowed in operand of OR or IN
I've tried adding the addition in a different place, yet that gives the same error message. I'm not sure now.

Document management system using oracle text

i plan to create document management system using oracle text with following features
1) document comparision
2) document search
and more...
can oracle text be used to display documents of various formats by converting them to HTML. and can search keywords be highlighted in the document.
please help!

Have you ever considered doing this in Oracle Application Express (free on top of the Oracle database)? How about something like:
http://download-west.oracle.com/docs/cd/B31036_01/doc/appdev.22/b28839/up_dn_files.htm
Index the files using the CONTEXT index, and perhaps the docs' meta with it using the Oracle Text MULTI_COLUMN_DATASTORE, and then when you write your query for a report on the documents include a search string.
I've created a number of APEX-based document management systems and it is quite easy once you get the hang of using this environment. I suggest looking at some of the tutorials/how-to documents and you'll be on your way quickly.
Start with the upload application. Once you can get your documents in, create a report that shows everything except the document. Verify all of this works correctly.
Add some "items" to the page for the report, and include them as bind variables in the where clause.
After that, add your Oracle Text index to the database, and toss in a "text-field" item to the APEX page. Modify your report query, adding the CONTAINS clause, and use the newly created item as a bind variable. There's your keyword search.
Linking to Oracle Apps is done through API's and may be over database links.
Hope it helps. Though not a step-by-step how to document, this should point you in the right direction. Get familiar with APEX as that covers most of what you described.
-Ron

Upgrading Oracle Text - Post upgrade step 10.2 to 11.2

I already upgraded my 10.2.0.4 database to 11.2.0.1 and have to do post upgrade steps. In step 39 of Manual guideline (837570.1) is not clear for me. If some one can explain further would be appriciated. When i check my source ORACLE_HOME/ctx/admin/ctxf102.txt or ctxf102.sql
Step 39
Upgrading Oracle Text
Copy the following files from the previous Oracle home to the new Oracle home:
* Stemming user-dictionary files
* User-modified KOREAN_MORPH_LEXER dictionary files
* USER_FILTER executables
To obtain a list of the above files, use:
$ORACLE_HOME/ctx/admin/ctxf<version>.txt
$ORACLE_HOME/ctx/admin/ctxf<version>.sql
where version is 920,101,102
For instance, if upgrading from 10.2.0
*1. For dictionary files check*
*$ORACLE_HOME/ctx/admin/ctxf102.txt*
*2. Execute the script as database user SYS,SYSTEM, or CTXSYS*
*$ORACLE_HOME/ctx/admin/ctxf102.sql*
If your Oracle Text index uses KOREAN_LEXER which was deprecated in Oracle 9i and desupported in Oracle 10g Release 2, see below Note for further information on manual migration from KOREAN_LEXER to KOREAN_MORPH_LEXER.
Note 300172.1 Obsolescence of KOREAN_LEXER Lexer Type

Hi Srini
Thank you very much. now i got it.
Oracle asked me to identify the CTXCAT indexes with KOREAN_LEXER execute the following query as user CTXSYS: if nothing return then i can skip this step.
SELECT idx_name
FROM ctxsys.ctx_indexes
WHERE idx_type = 'CTXCAT'
AND idx_name IN
(SELECT ixo_index_name
FROM ctxsys.ctx_index_objects
WHERE ixo_class = 'LEXER'
AND ixo_object = 'KOREAN_MORPH_LEXER ');
SELECT isl_index_owner,isl_index_name,isl_language
FROM CTXSYS.ctx_index_sub_lexers
WHERE isl_object = 'KOREAN_MORPH_LEXER';

Oracle Text and Order By

In the Portal Search Properties you can turn on Oracle
Text Searching. When reading the help page for that
page you can follow a link at the bottom to a help page
called "Performing a custom search". In the middle
of that page there is a section called "Order By List".
The third paragraph contains this sentence: "If Oracle
Text is enabled, this option does not appear in the
search submission portlet.".
What is seems to mean is that if you turn on Oracle Text
the developer or user can no longer have control of the
order of found items.
Is there really no way (even undocumented) of ordering
found items when Oracle Text is used?
As I have custom attributes on my custom items I must
use Oracle Text if I want a search to work on those
attributes, right?
I have added a hidden field called p_order_by_attribute
in my search form with the value "3,0" that should mean
Display Name but without effect.
Kind regards
Tomas Albinsson
Stockhlm, Sweden

When Oracle Text is enabled there is no way to order search results as they will always be ordered by Oracle text score.
Enabling Order By feature with Oracle Text on is a planned feature for a future release.

Saving queries executed in Oracle Text

I am gathering information on Oracle Text for use as a text query capability. My users would like to be able to execute the same queries so would like the ability to save their query parameters. Is this available in Oracle Text?

Yes - you can save queries as Stored Query Expressions.

10gR2, Installation of Oracle Text: Data Mining license required?

Hello all,
1. I wonder if I have to obtain a Data Mining license if I want to use Oracle Text. Using DBCA I have to check "Data Mining" to be able to check "Oracle Text".
2. Does Oracle Text work, when I previously removed data mining using "make -f ins_rdbms.mk dm_off; make -f ins_rdbms.mk ioracle"?
Regards, Heiko

Hello,
I hava an additional question: how can I find out, which licenses I need?
The installer forces me to install "Data Mining" to use "Oracle Text", but I can´t find any information why. Does Text use "Data Mining" functionality? Is it just a bug in the GUI? How can I verify, that I don´t have to get licenses for "Data Mining"?
Thank you
Klaus

Oracle Text with eBusiness Suite

I'm trying to understand if Oracle eBusiness Suite uses Oracle Text for indexing certain queries such as item description, etc.. I was told that iStore uses it for searching on items. Is that true? Does the Oracle Apps standard Item form use it? Can it be set up for use with the apps? I know the query syntax is different (CONTAINS), and I don't see any non standard indexes on MTL_SYSTEM_ITEMS_B, so I'm not sure how I could implement the functionality in the apps. Please help!!!!

Yes, Oracle Text is used in iStore Search Feature, Refer to code under $JAVA_TOP/oracle/apps/ibe/catalog/Search.java for an example, also look at table ibe_ct_imedia_search table and its indexes which gives an idea of how Oracle iStore is leveraging Oracle Test for Searching.
get the data into a staging table and then create whichever index you want, dont modify the indexes on key tables like MTL_SYSTEM_ITEMS_B

Oracle text clustering - use of Stemming

Similar Messages

Maybe you are looking for