Using Oracle Text with CLOB field containing multiple languages

I'm using Oracle 10g (NLS_CHARACTERSET is set to. AL32UTF8) and have a table with a CLOB field which is storing text written in either English and/or Simplified Chinese.
The following index has been created on this field:
CREATE INDEX text_index
ON text_table(text_field)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('FILTER CTXSYS.INSO_FILTER');
I'm having issues in returning text which matches the Chinese text using the CONTAINS operator. For some reason the following query is returning rows which do not contain any Chinese text:
SELECT *
FROM text_table
WHERE contains(text_field,'炫%') > 1;
A newsgroup user advised me to produce an explain plan using ctx_query.explain.
I created 2 explain plans, one which was searching the index for 'A%' and the other searching for the Simplified Chinese character '炫%'. The results for the first test were as expected whereby the values contained within the OBJECT_NAME field all began with the letter 'A'.
The second test however produced somewhat unexpected results. The OBJECT_NAME field this time contained various words, both English and Simplified Chinese. I could be wrong but it appeared to store every individual word in the CLOB field. Both tests produced different EQUIVALENCE rows, the first test was:
OPTIONS = Null
OBJECT_NAME = A%
Whereas the second test produced:
OPTIONS = (?)
OBJECT_NAME = %
Am I right in thinking the Simplified Chinese character is for some reason being converted to a '?' character?
Any help on this will be much appreciated.

As you're not specifying a lexer to use, it will use the BASIC_LEXER, designed for space-separated European-type languages. This won't work effectively with Chinese.
If you know which documents are Chinese and which are English, you can write this into a LANGUAGE column and use the MULTI_LEXER - this will allow you to specify BASIC_LEXER for the English texts, and CHINESE_LEXER or CHINESE_VGRAM_LEXER for the Chinese texts.
If you don't know the language, you must use either WORLD_LEXER (10g) or AUTO_LEXER (11g). These lexers will automatically determine the language of the documents and index them appropriately. In general. MULTI_LEXER will be faster and more accurate than either of the automatic alternatives.
When querying for Chinese characters you need to be very careful with your NLS_LANG settings. You need to make sure that the character set defined in NLS_LANG is the same as the character set from which you've pasted (or typed) the chinese characters.
The "?" in output usually just means "I don't know how to translate this character into your output character set". Sometimes it may appear as a reversed question mark.

Similar Messages

Using Oracle Text with Apex

Can someone point me to some resources on how to integrate Oracle Text and APEX to do searches, highlight results, etc (all the features of Oracle Text)?
The data to be indexed is in files on the filesystem, so I would like to keep it that way and use the FILE_DATASTORE option for Text.
Thanks for any pointers.
Update: Yes, I did see http://www.oracle.com/technology/products/database/application_express/pdf/apex_text_application_v1.6.pdf
but the search results there just returns the URL/file containing the "hit". It doesn't show the actual text fragment that caused the match, doesn't highlight it, etc. I am looking for a real Google-like search. Hm, having said that, I might as well use Google Desktop! Nah, where's the fun in that?

This is a very simple application for my own use. It started life in 8i when there were fewer Text options.
As such, it uses the query string as entered. This returns all of the matches:
select msgid, msgdate, Box, fromaddr, subject
from eudora.inbox
where contains(body, :P703_MailSearch) > 0
order by msgdate descI display the selected result like this:
select subject,
Replace(eudora.mmarkup(:P704_MSGID, :P702_SEARCH), Chr(13), '<BR>') Body
from eudora.inbox
where msgid = :P704_MSGIDIn a newer application, I experimented with the CTXCAT grammer.
That query looks like this:
select m.ID, m.pdpno, m.shortdesc
from pdp_mast m
where contains(m.dphistory, '<query><textquery lang="ENGLISH" grammar="CTXCAT">
                                         ' || :P1_Text || '
                                     </textquery>
                                  <score datatype="INTEGER"/>
                              </query>') > 0
    or contains(m.shortdesc, '<query><textquery lang="ENGLISH" grammar="CTXCAT">
                                         ' || :P1_Text || '
                                     </textquery>
                                  <score datatype="INTEGER"/>
                              </query>') > 0As always, once you figure out the syntax, its easy to make it work in Apex.
Text indexes are very fast. On my old 600MHz PC, searches in 250MB of text take less than a second.

Data moving between Oracle 10g with CLOB fields

Hi all,
I have a trouble in migrate data between Oracle 10g with different platform. The worst thing is that I don't have a DBA account for the DB importing data. I think I can't use IMP or IMPDP in this way. So I seek help from sql developer, but it seems having another problem with CLOB data moving.
Re: EA2 : SQL Developer 1.5 : export data CLOB columns
in the thread, I found someone wrote this:
- SQL Developer v1.5 EA2 - exports first 4000 chars (which is anyway too small for me, because my CLOBs are larger - if they were smaller, I would have made them VARCHAR2s instead!).
I would like to ask:
1, What SQL Developer v1.5 EA2 is? Is it the Data Modeling one?
2, How to export table data with CLOB with SQL Developer v1.5 EA2? Since 4000 chars is enough for me to use.
OR any other method to export CLOB but not IMP / IMPDP ?
Now I am using the spool of sqlplus to export CLOB.
Many thanks,

1. The EAs are Early Adapters, betas, so you should expect the same behaviour from the latest 1.5.4 production release.
2. Exports can be done with the Database Export tool, through the table's context menu in the navigator tree or the result grid's context menu.
Have fun,
K.

Using Oracle Text with MS WORD

Hi,
We have just installed Text and we want to use it for indexing hundreds of MS WORD documents that are in the same directory. But I could not find a document / example about indexing/filtering Word Documents. I will be grateful iy you can help me for finding these..
Thanks..

You could specify INSO_FILTER for FILTER preference
in command for creating index
(see
http://otn.oracle.com/products/text/x/samples/indexing/filters/inso_filter/inso_filter_idx.sql) or
use USER filter.
For example see also
http://otn.oracle.com/products/text/x/samples/indexing/filters/INSO_Filter/index.html
(for loading data you could use any other tools than SQL*Loader)
Regards, Victor.

Using Oracle Text for searching with UCM 10g

I am using Oracle text with UCM 10gR3 and Site Studio 10gR4 and I am trying to sort the search results by relevancy and to also include a snippet of the retrieved document. I have the fields that the SS_GET_SEARCH_RESULTS service returns but the relevancy score is always equals 5 and the snippet contains characters such as < idcnull, /p, etc., which you can see are XML/HTML/UCM tags but which result sin even more strangeness in the snippet if I try to remove them programmatically.
I have read the Oracle Text documentation and there appear to be ways you can configure Oracle Text but I am not clear at all on what I can do from UCM. It looks like the configuration is either done in database tables or in the query itself, neither of which are readily configurable to me.
Is anyone experienced in this or know of any documentation this might help?
Bill

Hi
If I remember correctly then this issue was seen with an older version of OTS component and Core Update patch / bundle . Upgrade the UCM instance with the latest CS10gr35 update bundle patchset 6907073 and also upgrade OTS component from the same patchset .
Let me know how it goes after this .
Thanks
Srinath

Document management system using oracle text

i plan to create document management system using oracle text with following features
1) document comparision
2) document search
and more...
can oracle text be used to display documents of various formats by converting them to HTML. and can search keywords be highlighted in the document.
please help!

Have you ever considered doing this in Oracle Application Express (free on top of the Oracle database)? How about something like:
http://download-west.oracle.com/docs/cd/B31036_01/doc/appdev.22/b28839/up_dn_files.htm
Index the files using the CONTEXT index, and perhaps the docs' meta with it using the Oracle Text MULTI_COLUMN_DATASTORE, and then when you write your query for a report on the documents include a search string.
I've created a number of APEX-based document management systems and it is quite easy once you get the hang of using this environment. I suggest looking at some of the tutorials/how-to documents and you'll be on your way quickly.
Start with the upload application. Once you can get your documents in, create a report that shows everything except the document. Verify all of this works correctly.
Add some "items" to the page for the report, and include them as bind variables in the where clause.
After that, add your Oracle Text index to the database, and toss in a "text-field" item to the APEX page. Modify your report query, adding the CONTAINS clause, and use the newly created item as a bind variable. There's your keyword search.
Linking to Oracle Apps is done through API's and may be over database links.
Hope it helps. Though not a step-by-step how to document, this should point you in the right direction. Get familiar with APEX as that covers most of what you described.
-Ron

Using oracle text on a non-materialized view

I'm having trouble tracking down an error when using oracle text on a non-materialized view (indexes are on the referenced columns). My database has a users table and a user history table which saves the old values when a user profile changes. My view performs a "union all" so I can select from both at once.
I would like to use oracle text to perform a "contains" on the view whenever someone signs up to see if any current users or historical entries contain the desired username.
The following works fine:
contains(user_history_view, 'bill')but when I reference anything in the contains clause, i get a "column is not indexed" error:
contains(user_history_view, signup.user_name) --signup.username is 'bill'Here is a stripped-down demonstration (I am using version 10.2.0.4.0)
create table signup (
signup_id   number(19,0) not null,
signup_name varchar2(255),
primary key (signup_id)
create table users (
user_id   number(19,0) not null,
user_name varchar2(255),
primary key (user_id)
create table user_history (
history_id number(19,0) not null,
user_id    number(19,0) not null,
user_name varchar2(255),
primary key (history_id),
foreign key (user_id) references users on delete set null
create index user_name_index on users(user_name)
indextype is ctxsys.context parameters ('sync (on commit)');
create index user_hist_name_index on user_history(user_name)
indextype is ctxsys.context parameters ('sync (on commit)');
create index signup_name_index on signup(signup_name)
indextype is ctxsys.context parameters ('sync (on commit)');
create or replace force view user_history_view
(user_id, user_name, flag_history) as
select user_id, user_name, 'N' from users
union all
select user_id, user_name, 'Y' from user_history;
--user bill changed his name to bob, and there is a pending signup for another bill
insert into users(user_id, user_name) values (1, 'bob');
insert into user_history(history_id, user_id, user_name) values (1, 1, 'bill');
insert into signup(signup_id, signup_name) values(1, 'bill');
commit;
--works
select * from user_history_view users, signup new_user
where new_user.signup_id = 1
and contains(users.user_name, 'bill')>0;
--fails
select * from user_history_view users, signup new_user
where new_user.signup_id = 1
and contains(users.user_name, new_user.signup_name)>0;I could move everything into a materialized view, but querying against real-time data like this would be ideal. Any help would be greatly appreciated.

Hi,
this is to my knowledge not possible. It is hard for Oracle to do, think about a table with many rows, every row with that column must be checked. So I think only a single varchar2 is possible. Maybe for you will a function work. It is possible to give a function as second parameter.
function return_signup
return varchar2
is
l_signup_name signup.signup_name%type;
begin
select signup_name
into l_signup_name
from signup
where signup_id = 1
and rownum = 1
return l_signup_name;
exception
when no_data_found
then
    l_signup_name := 'abracadabra'; -- hope does not exist
    return l_signup_name;
end;Now you can use above function in the contains.
select * from user_history_view users --, signup new_user
--where new_user.signup_id = 1
where contains(users.user_name, return_signup)>0;I didn't test the code! Maybe you have to adjust the function for your needs. But it is a idea how this can be done.
Otherwise you must make the check by normaly check the columns by simple using a join:
select * from user_history_view users, signup new_user
where new_user.signup_id = 1
and users.user_name = new_user.signup_name;Herald ten Dam
htendam.wordpress.com

Oracle Text with Oracle TimesTen

Hi!
I'm trying to use Oracle Text with Oracle TimesTen In-Memory. In this customer, we are using Oracle Text to index the names of the company clients. There are about 13 million names to index. We're trying to speed up even more the search using Oracle TimesTen.
Does anybody as any experience using simultanely this two technologies?
Thanks in advance
Tiago Soares

TimesTen doesn't support the CONTEXT indextype or CONTAINS clause (or other domain indexes/operators), so you can't create Oracle Text indexes in it.

Searching Web Apps with Data Source fields containing multiple values

I have a Web App with a field allowing multiple values to be entered similar to the checkbox list. I need to restrict allowed values to a large, finite list of values currently stored in another Web App as the data source. I can't apply the Data Source field type as that only allows single value selection. I also need to be able to use the Web App Search form to search for items containing 1 OR more values in this field (the search functionality of a checklist field type). Here's what I've tried for field types:
Text (string) or Text (multiline) field type - By saving a list of comma separated values (the same way that checkbox list outputs) to a text input or textarea, the search logic only searches for exact string (including commas) and doesn't parse the individual values.
List (checkbox list) field type - This allows me to search multiple values using OR logic, but the web app will only store values that have been entered as options in the actual web app field setup. I tried using a checkbox list with minimal or empty options hoping that whatever values I sent over in a comma separated string value would still get stored, but because the values came from my Web App data source and not the list of options stored with the field, they were not saved.
Has anyone found a way to do this?
My other question is about how I might use a similar multi-value field as described above but return search results containing items with ALL selected values for that field (AND logic).
Can anyone enlighten me to the inner workings of BC web app search logic?

Thanks Robert.
You'll need to create your own interface to the webapp database for those kind of data operations
by this, are you speaking of the internal BC database which stores web app schema data? That would be great if it were possible to update that programmatically because I need to use the List (Checkbox List) field type (for the search functionality), but I need to supply the checkbox options from a web app rather than by manually updating the list entered in the Fields view of the web app settings (shown below).
I'm curious if anyone else has tried this?
Again, my reason for needing to use the List (Checkbox List) field type is that the page which processes searches knows to expect a comma separated list for this field type and then appears to be parsing out the individual values for searching out web app items with 1 or more matching values. You're right that text fields (string and multiline) just check for 'string contains' matches, and this would be ok if I was only ever needing to search just one value at a time. Here's an example of what I might do:
Web App item field value (as recorded against the List (Checkbox List) field type:
8294877,8294878
Web App Search value (for this same field):
8294879,8294877,8294885
The search would return this web app item because the field contains 2 (1 or more) individual values even though they were entered into the search field in a different order. If this web app item were just a Text (string or multiline) field, the searched value is not a substring of the web app item's stored value, so it would not find a match. Hence the need to use Checkbox List field type.
The web app will have thousands if not 10s of thousands of records, so dumping them all into one big array or object and searching on the front-end won't be practical (though it works great on smaller datasets).

Getting table script using dbms_metadata.get_ddl, but with clob field

So, Oracle 11g R2..
I'm using dbms_metadata.get_ddl to get table scripts and it's working fine..
now, I have a table with clob field, and it's not working... I got an 'missing right parenthesis (ora-0907)' error...
I could paste a script that I got, but I don't think it makes any sense..
does anybody have some experience on using this package on clob tables?
tnx

this is script that I got... it's long, and it looks like it's not good
CREATE TABLE "COMMON"."TEST_AAA2"
   (    "ID" NUMBER(10,0),
    "TEKST" VARCHAR2(200 CHAR),
    "UPDATESTAMP" DATE,
    "OBJEKAT" CLOB,
     CONSTRAINT "TEST_PART_PK2" PRIMARY KEY ("ID")
USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "USERS"
ALTER INDEX "COMMON"."TEST_PART_PK2" UNUSABLE ENABLE
   ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "COMMON_DATA"
LOB ("OBJEKAT") STORE AS BASICFILE (
ENABLE STORAGE IN ROW CHUNK 8192 RETENTION
NOCACHE LOGGING
STORAGE(
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT))
PARTITION BY RANGE ("UPDATESTAMP")
(PARTITION "P_201012" VALUES LESS THAN (TO_DATE(' 2011-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "COMMON_DATA"
LOB ("OBJEKAT") STORE AS BASICFILE (
TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192 PCTVERSION 10
NOCACHE LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) NOCOMPRESS ,
PARTITION "P_201101" VALUES LESS THAN (TO_DATE(' 2011-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "COMMON_DATA"
LOB ("OBJEKAT") STORE AS BASICFILE (
TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192 PCTVERSION 10
NOCACHE LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) NOCOMPRESS ,
PARTITION "P_201102" VALUES LESS THAN (TO_DATE(' 2011-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "COMMON_ARCHIVE"
LOB ("OBJEKAT") STORE AS BASICFILE (
TABLESPACE "COMMON_ARCHIVE" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION
NOCACHE LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) NOCOMPRESS ,
PARTITION "P_201103" VALUES LESS THAN (TO_DATE(' 2011-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "COMMON_DATA"
LOB ("OBJEKAT") STORE AS BASICFILE (
TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192 PCTVERSION 10
NOCACHE LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) NOCOMPRESS ,
PARTITION "P_201104" VALUES LESS THAN (TO_DATE(' 2011-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "COMMON_ARCHIVE"
LOB ("OBJEKAT") STORE AS BASICFILE (
TABLESPACE "COMMON_ARCHIVE" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION
NOCACHE LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) NOCOMPRESS ,
PARTITION "P_201105" VALUES LESS THAN (TO_DATE(' 2011-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "COMMON_DATA"
LOB ("OBJEKAT") STORE AS BASICFILE (
TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192 PCTVERSION 10
NOCACHE LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) NOCOMPRESS ,
PARTITION "P_201106" VALUES LESS THAN (TO_DATE(' 2011-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "COMMON_ARCHIVE"
LOB ("OBJEKAT") STORE AS BASICFILE (
TABLESPACE "COMMON_ARCHIVE" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION
NOCACHE LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) NOCOMPRESS ,
PARTITION "P_201107" VALUES LESS THAN (TO_DATE(' 2011-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "COMMON_DATA"
LOB ("OBJEKAT") STORE AS BASICFILE (
TABLESPACE "COMMON_ARCHIVE" ENABLE STORAGE IN ROW CHUNK 8192 PCTVERSION 10
NOCACHE LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) NOCOMPRESS ,
PARTITION "P_MAXVALUE" VALUES LESS THAN (MAXVALUE)
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "COMMON_DATA"
LOB ("OBJEKAT") STORE AS BASICFILE (
TABLESPACE "COMMON_ARCHIVE" ENABLE STORAGE IN ROW CHUNK 8192 PCTVERSION 10
NOCACHE LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) NOCOMPRESS )

Problem with blob column index created using Oracle Text.

Hi,
I'm running Oracle Database 10g 10.2.0.1.0 standard edition one, on windows server 2003 R2 x64.
I have a table with a blob column which contains pdf document.
Then, I create an index using the following script so that I can do fulltext search using Oracle Text.
CREATE INDEX DMCS.T_DMCS_FILE_DF_FILE_IDX ON DMCS.T_DMCS_FILE
(DF_FILE)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('DATASTORE CTXSYS.DEFAULT_DATASTORE');
However, the index is not searchable and I check the following tables created by database for my index and found them to be empty as well !!
DR$T_DMCS_FILE_DF_FILE_IDX$I
DR$T_DMCS_FILE_DF_FILE_IDX$K
DR$T_DMCS_FILE_DF_FILE_IDX$N
DR$T_DMCS_FILE_DF_FILE_IDX$R
I wonder what's wrong with it.
My user has been granted the ctx_app role and I have other tables that store plain text which I use Oracle Text are fine. I even output the blob column and save as pdf file and they are fine.
However the database seems like not indexing my blob column although the index can be created without error.
Please advise.
Really appreciate anyone who can help.
Thank you.

The situation is I have already loaded a few pdf document into the table's blob column.
After I create the Oracle text index on this blob column, I find the system generated index tables listed in my earlier posting are empty, except for the 4th table.
Normally we'll see words inside the table where those are the words indexed by oracle text on my document.
As a result, no matter how i search for the index using select statement with contains operator, it will not give me any result.
I feel weird why the blob is not indexed. The content of the blob are actually valid because I tested this by export the content back to pdf and I can still view and search within the pdf.
Regards,
Jap.

Address Matching Using Oracle Text

Hi,
I am a newbie to Oracle Text. Hence please pardon my ignorance if this is a "RTFM" Query.
We would like to clash our customer addresses against a table (GEO) that has addresses and geographic coordinates (latitude, longitude etc). The customer address data are in four VARCHAR2 columns (address1, address2, address3 & address4) and need not be in the same order as the address data in the GEO table.
Has anybody used Oracle Text to do similar work to score addresses?
Any links & pointers will be highly appreciated.
Thanks in advance.
Best Regards
Ramdas

When you use a multi_column_datastore, even though the index is only created using one column name and only that column name is used in the contains clause, it searches all columns named in the multi_column_datastore. It may be clearer if you use a dummy column with a name that has more meaning; I have used addresses in the revised example below.
Queries run faster if you put everything in one contains clause instead of using multiple contains clauses.
If you add a section group and field sections, then you can search within those sections and apply weights. In the following example I have doubled the score for results in address1 and address2 and halved the score for results in address3 and address4.
SCOTT@orcl_11gR2> CREATE TABLE customers
2    (id       NUMBER,
3      address1 VARCHAR2(30),
4      address2 VARCHAR2(30),
5      address3 VARCHAR2(30),
6      address4 VARCHAR2(30),
7      addresses VARCHAR2(1))
8 /
Table created.
SCOTT@orcl_11gR2> INSERT INTO customers VALUES
2    (1, '123 Somewhere, Someplace', 'nowhere', 'nowhere', 'nowhere', null)
3 /
1 row created.
SCOTT@orcl_11gR2> INSERT INTO customers VALUES
2    (2, 'nowhere', 'nowhere', 'nowhere', '123 Somewhere, Someplace', null)
3 /
1 row created.
SCOTT@orcl_11gR2> INSERT INTO customers VALUES
2    (3, 'nowhere', '500 Oracle Pkwy', 'nowhere', 'nowhere', null)
3 /
1 row created.
SCOTT@orcl_11gR2> INSERT INTO customers VALUES
2    (4, 'nowhere', 'nowhere', '500 Oracle Pkwy', 'nowhere', null)
3 /
1 row created.
SCOTT@orcl_11gR2>
SCOTT@orcl_11gR2> BEGIN
2    CTX_DDL.CREATE_PREFERENCE
3       ('cust_datastore',
4        'MULTI_COLUMN_DATASTORE');
5    CTX_DDL.SET_ATTRIBUTE
6       ('cust_datastore',
7        'COLUMNS',
8        'address1, address2, address3, address4');
9    CTX_DDL.CREATE_SECTION_GROUP
10       ('cust_sg', 'BASIC_SECTION_GROUP');
11    CTX_DDL.ADD_FIELD_SECTION ('cust_sg', 'address1', 'address1', true);
12    CTX_DDL.ADD_FIELD_SECTION ('cust_sg', 'address2', 'address2', true);
13    CTX_DDL.ADD_FIELD_SECTION ('cust_sg', 'address3', 'address3', true);
14    CTX_DDL.ADD_FIELD_SECTION ('cust_sg', 'address4', 'address4', true);
15 END;
16 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> CREATE INDEX customers_idx
2 ON customers (addresses)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS
5    ('DATASTORE     cust_datastore
6       SECTION GROUP cust_sg')
7 /
Index created.
SCOTT@orcl_11gR2> CREATE TABLE geo
2    (address      VARCHAR2(40),
3      coordinates VARCHAR2(30))
4 /
Table created.
SCOTT@orcl_11gR2> INSERT INTO geo VALUES
2    ('500 Oracle Pkwy, Redwood City, CA 94065',
3      '37° 31'' N / 122° 15'' W')
4 /
1 row created.
SCOTT@orcl_11gR2> INSERT INTO geo VALUES
2    ('123 Somewhere Street, Someplace City, CA',
3      NULL)
4 /
1 row created.
SCOTT@orcl_11gR2> SELECT SCORE(1), c.*, g.address, g.coordinates
2 FROM   customers c,
3          (SELECT address,
4               '(' || REPLACE (REPLACE (address, ' ', ','), ',,', ',') || ')' addr,
5               coordinates
6           FROM   geo) g
7 WHERE CONTAINS
8            (c.addresses,
9             '((' || g.addr || ' WITHIN address1 OR ' ||
10                  g.addr || ' WITHIN address2) * 2) OR ' ||
11             '((' || g.addr || ' WITHIN address3 OR ' ||
12                  g.addr || ' WITHIN address4) * 0.5)',
13             1) > 0
14 ORDER BY SCORE(1) DESC
15 /
SCORE(1)         ID ADDRESS1
ADDRESS2                       ADDRESS3
ADDRESS4                       A ADDRESS
COORDINATES
        68          1 123 Somewhere, Someplace
nowhere                        nowhere
nowhere                          123 Somewhere Street, Someplace City, CA
        59          3 nowhere
500 Oracle Pkwy                nowhere
nowhere                          500 Oracle Pkwy, Redwood City, CA 94065
37° 31' N / 122° 15' W
        17          2 nowhere
nowhere                        nowhere
123 Somewhere, Someplace         123 Somewhere Street, Someplace City, CA
        15          4 nowhere
nowhere                        500 Oracle Pkwy
nowhere                          500 Oracle Pkwy, Redwood City, CA 94065
37° 31' N / 122° 15' W
4 rows selected.

Need help for SQL SELECT query to fetch XML records from Oracle tables having CLOB field

Hello,
I have a scenario wherein i need to fetch records from several oracle tables having CLOB fields(which is holding XML) and then merge them logically to form a hierarchy XML. All these tables are related with PK-FK relationship. This XML hierarchy is having 'OP' as top-most root node and ‘DE’ as it’s bottom-most node with One-To-Many relationship. Hence, Each OP can have multiple GM, Each GM can have multiple DM and so on.
Table structures are mentioned below:
OP:
Name                             Null                    Type
OP_NBR                    NOT NULL      NUMBER(4)    (Primary Key)
OP_DESC                                        VARCHAR2(50)
OP_PAYLOD_XML                           CLOB
GM:
Name                          Null                   Type
GM_NBR                  NOT NULL       NUMBER(4)    (Primary Key)
GM_DESC                                       VARCHAR2(40)
OP_NBR               NOT NULL          NUMBER(4)    (Foreign Key)
GM_PAYLOD_XML                          CLOB
DM:
Name                          Null                    Type
DM_NBR                  NOT NULL         NUMBER(4)    (Primary Key)
DM_DESC                                         VARCHAR2(40)
GM_NBR                  NOT NULL         NUMBER(4)    (Foreign Key)
DM_PAYLOD_XML                            CLOB
DE:
Name                          Null                    Type
DE_NBR                     NOT NULL           NUMBER(4)    (Primary Key)
DE_DESC                   NOT NULL           VARCHAR2(40)
DM_NBR                    NOT NULL           NUMBER(4)    (Foreign Key)
DE_PAYLOD_XML                                CLOB
+++++++++++++++++++++++++++++++++++++++++++++++++++++
SELECT
j.op_nbr||'||'||j.op_desc||'||'||j.op_paylod_xml AS op_paylod_xml,
i.gm_nbr||'||'||i.gm_desc||'||'||i.gm_paylod_xml AS gm_paylod_xml,
h.dm_nbr||'||'||h.dm_desc||'||'||h.dm_paylod_xml AS dm_paylod_xml,
g.de_nbr||'||'||g.de_desc||'||'||g.de_paylod_xml AS de_paylod_xml,
FROM
DE g, DM h, GM i, OP j
WHERE
h.dm_nbr = g.dm_nbr(+) and
i.gm_nbr = h.gm_nbr(+) and
j.op_nbr = i.op_nbr(+)
+++++++++++++++++++++++++++++++++++++++++++++++++++++
I am using above SQL select statement for fetching the XML records and this gives me all related xmls for each entity in a single record(OP, GM, DM. DE). Output of this SQL query is as below:
Current O/P:
<resultSet>
     <Record1>
          <OP_PAYLOD_XML1>
          <GM_PAYLOD_XML1>
          <DM_PAYLOD_XML1>
          <DE_PAYLOD_XML1>
     </Record1>
     <Record2>
          <OP_PAYLOD_XML2>
          <GM_PAYLOD_XML2>
          <DM_PAYLOD_XML2>
          <DE_PAYLOD_XML2>
     </Record2>
     <RecordN>
          <OP_PAYLOD_XMLN>
          <GM_PAYLOD_XMLN>
          <DM_PAYLOD_XMLN>
          <DE_PAYLOD_XMLN>
     </RecordN>
</resultSet>
Now i want to change my SQL query so that i get following output structure:
<resultSet>
     <Record>
          <OP_PAYLOD_XML1>
          <GM_PAYLOD_XML1>
          <GM_PAYLOD_XML2> .......
          <GM_PAYLOD_XMLN>
          <DM_PAYLOD_XML1>
          <DM_PAYLOD_XML2> .......
          <DM_PAYLOD_XMLN>
          <DE_PAYLOD_XML1>
          <DE_PAYLOD_XML2> .......
          <DE_PAYLOD_XMLN>
     </Record>
     <Record>
          <OP_PAYLOD_XML2>
          <GM_PAYLOD_XML1'>
          <GM_PAYLOD_XML2'> .......
          <GM_PAYLOD_XMLN'>
          <DM_PAYLOD_XML1'>
          <DM_PAYLOD_XML2'> .......
          <DM_PAYLOD_XMLN'>
          <DE_PAYLOD_XML1'>
          <DE_PAYLOD_XML2'> .......
          <DE_PAYLOD_XMLN'>
     </Record>
<resultSet>
Appreciate your help in this regard!

Hi,
A few questions :
How's your first query supposed to give you an XML output like you show ?
Is there something you're not telling us?
What's the content of, for example, <OP_PAYLOD_XML1> ?
I don't think it's a good idea to embed the node level in the tag name, it would make much sense to expose that as an attribute.
What's the db version BTW?

Searching using Oracle Text instead of LIKE '%'

Hello all,
I hope you help me in this:
I have a table looks like this
create table subscribers (
id numer(10),
first_name varchar2(30),
father_name varchar2(30),
grandfather_name varchar2(30),
last_name varchar2(30))
The application is built using Oracle Forms. Many times, the end users are not so sure of the spelling of the name, therefore they use the "%" wildcard with name fields. This will be reflected to the queries the application will send them to the Oracle Server.
We have the following queries
1) select *
from subscribers
where last_name like '%family_name%';
2) select *
from subscribers
where last_name like 'family_name%';
3) select *
from subscribers
where last_name like '%family_name%' and first_name like '%first_name%';
4) select *
from subscribers
where last_name like 'family_name%' and first_name like 'first_name%';
As well as searching on the father_name and grandfather_name fields. But most of the search are on the first_name and the last_name.
These queries are killing the server since we have millions of records. BTree indexes will not help here because of the LIKE and the "%"
I am thinking to use Oracle Text here, but I am not sure whether I have to go for a CONTEXT index on each individual column, or I can use the MULTI_COLUMN_DATASTORE indexing.
Any idea will be appreciated

The ctxcat index and catsearch operator are generally intended for usage with one text column and one or more columns of structured data. You would have to pick just one of your columns as the text column and the others as structured columns. I would be more inclined to use the multi_column_datastore with a context index and contains operator, so that you can search all of your columns as text columns.

Understanding logminer results -- inserting row into table with CLOB field

In using log miner I have noticed that inserts into rows that contain a CLOB (I assume this applies to other LOB type fields as well, have only tested with CLOB so far) field are actually recorded as two DML entries.
--the first entry is the insert operation that inserts all values with an EMPTY_CLOB() for the CLOB field
--the second entry is the update that sets the actual CLOB value (+this is true even if the value of the CLOB field is not being set explicitly+)
This separation makes sense as there may be separate locations that the values are being stored etc.
However, what I am tripping over is the fact the first entry, the Insert, has a RowId value of 'AAAAAAAAAAAAAAAAAA' which is invalid if I attempt to use it in a flashback query such as:
SELECT * FROM PERSON AS OF SCN #####' where RowId = 'AAAAAAAAAAAAAAAAAA'The second operation, the Update of the CLOB field, has the valid RowId.
Now, again, this makes sense if the insert of the new row is not really considered "+done+" until the two steps are done. However, is there some way to group these operations together when analyzing the log contents to know that these two operations are a "+matched set+"?
Not a total deal breaker, but would be nice to know what is happening under the hood here so I don't act on any false assumptions.
Thanks for any input.
To replicate:
Create a table with CLOB field:
CREATE TABLE DEVUSER.TESTTABLE
        ID NUMBER
       , FULLNAME VARCHAR2(50)
      , AGE NUMBER
      , DESCRIPTION CLOB
       );Capture the before SCN:
SELECT DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER FROM DUAL;Insert a new row in the test table:
INSERT INTO TESTTABLE(ID,FULLNAME,AGE) VALUES(1,'Robert BUILDER',35);
     COMMIT;Capture the after SCN:
SELECT DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER FROM DUAL;Start logminer session with the bracketing scn values and options etc:
EXECUTE DBMS_LOGMNR.START_LOGMNR(STARTSCN=>2619174, ENDSCN=>2619191, -
           OPTIONS => DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG + DBMS_LOGMNR.CONTINUOUS_MINE + -
           DBMS_LOGMNR.COMMITTED_DATA_ONLY + DBMS_LOGMNR.NO_ROWID_IN_STMT + DBMS_LOGMNR.NO_SQL_DELIMITER)Query the logs for the changes in that range:
SELECT
       commit_scn, xid,operation,table_name,row_id
       ,sql_redo,sql_undo, rs_id,ssn
       FROM V$LOGMNR_CONTENTS
    ORDER BY xid asc,sequence# ascResults:
2619178     0C00070028000000     START                  AAAAAAAAAAAAAAAAAA     set transaction read write
2619178     0C00070028000000     INSERT     TESTTABLE     AAAAAAAAAAAAAAAAAA     insert into "DEVUSER"."TESTTABLE" ...
2619178     0C00070028000000     UPDATE     TESTTABLE     AAAFEXAABAAALEJAAB     update "DEVUSER"."TESTTABLE" set "DESCRIPTION" = NULL ...
2619178     0C00070028000000     COMMIT                  AAAAAAAAAAAAAAAAAA     commitEdited by: 958701 on Sep 12, 2012 9:05 AM
Edited by: 958701 on Sep 12, 2012 9:07 AM

Scott,
Thanks for the reply.
I am inserting into the table over a database link.
I am using the new version of HTML Db (2.0)
HTML Db is connected to an Oracle 10 database I think, however the table I am trying to insert data into (via the database link) is in an Oracle 8 database - this is why we created a link to it as we couldn't have the HTML Db interacting with the Oracle 8 database directly due to compatibility problems (or so I've been told)
Simon

Using Oracle Text with CLOB field containing multiple languages

Similar Messages

Maybe you are looking for