FILE_DATASTORE??

3.3.2.1.4 Specifying File Data Storage
The following example creates a data storage preference using the FILE_DATASTORE. This tells the system that the files to be indexed are stored in the operating system. The example uses CTX_DDL.SET_ATTRIBUTE to set the PATH attribute of to the directory /docs.
begin
ctx_ddl.create_preference('mypref', 'FILE_DATASTORE');
ctx_ddl.set_attribute('mypref', 'PATH', '/docs');
end;
Where is the location of '/docs' ?
If i want to search the file.html file, it is located in "E:\testdata\file.html" .
How can i set the 'path' attribute?
thanks!!

"/docs" is in the docs sub-directory of your root directory. The directory must be on the server, not on the client machine, Oracle must have read and write privileges to it, and the path notation depends upon your operating system. The demonstration below uses Oracle 10g on Windows XP.
SCOTT@10gXE> begin
2    ctx_ddl.create_preference ('mypref', 'FILE_DATASTORE');
3    ctx_ddl.set_attribute ('mypref', 'PATH', 'D:\testdata');
4 end;
5 /
PL/SQL procedure successfully completed.
SCOTT@10gXE> CREATE TABLE mytab
2    (id    NUMBER,
3      docs VARCHAR2 (30))
4 /
Table created.
SCOTT@10gXE> INSERT INTO mytab     VALUES (1, 'file.html')
2 /
1 row created.
SCOTT@10gXE> CREATE INDEX myind ON mytab (docs)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS ('DATASTORE mypref')
4 /
Index created.
SCOTT@10gXE> SELECT token_text FROM dr$myind$i
2 /
TOKEN_TEXT
FILE
HTML
TESTDATA
TESTING
SCOTT@10gXE> SELECT id FROM mytab
2 WHERE CONTAINS (docs, 'testing') > 0
3 /
        ID
         1
SCOTT@10gXE>

Similar Messages

File_datastore not indexing correctly (oracle version 10.2.0.1.0)

Problem:
(1) Created an index on a VARCHAR2 column that contained text from a given file (single row).
Used direct_datastore to create index.
Selected count(*) from the index and received a count of 109.
(2) Created an index on VARCHAR2 column that contained a file location for a file on the file system which contains the same data as in (1).
Used file_datastore to create index.
Selected count(*) from the index and received a count of 17.
The data that is getting indexed is Arabic.
Index for direct_datastore works as expected when performing a query but the file_datastore does not.
Any ideas why the files are not getting parsed correctly and what I could do to resolve?
Database properties
NLS_LANGUAGE = AMERICAN
NLS_CHARACTERSET = AL32UTF8
NLS_NCHAR_CHARACTERSET = AL16UTF16
Commands used to create indexes:
(1) create index globaldoc_index on globaldoc(text_data)
indextype is ctxsys.CONTENT
parameters ('datastore CTXSYS.DIRECT_DATASTORE,
filter ctxsys.null_filter, lexer world_lexer, languange column language,
charset column characterset')
(2) create index globaldoc_index on globaldoc(filepath)
indextype is ctxsys.CONTENT
parameters ('datastore CTXSYS.FILE_DATASTORE,
filter ctxsys.null_filter, lexer world_lexer, languange column language,
charset column characterset')
Data in table globaldoc:
language = ar
characterset = AL32UTF8

Can you try the the second case with Auto Lexer?
create index globaldoc_index on globaldoc(filepath)
indextype is ctxsys.CONTENT
parameters ('datastore CTXSYS.FILE_DATASTORE,
filter ctxsys.null_filter, lexer auto_lexer, languange column language,
charset column characterset')
Auto lexer should determine the language of the document automatically.
Also, you can create your own lexer specifying the language of the lexer and use the same while creating the index.

FILE_DATASTORE and non-ASCII chars

I have created an interMedia Text index
with the FILE_DATASTORE option, so that
interMedia treats table entries as
filenames and indexes the corresponding
file on the servers's filesystem.
But whenever the filename contains characters
which are not part of the US7ASCII charset (like dv| _), the file is not found. But both Oracle and the operating system support these characters.
The Oracle instance uses UTF8 as internal
characterset. The client which stores
the filenames in the table uses the
WE8ISO8859P1 charset. The values in the
database table are stored and shown correctly
when viewed with Oracle or Java client
programs.
So where does the charset conversion fail ?
The names are stored correctly, they can be
read correctly by clients, but the indexer
seems to use a wrong charset to translate
the filenames stored in the database into
filenames of the operating system.
Do I have to apply some additonal configurations to my indexer ?
Greetings,
Michael Skusa
null

I bump Dr. Chucks thread for a similiar problem with non-ascii chars.
The chars show up but the sorting is a bit off.
Example: A, Å, B, ... Z
Should be: A, B, ... Z, Å, Ä, Ö
In Swedish Å (the letter Aring;) is one of the last letters and should not be placed after A despite being similiar.
Any ideas?

Do I have to name each file using FILE_DATASTORE

Hello,
i wan't to crawl and search our filesystems, do i have name each file and insert it to the database using
FILE_DATASTORE or can I use an asterix?
or is there an other solution
thx

Are you viewing in a browser? Which one? What steps are you taking to print and exactly what happens?

Questions on File_DataStore

Hi, I have some questions about file_datastore.
Suppose My data is stored under this directory:
/export/home/users/research/lai/data
And I run sqlplus on this directory:
/export/home/users/research/lai
I have created a file datastore preference by
begin
ctx_ddl.create_preference('COMMON_DIR', 'FILE_DATASTORE');
ctx_ddl.set_attribute('COMMON_DIR', 'PATH', '/export/home/users/research/lai/data');
end;
The table is created by:
create table filetable(docid number primary key, docs varchar2(2000));
insert into filetable values(200010010020042,'200010010020042.txt');
insert into filetable values(200010200040027,'200010200040027.txt');
insert into filetable values(200010240060092,'200010240060092.txt');
insert into filetable values(200010200020012,'200010200020012.txt');
insert into filetable values(200010200040184,'200010200040184.txt');
insert into filetable values(200010300020234,'200010300020234.txt');
commit;
The index is created by:
create index fileindex on filetable(docs)
indextype is ctxsys.context
parameters ('datastore COMMON_DIR LEXER mylex Wordlist myfuzzy section group nullgroup');
However, when I use this query:
select docid from filetable where contains(docs,')z3x')>0;
where The term ')z3x' is a keyword which must appear in some of the documents.
However 'no rows selected' resulted. Anything wrong??
Thanks

Hi muralis;
Let me start with your last question first on partitioning your drive. Today that is generally consider a bad solution. If you are interested in all of the details I can give them to you later. For now since you admit that you think you are going to come up short for disk space and I agree, I would strongly suggest you do compound this with partitioning.
Now on to making OS X smaller, there is an application called Monolingual that will remove other languages from your system this can save about a GB of space. If you have installed GarageBand it has a bunch of sample tracks that burn up bookoo disk space if you aren't going to use that delete them. Also in /library/printers there are folders for a bunch of printers. If there are any printer you will never own, delete the files for them. That can save several 100 MB of space.
Finally do you really need so much space for Windows? With such a small drive I would consider give Windows less space like 10 or 15 GB instead of 15 to 20 GB. To me it sounds like you are being overly generous with Windows.
Allan

Indexing Problem with FILE_DATASTORE and .pdf files

Hello all,
Do any of you have an example showing how to index .pdf files through FILE_DATASTORE? I am able to successfully index text and .doc files but not a .pdf file. Below is the script that I use to index my files:
create index myindex on mytable(docs)
indextype is ctxsys.context
parameters ('datastore COMMON_DIR filter ctxsys.null_filter');
I am using Oracle 8.1.6
Thanks you!!!
-garrett

I don't think that you are able to index anything else then plain ascii texts, because you are not using the INSO filter.
Use preferences like this:
exec ctx_ddl.drop_preference('NO_PATH');
exec ctx_ddl.create_preference('NO_PATH','FILE_DATASTORE');
exec ctx_ddl.drop_preference('MY_LEXER');
exec ctx_ddl.create_preference('MY_LEXER','BASIC_LEXER');
exec ctx_ddl.set_attribute('MY_LEXER','MIXED_CASE', 'NO');
exec ctx_ddl.set_attribute('MY_LEXER','INDEX_THEMES','NO');
exec ctx_ddl.set_attribute('MY_LEXER','INDEX_TEXT', 'YES');
exec ctx_ddl.drop_Preference ('MY_FILTER');
exec ctx_ddl.create_Preference ('MY_FILTER','INSO_FILTER');
exec ctx_ddl.drop_section_group ('MY_SECTION');
exec ctx_ddl.create_section_group ('MY_SECTION','NULL_SECTION_GROUP');
drop index i_filenames;
create index i_filenames on filenames (filename)
indextype is ctxsys.context
parameters ('datastore NO_PATH
section group MY_SECTION
lexer MY_LEXER
filter MY_FILTER
memory 10M
IMPORTANT is the INSO_FILTER preference.
Thomas

THIS SUCKS!!! Re:Multiple paths in FILE_DATASTORE path attribute

After 6 hours of struggle I managed to solve my problem. Everything was due to a stupid mistake (or incomplete documentation of intermediaText) on Oracle's part:
1. Pasted from Oracle8i interMedia Text Reference - Datastore Objects:
You can specify multiple paths for path, with each path separated by a colon (:). File names are stored in the text column in the text table. If path is not used to specify a path for external files, Oracle requires the path to be included in the file names stored in the text column.
2. Pasted from Oracle intermedia Text Manager online help:
Multiple paths can be specified for
path, with each path separated by a colon (:).
This is wrong at least on Windows based OS's. Fortunately, when everything else failed I tried with a SEMI-COLON (;) and worked. Go figure...

Please ask questions (or post statements) about Oracle Text (formerly interMedia text) in the Oracle text forum. Oracle text experts monitor the Oracle Text forum, you will get a quicker, more expert answer there.

Multiple paths in FILE_DATASTORE path attribute

Hi,
I want to index a column that holds file names spread across multiple directories on HD. The documentation says this can be done by separating paths with a ':'. I tried the following but didn't work (DRG-11513):
BEGIN
ctx_ddl.set_attribute('IE2.IE_LNKS_FILE_DATASTORE', 'PATH', 'C:\dir1:C:\dir2');
END;
Any ideas how to use this feature on Windows?
Any help much appreciated. (CFG: Oracle 8.1.7 on WinSBS)
Andrei.

Please ask questions about Oracle Text (formerly interMedia text) in the Oracle text forum. Oracle text experts monitor the Oracle Text forum, you will get a quicker, more expert answer there.

Does text index gets effected by "line too long" -- File_datastore

I am trying to create text index using following index script:
create index T_SRCH_IND_DF_IDX
on t_search_index(data_filesystem)
indextype is ctxsys.context
parameters ('DATASTORE myDS
lexer lxrprtjoins
filter MY_FILTER
format column fmt
memory 10M');
Index gets created on some files. For other files (xml,pdf, txt) I get :
DRG-11513 unable to open or write to file %s
One thing i noted in these files is that all these files have
"Line too long" when accessed using "vi" editor.
Does Line being too long matters on text index creation?
Also I get multiple "DRG-11513 unable to open or write to file %s"
for same file oracle tries to index in ctx_user_index_errors. Why not
just one mesg is enough? Please advise...
Thanks a lot.
Tahir

Garrett, Thanks for ur response.
I am doing exactly what U advised. I loaded one of the file for which i was getting error mesg in
a table a BFILE. try to read it using DBMS_LOB.READ. Below is the Procedure thatI am using to do all this.
create or replace PROCEDURE ReadBLOB IS
src_lob bfile; ---- modify datatype bfile to BLOB if trying to tread blob
buffer RAW(32767);
retval integer;
amt BINARY_INTEGER := 32767;
pos INTEGER := 2147483647;
BEGIN SELECT docs INTO src_lob FROM del_bfile_table WHERE key = '1';
DBMS_LOB.OPEN(src_lob, DBMS_LOB.LOB_READONLY);
Retval := DBMS_LOB.ISOPEN(src_lob);
DBMS_OUTPUT.PUT_LINE('IS OPEN? = '||retval);
LOOP
dbms_lob.read (src_lob, amt, pos, buffer);
DBMS_OUTPUT.put_line('Cut = '||UTL_RAW.cast_to_varchar2(buffer)||' ...');
DBMS_OUTPUT.put_line('Length = '||to_char(DBMS_LOB.GetLength(src_lob)));
pos := 1;
pos := pos + amt;
DBMS_OUTPUT.put_line('pos = '||pos);
END LOOP;
EXCEPTION WHEN NO_DATA_FOUND THEN dbms_output.put_line('End of data');
END;
I get back 1 for retval. Funny thing is I am not able to get anything print anything
using DBMS_OUTPUT.put_line after dbms_lob.read line in the code. Can't even print
'Hello World'.
Any idea why?
Thanks a lot.
Tahir Dildar.

DroldF.dat not found

Hi,
I try to install the following code
CREATE INDEX IDF_FULL on ECOLORF.INDEXATION(FICHIER)
INDEXTYPE IS ctxsys.context
PARAMETERS
('datastore CTXSYS.FILE_DATASTORE
LANGUAGE COLUMN LANGUE
LEXER mylex
FORMAT COLUMN FORMAT
WORDLIST DEFAULT_WORDLIST
STOPLIST multistop
MEMORY 2M SYNC (EVERY "SYSDATE + 1/24/6")');
and I have the error :
DRG-11446: fichier de base de connaissances fourni C:\oracle\product\10.2.0\db_1\ctx\data\frlx\droldF.dat non installé
+(file droldF.dat not found)+
(Same thing with an Oracle 11g on another server)
When removing 'LEXER mylex' in the create index, it's OK, but I have some difficulties with the french accents.
The only file in the directory xxxxx\db_1\ctx\data\frlx\ is drfr.is
On [this forum|http://kb.dbatoolz.com/tp/2796.drg-11446_droldusdat_not_installed.html] , it is told that the missing files are on companion CD in stage/Components/oracle.cartridges.context.knowbase/10.1.0.2.0/1/DataFiles/data.jar, but only iin the R1 release (not the R2)
But I can't find this companion CD. Where are they in OTN ? How to download old versions ?
Or more simply : who have all these files droldF.dat, droldUS.dat and can furnish them ?
Best regards

Your lexer definition presumably includes INDEX_THEMES = YES.
If you change that it will work. Otherwise you need to download the database examples for 11g, which include the knowledge base.
Sometimes can be tricky to find - if you download the full set of files it should include the examples.

How do I get Oracle Text to index files on a file server?

I am new to Oracle (I'm a MS-SQL DBA looking for a Full-Text Search solution that is better than linking to a MS index server.)
So - Here's the objective:
I have Oracle Server(Express) installed on a Windows server.
I would like for Oracle to build a Full-Text Catalog of the files on a separate file server based on file paths in a table in the database.
(No desire to store terabytes of images and documents inside the database)
I can get Oracle text up and running, using the URL_Datastore:
CREATE TABLE files (id NUMBER PRIMARY KEY, issue_id NUMBER, path VARCHAR(255) UNIQUE, ot_format VARCHAR(6), ot_version VARCHAR(10));
The Compaq server is a remote windows server on my local workgroup, so the fully qualified path is just "compaq" and the URL is valid:
INSERT INTO files VALUES (9,9,'file://Compaq/FTQ/00000003.pdf',NULL,NULL);
INSERT INTO files VALUES (13,13,'file://Compaq/FTQ/01.txt',NULL,NULL);
CREATE INDEX file_index ON files(path) INDEXTYPE IS ctxsys.context
PARAMETERS ('datastore ctxsys.URL_DATASTORE format column ot_format');
but when I enter:
Select * from CTX_User_Index_errors, I see the following errors:
DRG-11609: URL store: unable to open local file specified by file://Compaq/FTQ/00000003.pdf
DRG-11609: URL store: unable to open local file specified by file://Compaq/FTQ/01.txt
Did I miss something?
Do I need to install anything on the file server?
I would like to convince my company that Oracle can be much quicker than Microsoft's Indexing Service because it can avoid joining two large result sets (one result set from Full_text (indexing service) and one for specific data contained in fields in the MS-SQL database.) Full Text Searches commonly take 40 - 60 seconds where there are 1.5 million multi-page PDF files for a particular set that I sample search on. Without this massive join, I believe I can get the search to run in under 10 seconds.

Thank you!
File_Datastore worked fine.
I was staying away from File_Datastore because the information I gathered from googling suggested that file_datastore would only work locally.
Now I just have to get Oracle to pull data out of tables in a MS-SQL database on the local network (don't have a clue yet), and then have it index compiled file paths.
Then MS-SQL can query Oracle with index and full-text criteria and Oracle can send back a result set
It may sound like a bad way of performing Full-Text Queries, but anything will be better than the way things are currently running. We are currently performing Full Text Searches on a table that is rebuilt nightly, so the table containing millions of file paths is not live..
It would be so much better if we just migrated to Oracle, but we currently do not have the resources.

How can i see my formatted document??

Hi,
I have this table
CREATE TABLE ARCHAEOLDB(
ID_DOC NUMBER(5) PRIMARY KEY,
NOME_DOC VARCHAR2(200),
FMT_DOC VARCHAR2(10)
this is an example of my formatted document in my table.It has stored in my file system
INSERT INTO archaeoldb VALUES(1,'RapportiSiciliaMagnaGreciaEtruriaTestimonianzaAraTarquinia.doc','binary');
My datastore is file_datastore, and my filter is inso_filter.
How can i see my formatted documents from php?
I tried to create a table that contained the text of every document,using ctx_doc.Filter, in this way
create table doc_filter
query_id number constraint doc_filter_key primary key,
document clob,
format_doc VARCHAR2(10) default ‘binary’
begin
Ctx_Doc.Set_Key_Type ( Ctx_Doc.Type_Primary_Key );
Ctx_Doc.Filter
index_name => 'archindex' ,
textkey => '2',
restab => 'doc_filter',
query_id => 2,
plaintext => true /* try the effect of FALSE */
end;
but if i runned from php
select format from doc_filter where query_id=2 ;
the result was a not formatted document..
how can i solve my problem??
if there are other solution, it's ok!!
I tried in many ways, but i can't solve this problem!!
thankyou

SORRY!

How can i search and retrieve the first word ...

How can i retrieve the FIRST or the SECOND word in a document stored in a file_datastore ?
Thanks
null

You noticed the first one yourself: the Found item list seems to randomly jump around the document -- I believe you are correct in your observation it may be due to the object construction order. That tells me, by the way, something about your lot numbers that tou didn't mention: your text is not one continuous long threaded story, but it's all or partially in disconnected etxt frames. The Found list does return everything inside a single story in the correct order, but it goes over each separate story in the order they were constructed.
The only solution is to gather all of your lot numbers, *re-sort* them according to the page number they appear on (and some sort of Worst Case Scenario is when you might have more than one lot number frame per page; in that case you also need to sort by textframe, top to bottom -- yet even worse is if you also may have these textframes side by side!).
Only then you have a reliable counting order.
This isn't too bad. We can just extend the method I offered for sorting top-to-bottom/left-to-right in Re: Working around the frame selection order issue in CS 4 and make it also include page numbers:
function byPageYX(a,b) {
    var
        aP = a.parentTextFrames[0].parentPage.index,
        bP = b.parentTextFrames[0].parentPage.index,
        aY = a.baseline[0],
        bY = b.baseline[0],
        aX = a.horizontalOffset[1],
        bX = b.horizontalOffset[1],
        dP = aP-bP,
        dy = aY-bY,
        dx = aX-bX;
    return dP?dP:(dy?dy:dx);
myResults.sort(byPageYX);
Or something like that.
As for actually implementing the above I cannot be of any help with Applescript.
Once we're dealing with sorting I think you're much better off in Javascript anyhow.

How to index a whole directory?

Hi,
what is the best way to automatically index all files of an directory when using 'FILE_DATASTORE' in oracle
intermedia text (oracle text)? Referring to the Oracte Text documentation I have to insert each file(-name)
separately into one table (no wildcard):
begin
ctx_ddl.create_preference('COMMON_DIR', 'FILE_DATASTORE');
ctx_ddl.set_attribute('COMMON_DIR', 'PATH', '/mypath'); <-- no wildcard here for all files possible!
end;
create table mytable (id number primary key, docs varchar2(2000));
insert into <mytable> values (..., '<filename_1>');
insert into <mytable> values (..., '<filename_n>');
Is there a method in PL/SQL that reads out the filenames of a directory's files and inserts them into the
table mentioned above? Any other idea to mimic the wildcard (i.e. all files)?
Many thanks for your replies in advance.
Best Regards,
Dan

Download FileZilla(free) and use FTP to delete the folder and
its contents. Its not possible doing this using Contribute, even
with 'Delete' permissions, you can only delete individual files
that you're allowed to edit. I haven't found any admin-settings
that allow you to delete folders with content (alas).

Bad Query Performance in Oracle Text

Hello everyone, I have the following problem:
I have a table, TABLE_A from now on, a table of more or less 1,000.000 rows, with a CONTEXT index, using FILE_DATASTORE, CTXSYS.DEFAULT_STORAGE, CTXSYS.NULL_FILTER, CTXSYS.BASIC_LEXER and querying the index in the following way:
SELECT /*+FIRST_ROWS*/ A.ID, B.ID2, SCORE(1) FROM TABLE_A A, TABLE_B WHERE A.ID = B.ID AND CONTAINS(A.PATH, '<SOME KW>', 1) > 0 ORDER BY SCORE(1) DESC
where TABLE_B has another 1,000.000 rows.
The problem is that the query response time is much higher after some time of inactivity regarding those tables. How can I avoid this behavior?. The fact is that those inactivity times (not more than 20min) are core to my application, so I always get long long response times for my queries.
Is there any cache or cache time parameter that affects this behavior? I have checked the Oracle Text documentation without finding anything about that...
More data: I am using Oracle 9.2.0.1, but I have tested with the latest patches an the behavior is the same...
Thank you very much in advance.

Pablo,
This appears to be a generic database or OS issue, not a Text specific issue. It really depends on what your application is doing.
If your application is doing some other database activity such as queries or DMLs on other non-text tables, chances are Oracle Text related data blocks are being aged out of cache. You can either increase the db_cache_size init
parmater or try to keep the text tables and index tables blocks in cache using ALTER TABLE commands.
If your app is doing NON-database activity, then chances are your application is taking up much of the machine's physical memory such that OS is swapping ORACLE out of the memory. In which case, you may want to consider to add more memory to the machine or have ORACLE run on a separate machine by itself.

FILE_DATASTORE??

Similar Messages

Maybe you are looking for