Using Oracle Text with MS WORD

Hi,
We have just installed Text and we want to use it for indexing hundreds of MS WORD documents that are in the same directory. But I could not find a document / example about indexing/filtering Word Documents. I will be grateful iy you can help me for finding these..
Thanks..

You could specify INSO_FILTER for FILTER preference
in command for creating index
(see
http://otn.oracle.com/products/text/x/samples/indexing/filters/inso_filter/inso_filter_idx.sql) or
use USER filter.
For example see also
http://otn.oracle.com/products/text/x/samples/indexing/filters/INSO_Filter/index.html
(for loading data you could use any other tools than SQL*Loader)
Regards, Victor.

Similar Messages

  • Using Oracle Text with CLOB field containing multiple languages

    I'm using Oracle 10g (NLS_CHARACTERSET is set to. AL32UTF8) and have a table with a CLOB field which is storing text written in either English and/or Simplified Chinese.
    The following index has been created on this field:
    CREATE INDEX text_index
    ON text_table(text_field)
    INDEXTYPE IS CTXSYS.CONTEXT
    PARAMETERS('FILTER CTXSYS.INSO_FILTER');
    I'm having issues in returning text which matches the Chinese text using the CONTAINS operator. For some reason the following query is returning rows which do not contain any Chinese text:
    SELECT *
    FROM text_table
    WHERE contains(text_field,'炫%') > 1;
    A newsgroup user advised me to produce an explain plan using ctx_query.explain.
    I created 2 explain plans, one which was searching the index for 'A%' and the other searching for the Simplified Chinese character '炫%'. The results for the first test were as expected whereby the values contained within the OBJECT_NAME field all began with the letter 'A'.
    The second test however produced somewhat unexpected results. The OBJECT_NAME field this time contained various words, both English and Simplified Chinese. I could be wrong but it appeared to store every individual word in the CLOB field. Both tests produced different EQUIVALENCE rows, the first test was:
    OPTIONS = Null
    OBJECT_NAME = A%
    Whereas the second test produced:
    OPTIONS = (?)
    OBJECT_NAME = %
    Am I right in thinking the Simplified Chinese character is for some reason being converted to a '?' character?
    Any help on this will be much appreciated.

    As you're not specifying a lexer to use, it will use the BASIC_LEXER, designed for space-separated European-type languages. This won't work effectively with Chinese.
    If you know which documents are Chinese and which are English, you can write this into a LANGUAGE column and use the MULTI_LEXER - this will allow you to specify BASIC_LEXER for the English texts, and CHINESE_LEXER or CHINESE_VGRAM_LEXER for the Chinese texts.
    If you don't know the language, you must use either WORLD_LEXER (10g) or AUTO_LEXER (11g). These lexers will automatically determine the language of the documents and index them appropriately. In general. MULTI_LEXER will be faster and more accurate than either of the automatic alternatives.
    When querying for Chinese characters you need to be very careful with your NLS_LANG settings. You need to make sure that the character set defined in NLS_LANG is the same as the character set from which you've pasted (or typed) the chinese characters.
    The "?" in output usually just means "I don't know how to translate this character into your output character set". Sometimes it may appear as a reversed question mark.

  • Using Oracle Text with Apex

    Can someone point me to some resources on how to integrate Oracle Text and APEX to do searches, highlight results, etc (all the features of Oracle Text)?
    The data to be indexed is in files on the filesystem, so I would like to keep it that way and use the FILE_DATASTORE option for Text.
    Thanks for any pointers.
    Update: Yes, I did see http://www.oracle.com/technology/products/database/application_express/pdf/apex_text_application_v1.6.pdf
    but the search results there just returns the URL/file containing the "hit". It doesn't show the actual text fragment that caused the match, doesn't highlight it, etc. I am looking for a real Google-like search. Hm, having said that, I might as well use Google Desktop! Nah, where's the fun in that?

    This is a very simple application for my own use. It started life in 8i when there were fewer Text options.
    As such, it uses the query string as entered. This returns all of the matches:
    select msgid, msgdate, Box, fromaddr, subject
      from eudora.inbox
    where contains(body, :P703_MailSearch) > 0
    order by msgdate descI display the selected result like this:
    select subject,
      Replace(eudora.mmarkup(:P704_MSGID, :P702_SEARCH), Chr(13), '<BR>') Body
      from eudora.inbox
    where msgid = :P704_MSGIDIn a newer application, I experimented with the CTXCAT grammer.
    That query looks like this:
    select m.ID, m.pdpno, m.shortdesc
      from pdp_mast m
    where contains(m.dphistory, '<query><textquery lang="ENGLISH" grammar="CTXCAT">
                                             ' || :P1_Text || '
                                         </textquery>
                                      <score datatype="INTEGER"/>
                                  </query>') > 0     
        or contains(m.shortdesc, '<query><textquery lang="ENGLISH" grammar="CTXCAT">
                                             ' || :P1_Text || '
                                         </textquery>
                                      <score datatype="INTEGER"/>
                                  </query>') > 0As always, once you figure out the syntax, its easy to make it work in Apex.
    Text indexes are very fast. On my old 600MHz PC, searches in 250MB of text take less than a second.

  • Using Oracle Text for searching with UCM 10g

    I am using Oracle text with UCM 10gR3 and Site Studio 10gR4 and I am trying to sort the search results by relevancy and to also include a snippet of the retrieved document. I have the fields that the SS_GET_SEARCH_RESULTS service returns but the relevancy score is always equals 5 and the snippet contains characters such as &lt; idcnull, /p, etc., which you can see are XML/HTML/UCM tags but which result sin even more strangeness in the snippet if I try to remove them programmatically.
    I have read the Oracle Text documentation and there appear to be ways you can configure Oracle Text but I am not clear at all on what I can do from UCM. It looks like the configuration is either done in database tables or in the query itself, neither of which are readily configurable to me.
    Is anyone experienced in this or know of any documentation this might help?
    Bill

    Hi
    If I remember correctly then this issue was seen with an older version of OTS component and Core Update patch / bundle . Upgrade the UCM instance with the latest CS10gr35 update bundle patchset 6907073 and also upgrade OTS component from the same patchset .
    Let me know how it goes after this .
    Thanks
    Srinath

  • Document management system using oracle text

    i plan to create document management system using oracle text with following features
    1) document comparision
    2) document search
    and more...
    can oracle text be used to display documents of various formats by converting them to HTML. and can search keywords be highlighted in the document.
    please help!

    Have you ever considered doing this in Oracle Application Express (free on top of the Oracle database)? How about something like:
    http://download-west.oracle.com/docs/cd/B31036_01/doc/appdev.22/b28839/up_dn_files.htm
    Index the files using the CONTEXT index, and perhaps the docs' meta with it using the Oracle Text MULTI_COLUMN_DATASTORE, and then when you write your query for a report on the documents include a search string.
    I've created a number of APEX-based document management systems and it is quite easy once you get the hang of using this environment. I suggest looking at some of the tutorials/how-to documents and you'll be on your way quickly.
    Start with the upload application. Once you can get your documents in, create a report that shows everything except the document. Verify all of this works correctly.
    Add some "items" to the page for the report, and include them as bind variables in the where clause.
    After that, add your Oracle Text index to the database, and toss in a "text-field" item to the APEX page. Modify your report query, adding the CONTAINS clause, and use the newly created item as a bind variable. There's your keyword search.
    Linking to Oracle Apps is done through API's and may be over database links.
    Hope it helps. Though not a step-by-step how to document, this should point you in the right direction. Get familiar with APEX as that covers most of what you described.
    -Ron

  • Oracle Text with Oracle TimesTen

    Hi!
    I'm trying to use Oracle Text with Oracle TimesTen In-Memory. In this customer, we are using Oracle Text to index the names of the company clients. There are about 13 million names to index. We're trying to speed up even more the search using Oracle TimesTen.
    Does anybody as any experience using simultanely this two technologies?
    Thanks in advance
    Tiago Soares

    TimesTen doesn't support the CONTEXT indextype or CONTAINS clause (or other domain indexes/operators), so you can't create Oracle Text indexes in it.

  • Using Oracle Text to search through WORD, EXCEL and PDF documents

    Hello again,
    What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
    Thanks
    Doug

    Yes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
    Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
    http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
    begin example.
    -- The following needs to be executed
    -- as sys.
    DROP DIRECTORY docs_dir;
    CREATE OR REPLACE DIRECTORY docs_dir
    AS 'C:\sql\oracle_text\documents';
    GRANT READ ON DIRECTORY docs_dir TO text;
    -- End sys ran SQL
    DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
    CREATE TABLE db_docs (
    id NUMBER,
    format VARCHAR2(10),
    location VARCHAR2(50),
    document BLOB,
    CONSTRAINT i_db_docs_p PRIMARY KEY(id)
    -- Several notes need to be made about this anonymous block.
    -- First the 'DOCS_DIR' parameter is a directory object name.
    -- This directory object name must be in upper case.
    DECLARE
    f_lob BFILE;
    b_lob BLOB;
    document_name VARCHAR2(50);
    BEGIN
    document_name := 'externaltables.doc';
    INSERT INTO db_docs
    VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
    RETURN document INTO b_lob;
    f_lob := BFILENAME('DOCS_DIR', document_name);
    DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
    DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
    DBMS_LOB.FILECLOSE(f_lob);
    COMMIT;
    END;
    -- build the index
    -- Note that this index differs than the file system stored file
    -- in that paramter datastore is ctxsys.defautl_datastore and not
    -- ctxsys.file_datastore. FILE_DATASTORE is for documents that
    -- exist on the file system. DEFAULT_DATASTORE is for documents
    -- that are stored in the column.
    create index db_docs_ctx on db_docs(document)
    indextype is ctxsys.context
    parameters (
    'datastore ctxsys.default_datastore
    filter ctxsys.inso_filter
    format column format');
    --search for something that is known to not be in the document.
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
    --search for something that is known to be in the document.  
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Albright', 1) > 0;

  • Problem with blob column index created using Oracle Text.

    Hi,
    I'm running Oracle Database 10g 10.2.0.1.0 standard edition one, on windows server 2003 R2 x64.
    I have a table with a blob column which contains pdf document.
    Then, I create an index using the following script so that I can do fulltext search using Oracle Text.
    CREATE INDEX DMCS.T_DMCS_FILE_DF_FILE_IDX ON DMCS.T_DMCS_FILE
    (DF_FILE)
    INDEXTYPE IS CTXSYS.CONTEXT
    PARAMETERS('DATASTORE CTXSYS.DEFAULT_DATASTORE');
    However, the index is not searchable and I check the following tables created by database for my index and found them to be empty as well !!
    DR$T_DMCS_FILE_DF_FILE_IDX$I
    DR$T_DMCS_FILE_DF_FILE_IDX$K
    DR$T_DMCS_FILE_DF_FILE_IDX$N
    DR$T_DMCS_FILE_DF_FILE_IDX$R
    I wonder what's wrong with it.
    My user has been granted the ctx_app role and I have other tables that store plain text which I use Oracle Text are fine. I even output the blob column and save as pdf file and they are fine.
    However the database seems like not indexing my blob column although the index can be created without error.
    Please advise.
    Really appreciate anyone who can help.
    Thank you.

    The situation is I have already loaded a few pdf document into the table's blob column.
    After I create the Oracle text index on this blob column, I find the system generated index tables listed in my earlier posting are empty, except for the 4th table.
    Normally we'll see words inside the table where those are the words indexed by oracle text on my document.
    As a result, no matter how i search for the index using select statement with contains operator, it will not give me any result.
    I feel weird why the blob is not indexed. The content of the blob are actually valid because I tested this by export the content back to pdf and I can still view and search within the pdf.
    Regards,
    Jap.

  • Parsing the word file using oracle text having tables within it............

    Hi,
    I was going through this document.Actually I am going to implement something like full text search functionality in our system.
    We get the info as .doc file.
    Earlier what we used to do is, we used to parse the file and store it into the database and then searched using PL/SQL.
    But what I understand from this article that this can be done using oracle text also.
    One concern is that whether the oracle text is able to parse the .doc file having tables embedded within it.
    Please let me know about this.(Whether oracle text will be able to parse the files having tables embedded within it).
    I am attaching an example file for this.
    Please let me know about this as early as possible.

    Yes Oracle Text have this capability. Use AUTO_FILTER or USER_FILTER to create index

  • NEAR operator alternative when not using. oracle Text ?

    hi,
    I'm working on a project where i would need a Oracle Text 'NEAR like' operator ...
    here is my scenario ...
    in db we have Customers ... and every customer has some criterias like different search words( names, towns,cars,etc...) so for every customer i can create an SQL query out of criterias . ....
    now .... we can have a criteria like. ...... WHERE fulltext like 'john%'. or even distance search line NEAR inside CONTAINS. ... but then the Oracle text index is needed .....
    the only tAble on which Text index is created is our storage table that holds more then 4mil records and growing...
    my question is ... is there any way to have a query that would do the same thing as NEAR but without Text index ?
    here is how I start ....
    I get full newspaper article text from our OCR library ......
    then i need to check customer's criterias against this text to see which article is for which customer and then bind the article to the customer
    I could do it without Oracle using RegEx , but criterias can get really complicated ... like customer wants only specific MEDIA, or specific category , type , only articles that are from medias that are from specific country etc ... and many more different criterias ... and all this can be wrapped inside brackets with ANDs, ORs, NOT. ....
    So the only way to do it is to put it in Oracle and execute the correct query and let Oracle decide if the result is true or false .... but due to NEAR operator I need Oracle text ...
    So if I decide to first insert article into our storage table which has Oracle text index to be able to do the correct search .... how fast will this be ????
    will the the search become slower when there are 6mil records ? I know I can use FILTER BY to help Text index to do a better and quicker seach ... and how to optimize index ....but still
    I'm always asking my self..... why insert the article in a table where there are already 6mil articles and execute query when I only need to check data on one single article and. i already know this article ...
    I see two solutions :
    - if there is alternative for NEAR without using Oracle text index then i would insert data into temporary table and execute query on this table..... table would always contain only this one article. maybe one option would be to have one 'temp' table with Oracle text index in which i insert this one article and with help of Oracle text based on this one article do the search , and then maybe on a daily basis clear index ..... or when the article is removed from the table ... but this would mean having two Orcle text indexes, cause we already have Oracle text index on our storage table anyway....
    - another is to use Oracle text index and insert it into our storage table and hope for the best quick results ....
    Maybe I'm exaggerating and query like WHERE id=1234 and CONTAINS(...). will execute faster then I think
    If anyone would have any other suggestion I will be happy to try it ..
    thanks,
    Kris

    Hi,
    this is to my knowledge not possible. It is hard for Oracle to do, think about a table with many rows, every row with that column must be checked. So I think only a single varchar2 is possible. Maybe for you will a function work. It is possible to give a function as second parameter.
    function return_signup
    return varchar2
    is
      l_signup_name signup.signup_name%type;
    begin
      select signup_name
      into l_signup_name
      from signup
      where signup_id = 1
      and rownum = 1
      return l_signup_name;
    exception
      when no_data_found
      then
        l_signup_name := 'abracadabra'; -- hope does not exist
        return l_signup_name;
    end;Now you can use above function in the contains.
    select * from user_history_view users --, signup new_user
    --where new_user.signup_id = 1
    where contains(users.user_name, return_signup)>0;I didn't test the code! Maybe you have to adjust the function for your needs. But it is a idea how this can be done.
    Otherwise you must make the check by normaly check the columns by simple using a join:
    select * from user_history_view users, signup new_user
    where new_user.signup_id = 1
    and users.user_name = new_user.signup_name;Herald ten Dam
    htendam.wordpress.com

  • Change tracking using Oracle text?

    I'm working on a database that does a lot of text handling. My customer is currently using MS Word and they use the track changes feature extensively and would like to keep that function. Has anyone ever used Oracle text to show changes from one paragraph/sentence to another. Can this be done with Oracle Text?
    Any help is appreciated.
    Tks, Rob

    No. Oracle Text does not provide this type of capability.
    - Roger Ford

  • Using oracle text

    I have some problem when trying a query text application using Oracle Text, as fallow.
    My database is 8.1.7, I have user 'DEMO' having DBA privilege and granted roles: RESOURCE, CONNECT, CTXAPP already.
    I connect with DEMO and create a table 'QUICK' as instructed:
    create table quick
    quick_id number
    constraint quick_pk primary key,
    text varchar(80)
    Now, I create the index:
    create index quick_text on quick ( text )
    indextype is ctxsys.context;
    But I receive the following errors and messages:
    ERROR at line 1:
    ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
    ORA-20000: interMedia Text error:
    DRG-50100: CORE LSF error: 4294967280
    ORA-06512: at "CTXSYS.DRUE", line 126
    ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 78
    ORA-06512: at line 1
    Please tell me why and what I have to do now.
    Thanks.
    VuToi

    Looks like a bad install to me. If this is purely a test database,
    I'd suggest reinstalling the whole thing from scratch. If it's not,
    then you need to contact Oracle Support to get this resolved.

  • Using oracle text on a non-materialized view

    I'm having trouble tracking down an error when using oracle text on a non-materialized view (indexes are on the referenced columns). My database has a users table and a user history table which saves the old values when a user profile changes. My view performs a "union all" so I can select from both at once.
    I would like to use oracle text to perform a "contains" on the view whenever someone signs up to see if any current users or historical entries contain the desired username.
    The following works fine:
    contains(user_history_view, 'bill')but when I reference anything in the contains clause, i get a "column is not indexed" error:
    contains(user_history_view, signup.user_name) --signup.username is 'bill'Here is a stripped-down demonstration (I am using version 10.2.0.4.0)
    create table signup (
      signup_id   number(19,0) not null,
      signup_name varchar2(255),
      primary key (signup_id)
    create table users (
      user_id   number(19,0) not null,
      user_name varchar2(255),
      primary key (user_id)
    create table user_history (
      history_id number(19,0) not null,
      user_id    number(19,0) not null,
      user_name  varchar2(255),
      primary key (history_id),
      foreign key (user_id) references users on delete set null
    create index user_name_index on users(user_name)
    indextype is ctxsys.context parameters ('sync (on commit)');
    create index user_hist_name_index on user_history(user_name)
    indextype is ctxsys.context parameters ('sync (on commit)');
    create index signup_name_index on signup(signup_name)
    indextype is ctxsys.context parameters ('sync (on commit)');
    create or replace force view user_history_view
    (user_id, user_name, flag_history) as
    select user_id, user_name, 'N' from users
    union all
    select user_id, user_name, 'Y' from user_history;
    --user bill changed his name to bob, and there is a pending signup for another bill
    insert into users(user_id, user_name) values (1, 'bob');
    insert into user_history(history_id, user_id, user_name) values (1, 1, 'bill');
    insert into signup(signup_id, signup_name) values(1, 'bill');
    commit;
    --works
    select * from user_history_view users, signup new_user
    where new_user.signup_id = 1
    and contains(users.user_name, 'bill')>0;
    --fails
    select * from user_history_view users, signup new_user
    where new_user.signup_id = 1
    and contains(users.user_name, new_user.signup_name)>0;I could move everything into a materialized view, but querying against real-time data like this would be ideal. Any help would be greatly appreciated.

    Hi,
    this is to my knowledge not possible. It is hard for Oracle to do, think about a table with many rows, every row with that column must be checked. So I think only a single varchar2 is possible. Maybe for you will a function work. It is possible to give a function as second parameter.
    function return_signup
    return varchar2
    is
      l_signup_name signup.signup_name%type;
    begin
      select signup_name
      into l_signup_name
      from signup
      where signup_id = 1
      and rownum = 1
      return l_signup_name;
    exception
      when no_data_found
      then
        l_signup_name := 'abracadabra'; -- hope does not exist
        return l_signup_name;
    end;Now you can use above function in the contains.
    select * from user_history_view users --, signup new_user
    --where new_user.signup_id = 1
    where contains(users.user_name, return_signup)>0;I didn't test the code! Maybe you have to adjust the function for your needs. But it is a idea how this can be done.
    Otherwise you must make the check by normaly check the columns by simple using a join:
    select * from user_history_view users, signup new_user
    where new_user.signup_id = 1
    and users.user_name = new_user.signup_name;Herald ten Dam
    htendam.wordpress.com

  • Using Oracle Text to Data Mine

    Can someone provide me with an idea of how to Data Mine with just using Oracle Text and not the data mining option. I need to search a column of customer complaints and then put it in a category based on that. It would be best if the categories were auto generated. It has to be done in PL/SQL.
    Thanks,

    You cannot have the categories created automatically without data mining. However, if you are willing to create the categories and queries that determine them, then you can do it with just Oracle Text. I posted an example on the 2nd page of the following thread:
    Re: New to Oracle Text search

  • Oracle Text with Numbers

    Hello,
    I need to search in a number column for particular "subnumbers". For
    example I have a column with 3453454 in it an I like to search e.g for the
    number "53" in it. I know I could use
    select * from table where number_column like '%53%'
    but since the table is rather big I'd like to use Oracle Text for it to avoid a full table scan and query like
    select * from table where contains(number_column, '53') > 0
    but above query would return NULL after converting the number column
    to a varchar2 column! Only full numbers are indexed and therefore only
    a search on the full number 3453454 would yield a result. What are my
    options to make above query with "contains" clause work?
    Thanks in advance

    You can configure Text to do substring searches.
    Do this:
    ctx_ddl.create_preference( 'SUBSTR_SUPPORT_PREF', 'basic_wordlist' );
    ctx_ddl.set_attribute( 'SUBSTR_SUPPORT_PREF', 'SUBSTRING_INDEX', 'YES' );
    Then you can do something like:
    where contains(col,'%53%',1) > 0
    Tom Best

Maybe you are looking for

  • How to know if a file browser item is not null using javascript fucntion

    Hi <br><br> I tried to use javascript for validation. I have a file browser item named P55_FILE_NAME and I would like to know if this item is null or not before submit. <br> I wrote this function :<br><br> function validate_import(f_n){<br>      if (

  • CSS Layout Issue

    Using CSS I am attempting my first tableless page layout and I am bumping up against a problem I cannot figure out. I am posting this hoping that kind soul out there will take a drive by and tell me what I'm doing wrong. Testing with Firefox and IE6&

  • How can I obtain a reference to a global variable?

    I wish to change the contents of the columns in a global multicolumn listbox. As far as I can see, the only way to access the columns in a multicolumn listbox is to create a reference to it and access the ItemNames property. But how can I create a re

  • M2 815

    Hi Expert, I have a problem with my new company. I install in SAP new company for China with CNY as Currency. When i create a order (VA01), after save, the system send me a particular error message "M2 815". This is the message: Update was terminated

  • Error in Oracle 10 Procedure - converted from SQL Server

    I have converted the folowing script from SQL Server proc to Oracle Proc; it compiles with a warning "30 Hint: Comparision with NULL in 'sp_MyEmployee'" When this proc is executed; it generates an error "ORA-00900: invalid SQL statement" Please help!