Justification for Using Oracle Text

Hello,
Can someone give me good cause (justification) for utilizing Oracle Text over other tools out there that are not tied directly to Oracle?
Apparently it is possible to identify metadata within text and do keyfield and keyword searches this way with other tools, but I question the accuracy, speed, or value in terms of data relationships with this approach. I feel the relationships belong in the database along with the indexes but can't convince anyone of this.
Has anyone experience working with Oracle Text where relationships help to drive the search and can give me good cause to this approach?
thanks

Hi,
Justification depends on your use. For starters:
1) It is included in both standard and enterprise editions of the db at no added charge
2) Uses SQL to query and maintain
3) Includes a number of built-ins for maintenance and optimization
4) It has 4 different index types for various uses
5) It can index any data type
6) UltraSearch is included in both standard and enterprise editions of the db at no additional charge (this is a crawler built on Oracle Text).
As for the integration - it is optimized for Oracle. If you were to build a standalone indexing solution you would probably design it a bit different, but Oracle Text takes into account the optimizer and database structure.
It has other features (same as some of the other tools) like a knowledge base, classification, clustering, theme extraction, language-specific features, ability to index documents in and out of the database, stopwords, stemming, wildcard, progressive relaxation, and the list goes on.
I guess my question would be, what is the reason for NOT using it? That might give me a better line on the reasoning so that I can respond with something a bit more specific.
Thanks,
Ron

Similar Messages

  • Justification for using Oracle Text vs Other Search Tools

    Hello,
    Can someone give me good cause (justification) for utilizing Oracle Text over other tools out there that are not tied directly to Oracle?
    Apparently it is possible to identify metadata within text and do keyfield and keyword searches this way with other tools, but I question the accuracy, speed, or value in terms of data relationships with this approach. I feel the relationships belong in the database along with the indexes but can't convince anyone of this.
    Has anyone experience working with Oracle Text where relationships help to drive the search and can give me good cause to this approach?
    thanks

    Tools like Autonomy have fantastic searching functions, far in excess of what we get with Text. What they aren't very good at is the database side of things. So if your prime concern is to be able to search free text documents on your file system using a baroque range of filters then Text is not the right choice. But if you want to join the outcomes of free text searches with structured data in database tables or if the text you want to search is stored in database tables then using Oracle's built-in functionality is the better approach.
    From your post I'm not sure which scenario fits your situation more closely.
    Cheers, APC

  • Using Oracle Text for searching with UCM 10g

    I am using Oracle text with UCM 10gR3 and Site Studio 10gR4 and I am trying to sort the search results by relevancy and to also include a snippet of the retrieved document. I have the fields that the SS_GET_SEARCH_RESULTS service returns but the relevancy score is always equals 5 and the snippet contains characters such as < idcnull, /p, etc., which you can see are XML/HTML/UCM tags but which result sin even more strangeness in the snippet if I try to remove them programmatically.
    I have read the Oracle Text documentation and there appear to be ways you can configure Oracle Text but I am not clear at all on what I can do from UCM. It looks like the configuration is either done in database tables or in the query itself, neither of which are readily configurable to me.
    Is anyone experienced in this or know of any documentation this might help?
    Bill

    Hi
    If I remember correctly then this issue was seen with an older version of OTS component and Core Update patch / bundle . Upgrade the UCM instance with the latest CS10gr35 update bundle patchset 6907073 and also upgrade OTS component from the same patchset .
    Let me know how it goes after this .
    Thanks
    Srinath

  • Using Oracle Text in Oracle XML DB .

    Hi all ,
    The idea is simple ,i need to index all stored files in Oracle XML DB and the index should stay in Oracle DB . Using some 3 party index software is also possible but you need to write a mapping to move the index file in Oracle DB .
    So i thought of using Oracle Text but i am not sure about how to retrieve such a document from Oracle XML DB , let me say over ftp or http ? . And if these documents are password protected -> how can Oracle Text allow this ?

    [11gR2 XMLDB Developers Guide -- Full-Text Search over XML Data|http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10492/xdb09sea.htm#i1006756] would be the first place to start.
    For document display, there a bunch of potential solutions, you can look at [XML DB Repository|http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10492/xdb03usg.htm#insertedID18], or the Text Application Developers Guide [Presenting Documents in Oracle Text|http://download.oracle.com/docs/cd/B28359_01/text.111/b28303/view.htm#i1006687] .
    Password protected documents can't be indexed using the auto_filter.

  • Oracle Text: How to add/get stopwords list when using Oracle Text world lexer?

    I have a use case that we currently use Oracle Text World Lexer to index and search multilingual documents. As we know that World Lexer does the language auto detection. I would like to know the following questions:
    Is there anyway I can get the current document's language that Oracle Text detected?
    Is there anyway to get the language's stopwords list?
    Any thoughts and points will be appreciated.
    - Charles

    1. If you're using 12c, you can use ctx_doc.policy_languages. (https://docs.oracle.com/database/121/CCREF/cdocpkg011.htm#CCREF24102)
    2. If you want multiple stoplists based on each document's language, you have to use the multi-lexer. For world_lexer, there is one stoplist; since the stoplists are somewhat dynamic (you can add but not remove them), the most accurate way to fetch the list is using ctx_report.describe_index or ctx_report.create_index_script and parse the report.

  • Oracle iRecruitment: Keyword Search within Resumes using Oracle Text

    Dear All,
    As per my understanding (and Note: 247064.1) simple Keyword searches can be performed in iRecruitment if oracle Text is installed. However searching for Keywords within resumes is not possible using Oracle Text and is possible ONLY if Resume Parsing is enabled via a third party (non-oracle) service provider.
    Can you please let me know if my understanding is correct and if not provide further inputs on this.
    Thanks,
    Subrat

    Got this confirmation from Oracle via SR:
    Resume searching is independent of resume parsing and not required to search resumes.
    Oracle Text is the text engine that allows you to search documents using content-based queries. Oracle Text allows you to upload documents, search documents, parse resumes, etc.
    Hence to conclude - Installation of Oracle Text will allow Keyword Searches on resumes.
    Thanks,
    Subrat

  • Using Oracle Text to search through WORD, EXCEL and PDF documents

    Hello again,
    What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
    Thanks
    Doug

    Yes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
    Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
    http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
    begin example.
    -- The following needs to be executed
    -- as sys.
    DROP DIRECTORY docs_dir;
    CREATE OR REPLACE DIRECTORY docs_dir
    AS 'C:\sql\oracle_text\documents';
    GRANT READ ON DIRECTORY docs_dir TO text;
    -- End sys ran SQL
    DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
    CREATE TABLE db_docs (
    id NUMBER,
    format VARCHAR2(10),
    location VARCHAR2(50),
    document BLOB,
    CONSTRAINT i_db_docs_p PRIMARY KEY(id)
    -- Several notes need to be made about this anonymous block.
    -- First the 'DOCS_DIR' parameter is a directory object name.
    -- This directory object name must be in upper case.
    DECLARE
    f_lob BFILE;
    b_lob BLOB;
    document_name VARCHAR2(50);
    BEGIN
    document_name := 'externaltables.doc';
    INSERT INTO db_docs
    VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
    RETURN document INTO b_lob;
    f_lob := BFILENAME('DOCS_DIR', document_name);
    DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
    DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
    DBMS_LOB.FILECLOSE(f_lob);
    COMMIT;
    END;
    -- build the index
    -- Note that this index differs than the file system stored file
    -- in that paramter datastore is ctxsys.defautl_datastore and not
    -- ctxsys.file_datastore. FILE_DATASTORE is for documents that
    -- exist on the file system. DEFAULT_DATASTORE is for documents
    -- that are stored in the column.
    create index db_docs_ctx on db_docs(document)
    indextype is ctxsys.context
    parameters (
    'datastore ctxsys.default_datastore
    filter ctxsys.inso_filter
    format column format');
    --search for something that is known to not be in the document.
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
    --search for something that is known to be in the document.  
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Albright', 1) > 0;

  • Using oracle text on a non-materialized view

    I'm having trouble tracking down an error when using oracle text on a non-materialized view (indexes are on the referenced columns). My database has a users table and a user history table which saves the old values when a user profile changes. My view performs a "union all" so I can select from both at once.
    I would like to use oracle text to perform a "contains" on the view whenever someone signs up to see if any current users or historical entries contain the desired username.
    The following works fine:
    contains(user_history_view, 'bill')but when I reference anything in the contains clause, i get a "column is not indexed" error:
    contains(user_history_view, signup.user_name) --signup.username is 'bill'Here is a stripped-down demonstration (I am using version 10.2.0.4.0)
    create table signup (
      signup_id   number(19,0) not null,
      signup_name varchar2(255),
      primary key (signup_id)
    create table users (
      user_id   number(19,0) not null,
      user_name varchar2(255),
      primary key (user_id)
    create table user_history (
      history_id number(19,0) not null,
      user_id    number(19,0) not null,
      user_name  varchar2(255),
      primary key (history_id),
      foreign key (user_id) references users on delete set null
    create index user_name_index on users(user_name)
    indextype is ctxsys.context parameters ('sync (on commit)');
    create index user_hist_name_index on user_history(user_name)
    indextype is ctxsys.context parameters ('sync (on commit)');
    create index signup_name_index on signup(signup_name)
    indextype is ctxsys.context parameters ('sync (on commit)');
    create or replace force view user_history_view
    (user_id, user_name, flag_history) as
    select user_id, user_name, 'N' from users
    union all
    select user_id, user_name, 'Y' from user_history;
    --user bill changed his name to bob, and there is a pending signup for another bill
    insert into users(user_id, user_name) values (1, 'bob');
    insert into user_history(history_id, user_id, user_name) values (1, 1, 'bill');
    insert into signup(signup_id, signup_name) values(1, 'bill');
    commit;
    --works
    select * from user_history_view users, signup new_user
    where new_user.signup_id = 1
    and contains(users.user_name, 'bill')>0;
    --fails
    select * from user_history_view users, signup new_user
    where new_user.signup_id = 1
    and contains(users.user_name, new_user.signup_name)>0;I could move everything into a materialized view, but querying against real-time data like this would be ideal. Any help would be greatly appreciated.

    Hi,
    this is to my knowledge not possible. It is hard for Oracle to do, think about a table with many rows, every row with that column must be checked. So I think only a single varchar2 is possible. Maybe for you will a function work. It is possible to give a function as second parameter.
    function return_signup
    return varchar2
    is
      l_signup_name signup.signup_name%type;
    begin
      select signup_name
      into l_signup_name
      from signup
      where signup_id = 1
      and rownum = 1
      return l_signup_name;
    exception
      when no_data_found
      then
        l_signup_name := 'abracadabra'; -- hope does not exist
        return l_signup_name;
    end;Now you can use above function in the contains.
    select * from user_history_view users --, signup new_user
    --where new_user.signup_id = 1
    where contains(users.user_name, return_signup)>0;I didn't test the code! Maybe you have to adjust the function for your needs. But it is a idea how this can be done.
    Otherwise you must make the check by normaly check the columns by simple using a join:
    select * from user_history_view users, signup new_user
    where new_user.signup_id = 1
    and users.user_name = new_user.signup_name;Herald ten Dam
    htendam.wordpress.com

  • APEX app using Oracle Text  to index pages that require authorzation

    Hi Gurus and APEX Dev team
    My team need to develop an APEX App that will index all our documents spread across various servers. Some of the documents require Single sign on access (e.g. KIX.oraclecorp.com) and some require other authorization methods (e.g. Metalink) . The Question is , Is it possible to index the pages that require authorization using Oracle text. If yes How? I have implemented the demo app which can index pages that do not require authorization.
    Thanks a million
    regards
    Bala

    Hello,
    Unless I misunderstand you, the fact that the pages require authentication doesn't really matter, it is the underlying data you want to index correct? If so then you would index them in exactly the same way that you would index any table data using Oracle Text/interMedia.
    John.
    Blog: http://jes.blogs.shellprompt.net
    Work: http://www.apex-evangelists.com
    Author of Pro Application Express: http://tinyurl.com/3gu7cd
    REWARDS: Please remember to mark helpful or correct posts on the forum, not just for my answers but for everyone!

  • Document management system using oracle text

    i plan to create document management system using oracle text with following features
    1) document comparision
    2) document search
    and more...
    can oracle text be used to display documents of various formats by converting them to HTML. and can search keywords be highlighted in the document.
    please help!

    Have you ever considered doing this in Oracle Application Express (free on top of the Oracle database)? How about something like:
    http://download-west.oracle.com/docs/cd/B31036_01/doc/appdev.22/b28839/up_dn_files.htm
    Index the files using the CONTEXT index, and perhaps the docs' meta with it using the Oracle Text MULTI_COLUMN_DATASTORE, and then when you write your query for a report on the documents include a search string.
    I've created a number of APEX-based document management systems and it is quite easy once you get the hang of using this environment. I suggest looking at some of the tutorials/how-to documents and you'll be on your way quickly.
    Start with the upload application. Once you can get your documents in, create a report that shows everything except the document. Verify all of this works correctly.
    Add some "items" to the page for the report, and include them as bind variables in the where clause.
    After that, add your Oracle Text index to the database, and toss in a "text-field" item to the APEX page. Modify your report query, adding the CONTAINS clause, and use the newly created item as a bind variable. There's your keyword search.
    Linking to Oracle Apps is done through API's and may be over database links.
    Hope it helps. Though not a step-by-step how to document, this should point you in the right direction. Get familiar with APEX as that covers most of what you described.
    -Ron

  • Problem for using oracle xml parser v2 for 8.1.7

    My first posting was messed up. This is re-posting the same question.
    Problem for using oracle xml parser v2 for 8.1.7
    I have a sylesheet with
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">.
    It works fine if I refer this xsl file in xml file as follows:
    <?xml-stylesheet type="text/xsl" href="http://...../GN.xsl"?>.
    When I use this xsl in pl/sql package, I got
    ORA-20100: Error occurred while processing: XSL-1009: Attribute 'xsl:version' not found in 'xsl:stylesheet'.
    After I changed name space definition to
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> in xsl file, I got
    ORA-20100: Error occurred while processing: XSL-1019: Expected ']' instead of '$'.
    I am using xml parser v2 for 8.1.7
    Can anyone explain why it happens? What is the solution?
    Yi

    <BLOCKQUOTE><font size="1" face="Verdana, Arial">quote:</font><HR>Originally posted by Steven Muench ([email protected]):
    Element's dont have text content, they [b]contain text node children.
    So instead of trying to setNodeValue() on the element, construct a Text node and use the appendChild method on the element to append the text node as a child of the element.<HR></BLOCKQUOTE>
    Steve,
    We are also creating an XML DOM from java and are having trouble getting the tags created as we want. When we use XMLText it creates the tag as <tagName/>value rather than <tagName>value</tagName>. We want separate open and close tags. Any ideas?
    Lori

  • Searching using Oracle Text instead of LIKE '%'

    Hello all,
    I hope you help me in this:
    I have a table looks like this
    create table subscribers (
    id numer(10),
    first_name varchar2(30),
    father_name varchar2(30),
    grandfather_name varchar2(30),
    last_name varchar2(30))
    The application is built using Oracle Forms. Many times, the end users are not so sure of the spelling of the name, therefore they use the "%" wildcard with name fields. This will be reflected to the queries the application will send them to the Oracle Server.
    We have the following queries
    1) select *
    from subscribers
    where last_name like '%family_name%';
    2) select *
    from subscribers
    where last_name like 'family_name%';
    3) select *
    from subscribers
    where last_name like '%family_name%' and first_name like '%first_name%';
    4) select *
    from subscribers
    where last_name like 'family_name%' and first_name like 'first_name%';
    As well as searching on the father_name and grandfather_name fields. But most of the search are on the first_name and the last_name.
    These queries are killing the server since we have millions of records. BTree indexes will not help here because of the LIKE and the "%"
    I am thinking to use Oracle Text here, but I am not sure whether I have to go for a CONTEXT index on each individual column, or I can use the MULTI_COLUMN_DATASTORE indexing.
    Any idea will be appreciated

    The ctxcat index and catsearch operator are generally intended for usage with one text column and one or more columns of structured data. You would have to pick just one of your columns as the text column and the others as structured columns. I would be more inclined to use the multi_column_datastore with a context index and contains operator, so that you can search all of your columns as text columns.

  • Problem with blob column index created using Oracle Text.

    Hi,
    I'm running Oracle Database 10g 10.2.0.1.0 standard edition one, on windows server 2003 R2 x64.
    I have a table with a blob column which contains pdf document.
    Then, I create an index using the following script so that I can do fulltext search using Oracle Text.
    CREATE INDEX DMCS.T_DMCS_FILE_DF_FILE_IDX ON DMCS.T_DMCS_FILE
    (DF_FILE)
    INDEXTYPE IS CTXSYS.CONTEXT
    PARAMETERS('DATASTORE CTXSYS.DEFAULT_DATASTORE');
    However, the index is not searchable and I check the following tables created by database for my index and found them to be empty as well !!
    DR$T_DMCS_FILE_DF_FILE_IDX$I
    DR$T_DMCS_FILE_DF_FILE_IDX$K
    DR$T_DMCS_FILE_DF_FILE_IDX$N
    DR$T_DMCS_FILE_DF_FILE_IDX$R
    I wonder what's wrong with it.
    My user has been granted the ctx_app role and I have other tables that store plain text which I use Oracle Text are fine. I even output the blob column and save as pdf file and they are fine.
    However the database seems like not indexing my blob column although the index can be created without error.
    Please advise.
    Really appreciate anyone who can help.
    Thank you.

    The situation is I have already loaded a few pdf document into the table's blob column.
    After I create the Oracle text index on this blob column, I find the system generated index tables listed in my earlier posting are empty, except for the 4th table.
    Normally we'll see words inside the table where those are the words indexed by oracle text on my document.
    As a result, no matter how i search for the index using select statement with contains operator, it will not give me any result.
    I feel weird why the blob is not indexed. The content of the blob are actually valid because I tested this by export the content back to pdf and I can still view and search within the pdf.
    Regards,
    Jap.

  • How to index ORDSYS.orddoc type using Oracle Text?

    Dear All,
    I am very new to Oracle Text and Oracle intermedia ORDSYS.orddoc type.
    As what I know it is impossible to index ORDSYS.orddoc using Oracle Text, so
    may I know is there anyway alternative to index ORDSYS.orddoc type using Oracle Text?
    I am using ORDDOC type due to my application need to allow user to upload various type of media file such as audio, video, word document etc...
    Please help as I need it to do full text search for those uploaded document, thanks in advanced.
    Best Regards,
    Chin

    Dear All,
    I am very new to Oracle Text and Oracle intermedia ORDSYS.orddoc type.
    As what I know it is impossible to index ORDSYS.orddoc using Oracle Text, so
    may I know is there anyway alternative to index ORDSYS.orddoc type using Oracle Text?
    I am using ORDDOC type due to my application need to allow user to upload various type of media file such as audio, video, word document etc...
    Please help as I need it to do full text search for those uploaded document, thanks in advanced.
    Best Regards,
    Chin

  • NEAR operator alternative when not using. oracle Text ?

    hi,
    I'm working on a project where i would need a Oracle Text 'NEAR like' operator ...
    here is my scenario ...
    in db we have Customers ... and every customer has some criterias like different search words( names, towns,cars,etc...) so for every customer i can create an SQL query out of criterias . ....
    now .... we can have a criteria like. ...... WHERE fulltext like 'john%'. or even distance search line NEAR inside CONTAINS. ... but then the Oracle text index is needed .....
    the only tAble on which Text index is created is our storage table that holds more then 4mil records and growing...
    my question is ... is there any way to have a query that would do the same thing as NEAR but without Text index ?
    here is how I start ....
    I get full newspaper article text from our OCR library ......
    then i need to check customer's criterias against this text to see which article is for which customer and then bind the article to the customer
    I could do it without Oracle using RegEx , but criterias can get really complicated ... like customer wants only specific MEDIA, or specific category , type , only articles that are from medias that are from specific country etc ... and many more different criterias ... and all this can be wrapped inside brackets with ANDs, ORs, NOT. ....
    So the only way to do it is to put it in Oracle and execute the correct query and let Oracle decide if the result is true or false .... but due to NEAR operator I need Oracle text ...
    So if I decide to first insert article into our storage table which has Oracle text index to be able to do the correct search .... how fast will this be ????
    will the the search become slower when there are 6mil records ? I know I can use FILTER BY to help Text index to do a better and quicker seach ... and how to optimize index ....but still
    I'm always asking my self..... why insert the article in a table where there are already 6mil articles and execute query when I only need to check data on one single article and. i already know this article ...
    I see two solutions :
    - if there is alternative for NEAR without using Oracle text index then i would insert data into temporary table and execute query on this table..... table would always contain only this one article. maybe one option would be to have one 'temp' table with Oracle text index in which i insert this one article and with help of Oracle text based on this one article do the search , and then maybe on a daily basis clear index ..... or when the article is removed from the table ... but this would mean having two Orcle text indexes, cause we already have Oracle text index on our storage table anyway....
    - another is to use Oracle text index and insert it into our storage table and hope for the best quick results ....
    Maybe I'm exaggerating and query like WHERE id=1234 and CONTAINS(...). will execute faster then I think
    If anyone would have any other suggestion I will be happy to try it ..
    thanks,
    Kris

    Hi,
    this is to my knowledge not possible. It is hard for Oracle to do, think about a table with many rows, every row with that column must be checked. So I think only a single varchar2 is possible. Maybe for you will a function work. It is possible to give a function as second parameter.
    function return_signup
    return varchar2
    is
      l_signup_name signup.signup_name%type;
    begin
      select signup_name
      into l_signup_name
      from signup
      where signup_id = 1
      and rownum = 1
      return l_signup_name;
    exception
      when no_data_found
      then
        l_signup_name := 'abracadabra'; -- hope does not exist
        return l_signup_name;
    end;Now you can use above function in the contains.
    select * from user_history_view users --, signup new_user
    --where new_user.signup_id = 1
    where contains(users.user_name, return_signup)>0;I didn't test the code! Maybe you have to adjust the function for your needs. But it is a idea how this can be done.
    Otherwise you must make the check by normaly check the columns by simple using a join:
    select * from user_history_view users, signup new_user
    where new_user.signup_id = 1
    and users.user_name = new_user.signup_name;Herald ten Dam
    htendam.wordpress.com

Maybe you are looking for