Using Oracle Text to search through WORD, EXCEL and PDF documents

Hello again,
What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
Thanks
Doug

Yes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
begin example.
-- The following needs to be executed
-- as sys.
DROP DIRECTORY docs_dir;
CREATE OR REPLACE DIRECTORY docs_dir
AS 'C:\sql\oracle_text\documents';
GRANT READ ON DIRECTORY docs_dir TO text;
-- End sys ran SQL
DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
CREATE TABLE db_docs (
id NUMBER,
format VARCHAR2(10),
location VARCHAR2(50),
document BLOB,
CONSTRAINT i_db_docs_p PRIMARY KEY(id)
-- Several notes need to be made about this anonymous block.
-- First the 'DOCS_DIR' parameter is a directory object name.
-- This directory object name must be in upper case.
DECLARE
f_lob BFILE;
b_lob BLOB;
document_name VARCHAR2(50);
BEGIN
document_name := 'externaltables.doc';
INSERT INTO db_docs
VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
RETURN document INTO b_lob;
f_lob := BFILENAME('DOCS_DIR', document_name);
DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
DBMS_LOB.FILECLOSE(f_lob);
COMMIT;
END;
-- build the index
-- Note that this index differs than the file system stored file
-- in that paramter datastore is ctxsys.defautl_datastore and not
-- ctxsys.file_datastore. FILE_DATASTORE is for documents that
-- exist on the file system. DEFAULT_DATASTORE is for documents
-- that are stored in the column.
create index db_docs_ctx on db_docs(document)
indextype is ctxsys.context
parameters (
'datastore ctxsys.default_datastore
filter ctxsys.inso_filter
format column format');
--search for something that is known to not be in the document.
SELECT SCORE(1), id, location
FROM db_docs
WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
--search for something that is known to be in the document.  
SELECT SCORE(1), id, location
FROM db_docs
WHERE CONTAINS(document, 'Albright', 1) > 0;

Similar Messages

  • Can I use my iCloud 5GB to store Word, Excel and PDF files?

    I have an iCloud account that shows 5GB of space available and I'd like to be able to upload a bunch of documents for storage and be able to access them from any of several PCs, Macs and iOS devices. I do this now with Dropbox, readdledocs and google docs, but I can't figure out how to do it with iCloud.  I don't want to buy the iWork apps--just want to store files and be able to access them from other systems.  Is this possible?

    Only If you own iOS5 and also keynote,pages and numbers for iOS you can opt in to use iCloud for document storage.
    That's in iOS5 "Settings" -> "pages" ( numbers,keynote ) -> "Use icloud" , set the slider to yes ( blue ).
    Then all files created on your mobile iwork apps is going into the cloud, occupying space of the 5GB.
    You can then ( without purchasing the apps you can't use office document storage/ up- and download per Browser in iCloud at all ) also manually upload files from PC/Mac to the icloud space, thus also have these uploaded documents inside your iOS apps.
    It accepts all Microsoft Office 97 - 2011 formats as well as iWork09 for OSX file formats.
    PDFs can't be up-, but down-loaded as copies of your documents up there in the icloud.
    The URL to up and download is www.icloud.com/iwork  and the process of uploading is done like this :
    Message was edited by: Sjazbec : no iOS apps = no iWork in the cloud.

  • What do I need to do to display a MS Word, Excel or PDF document in browser

    Hi, Right now I have photos loaded and displayed in my HTML document in the browser next to a report...
    What do I need to do to display a MS Word, Excel or PDF document in a browser?
    I use the following procedure to load the content to the region of my HTML .
    This gives an EDIT link to the photo...
    select
    '[img src="#OWNER#.display_thumb?p_file_id=' || nvl(file_catalog_id,0) || '" /]' "File"
    from "FILE_CATALOG"
    where "FILE_CATALOG_ID" = :P9_FILE_CATALOG_ID
    This is the procedure to load the content to the region of my HTML .
    create or replace PROCEDURE "DISPLAY_THUMB" (p_photo_id in number)
    as
    l_mime varchar2(255);
    l_length number;
    l_file_name varchar2(2000);
    lob_loc BLOB;
    begin
    select mime_type, thumbnail, photo_name, dbms_lob.getlength(thumbnail)
    into l_mime, lob_loc, l_file_name, l_length
    from photo_catalog where photo_catalog_id = p_photo_id;
    -- Set up HTTP header
    -- Use an NVL around the mime type and if it is a null, set it to
    -- application/octect - which may launch a download window from windows
    owa_util.mime_header(nvl(l_mime,'application/octet'), FALSE );
    -- Set the size so the browser knows how much to download
    htp.p('Content-length: ' || l_length);
    -- The filename will be used by the browser if the users does a "Save as"
    htp.p('Content-Disposition: filename="' || l_file_name || '"');
    -- Close the headers
    owa_util.http_header_close;
    -- Download the BLOB
    wpg_docload.download_file( Lob_loc );                               
    end;

    These were supplied from Justin in Experts Exchange..
    For PDF, see here:
    http://www.adobe.com/support/techdocs/328233.html
    http://www.adobe.com/support/techdocs/331025.html
    For Word docs, see here:
    http://www.shaunakelly.com/word/sharing/OpenDocInIE.html
    Any other input... any AJAX?

  • Hoe to edit Word, Excell and Powerpoint documents

    Hello 
    I had installed the BlackBerry Device Software 4.5 on my BlackBerry 8310 and would like to know how to edit Word, Excel and Powerpoint documents. 
    Can I edit files that i sent from the computer to the device memory card and then send it by email.
    Thankyou 
    sluizoliv 

    Hi there!
    If your BB came with the DataViz ToGo apps, then you can indeed do what you desire. Check under (usually) Applications for Word-to-go, sheet-to-go and other ToGo apps. You can edit a file you save to your device or media card memory. You can send them as attachments. Etc.
    Hope that helps!
    Occam's Razor nearly always applies when troubleshooting technology issues!
    If anyone has been helpful to you, please show your appreciation by clicking the button inside of their post. Please click here and read, along with the threads to which it links, for helpful information to guide you as you proceed. I always recommend that you treat your BlackBerry like any other computing device, including using a regular backup schedule...click here for an article with instructions.
    Join our BBM Channels
    BSCF General Channel
    PIN: C0001B7B4   Display/Scan Bar Code
    Knowledge Base Updates
    PIN: C0005A9AA   Display/Scan Bar Code

  • How can I embed files of word, excel, and pdf format in a pdf document

    I have a word document of the product, which I am updating for next product release. However, there are some excel, word, and pdf files embedded within a word document. When I double click these embedded files in word, these files open in a new window. However, we deliver documentation to the customer in the pdf format. Therefore, when I am converting the word document in the pdf format, only an icon of the embedded file is displayed and the files do not open in a new window. Can someone let me know how can I embed these files (in word, excel, and pdf format) in a pdf document?

    You must attach them to the PDF file after it is created. You cannot embed file attachments onto a page as you can with Word.

  • Sharing and displaying Excel and pdf documents in SAP BO Mobile

    Hello all,
    I have one quick question. Is it possible to share excel and pdf documents through SAP BO Mobile?
    I tried it uploading documents to BO Launch pad and assigning them to Mobile category. I can't see them in my ipad. Is there another way to share these kind of files through SAP Mobile in mobile devices? and how?
    Thanks.

    No that won't do it. Because my users have their own excels which are not related to BO or BW platforms. They are entering data manually to excel spreadsheets and reporting. We just want to upload them to BO platform and let them share those files with other users.
    I know I could create a webi report on top of excel, but we don't want to spend redoing the reports in excel.
    Thanks though.

  • Using Oracle Text for searching with UCM 10g

    I am using Oracle text with UCM 10gR3 and Site Studio 10gR4 and I am trying to sort the search results by relevancy and to also include a snippet of the retrieved document. I have the fields that the SS_GET_SEARCH_RESULTS service returns but the relevancy score is always equals 5 and the snippet contains characters such as < idcnull, /p, etc., which you can see are XML/HTML/UCM tags but which result sin even more strangeness in the snippet if I try to remove them programmatically.
    I have read the Oracle Text documentation and there appear to be ways you can configure Oracle Text but I am not clear at all on what I can do from UCM. It looks like the configuration is either done in database tables or in the query itself, neither of which are readily configurable to me.
    Is anyone experienced in this or know of any documentation this might help?
    Bill

    Hi
    If I remember correctly then this issue was seen with an older version of OTS component and Core Update patch / bundle . Upgrade the UCM instance with the latest CS10gr35 update bundle patchset 6907073 and also upgrade OTS component from the same patchset .
    Let me know how it goes after this .
    Thanks
    Srinath

  • How to use documents MS-Word-Excel and Pdf on iPhone 4S

    Before I bought the iPhone I was using Sony Ericson X1 Xperia. On this phone I could easily read & write  on documents MS-word/excel and transferred them from and to my PC with Windows 7.
    Till now I cannot find an easy way to transfer these documents onto my iPhone to edit, create or read. Have already installed iTunes, iCloud and one app numbers. Sofar no success to have it on my iPhone.
    I there sombody who could help me out.
    Thnks, Cor

    http://itunes.apple.com/us/app/quickoffice-pro/id310723177?mt=8

  • Unable to search for words in a pdf document

    I am using Adobe Reader Ver 10.1.1 (All updates)
    I have a pdf document in which I am unable to search for words. No words in the document are found even though they do exist. If I copy a word from within the document and paste it into the search criteria then instead of getting the word that I copied, I get substitute characters and it can in fact find these characters as the word being searched.
    The document is a catalog and can be downloaded from the following URL.
    http://www.carbatec.com.au/getcatalogue?zenid=d1cuvbat0ois0g37r0r33vnah1
    I will appreciate any help as to why I cannot search for words in the document.

    Thank you Dave Merchant and try67 for your responses. As per my previous post, I contacted the company re the catalogue and they have responded favourably. I'll include their response because it gives the reason for the search failure as document compression which you might find interesting. I'll await their new catalogue and see if they have fixed the problem.
    Company response:
    Thank you very much for your input.  And yes, you are correct, the compression we used for the current catalogue's PDF format does strip out text included in the catalogue.  We used the compression settings we did with the intention of minimising download time, however I take your point about including text for search purposes (which I also utilise when I'm scanning through PDFs).
    We will actually be posting out our new catalogue next week and we'll release the new PDF version on the website at the same time.  I've asked our graphic designer to ensure that the PDF we use for the new catalogue includes searchable text.

  • Using Oracle Text to find attribute values in a XML document

    Can anybody help me? I created a index on a URIType column,
    create index my_index on uri_tab(docurl) indextype is ctxsys.context parameters ('SECTION GROUP my_sections').
    Before index creation I executed these two functions, which prepare Oracle on text search in attribute sections: ctx_ddl.create_section_group('my_sections','XML_SECTION_GROUP') and exec ctx_ddl.add_attr_section('my_sections','machinetype','MachineType@text')
    After index build I looking for an attribute value.
    SELECT e.docName FROM uri_tab e WHERE CONTAINS(e.docurl,'SM_52 WITHIN machinetype') > 0;

    An advise to read Oracle documentation is great but I have done it already and didn't find a way to check a syntax of an attribute retrived from LDAP server.
    I haven't find anything new in $ORACLE_HOME/RDBMS/admin/dbmsldap.sql as well.
    Let's take an example. I have taken some attribute from LDAP server by dbms_ldap.first_attribute and would like to know if the values of this attribute are strings or some binary staff.
    How can I do it?

  • Why can't Adobe Reader find or search for words on a PDF document?

    I have seen where people have the same problem whoes comments where dated back in June 2011.  WHEN WILL THIS PROBLEM BE FIXED?  I am sent PDF files with multiple pages of documents that I have to go through.  I need to find documents by employee names.  Please help.

    Are you able to select the text? If not, then it's probably just a scanned image of text, not real text. In that case there's nothing you can do about it unless you run OCR on the document in Acrobat or some other similar application.

  • Differences Oracle Text Soundex Search & Standar Soundex

    Hi all,
    I want to ask some question:
    1. Is Oracle Text soundex searching using soundex matching algorithm invented
    by Donald Knuth?
    2. Why Oracle Text soundex search returns different results to a standard
    soundex?
    3. Can anybody describe how Oracle Text soundex searching process?
    Thanx,
    Robby

    Hi Ron,
    thank for your reply.
    I've already read the thread and soundex matching algorithm invented by Donald Knuth.
    but sorry i still don't understand about oracle soundex searching.
    According to Knuth's algorithm the first letter is the important key to searching.
    i.e with standard soundex a word "PEEL" will find "PILE" or "P???" and so on.
    but with oracle text soundex search a word "PEEL" will find "PILE", "BEEL", "BELL", "FEEL", "VERE" etc.
    Is oracle text soundex search not using Knuth's algorithm? if is then how the process work?
    Thanks,
    Robby

  • Parsing the word file using oracle text having tables within it............

    Hi,
    I was going through this document.Actually I am going to implement something like full text search functionality in our system.
    We get the info as .doc file.
    Earlier what we used to do is, we used to parse the file and store it into the database and then searched using PL/SQL.
    But what I understand from this article that this can be done using oracle text also.
    One concern is that whether the oracle text is able to parse the .doc file having tables embedded within it.
    Please let me know about this.(Whether oracle text will be able to parse the files having tables embedded within it).
    I am attaching an example file for this.
    Please let me know about this as early as possible.

    Yes Oracle Text have this capability. Use AUTO_FILTER or USER_FILTER to create index

  • Using oracle text in apex report search

    I am trying to use oracle text in apex, integrating it in an existing application. The idea is that it will allow to do a search in bigger textfields. Thats how I want it to get to work. In one of the oracle packaged applications oracle text is used as well, so I will have a look to that as well. I've addapted this search. I've added
    AND t. contains(oplossing, :P15_OPLOSSING)
    AND t.contains(sleutelwoorden, :P15_SLEUTELWOORDEN)
    That didn't work, so I changed those two to:
    AND t.oplossing = (t.contains(oplossing, :P15_OPLOSSING)>0)
    AND t.sleutelwoorden = (t.contains(sleutelwoorden, :P15_SLEUTELWOORDEN)>0)
    which didn't work either, which I expected to be the case. Clearly I'm not doing it correctly, I intend to look it up tonight in the packaged applications as I do want to findt it myself to.
    But does anyone can give a hint, on what I am doing wrong ?
    SELECT t.ticketid ticketnr, t.ticketid,
    g.voornaam||' '||g.naam aangemaaktdoor,
    t.credt, t.applicatiecd, t.titel,
    s.statusdefoms,
    si.statusdefoms instat,
    NVL2(t.toegekend,'Y','N') toegekend,
    sleutelwoorden, klantprioriteitid, oplossing, s.htmlkleur, si.htmlkleur inthtmlkleur
    FROM ticket t,
    gebruiker g,
    status s,
    status si
    WHERE t.gebruikerid = g.gebruikerid
    AND t.statusid = s.statusid
    AND t.statusinternid = si.statusid (+)
    AND t.applicatiecd = NVL(:P0_APPLICATIECD, :F101_APPLICATIECD)
    AND (t.categorieid = :P15_CATEGORIEID OR NVL(:P15_CATEGORIEID, 0) = 0)
    AND (t.moduleid = :P15_MODULEID OR NVL(:P15_MODULEID, 0) = 0)
    AND (t.statusid = :P15_STATUSID OR NVL(:P15_STATUSID, 0) = 0)
    AND (t.statusinternid = :P15_INTSTATUSID OR NVL(:P15_INTSTATUSID, 0) = 0)
    AND (t.versieid = :P15_VERSIEID OR NVL(:P15_VERSIEID, 0) = 0)
    AND t.ticketid LIKE '%'||:P15_TICKETID||'%'
    AND t.gebruikerid = DECODE(NVL(:P15_GEBRUIKERID,0), 0, t.gebruikerid, :P15_GEBRUIKERID)
    AND t.credt BETWEEN NVL(:P15_DATUMVAN, To_Date('01-01-1900', 'DD-MM-YYYY')) AND NVL(To_Date(:P15_DATUMTOT, 'DD-MM-YYYY'), sysdate) +1
    AND t.titel LIKE '%'||:P15_TITEL||'%'
    AND t. contains(oplossing, :P15_OPLOSSING)
    AND t.contains(sleutelwoorden, :P15_SLEUTELWOORDEN)
    AND PCK$Ticket_Admin.getklantid(t.gebruikerid) = DECODE(Pck$Ticket_Admin.isklantadminroleN(:APP_USER,NVL(:P0_APPLICATIECD, :F101_APPLICATIECD)), 1, PCK$Ticket_Admin.getklantid(:APP103_GEBRUIKERID), PCK$Ticket_Admin.getklantid(t.gebruikerid))
    AND (:APP103_GEBRUIKERID IN (t.voor_gebruikerid, t.gebruikerid)
    OR Pck$Ticket_Admin.isintern(:APP_USER,:P0_APPLICATIECD) = 1)
    changed to:
    AND t.oplossing = (t.contains(oplossing, :P15_OPLOSSING)>0)
    AND t.sleutelwoorden = (t.contains(sleutelwoorden, :P15_SLEUTELWOORDEN)>0)

    I have worked it further out now, and looked at the search of the packaged application. It turned out to be a pl/sql block . I used what I found in there to adapt the previous search. I added the following:
    OR (CONTAINS(t.oplossing, :P15_OPLOSSING)>0)
    OR (CONTAINS(t.sleutelwoorden, :P15_SLEUTELWOORDEN)>0)
         OR (CONTAINS(t.titel,:P15_SEARCH_T_O_S)>0 OR
         CONTAINS (t.oplossing, :P15_SEARCH_T_O_S)>0 OR
         CONTAINS(t.sleutelwoorden, :P15_SEARCH_T_O_S)>0 )
    OR (CONTAINS(t.titel,:P15_SEARCH_T_O_S)>0 AND
         CONTAINS (t.oplossing, :P15_SEARCH_T_O_S)>0 AND
         CONTAINS(t.sleutelwoorden, :P15_SEARCH_T_O_S)>0 )
    oplossing means solution
    sleutelwoorden means keywords
    titel means title
    Yet this doesn't work yet. It gives an error message:
    failed to parse SQL query:
    ORA-01719: outer join operator (+) not allowed in operand of OR or IN
    I've tried adding the addition in a different place, yet that gives the same error message. I'm not sure now.

  • Oracle iRecruitment: Keyword Search within Resumes using Oracle Text

    Dear All,
    As per my understanding (and Note: 247064.1) simple Keyword searches can be performed in iRecruitment if oracle Text is installed. However searching for Keywords within resumes is not possible using Oracle Text and is possible ONLY if Resume Parsing is enabled via a third party (non-oracle) service provider.
    Can you please let me know if my understanding is correct and if not provide further inputs on this.
    Thanks,
    Subrat

    Got this confirmation from Oracle via SR:
    Resume searching is independent of resume parsing and not required to search resumes.
    Oracle Text is the text engine that allows you to search documents using content-based queries. Oracle Text allows you to upload documents, search documents, parse resumes, etc.
    Hence to conclude - Installation of Oracle Text will allow Keyword Searches on resumes.
    Thanks,
    Subrat

Maybe you are looking for