PDF indexing and multiple searches.

Dear members:
Please forgive me if my question is rather basic but I haven't been able to find the exact answers I am looking for in order to address my project needs.
I have a folder where I keep all of my PDF files. These are all articles from medical journals that I keep organized using a browser application specific for these types of articles. The application allows me to search these articles but it only looks for specific keywords (title, author name, date, journal name and keyword just to name a few). However, it doesn't look at the content of the PDF file to find words that are contained in the body of the article itself.
I would like to be able to use Acrobat to search these articles and try to find words I am looking for in the entire article instead of being restricted only to keywords. These are the questions I have:
1. What is the best way to index these PDF files so that they can become searchable ?
2. Is there a way to find out if they have already been indexed by the publishing company so that I avoid wasting time by doing it again ?
3. My library now contains approximately 15,000 articles and I expect it to grow to at least 30,000. How can I handle these searches so that performance doesn't become an issue ? Is there a way to ensure that Acrobat can search these number of files without taking a long time ?
4. I understand from the help files that Acrobat can search an entire folder so I don't have to run my search one article or file at a time. Is this correct ? What is the best way to run my search so that Acrobat looks at all files in one folder ? In this folder I have subfolders (subdirectories) ? Will Acrobat look at all files when searching including those in subdirectories within the specified directory ?
Thank you in advance for your help and replies.
Best regards,
Joseph Chamberlaini

After creation the index you need execute next operations.
first, check that your index tables conatins indexed terms. Execute
select token_text from dr$YOUR_INDEX$i;
Second, you will need to check the index errors table CTX_INDEX_ERRORS. This is owned by the user CTXSYS, and most users do NOT have # SELECT privilege to it by default.
If it's OK, then check that your PDF documents is supported by INSO filter.
Citation:
"PDF - Portable Document Format
Acrobat Versions 2.1, 3.0, 4.0, and 5.0 including Japanese PDF"
(Appendix B. Supported Document Formats in Oracle Text Reference 9.2)
For Oracle 9i you could install 9.2.0.4 patchset (it included INSO FILTER 7.5)
P.S.
for the beginning, you could find answers for your question about Oracle Text here
http://otn.oracle.com/products/text
Sorry for my English.
Best regards, Victor Zogin.

Similar Messages

  • Full text PDF indexing for website search?

    Hi.  We run a couple of websites on CQ5.5 and are trying to get the PDF files we refer to in the DAM to show in search results that users conduct on our sites.  I've seen a number of references that imply that full text searching of PDFs is possible.  For example:
    http://dev.day.com/docs/en/crx/current/developing/searching_in_crx.html#Full-Text%20Extrac tion
    But thus far I've not been able to figure out what I must do to get it working.  I had expected that if this were possible to do, then it would have worked with the Geometrixx demo site.  It did not.
    Am I chasing my tail here, or is there actually a way to get this done?  If it's possible, links to documentation on how to configure indexing_config.xml and any other required files would be greatly appreciated.
    Thanks.

    Laurent,
    We never got a definitive answer, but we have suspicions that it was due to having upgraded from CQ 5.4 to 5.5.  It seems that the libraries used for the indexing changed during that version upgrade.  When I took our application and installed it on a pristine 5.5 installation, the PDF indexing worked.  It was only our existing installations (two staging, two production) that did not work.  So at least we know it's not our application or CQ in general.
    Sadly, we don't have the resources to rebuild our servers, and we also ran into a separate problem that would prevent us from using the indexing anyway.  It seems that there is no way to prevent cross-site results if you have multiple sites on the same CQ install and they each have their own sections in the DAM where the PDF files are stored.  Would take some custom code to get around the issue, it seems.
    For example, you have site A and site B.
    /content/a  <- Main site A content for pages
    /content/b
    /content/dam/a <- Site A's files in the DAM
    /content/dam/b
    There is no stock way, that I am aware of, to keep searches on site A from turning up PDF results from /content/dam/b (for site B), and vice versa.  That's enough to keep us from using it - a total deal breaker.

  • TREX index and document search in 7.0

    Hello everyone,
    Could someone clarify next question?
    Is it possible to use TREX to index CRM 7.0 objects's attachments to search attachment?
    And if yes, could you provide me with roadmap and documents?
    Thank you in advance!
    Michael Wolff

    Michael,
    Were you able to find the solution. If yes, please provide some information.
    Thanks
    Chalapathi

  • Ssd indexing and mail search

    I've mavericks on ssd and I disabled (as interner suggest) indexing on my ssd... but now in mail I cannot use the search function! how can I restore that? do I need to reenable indexing on the drive?

    SSD is just a tool for your machine to work faster, smoother, cozier.
    Use it. Even if there is a longevity issue, you do keep backups and will get a new SSD when it's needed.
    Enjoy the speed while you have it.

  • Acrobat - Convert Office documents to PDF so that it is crawled/indexed by SharePoint search

    Hi there,
    This is a hybrid question between Acrobat and SharePoint and I'll post on both forums....
    Background:
    In a fairly complex application we have a publishing server that utilizes Acrobat to convert Office documents to PDF using the Convert to PDF functionality.
    We then publish that PDF to a library in SharePoint.  We would like to have those published PDFs searchable by SharePoint search.  Unfortunately there is something about these PDFs where SharePoint cannot crawl the content.
    Note:  I do realize that PDFs are not indexable by SharePoint out of the box and I have installed and configured the iFilter utility.  I have been able to index and search for other PDFs, so I know the mechanism works.  It just seems to be these
    particular PDFs.
    I have also manually "Saved as PDF" directly from Word/Excel and those PDFs are crawled by SharePoint....it just seems to be when Acrobat does its conversion.  I'm sure it's just a simple configuration somewhere... I just don't know what I'm
    looking for.
    Another note:  When I open the published PDFs, I am able to use Acrobat's search to find the text.... and the text is selectable; so it's not as if the conversion changed it to an image.
    So....would anyone happen to have encountered this issue?  Or does anyone know what makes a PDF indexable by SharePoint search?
    Thanks in advance

    Hi  ,
    According to your description, my understanding is that the PDFs which are converted from Office documents by Acrobat cannot be crawled in your SharePoint 2010.
    For your issue, please make sure these PDFs version is 1.5(Acrobat 6.x) or above.
    You can take steps as below for verifying:
    Open your PDF using Adobe Reader.
    Go to File -> Properties.
    Check the PDF Version under Advanced section.
    Best Regards,
    Eric
    Eric Tao
    TechNet Community Support

  • Convert to PDF from Excel so that it is indexable by SharePoint search

    Hi there,
    This is a hybrid question between Acrobat and SharePoint and I'll post on both forums....
    Background:
    In a fairly complex application we have a publishing server that utilizes Acrobat to convert Office documents to PDF using the Convert to PDF functionality.
    We then publish that PDF to a library in SharePoint.  We would like to have those published PDFs searchable by SharePoint search.  Unfortunately there is something about these PDFs where SharePoint cannot crawl the content.
    Note:  I do realize that PDFs are not indexable by SharePoint out of the box and I have installed and configured the iFilter utility.  I have been able to index and search for other PDFs, so I know the mechanism works.  It just seems to be these particular PDFs.
    I have also manually "Saved as PDF" directly from Word/Excel and those PDFs are crawled by SharePoint....it just seems to be when Acrobat does its conversion.  I'm sure it's just a simple configuration somewhere... I just don't know what I'm looking for.
    Another note:  When I open the published PDFs, I am able to use Acrobat's search to find the text.... and the text is selectable; so it's not as if the conversion changed it to an image.
    So....would anyone happen to have encountered this issue?  Or does anyone know what makes a PDF indexable by SharePoint search?
    Thanks in advance

    This cannot be done on a Mac. If you need to continue this discussion, please post in the Acrobat Macintosh forum.

  • Indexing and Searching pdf files which are used as attachment in an Announcemnet list item

    Hi all,
    I am using a SharePoint 2013 online environment and trying to search and find pdf files which are attached to a announcement list item. However it does not find anything when I search for the name of the pdf file or the content of the pdf file.
    When I attach a word to the list item it gets indexed and it find the file.
    thanks and appreciate every kind of advice.

    Are you able to search for pdfs in other locations? SharePoint 2013 comes with an iFilter out of the box unlike 2010 which needed configuration.

  • Embedded Search Index AND Document Security?

    I'm using Adobe Acrobat Standard 8.1.7.
    It appears that I cannot have both an embedded search index and restricted security (e.g., password required to change document) on the same document.
    Why is that?
    If I start with security ON and then attempt to embed a search index, I get below error message:
    A search index can not be embedded in this document because this document has restricted security permissions.
    If I start with security OFF, successfully embed a search index, and then secure the document, Acrobat "strips off" the previously embedded search index.  No warning message; no feedback to end-user; just kills it!
    Why are those two functions mutually exclusive?  Anyone know of a work-around?
    Thank you in advance!

    Hi,
    As to "why", that might be floating out there in Adobe's devnet space or in one of the blogs maintained by Adobe's devnet crew.
    Also good to know about use of embedded index - if used, cannot apply fast web view to the PDF. It is one or the other, but not both.
    Work around? I've not come across one; but, that does not mean something isn't "out there" <g>.
    Be well...

  • How to convert Xstring to PDF format and send pdf to multiple user

    Hi to all
    can any one provide me saple code to convert Xstring to PDF format and send pdf to multiple user
    i have searched the SDN , but cant get any proper soulution.
    I shall be thankful to you for this.
    Regards
    Pavneet Rana

    Use function module 'SCMS_XSTRING_TO_BINARY' to convert from XString to a Binary table. Just like this:
      CALL FUNCTION 'SCMS_XSTRING_TO_BINARY'
        EXPORTING
          buffer          = lv_xstring_pdf
          append_to_table = ' '
        TABLES
          binary_tab      = lt_doc_content.
    To send the email in an OO way you should user class CA_SEND_REQUEST_BCS. Take a look to program BCS_EXAMPLE_6 or any of the test programs in package SBCOMS.

  • Launch PDF from HTML and execute search

    Is it possible using JavaScript in an HTML doc to launch a PDF and execute a search.query with a passed-in search parameter? TIA

    Hello, George, thanks for following up.  I went into Advanced->Document Processing->Document JavaScripts and put the code at the top of the file outside of the dummy function declaration I had to insert to activate the edit button.  It looks like this:
    // Split the file path into an array
    var aPath = this.path.split("/");
    // Remove the last element, which is the file name
    aPath.pop();
    // Join the path elements back together and add the index file
    var cIdxPath = aPath.join("/") + "/Support Documents/SupDocs.pdx";
    search.query ("Enhanced Tactical Automated Security System (eTASS)", "Index", cIdxPath);
    function Search()
    It runs fine when I launch the PDF directly into the reader by double-clicking in file explorer in the folder above "Support Documents".  When I load the PDF from an HTML file in the same folder, however, the script does not run.  I set the reader preferences to force loading in the reader even when launched from a browser, and the script still will not execute.  I tried adding a button to the PDF and attaching the search script to it.  Again, it runs when launching directly but not from the HTML document.  Otherwise, the PDF is empty, and I'm using a relative URL and JS to load it from the HTML doc, such as:
    document.location = "mySearch.PDF"
    or
    window.open("mySearch.PDF", "searchwin");
    Neither one works.  This product has to run from a CD-ROM, so I haven't bothered to test it via http service.  Any suggestions?  Thanks again!
    Jon Camp
    Senior Computer Scientist
    Applied Research Associates, Inc.
    North Florida Division
    Training Solutions Group
    430 W 5th Street Suite 700
    Panama City, FL  32401
    comm: 850-914-3188 x203
    fax: 850-914-3189
    email: [email protected]

  • InterMedia indexing and searching of zipped files

    Hello, I have interMedia successfully configured to index and query a repository of files (MS Word, Excel, PPT, PDFs, txt files)which are located on a file system. My issue is with zip files. I cannot successfully index and search zip files. I've tried zips that contain both ascii(text) and formatted files (doc, ppt), but interMedia seems not to recognize this particular MIME type. Is there a way to have interMedia index and search zip files? Thanks in advance for any assistance.

    You will have more luck with this question if you post it in the Oracle Text forum. This forum is for interMedia (image, audio video).

  • Building index from multiple master and child relationship tables

    Hello,
    My question is:
    Is it possible to create the index for master and child tables?
    If yes, can you please point me out to any links or give me an example.
    Actually i just followed this below link to create the index using multiple tables
    Building index from multiple tables for text search
    I am able to create the index using above link,but problem accured , when i search for one master data column value then it is returning many rows with same master data for each child row.
    for example
    SELECT
    a.conc_program_name,
    a.conc_program_desc,
    b.param_name
    FROM a_master a, b_child b
    WHERE b.report_dtls_id = a.report_id
    AND CONTAINS (a.dummy, 'PAY') > 0
    Which retruns
    PAY Master A
    PAY Master B
    PAY Master C
    Please let me know is there any way i can restrict this to single row with concatination of child data like
    PAY Master A B C
    Another doubt is ,i have the column value like p_consolidation_set_id,when i give this in CONTAINS (a.dummy, 'p_consolidation_set_id') > 0 ,then not able to get the any results.
    please let me what shall i do for this issue.
    Thanks
    Message was edited by:
    user496798

    There are various ways to concatenate the values. One nice generic solution is to use Tom Kyte's stragg function:
    http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:2196162600402
    If p_consolidation_set_id is a variable name, not a value, then do not put quotes around it.
    Message was edited by:
    Barbara Boehmer

  • With the new update: I click on .pdf to download and multiple new tabs open w/o downloading .pdf file

    Every time I click to download a .pdf file for a study on the following site
    http://www.regionschristiancenter.org/the-rabbis-son/rabbis-son-study-archives/archives-dvarim-deuteronomy/
    Firefox opens multiple new tabs and does not stop doing so until I close the browser and even then it opens up on its own and continues to open multiple new tabs. If I do not get to it quickly enough, I may have to do this more than twice.
    I have tried downloading a study on Internet Explorer and do not have this issue. There is also a Microsoft Word option that works on Firefox but it's the .pdf that is the problem.
    I have not attempted to download any other .pdf files from other sites so I do not know if it is a problem with all .pdf files. But as stated above, it is not an issue with .pdf on IE.

    Could you check your PDF viewer preference and try changing it to a different viewer to see whether that helps? This article describes how to access that setting: [[How to disable the built-in PDF viewer and use another viewer]].
    Unfortunately, the settings file that stores those application handling preferences sometimes contains crossed up settings or settings which Firefox cannot actually implement. In that case, you generally need to rename or remove that file and let Firefox rebuild it. The steps for that are in this article: [[Firefox repeatedly opens empty tabs or windows after you click on a link]] -- skip to the section "Reset actions for all content types".

  • Spotlight searching no longer working - indexing and search disabled.

    I've been searching the web and tried everything:
    Server 10.5.8
    In Server Admin - the attached drive is a SharePoint with Spotlight search on.
    I've used mdutil to enable Spotlight.
    I've checked permissions.
    I can search the Boot Drive. I can't search the attached drive.
    mdutil returns indexing and search disabled when used to turn it on.
    very frustrating.
    Anyone out there have a clue?
    Thanks,
    Mark

    HI James,
    Open System Preferences/Spotlight and click the Privacy tab. Where you see; Delete any locations listed, Quit System Preferences and restart your Mac and see if you can use Spotlight.
    Spotlight Tips
    Spotlight: How to re-index folders or volumes
    Carolyn

  • Hello.  May I ask:  I am using Acrobat 6.0.2, and having trouble formatting a PDF created from multiple (jpeg) files.  Each page is formatted to A4 size, portrait.  But when I create the PDF, each sheet appears as tiny, in the middle of a huge white page.

    Hello.  May I ask:  I am using Acrobat 6.0.2, and having trouble formatting a PDF created from multiple (jpeg) files.  Each page is formatted to A4 size, portrait.  But when I create the PDF, each sheet appears as tiny, in the middle of a huge white page.  I cannot seem to find any controls to adjust this.  Any advice appreciated.

    Thanks CtDave, for the further info.  Unfortunately, those suggestions are not working, which is strange.
    ....Until last week, I used to make multi-page PDFs straight from Photoshop: File > Automate > Make PDF.  One simply choses the files, orders them, and creates a PDF.  (Resolution doesn't matter; 300 DPI is no problem.)  However, my new Photoshop CS5 does not have the option in Automate.
    What I've done, seeing as Acrobat is letting me down, is use Bridge (new to me) to create the PDF.  It worked without a problem, but seems like more work than the old PS method.  I will download a newer version of Acrobat to see if that makes any difference.
    .....Also, thanks Test Screen Name:  I agree with you that jpeg is irrelevant, and that one can make a PDF from Photoshop (Print > Save As - pdf).  But that only works for SINGLE PAGE pdf, not multiple pages as far as I can tell.
    Kind regards,
    Prince Nuada

Maybe you are looking for

  • HT1849 "Complete Album" Problem - "Purchase of this Item is not currently Available. This item is being modified. Please try again later."

    I am trying to complete an album of which I bought 3 singles and every time I hit "Complete Album" for $4.83, it says "Purchase of this Item is not currently Available. This item is being modified. Please try again later." Ive tried multiple times ov

  • Mail not saving messages in Sent if previously in Drafts

    This question was already posed several months ago but not really answered so I am trying it again. I sometimes save a draft message in the draft folder only to send it later. Often it is the case that this previously saved draft does not appear in t

  • Problem in startup 10g rac

    hi, all i have just poweroff the os just now. but after i restart the os, the 10g rac failed to start now.it output the following error message: [oracle@dev1 ocfsdoc]$ crs_start -all Attempting to start `ora.dev1.vip` on member `dev1` Attempting to s

  • Engraving Question : Apostrophes

    For anyone that has this; are the apostrophe marks as they appear in the engraving preview on the engraving? I was thinking of using them in lieu of quotation marks but am a tad worried they may not work out. Tried it out on word and it looked rather

  • SAP BCM CDT browser issue.

    Hi Experts i was watching your earlier thread (threadID=2002983)and gud to see that the users's problem of CDT login has been resolved. it gave me immense strength that my same issue can be resolved with the experts and your concern. can you please h