Search pdf document

Hi!
i'm new to this, and wanted to ask:
is there a way to use VBA or JS to search for text in a PDF?
the preferable way would be to get a result without opening the document if the reader unless there is a valid result.
thanks!

What you are kind of looking for, is an indexer for a search engine. You might look around for small server search engines, and configure them in a way that it indexes the directories you need.
If you happen to be on OSX, you might look at the way you could control Spotlight (via AppleScript), but that would not really work from within a PDF.
If it has to be from a PDF, you would first have to run Catalog which then creates an according index, and then you can search through it.
If you have a specific set of search terms, you could also set the Metadata of the PDF accordingly, and then parse just the metadata, which means that you would not have to open the PDF in a PDF viewer.
Hope this can help.
Max Wyss.

Similar Messages

  • Problems searching PDF documents

    I am using Adobe Reader XI in Windows 7 and whenever I open a PDF and press ctrl+f to search the contents of the PDF document, I can get two or three results before a blue box comes up. When this blue box comes up, I can't run anymore searches. I've tried a repair and I have also uninstalled and reinstalled Adobe reader, but this hasn't resolved the issue. On some PDF documents, I can still search the document contents even though the blue box has come up, but on other PDF documents, I can't run any additional searches when the blue box comes up.

    I've uploaded two PDF files in which I have seen the box appear to Google drive. I have Adobe Reader XI installed on my home computer, but have seen this same blue box appear when using older versions of Adobe reader.
    https://docs.google.com/file/d/0B0NUQF7Y-KVjWjgyeElHYlRrXzA/edit?usp=sharing
    https://docs.google.com/file/d/0B0NUQF7Y-KVjeUpTc1BiOTk4XzA/edit?usp=sharing

  • Search engines, .pdf documents

    Can (or do) search engines search .pdf documents that are internally linked to and part of a specific website?

    Hi
    While most leading search engines can now read and index the content of a PDF, they have certain restrictions and may only index the first N hundred or thousand characters. Further, the file size of a PDF document frequently exceeds 100K and may take a long time to download.
    So if you use pdf's keep the documents short.
    PZ

  • Can I select a rectangle in pdf document, identify the bounds, and then retrieve the text with vba?

    I am using MS Access 2010 VBA and acrobat X, and AcroExchg.  I can manipulate and search pdf documents without difficulty.  I would like to programmatically capture the text in a rectangle that I have drawn on the document with a mouse.  It appears that If I can obtain the bounding rectangle that I could use AcroExchg.PDTextSelect to retrieve the text.  Is there a way to retrieve the coordinates of a mouse drawn rectangle in a pdf document.  In other words the equivalent of select, copy and later paste the text into another document?
    Message was edited by: jgm835
    I have reviewed posts from the adobe forum that indicate that obtaining the coordinates from a user drawn selection are not available through automation with AcroExch, but that function would be available .through a plug in.  Unless there is information to the contrary I will look to using menu commands in Acrobat

    hi Bruce,
    1. It sounds like what you need is to set the starting version number since your revision number increments in whole numbers, it would match up to SharePoint once the starting version number is set. You can potentially create a new custom field in the
    library to manually track the version of the uploaded PDF document, but this might not match up with SharePoint's own version number and could get confusing. Another possibility is to upload dummy versions of the PDF document until the SP version
    matches with the revision version and then delete these dummy versions.
    2. When you upload the PDF document again into the library, it should prompt you to see if you want to replace the existing. If you proceed with the upload, it should replace and increment the SharePoint version number.
    Please Mark Answered if my reply solves your problem. Thanks!
    Jeff Thai
    Technical Solutions Architect, AvePoint
    http://www.AvePoint.com

  • A hyperlink click to a pdf document in a website does not open a pdf document instead presents a blank page "searching for bookmarks"

    a hyperlink click to a pdf document in a website does not open a pdf document instead presents a blank page "searching for bookmarks"

    Check the settings as shown in the [[opening PDF files within Firefox]] article.

  • Unable to search for words in a pdf document

    I am using Adobe Reader Ver 10.1.1 (All updates)
    I have a pdf document in which I am unable to search for words. No words in the document are found even though they do exist. If I copy a word from within the document and paste it into the search criteria then instead of getting the word that I copied, I get substitute characters and it can in fact find these characters as the word being searched.
    The document is a catalog and can be downloaded from the following URL.
    http://www.carbatec.com.au/getcatalogue?zenid=d1cuvbat0ois0g37r0r33vnah1
    I will appreciate any help as to why I cannot search for words in the document.

    Thank you Dave Merchant and try67 for your responses. As per my previous post, I contacted the company re the catalogue and they have responded favourably. I'll include their response because it gives the reason for the search failure as document compression which you might find interesting. I'll await their new catalogue and see if they have fixed the problem.
    Company response:
    Thank you very much for your input.  And yes, you are correct, the compression we used for the current catalogue's PDF format does strip out text included in the catalogue.  We used the compression settings we did with the intention of minimising download time, however I take your point about including text for search purposes (which I also utilise when I'm scanning through PDFs).
    We will actually be posting out our new catalogue next week and we'll release the new PDF version on the website at the same time.  I've asked our graphic designer to ensure that the PDF we use for the new catalogue includes searchable text.

  • Using Oracle Text to search through WORD, EXCEL and PDF documents

    Hello again,
    What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
    Thanks
    Doug

    Yes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
    Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
    http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
    begin example.
    -- The following needs to be executed
    -- as sys.
    DROP DIRECTORY docs_dir;
    CREATE OR REPLACE DIRECTORY docs_dir
    AS 'C:\sql\oracle_text\documents';
    GRANT READ ON DIRECTORY docs_dir TO text;
    -- End sys ran SQL
    DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
    CREATE TABLE db_docs (
    id NUMBER,
    format VARCHAR2(10),
    location VARCHAR2(50),
    document BLOB,
    CONSTRAINT i_db_docs_p PRIMARY KEY(id)
    -- Several notes need to be made about this anonymous block.
    -- First the 'DOCS_DIR' parameter is a directory object name.
    -- This directory object name must be in upper case.
    DECLARE
    f_lob BFILE;
    b_lob BLOB;
    document_name VARCHAR2(50);
    BEGIN
    document_name := 'externaltables.doc';
    INSERT INTO db_docs
    VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
    RETURN document INTO b_lob;
    f_lob := BFILENAME('DOCS_DIR', document_name);
    DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
    DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
    DBMS_LOB.FILECLOSE(f_lob);
    COMMIT;
    END;
    -- build the index
    -- Note that this index differs than the file system stored file
    -- in that paramter datastore is ctxsys.defautl_datastore and not
    -- ctxsys.file_datastore. FILE_DATASTORE is for documents that
    -- exist on the file system. DEFAULT_DATASTORE is for documents
    -- that are stored in the column.
    create index db_docs_ctx on db_docs(document)
    indextype is ctxsys.context
    parameters (
    'datastore ctxsys.default_datastore
    filter ctxsys.inso_filter
    format column format');
    --search for something that is known to not be in the document.
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
    --search for something that is known to be in the document.  
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Albright', 1) > 0;

  • How 2 search inside a PDF document using the firefox internal PDF reader

    Cannot find any way to do a text search inside a PDF document when it is displayed by the builtin PDF player in Firefox.
    If there is a way, how is it done?
    If there isn't a way, it seems an obvious enhancement.

    son of a gun. I've always searched in firefox by hitting slash and typing the search term, but that doesn't seem to work in the pdf reader. I guess, if I had searched for key bindings I coulda figured that out for myself.
    thanks!

  • Searching a pdf document in order to find text parts formatted with a specific font

    I have a document that I output as a pdf.
    Between the first and last version of this document, I've made some changes, one of those being replacing a font by another one.
    Alas, in the final pdf document, under "document properties", "fonts tab", I can still see the first draft font (when it should not be present anymore in my document). It may mean that there's some bit of text that I have not correctly updated.
    So I want to identify those chunks of text that are still formatted using the first draft font. I know that it should be done in my main app but, for some (good) reason, I prefer to do this search in the pdf output itself.
    So, how could I do this search within Acrobat (I'm using Acrobat XI Pro) ?
    Thanks

    Possible yes.. a bit klunky.. also yes.
    Open the document properties > fonts dialog and write down the PostScript name of the font you're looking for
    Open Prefight from the Print Production panel
    Select Single Checks (eyeglass symbol)
    Select Options > Create New Preflight Check
    Call it 'find my font', and in the 'find' box at the top right type "base font". The task will appear in the window below, click Add.
    Under 'begins with' enter your PostScript name, then save the Check
    Back on the Preflight dialog select the Check and press Analyze.
    Double-click the line starting with a red X and in the document itself you will see each instance of that font indicated by red crop marks.

  • Annotating pdf documents in Preview - the text search feature doesn't work?

    Hi everyone,
    I'd like to annotate my pdf documents using preview. However, I realized the other day that as soon as I start annotating a document (with notes, shapes, highlighting, etc), the preview text search doesn't work anymore.
    Thanks for any help or advice!

    This is a known problem. I have seen one suggestion to print the annotated PDF as a new PDF in order to get search capability back. You can try the Apple Discussions "search" feature at the upper right corner to look for more suggestions.

  • Pdf documents search

    I have the documents that are scanned and stored in pdf
    format.
    Most of them are forms.
    I am using the verity tool to index and search the documents.
    My search was working fine for other regular pdf documents.
    But while testing I noticed that the search fails for
    contents within scanned pdf documents.
    Please help!
    Thanks

    But while testing I noticed that the search fails for
    contents within
    scanned pdf documents.
    Are the documents just scanned as images or are they run
    through a OCR
    process to turn the images into text.
    If the former, there is nothing for verity to search, it can
    not search
    an image. It the latter, it should be working.

  • How to search special character in searchable PDF Document???

    Hi All,
    Could some one help me out, how to search special character in PDF document? I have attached the screen shot for your reference.
    -Rgds,
    Gnanasekaran

    OK, I found it myself:
    1. Tools - Pages - Edit Page Design - Header & Footer - Add Header & Footer.
    2. Select the font and size, etc, place the cursor on the appropriate site to insert the page number, click the "Insert Page Number" button, and click OK.
    That is!

  • How can I allow visitors to my website to search within pdf documents created with Acrobat?

    I honestly had no idea where to place this question. If I should move it to another forum area please just let me know. I have about 200-300 pdf documents on my website and people are able to search by the title of the file but not for items in the body of the document. Is there a way that is possible?

    Are the documents encrypted? If they are not, the common search engines (Google, etc) will normally index all the text in a PDF, the same as if it was a regular Web page. They don't index non-text media within a PDF (images, attachments, etc.) nor will they handle PDF Portfolios.

  • HIGHLIGHTING WORDS AND DICTIONARIES SUPPORT - Search and highlight words in the PDF documents

    One of the things that Adobe people don't understand very well, is they focus and focus and focus constantly in adding new "cool" features to the product... as more flash support... etc. etc.
    But Adobe still pending to make easier the life of people that work with tons and literally tons of PDF documents.
    Students, researchers, law professors, academics... they all need a solution to search and highlight in different colors, for certain words in a lot of documents.
    So here we have a challenge for the Adobe folks.
    Let's imagine you are working in a Law Office, ok? (you're lawyer, not programmer)
    You have a trial tomorrow...
    The trial is about robbery (for example)
    Now you are looking for Jurisprudence (other similar cases that were judged before)...
    And let's imagine you have a folder with 200 cases in 200 PDF files, talking about robbery, between the years 2005-2009
    Now, let's imagine you are interested to search the words:
    CONVICTION
    KIDNAPPING
    ASSAULT
    FAILURE
    CRIME
    SUBPOENA
    How do you do that Adobe folks?
    Reading the 200 documents? one by one, having to drink all the ink of the documents, line by line...
    Or, could you, please, allow Adobe Acrobat 10 can handle dictionaries of words, and allow the users, to search and highlight those words in folder of PDF documents.
    Of course!!! you'll tell me, oh, you can accomplish a search of the desired words and Acrobat will search them for you in a lot of documents in a folder...
    And I'll reply you! oh! that is not enough!
    And do you know why?
    Just simpley because I need to see all the information highlighted in a context...
    If I see the word CRIME in red close to FAILURE... also in red I can see that something wrong is happening with that trial, for example...
    Do you understand now that searching single words in a lot of documents is not enough, and you have to improve URGENTLY this feature?
    Highlighting the words the users need to search in a folder of documents, allow INTENSIVE RESEARCH AND SAVE HOURS AND DAYS AND EVEN MONTHS OF HARD WORK TO THE USERS.
    In that way, you don't have to read the whole document, you just go directly to the highlighted parts.
    I've been submitting this feature since Adobe Acrobat 6!!!! No one in Adobe listened to me!!!
    I sincerely don't know why these forums are opened, must be an idea of someone from marketing, because finally, Adobe don't implement any specification from the users.
    I am absolutely sure, Adobe folks will present the 10 version as something pretty cool, with more flash support and more graphic stuff.
    But, as always, withouth helping the real people that work extensively with large amounts of documents.
    So I hope, Sirs, Madams of Adobe, please, now yes, you give support to this feature once and for all !!!
    Thanks

    Adobe does listen to users, but it listens to 10 users more frequently than 1 user. It listens to 1000 users more frequently than 10. Get everyone you know that works for lawyers to post the request in the Adobe wish form (where Adobe tabulates user requests).
    https://www.adobe.com/cfusion/mmform/index.cfm?name=wishform
    Then maybe it will appear in Acrobat 11. If it is not in Acrobat 10 by now it is probably too late in the upgrade cycle for it to get in.

  • How can I search a PDF document for annotations/comments

    I have a long 50+ page pdf document with many annotations and commnets.  How can I search this document to find each comment?

    Locating and Searching Comment annotations.
    From the open Comments List you can sort and filter comments.
    This is discussed in Acrobat X Pro's online Help:
    http://help.adobe.com/en_US/acrobat/pro/using/WS58a04a822e3e50102bd615109794195ff-7e42.w.h tml 
    A sorted / filtered list can be used to generate a PDF comment summary report.
    http://help.adobe.com/en_US/acrobat/pro/using/WS75E00763-F15A-43a1-85B7-51B920B1181A.w.htm l
    You could then use Find on the output summary report.
    Using Acrobat X Pro you can embed an index in the PDF.
    http://help.adobe.com/en_US/acrobat/pro/using/WSC28D4DBB-6A78-4027-9E04-F50FE411CFB9.w.htm l
    Then use the Search tool. With the Search pane open tick "Include Comments".
    Search returns have an icon at the left of each instance.
    This portion of Acrobat X Pro's online Help shows and describes these.
    The third up from the list's bottom is the Comments icon.
    http://help.adobe.com/en_US/acrobat/pro/using/WSC28D4DBB-6A78-4027-9E04-F50FE411CFB9.w.htm l 
    One of these or combinations may met your needs. 
    Be well...

Maybe you are looking for