Search engines, .pdf documents

Can (or do) search engines search .pdf documents that are internally linked to and part of a specific website?

Hi
While most leading search engines can now read and index the content of a PDF, they have certain restrictions and may only index the first N hundred or thousand characters. Further, the file size of a PDF document frequently exceeds 100K and may take a long time to download.
So if you use pdf's keep the documents short.
PZ

Similar Messages

  • How can I allow visitors to my website to search within pdf documents created with Acrobat?

    I honestly had no idea where to place this question. If I should move it to another forum area please just let me know. I have about 200-300 pdf documents on my website and people are able to search by the title of the file but not for items in the body of the document. Is there a way that is possible?

    Are the documents encrypted? If they are not, the common search engines (Google, etc) will normally index all the text in a PDF, the same as if it was a regular Web page. They don't index non-text media within a PDF (images, attachments, etc.) nor will they handle PDF Portfolios.

  • Searching a pdf document in order to find text parts formatted with a specific font

    I have a document that I output as a pdf.
    Between the first and last version of this document, I've made some changes, one of those being replacing a font by another one.
    Alas, in the final pdf document, under "document properties", "fonts tab", I can still see the first draft font (when it should not be present anymore in my document). It may mean that there's some bit of text that I have not correctly updated.
    So I want to identify those chunks of text that are still formatted using the first draft font. I know that it should be done in my main app but, for some (good) reason, I prefer to do this search in the pdf output itself.
    So, how could I do this search within Acrobat (I'm using Acrobat XI Pro) ?
    Thanks

    Possible yes.. a bit klunky.. also yes.
    Open the document properties > fonts dialog and write down the PostScript name of the font you're looking for
    Open Prefight from the Print Production panel
    Select Single Checks (eyeglass symbol)
    Select Options > Create New Preflight Check
    Call it 'find my font', and in the 'find' box at the top right type "base font". The task will appear in the window below, click Add.
    Under 'begins with' enter your PostScript name, then save the Check
    Back on the Preflight dialog select the Check and press Analyze.
    Double-click the line starting with a red X and in the document itself you will see each instance of that font indicated by red crop marks.

  • How can I search a PDF document for annotations/comments

    I have a long 50+ page pdf document with many annotations and commnets.  How can I search this document to find each comment?

    Locating and Searching Comment annotations.
    From the open Comments List you can sort and filter comments.
    This is discussed in Acrobat X Pro's online Help:
    http://help.adobe.com/en_US/acrobat/pro/using/WS58a04a822e3e50102bd615109794195ff-7e42.w.h tml 
    A sorted / filtered list can be used to generate a PDF comment summary report.
    http://help.adobe.com/en_US/acrobat/pro/using/WS75E00763-F15A-43a1-85B7-51B920B1181A.w.htm l
    You could then use Find on the output summary report.
    Using Acrobat X Pro you can embed an index in the PDF.
    http://help.adobe.com/en_US/acrobat/pro/using/WSC28D4DBB-6A78-4027-9E04-F50FE411CFB9.w.htm l
    Then use the Search tool. With the Search pane open tick "Include Comments".
    Search returns have an icon at the left of each instance.
    This portion of Acrobat X Pro's online Help shows and describes these.
    The third up from the list's bottom is the Comments icon.
    http://help.adobe.com/en_US/acrobat/pro/using/WSC28D4DBB-6A78-4027-9E04-F50FE411CFB9.w.htm l 
    One of these or combinations may met your needs. 
    Be well...

  • Searching single .pdf document

    I have a single .pdf document comprised of some 2500 files
    created with Acrobat Pro 8. Almost all the files contain a
    "received date" in the form of rmm/dd/yy. If I use the search
    window to search for "r04/22/02" I get an accurate list of all
    files containing a received date of April 22, '02. If I search for
    /02, I get an accurate list of all files containing /02, but not
    necessarily as a date prefixed with "r."
    What I'd like to do is a search such as "r??/??/02 to
    identify all files containing a received date in '02. But searching
    with Acrobat Pro 8 yields "no instances" even if those instances
    exist.
    After looking at this for a while, it seemed apparent
    (strange as it sounds) that Acrobat Pro 8 doesn't have advanced
    search options. Adobe Reader 8.1.3, however, seems to have such
    options (although wildcard searches are not explicitly listed).
    But if I open the .pdf document with Reader 8.1.3, the
    wildcard search shown above also yields "no instances." I'm left
    wondering if wildcard searches are even possible. Any advice or
    additional information would be appreciated.

    Hello osimp,
    Thank you for your post about searching in Acrobat 8.
    You should know that these forums are specific to the
    Acrobat.com website and its set of hosted services, and do
    not cover troubleshooting the Acrobat family of desktop products.
    Any questions related to the Acrobat family of desktop
    products would be best suited in the Acrobat Forums:
    http://www.adobeforums.com/cgi-bin/webx/.3bbeda8b/
    Have a great day!
    Pete

  • Can I search in pdf document on adobe reader under android ?

    The title is self-explanatory

    Hi,
    I couldn't find the device you mentioned among my test devices but here's the steps I followed on another device:
    1. Downloaded the file from the link you mentioned to my PC. Attached it to a mail and sent it to myself.
    2. Opened the mail on my Android device.
    3. When asked to open the file in the viewer, chose 'Adobe Reader' (11.5.0.1 build 98311)
    4. Searched for the word 'the'  in the pdf(by selecting the search icon which looks like a magnifying glass in the top toolbar).
    5. I get a screen like the one attached.
    6. If i tap on a particular snippet(the small portions of the text on different pages), I am taken to that particular page.
    7. To see all the results again, I can tap on the search icon again and the same results will show up.
    Please let me know if you don't observe the same behavior.
    Thanks.

  • CF perform word search on PDF files?

    Can CF MX (6.1 or 7) perform a word search of PDF documents?
    What I would like to do, at the minimum, is have CF search
    PDF files located in a directory for a specific word, and return a
    list of files that have that word (or phrase) in them.
    am I asking too much?
    Thanks for any and all help.
    Russ

    Yes. Use the Verity search engine that comes with
    ColdFusion.

  • Searching in DB files - Any search engine available ??

    HELP:
    I have several TIFF images I need to make it available on web as
    searchable content.
    Option 1:
    I am planning of OCRing the images to text files. The file name
    will be a unique ID. A DB table will contain meta data inforamtion
    about the file. A search engine (??) can index these files. And the
    results contain summary from the OCR text. And the resulting head
    line will contain link to a JSP passing the UNIQUE ID as arguement
    and JSP will read the tiff image for display. Thus I can track who
    viewing the image and how many times.
    Now any one know if there is a free search engine that can do this ?
    I am looking into Lucene but I think it does not return summaries and
    I am not sure if I can manipulate headlines..
    Option 2: I can store the images and text in a DB (Oracle) is there
    a search eingine that can index these ocred text files in db and
    the result should be heading (from db meta data) linked to jsp with
    doc id.
    Example:
    This is test <- from meta data
    <p>This document contain result of your search... <- from ocr text

    Lucene will allow you to do what you need -- it is content-agnostic. It has a concept of Document objects, which can contain one or more Field objects. You could have several Fields, like this:
    Body -> contains the text of the file
    Title -> contains the title
    URL -> a web-friendly URL to retrieve the image?
    PreviewURL -> a web-friendly URL to retrieve a preview?
    Summary -> Short document summary
    ... and so on...
    You can structure the Document object any way you want... Fields can be marked as stored, unstored (but still indexed...), etc.
    You will need to write a couple of pieces of code to implement a lucene-based search engine:
    1) Document indexer -- this is just a class that spiders through all your documents and adds them to the index. For your case, it would loop through the database and generate an index entry for each part. This should probably be on a timer to keep the index up-to-date.
    2) Search UI -- this just presents the user with some way to search the documents.
    Anyway, the devil's in the details, and this is no small project. Hope this gives you a good start.

  • Online .pdf documents, meta tags; title, keywords, descriptions. search engines

    Hi. In DW 8.0.
    I searched the forums for this but was unable to find the answer.
    I would like to have meta tags in my .pdf files that are on the web.
    When someone does a search for one of them in Google or Yahoo!, it reads the title in a bizarre upper & lower case letter pattern under the search results page.
    "ExaMPLe SAMple tiTLE"
    Correct me if I am wrong, but Google creates source code for .pdfs so others can search for them, but I do not understand why the title looks the way it does.
    I was hoping by changing the title meta tag in the actual .pdf using Acrobat would solve this, but I am waiting to see if this will work.
    Does anyone know of a better way of changing/adding the meta tags to an online .pdf document?
    Thanks.

    PDF is a print document.
    HTML is a web document.
    AFAIK, meta tags are found only in web documents.
    Soluiton would be to convert your PDFs to HTML pages and use your own meta tags and titles.
    Nancy O.
    Alt-Web Design & Publishing
    Web | Graphics |  Print | Media Specialists
    www.alt-web.com/
    www.twitter.com/altweb

  • Search pdf document

    Hi!
    i'm new to this, and wanted to ask:
    is there a way to use VBA or JS to search for text in a PDF?
    the preferable way would be to get a result without opening the document if the reader unless there is a valid result.
    thanks!

    What you are kind of looking for, is an indexer for a search engine. You might look around for small server search engines, and configure them in a way that it indexes the directories you need.
    If you happen to be on OSX, you might look at the way you could control Spotlight (via AppleScript), but that would not really work from within a PDF.
    If it has to be from a PDF, you would first have to run Catalog which then creates an according index, and then you can search through it.
    If you have a specific set of search terms, you could also set the Metadata of the PDF accordingly, and then parse just the metadata, which means that you would not have to open the PDF in a PDF viewer.
    Hope this can help.
    Max Wyss.

  • Headings follow relevant chapter names? Index link to pdf search engine?

    I´m new to Pages 09 and haven´t found how to have the page heading contain only the relevant chapter name. I´ve got 8 chapters and want the relevant name at the top of each page, as one usually finds. How is it done?
    Can one automatically link the alphabetical word index to the internal pdf search engine when exported as pdf file or does one have to do this by hand?
    Thanks for any help in these matters
    Neil

    Cut your document in sections.
    Insert your chapters in different sections.
    Doing that you will be able to use the chapter name in the corresponding header.
    In page 58 of the User Guide (English version) we may read:
    *Changing Headers and Footers in a Section*
    You can change headers and footers to be unique to a section. You can also change headers and footers within a section.
    To change headers and footers:
    1 Place the insertion point in the section.
    2 Click Inspector in the toolbar, click the Layout button, and then click Section.
    3 Deselect “Use previous headers and footers.”
    4 Type the new header or footer in the header or footer area of your document.
    Yvan KOENIG (from FRANCE dimanche 8 mars 2009 18:20:09)

  • A hyperlink click to a pdf document in a website does not open a pdf document instead presents a blank page "searching for bookmarks"

    a hyperlink click to a pdf document in a website does not open a pdf document instead presents a blank page "searching for bookmarks"

    Check the settings as shown in the [[opening PDF files within Firefox]] article.

  • Unable to search for words in a pdf document

    I am using Adobe Reader Ver 10.1.1 (All updates)
    I have a pdf document in which I am unable to search for words. No words in the document are found even though they do exist. If I copy a word from within the document and paste it into the search criteria then instead of getting the word that I copied, I get substitute characters and it can in fact find these characters as the word being searched.
    The document is a catalog and can be downloaded from the following URL.
    http://www.carbatec.com.au/getcatalogue?zenid=d1cuvbat0ois0g37r0r33vnah1
    I will appreciate any help as to why I cannot search for words in the document.

    Thank you Dave Merchant and try67 for your responses. As per my previous post, I contacted the company re the catalogue and they have responded favourably. I'll include their response because it gives the reason for the search failure as document compression which you might find interesting. I'll await their new catalogue and see if they have fixed the problem.
    Company response:
    Thank you very much for your input.  And yes, you are correct, the compression we used for the current catalogue's PDF format does strip out text included in the catalogue.  We used the compression settings we did with the intention of minimising download time, however I take your point about including text for search purposes (which I also utilise when I'm scanning through PDFs).
    We will actually be posting out our new catalogue next week and we'll release the new PDF version on the website at the same time.  I've asked our graphic designer to ensure that the PDF we use for the new catalogue includes searchable text.

  • Using Oracle Text to search through WORD, EXCEL and PDF documents

    Hello again,
    What I would like to know is if I have a WORD or PDF document stored in a table. Is it possible to use Oracle Text to search through the actual WORD or PDF document?
    Thanks
    Doug

    Yes you can do context sensitive searches on both PDF and Word docs. With the PDF you need to make sure they are text and not images. Some scanners will create PDFs that are nothing more than images of document.
    Below is code sample that I made some time back to demonstrate the searching capabilities of Oracle Text. Note that the example makes use of the inso_filter that is no longer shipped with Oracle begging with Patch set 10.1.0.4. See metalink note 298017.1 for the changes. See the following link for more information on developing with Oracle Text.
    http://download-west.oracle.com/docs/cd/B14117_01/text.101/b10729/toc.htm
    begin example.
    -- The following needs to be executed
    -- as sys.
    DROP DIRECTORY docs_dir;
    CREATE OR REPLACE DIRECTORY docs_dir
    AS 'C:\sql\oracle_text\documents';
    GRANT READ ON DIRECTORY docs_dir TO text;
    -- End sys ran SQL
    DROP TABLE db_docs CASCADE CONSTRAINTS PURGE;
    CREATE TABLE db_docs (
    id NUMBER,
    format VARCHAR2(10),
    location VARCHAR2(50),
    document BLOB,
    CONSTRAINT i_db_docs_p PRIMARY KEY(id)
    -- Several notes need to be made about this anonymous block.
    -- First the 'DOCS_DIR' parameter is a directory object name.
    -- This directory object name must be in upper case.
    DECLARE
    f_lob BFILE;
    b_lob BLOB;
    document_name VARCHAR2(50);
    BEGIN
    document_name := 'externaltables.doc';
    INSERT INTO db_docs
    VALUES (1, 'binary', 'C:\sql\oracle_text\documents\externaltables.doc', empty_blob())
    RETURN document INTO b_lob;
    f_lob := BFILENAME('DOCS_DIR', document_name);
    DBMS_LOB.FILEOPEN(f_lob, DBMS_LOB.FILE_READONLY);
    DBMS_LOB.LOADFROMFILE(b_lob, f_lob, DBMS_LOB.GETLENGTH(f_lob));
    DBMS_LOB.FILECLOSE(f_lob);
    COMMIT;
    END;
    -- build the index
    -- Note that this index differs than the file system stored file
    -- in that paramter datastore is ctxsys.defautl_datastore and not
    -- ctxsys.file_datastore. FILE_DATASTORE is for documents that
    -- exist on the file system. DEFAULT_DATASTORE is for documents
    -- that are stored in the column.
    create index db_docs_ctx on db_docs(document)
    indextype is ctxsys.context
    parameters (
    'datastore ctxsys.default_datastore
    filter ctxsys.inso_filter
    format column format');
    --search for something that is known to not be in the document.
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Jenkinson', 1) > 0;
    --search for something that is known to be in the document.  
    SELECT SCORE(1), id, location
    FROM db_docs
    WHERE CONTAINS(document, 'Albright', 1) > 0;

  • How 2 search inside a PDF document using the firefox internal PDF reader

    Cannot find any way to do a text search inside a PDF document when it is displayed by the builtin PDF player in Firefox.
    If there is a way, how is it done?
    If there isn't a way, it seems an obvious enhancement.

    son of a gun. I've always searched in firefox by hitting slash and typing the search term, but that doesn't seem to work in the pdf reader. I guess, if I had searched for key bindings I coulda figured that out for myself.
    thanks!

Maybe you are looking for