Search for PDF file content

I am currently receiving hundreds of pdf attachments daily basis and am storing these pdf files in a file system. I am looking for a solution that will allow me to use full text search on these  these files. Can someone help me out.
Thanks
Sam

I am talking about server level full text search not an individual search on a file. For exmple, if you have 1000 pdf files and you want to find out what file or files contain the word "shopping". Is there a adobe plug in that I need to buy? Do I need to store these files in a database rather than in file system?

Similar Messages

  • How does full-text search for pdf files work?

    Hi there,
    Basically I can see my pdf file in the content server.. inside the pdf there's a piece of test that says: "Test's Sample" but when I do a search with that string the file gets filtered from the results.
    I think it has to do with the ' (single quote) being there because other text in the pdf works fine.. so I was wondering how does VDK store this full text? where? I'd like to see how it gets translated IF that's how it works with pdf files....
    Following advice from Re: Parse error with search query I tried doing the search by:
    Test\'s Sample
    Test`s Sample
    "Test's Sample"
    The database is db2 if that helps.. how can I fix this problem?

    Nevermind, I fixed it by changing the VDK filters (in case someone is looking for a solution too).
    Cheers,

  • Looking for a free iOS 4 app that can search through .pdf files or spreadsheets

    Looking for a free iOS 4 app that can search through .pdf files or spreadsheet    
    Thanks

    Hey there
    "pdf creator" for iPad works flawlessly for me working with pdf files
    It takes care of all my needs
    I'm not sure about sending via Wifi or Bluetooth but I send them via e- mail all the time
    Possibly it could handle your needs as well
    Just type it into the App Store search field and the first one that comes up is the one I use
    Jump on over there and read up on it before buying and see if it will help you 
    Hope this helps
    Regards

  • Always scrolling back to the search bar for pdf files in ibooks. Is there a way to fix this?

    ALWAYS scrolling back up to the search bar for pdf files in ibooks. Is there a way to fix this?

    Care to share your fix with the rest of the community in case anyone else has the same problem or since you found the solution are you off to never be heard from again?

  • How to SEARCH for specific file TYPES, e.g. PDF?

    If I know the title has in it "resume" and I know it's a PDF, how do you search for all PDF files that have the word "resume" in the title?
    AND..
    How would you search all PDF files that had the word "supervisor" inside the document?
    Thanks

    Command F should do it. Once the search window appears, you should see a drop down box headed kind and next to it one headed any. Select PDF from the any one. Click the plus button, and another field will appear and you can see again more drop down boxes. You just want to set these up so you end up with name and contains, you can then add your text to the remaining field.
    Also see [this|http://apps.tempel.org/FindAnyFile/index.html]
    Message was edited by: gumsie

  • Searching on PDF files

    Hi,
    I've got allmost every thing working now
    except that searches on PDF files ddon't
    produce the deisred results.
    The filter seems on only search the pdf file
    for infomation that one would seem in the
    document info thru the acrobat reader!!
    It doesn't seem to index the contents of the
    pdf document as it does w/ other formats like
    exel and word :(
    Do I need to do any additional setup to crete
    a more comprehendive index on these pdf files?
    cheers,
    Vijay

    Hi,
    We have working intermedia successfully after
    some fixes with tnsnames.ora and listner.ora..
    This is for your reference.
    1. You may need to change listner.ora and tnsnames.ora for creation of external procedure processes
    2. Change listner.ora to include parameter
    LD_LIBRARY_PATH
    3. Restart listner process
    Below is sample files
    Regards,
    Yogesh
    Database support
    Citibank,
    NewYork, NY 10048
    # LISTENER.ORA Configuration File:/export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product/network/admin/listener.ora
    # Generated by Oracle configuration tools.
    # Modified Yogi 05/18/00
    LISTENER =
    (DESCRIPTION_LIST =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = ertdev9-1)(PORT = 1521))
    SID_LIST_LISTENER =
    (SID_LIST =
    (SID_DESC =
    (SID_NAME = PLSExtProc)
    (ORACLE_HOME = /export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product)
    (PROGRAM = extproc)
    (envs=LD_LIBRARY_PATH=/export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product/lib:/export/opt/UNPACKAGED/oracle/8
    .1.6.0/sparc-solaris2/product/ctx/lib )
    (SID_DESC =
    (GLOBAL_DBNAME = emdev1)
    (ORACLE_HOME = /export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product)
    (SID_NAME = emdev1)
    (envs=LD_LIBRARY_PATH=/export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product/lib:/export/opt/UNPACKAGED/oracle/8
    .1.6.0/sparc-solaris2/product/ctx/lib:/export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product/ctx/bin)
    # TNSNAMES.ORA Configuration File:/export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product/network/admin/tnsnames.ora
    # Generated by Oracle configuration tools.
    # Modified Yogi 05/18/00
    EMDEV1 =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = ertnj.ssmc.com)(PORT = 1521))
    (CONNECT_DATA =
    (SERVICE_NAME = emdev1)
    EXTPROC_CONNECTION_DATA =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
    (CONNECT_DATA =
    (SID = PLSExtProc)
    (PRESENTATION = RO)
    <BLOCKQUOTE><font size="1" face="Verdana, Arial">quote:</font><HR>Originally posted by Vijay ([email protected]):
    Hi,
    I've got allmost every thing working now
    except that searches on PDF files ddon't
    produce the deisred results.
    The filter seems on only search the pdf file
    for infomation that one would seem in the
    document info thru the acrobat reader!!
    It doesn't seem to index the contents of the
    pdf document as it does w/ other formats like
    exel and word :(
    Do I need to do any additional setup to crete
    a more comprehendive index on these pdf files?
    cheers,
    Vijay<HR></BLOCKQUOTE>
    null

  • How to print PDF file content from ABAP in background?

    Hi,
    Is it possible to print PDF file content from ABAP in background?
    I have some PDF content which I need to print it, these PDF files are generated outside the SAP.
    Please have you any suggestions?
    Thank you
    Tomas

    <b><u>Solution:</u></b><br>
    <br>
    The target output device must support PDF print, this is only one limitation.<br>
    <br>
    REPORT  z_print_pdf.
    TYPE-POOLS: abap, srmgs.
    PARAMETERS: p_prnds LIKE tsp01-rqdest OBLIGATORY DEFAULT 'LOCL',
                p_fname TYPE file_table-filename OBLIGATORY LOWER CASE,
                p_ncopi TYPE rspocopies OBLIGATORY DEFAULT '1',
                p_immed AS CHECKBOX.
    AT SELECTION-SCREEN ON VALUE-REQUEST FOR p_fname.
      DATA: lv_rc     TYPE i,
            lv_filter TYPE string.
      DATA: lt_files TYPE filetable.
      FIELD-SYMBOLS: <fs_file> LIKE LINE OF lt_files.
      CONCATENATE 'PDF (*.pdf)|*.pdf|' cl_gui_frontend_services=>filetype_all INTO lv_filter.
      CALL METHOD cl_gui_frontend_services=>file_open_dialog
        EXPORTING
          file_filter             = lv_filter
        CHANGING
          file_table              = lt_files
          rc                      = lv_rc
        EXCEPTIONS
          OTHERS                  = 1.
      IF sy-subrc NE 0 AND lv_rc EQ 0.
        MESSAGE 'Error' TYPE 'E' DISPLAY LIKE 'S'.
      ENDIF.
      READ TABLE lt_files ASSIGNING <fs_file> INDEX 1.
      IF sy-subrc EQ 0.
        p_fname = <fs_file>-filename.
      ENDIF.
    AT SELECTION-SCREEN.
      DATA: lv_name   TYPE string,
            lv_result TYPE boolean.
      lv_name = p_fname.
      CALL METHOD cl_gui_frontend_services=>file_exist
        EXPORTING
          file                 = lv_name
        RECEIVING
          result               = lv_result
        EXCEPTIONS
          OTHERS               = 1.
      IF sy-subrc NE 0.
        MESSAGE 'Bad file!' TYPE 'E' DISPLAY LIKE 'S'.
      ENDIF.
      IF lv_result NE abap_true.
        MESSAGE 'Bad file!' TYPE 'E' DISPLAY LIKE 'S'.
      ENDIF.
    START-OF-SELECTION.
    END-OF-SELECTION.
      PERFORM process.
    FORM process.
      DATA: lv_name     TYPE string,
            lv_size     TYPE i,
            lv_data     TYPE xstring,
            lv_retcode  TYPE i.
      DATA: lt_file TYPE srmgs_bin_content.
      lv_name = p_fname.
      CALL METHOD cl_gui_frontend_services=>gui_upload
        EXPORTING
          filename                = lv_name
          filetype                = 'BIN'
        IMPORTING
          filelength              = lv_size
        CHANGING
          data_tab                = lt_file
        EXCEPTIONS
          OTHERS                  = 1.
      IF sy-subrc NE 0.
        MESSAGE 'Read file error!' TYPE 'E' DISPLAY LIKE 'S'.
      ENDIF.
      CALL FUNCTION 'SCMS_BINARY_TO_XSTRING'
        EXPORTING
          input_length = lv_size
        IMPORTING
          buffer       = lv_data
        TABLES
          binary_tab   = lt_file
        EXCEPTIONS
          failed       = 1
          OTHERS       = 2.
      IF sy-subrc NE 0.
        MESSAGE 'Binary conversion error!' TYPE 'E' DISPLAY LIKE 'S'.
      ENDIF.
      PERFORM print USING p_prnds lv_data CHANGING lv_retcode.
      IF lv_retcode EQ 0.
        WRITE: / 'Print OK' COLOR COL_POSITIVE.
      ELSE.
        WRITE: / 'Print ERROR' COLOR COL_NEGATIVE.
      ENDIF.
    ENDFORM.                    " PROCESS
    FORM print USING    iv_prndst  TYPE rspopname
                        iv_content TYPE xstring
               CHANGING ev_retcode TYPE i.
      DATA: lv_handle    TYPE sy-tabix,
            lv_spoolid   TYPE rspoid,
            lv_partname  TYPE adspart,
            lv_globaldir TYPE text1024,
            lv_dstfile   TYPE text1024,
            lv_filesize  TYPE i,
            lv_pages     TYPE i.
      CLEAR: ev_retcode.
      CALL FUNCTION 'ADS_SR_OPEN'
        EXPORTING
          dest            = iv_prndst
          doctype         = 'ADSP'
          copies          = p_ncopi
          immediate_print = p_immed
          auto_delete     = 'X'
        IMPORTING
          handle          = lv_handle
          spoolid         = lv_spoolid
          partname        = lv_partname
        EXCEPTIONS
          OTHERS          = 1.
      IF sy-subrc NE 0.
        ev_retcode = 4.
        RETURN.
      ENDIF.
      CALL FUNCTION 'ADS_GET_PATH'
        IMPORTING
          ads_path = lv_globaldir.
      CONCATENATE lv_globaldir '/' lv_partname '.pdf' INTO lv_dstfile.
      OPEN DATASET lv_dstfile FOR OUTPUT IN BINARY MODE.
      IF sy-subrc NE 0.
        ev_retcode = 4.
        RETURN.
      ENDIF.
      TRANSFER iv_content TO lv_dstfile.
      IF sy-subrc NE 0.
        ev_retcode = 4.
        RETURN.
      ENDIF.
      CLOSE DATASET lv_dstfile.
      IF sy-subrc NE 0.
        ev_retcode = 4.
        RETURN.
      ENDIF.
      CALL FUNCTION 'ZBAP_RM_PDF_GET_PAGES'
        EXPORTING
          iv_content = iv_content
        IMPORTING
          ev_pages   = lv_pages.
      lv_filesize = XSTRLEN( iv_content ).
      CALL FUNCTION 'ADS_SR_CONFIRM'
        EXPORTING
          handle   = lv_handle
          partname = lv_partname
          size     = lv_filesize
          pages    = lv_pages
          no_pdf   = ' '
        EXCEPTIONS
          OTHERS   = 1.
      IF sy-subrc NE 0.
        ev_retcode = 4.
        RETURN.
      ENDIF.
      CALL FUNCTION 'ADS_SR_CLOSE'
        EXPORTING
          handle = lv_handle
        EXCEPTIONS
          OTHERS = 1.
      IF sy-subrc NE 0.
        ev_retcode = 4.
        RETURN.
      ENDIF.
    ENDFORM.                    " PRINT

  • PDF file contents of word document using XSLT.

    Hi Public,
    I am creating pdf file through the XML, XSL and FOP. I want PDF file contents to display external file contents such as word document.
    I know for displaying image in PDF we use <fo:external-graphic> but what tag we should to display file contents other than pdf file type.
    Thanks in advance.
    Rahul

    Your instructions for doing this are very clear. 

  • Problem searching some PDF files in Acrobat Reader – Non-ASCII characters

    Acrobat Reader cannot search some .pdf files.  I have put an example document up on Scribd here.
    Any attempt to search for any word that can be clearly seen to be in the document fails with “No matches were found.”
    This example document is NOT a scanned document – words and characters can be selected.
    A hex display tool shows that the characters in a PDF document that can be successfully searched are in the ASCII/1252 range (A=0x41, etc).
    Copying and pasting characters in the example document to a hex display tool shows that the characters in the document are not in the ASCII range.
    For example the letters A to Z in the example document are in the range ‘A’ = 0xDF (decimal 223), ‘B’ = 0xDE (decimal 222), through to ‘Z’ = 0xC6 (decimal 198).
    However, characters in these non-ASCII ranges are displayed perfectly by Acrobat Reader, as can be see if the example document is opened.
    Therefore, as Acrobat Reader knows what these characters are, it doesn’t seem unreasonable to say that it should be able to search for and find them.
    Tests were performed using Acrobat Reader X v10.1.4.
    Can anyone say what this problem is?

    Hi Pat, thanks for your reply. 
    Your reference to the title of that page being 'HARNESSES' indicates that, when you view that document in Adobe Reader, you are seeing 'HARNESSES', not
    "ØßÎÒÛÍÍÛÍ".  And that the remainder of the document is similarly being displayed in readable English language.
    Yes as you say, you can search for 'ß' and get hits on 'A' (to use that as an example) in the example document.
    But the need to form a word to be searched for into whatever code mapping this is using (for example having to enter "ØßÎÒÛÍÍ" for HARNESSES - I'm not even sure how that would be entered from a keyboard) doesn't seem to be very convenient.
    Its clear the example document is using some code mapping other than ASCII / Windows-1252 (which has 'A' as 0x41).  But it is also clear that Adobe Reader knows what that mapping is, and knows to use it, as its displaying (for example) 'A' for the code 0xDF. 
    So I guess the question is - why isn't Adobe Reader's knowledge of this mapping being extended to its search input? 

  • Searching for a file in java

    What I am trying to do is not very uncommon. I want to open a file in it's default application (or at least in a common one.)
    Mainly I am currently thinking either HTML files or PDF's.
    What I was wondering is how to search for a file and get its directory path. For example I want to search for the file "Acrobat". In windows its directory is "C:\Program Files\Adobe\Acrobat 5.0\Acrobat\Acrobat.exe"
    But I know in MAC or Unix the path could be very different. Not only that, but what if they did not use the default installation directory. I just want to find the path of a file by invoking a search. Is there currently any way to do this?
    I know I can open a JFileChooser and make the user search and find the program they wish to run it in, but that certainly is not very user friendly. I'd rather the program just searched for the proper application(s) and then called the exec() method to run the file in the program.
    Any help is appreciated.

    java.io.File provides a listRoots() method that returns all file system roots. So for Windoze you'd get 'A:/', 'B:/' 'C:/' ... etc.
    Unix you'd get '/' and so on.
    Now you can use other File methods to recursively scan the roots.
    Here's some (untried) code to get you going that searches for all java.exe found on all roots.
    import java.io.File;
    import java.util.Arrays;
    import java.io.FilenameFilter;
    import java.util.List;
    import java.io.IOException;
    public class FindIt {
      public FindIt () {
        File[] roots = File.listRoots();
        List list = new ArrayList();
        for (int i = 0; i < roots.length; i++) {
          scan(roots,list,new FilenameFilter(){
    public boolean accept(File file, String name) {
    return new File(file,name).isDirectory() ||
    name.equals("java.exe");
    public static void scan(File path,
    List list,
    FilenameFilter filter) throws IOException {
    // Get filtered files in the current path
    File[] files = path.listFiles(filter);
    // Process each filtered entry
    for (int i = 0; i < files.length; i++) {
    // recurse if the entry is a directory
    if (files[i].isDirectory()) {
    scan(files[i],list,filter);
    else {
    // add the filtered file to the list
    list.add(files[i]);
    } // for
    } // scan
    public static void main(String[] args) {
    new FindIt();
    You will run into problems with some roots as they will be removeable media (cd's diskettes) or network drives (potentially huge number of dirs to scan). So I do not recommend that you search on all roots, but a subset.
    Dave

  • How to read a PDF file content???

    Hi Experts,
    I need to read the pdf file content.
    Pdf file is in some repository
    I m unable to read pdf data with getContent() function.
    Please suggest me a way to read the pdf file
    Help will be appreciated and rewarded

    Hi Pankaj,
    Are you able to achieve the above said functionality? Even I too have the similar requirement.
    Can you pls let me know the solution or alternatives for your requirement you have followed...
    Thanks in advance.
    Nandu.

  • Bug - Safari 5.1.5 breaks the keyword search of PDF files displayed on Safari

    Bug - Safari 5.1.5 breaks the keyword search of PDF files displayed on Safari.
    After updating to Safari 5.1.5 with Adobe Acrobat Pro 10.1.3 on Mac OS X 10.6.8, it is not possible to search for keywords in PDF documents displayed on Safari.
    I understand that it is a bug. Is there any way to fix it?
    Thanks.

    Hi...
    Try deleting a plugin...
    Open the Finder. From the menu bar click Go > Go to Folder
    Typs this:    /Library/Internet Plug-Ins
    Move the Adobe PDF Browser plugin  (or PDF Browser plugin) to the Trash.
    Quit then relaunch Safari to test.
    If that doesn't help, back to the Finder menu.
    Go > Go to Folder
    Type this:  ~/Library/Caches/com.apple.Safarfi/Cache.db
    Move the Cache.db file to the Trash.
    Quit then relaunch Safari to test.

  • I backup to an external hdd with Time Machine, when it ran out of space it did not delete old backups, now my internal hdd says its full when before it had heaps of space. I have searched for extra files but cant find any. Can anyone help, please.

    I backup to an external hdd with Time Machine, when it ran out of space it did not delete old backups, now my internal hdd says its full when before it had heaps of space. I have searched for extra files but cant find any. Can anyone help, please.

    First, empty the Trash if you haven't already done so. Then reboot. That will temporarily free up some space.
    To locate large files, you can use Spotlight as described here. That method may not find large folders that contain a lot of small files.
    You can also use a tool such as OmniDiskSweeper (ODS) to explore your volume and find out what's taking up the space. You can delete files with it, but don't do that unless you're sure that you know what you're deleting and that all data is safely backed up. That means you have multiple backups, not just one.
    Proceed further only if the problem hasn't been solved.
    ODS can't see the whole filesystem when you run it just by double-clicking; it only sees files that you have permission to read. To see everything, you have to run it as root.
    Back up all data now.
    Install ODS in the Applications folder as usual.
    Triple-click the line of text below to select it, then copy the selected text to the Clipboard (command-C):sudo /Applications/OmniDiskSweeper.app/Contents/MacOS/OmniDiskSweeper
    Launch the Terminal application in any of the following ways:
    ☞ Enter the first few letters of its name into a Spotlight search. Select it in the results (it should be at the top.)
    ☞ In the Finder, select Go ▹ Utilities from the menu bar, or press the key combination shift-command-U. The application is in the folder that opens.
    ☞ Open LaunchPad. Click Utilities, then Terminal in the icon grid.
    Paste into the Terminal window (command-V). You'll be prompted for your login password, which won't be displayed when you type it. You may get a one-time warning not to screw up. If you see a message that your username "is not in the sudoers file," then you're not logged in as an administrator.
    I don't recommend that you make a habit of doing this. Don't delete anything while running ODS as root. If something needs to be deleted, make sure you know what it is and how it got there, and then delete it by other, safer, means.
    When you're done with ODS, quit it and also quit Terminal.

  • Firefox preview for pdf files doesn't show certain Cyrillic scripts.

    Since pdf preview support was added in Firefox I can't find a solution for the missing Cyrillic letters in the preview of the pdfs which the software in my company generates.
    Other pdf software/browsers with preview don't have issues with the Cyrillic script regardless of localization and OS as the fonts are embedded in the pdf.
    Can you please help with finding some documentation to use as a reference point for finding the solution for this issue?

    Hello,
    In order to change the default reader for PDF files (to not open PDF files with Firefox's internal PDF reader), follow these steps:
    #Go to ''Tools'' > ''Options'' (or ''Firefox'' > ''Options'').
    #In the Options window, select the ''Applications'' tab.
    #In the ''Search'' field, type ''PDF''. You should find ''Portable Document Format (PDF)''.
    #On the right handside you should find an ''Action'' column. Use that to select your favorite PDF reader. In order to view PDF files in Firefox, choose ''Preview in Firefox''.
    Did this fix your problems? Please report back to us!
    Thank you.

  • When i search for mp3 files spotlight shows id3 tags instead of filenames as it used to do !!! I DONT want that .. how can i disable it !!!!!???????

    when i search for mp3 files spotlight shows id3 tags instead of filenames as it used to do !!! I DONT want that .. how can i disable it !!!!!???????

    Sorry, my handful of mp3's had the file name as the id3 tag name.
    So, I am seeing the same thing as you.
    Oddly, a while back I was trying to help someone out in Mountain Lion that wanted to see the id3 tags in the Finder.
    It's all a bit silly. They should have added columns instead or changing it to tag song name.
    They are transitioning to having everything reflect the content and not the file system, so this may not be reversible.
    The best I think you can do is provide feedback: http://www.apple.com/feedback/macosx.html

Maybe you are looking for