Search in PDF

Hi all,
My Client requirement is to do a PDF search (non-english) in the Search module of his e-learning website. When i try to extract the contents of PDF for indexing, some of the characters are neglected during extraction (empty spaces in that area,when i view the indexed contents in Luke). I am getting these problem for languages like Tamil/Hindi.
The Client is very adamant that he wants  PDF search.
What is the solution for this...Please give me a ray of light or guidelines.
Thanks and Regards,
aras

http://pdfbox.org/

Similar Messages

  • Looking for a free iOS 4 app that can search through .pdf files or spreadsheets

    Looking for a free iOS 4 app that can search through .pdf files or spreadsheet    
    Thanks

    Hey there
    "pdf creator" for iPad works flawlessly for me working with pdf files
    It takes care of all my needs
    I'm not sure about sending via Wifi or Bluetooth but I send them via e- mail all the time
    Possibly it could handle your needs as well
    Just type it into the App Store search field and the first one that comes up is the one I use
    Jump on over there and read up on it before buying and see if it will help you 
    Hope this helps
    Regards

  • How do I search a pdf in Adobe touch?

    How do I search a pdf in Adobe touch?

    Bring up the toolbars by swiping from the top/bottom edge or right-clicking with a mouse.
    Tap/click the Find button in the bottom toolbar to open the find box.
    Type in your search criteria and click enter or tap the magnifying glass

  • Problem searching some PDF files in Acrobat Reader – Non-ASCII characters

    Acrobat Reader cannot search some .pdf files.  I have put an example document up on Scribd here.
    Any attempt to search for any word that can be clearly seen to be in the document fails with “No matches were found.”
    This example document is NOT a scanned document – words and characters can be selected.
    A hex display tool shows that the characters in a PDF document that can be successfully searched are in the ASCII/1252 range (A=0x41, etc).
    Copying and pasting characters in the example document to a hex display tool shows that the characters in the document are not in the ASCII range.
    For example the letters A to Z in the example document are in the range ‘A’ = 0xDF (decimal 223), ‘B’ = 0xDE (decimal 222), through to ‘Z’ = 0xC6 (decimal 198).
    However, characters in these non-ASCII ranges are displayed perfectly by Acrobat Reader, as can be see if the example document is opened.
    Therefore, as Acrobat Reader knows what these characters are, it doesn’t seem unreasonable to say that it should be able to search for and find them.
    Tests were performed using Acrobat Reader X v10.1.4.
    Can anyone say what this problem is?

    Hi Pat, thanks for your reply. 
    Your reference to the title of that page being 'HARNESSES' indicates that, when you view that document in Adobe Reader, you are seeing 'HARNESSES', not
    "ØßÎÒÛÍÍÛÍ".  And that the remainder of the document is similarly being displayed in readable English language.
    Yes as you say, you can search for 'ß' and get hits on 'A' (to use that as an example) in the example document.
    But the need to form a word to be searched for into whatever code mapping this is using (for example having to enter "ØßÎÒÛÍÍ" for HARNESSES - I'm not even sure how that would be entered from a keyboard) doesn't seem to be very convenient.
    Its clear the example document is using some code mapping other than ASCII / Windows-1252 (which has 'A' as 0x41).  But it is also clear that Adobe Reader knows what that mapping is, and knows to use it, as its displaying (for example) 'A' for the code 0xDF. 
    So I guess the question is - why isn't Adobe Reader's knowledge of this mapping being extended to its search input? 

  • Searching on PDF files

    Hi,
    I've got allmost every thing working now
    except that searches on PDF files ddon't
    produce the deisred results.
    The filter seems on only search the pdf file
    for infomation that one would seem in the
    document info thru the acrobat reader!!
    It doesn't seem to index the contents of the
    pdf document as it does w/ other formats like
    exel and word :(
    Do I need to do any additional setup to crete
    a more comprehendive index on these pdf files?
    cheers,
    Vijay

    Hi,
    We have working intermedia successfully after
    some fixes with tnsnames.ora and listner.ora..
    This is for your reference.
    1. You may need to change listner.ora and tnsnames.ora for creation of external procedure processes
    2. Change listner.ora to include parameter
    LD_LIBRARY_PATH
    3. Restart listner process
    Below is sample files
    Regards,
    Yogesh
    Database support
    Citibank,
    NewYork, NY 10048
    # LISTENER.ORA Configuration File:/export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product/network/admin/listener.ora
    # Generated by Oracle configuration tools.
    # Modified Yogi 05/18/00
    LISTENER =
    (DESCRIPTION_LIST =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = ertdev9-1)(PORT = 1521))
    SID_LIST_LISTENER =
    (SID_LIST =
    (SID_DESC =
    (SID_NAME = PLSExtProc)
    (ORACLE_HOME = /export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product)
    (PROGRAM = extproc)
    (envs=LD_LIBRARY_PATH=/export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product/lib:/export/opt/UNPACKAGED/oracle/8
    .1.6.0/sparc-solaris2/product/ctx/lib )
    (SID_DESC =
    (GLOBAL_DBNAME = emdev1)
    (ORACLE_HOME = /export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product)
    (SID_NAME = emdev1)
    (envs=LD_LIBRARY_PATH=/export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product/lib:/export/opt/UNPACKAGED/oracle/8
    .1.6.0/sparc-solaris2/product/ctx/lib:/export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product/ctx/bin)
    # TNSNAMES.ORA Configuration File:/export/opt/UNPACKAGED/oracle/8.1.6.0/sparc-solaris2/product/network/admin/tnsnames.ora
    # Generated by Oracle configuration tools.
    # Modified Yogi 05/18/00
    EMDEV1 =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = ertnj.ssmc.com)(PORT = 1521))
    (CONNECT_DATA =
    (SERVICE_NAME = emdev1)
    EXTPROC_CONNECTION_DATA =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
    (CONNECT_DATA =
    (SID = PLSExtProc)
    (PRESENTATION = RO)
    <BLOCKQUOTE><font size="1" face="Verdana, Arial">quote:</font><HR>Originally posted by Vijay ([email protected]):
    Hi,
    I've got allmost every thing working now
    except that searches on PDF files ddon't
    produce the deisred results.
    The filter seems on only search the pdf file
    for infomation that one would seem in the
    document info thru the acrobat reader!!
    It doesn't seem to index the contents of the
    pdf document as it does w/ other formats like
    exel and word :(
    Do I need to do any additional setup to crete
    a more comprehendive index on these pdf files?
    cheers,
    Vijay<HR></BLOCKQUOTE>
    null

  • CF perform word search on PDF files?

    Can CF MX (6.1 or 7) perform a word search of PDF documents?
    What I would like to do, at the minimum, is have CF search
    PDF files located in a directory for a specific word, and return a
    list of files that have that word (or phrase) in them.
    am I asking too much?
    Thanks for any and all help.
    Russ

    Yes. Use the Verity search engine that comes with
    ColdFusion.

  • Bug - Safari 5.1.5 breaks the keyword search of PDF files displayed on Safari

    Bug - Safari 5.1.5 breaks the keyword search of PDF files displayed on Safari.
    After updating to Safari 5.1.5 with Adobe Acrobat Pro 10.1.3 on Mac OS X 10.6.8, it is not possible to search for keywords in PDF documents displayed on Safari.
    I understand that it is a bug. Is there any way to fix it?
    Thanks.

    Hi...
    Try deleting a plugin...
    Open the Finder. From the menu bar click Go > Go to Folder
    Typs this:    /Library/Internet Plug-Ins
    Move the Adobe PDF Browser plugin  (or PDF Browser plugin) to the Trash.
    Quit then relaunch Safari to test.
    If that doesn't help, back to the Finder menu.
    Go > Go to Folder
    Type this:  ~/Library/Caches/com.apple.Safarfi/Cache.db
    Move the Cache.db file to the Trash.
    Quit then relaunch Safari to test.

  • Adobe Reader X and searching within PDFs in Outlook 2007

    Hello,
    Is it possible to search within pdfs attached to emails in Outlook 2007 via the search box in Outlook?  I currently have Adobe Acrobat Reader X installed and Outlook 2007.  Is there an iFilter available for this functionality within Reader X or would I need to either go back to Reader 9 or upgrade to a newer version to have this capability?
    Thanks.

    This may help a few of you here.
    I'm running IE 8 and Adobe Reader X on a WIN64Bit  system.
    Reader X wouldn't open an embedded  PDF with IE8 64BIT. AR X seems to like an older version of IE.
    Go to your START menu and you should see "Internet Explorer" and "Internet Explore (64-bit)".
    Open a window with the "Internet Explorer", not the (64-bit).
    In the IE8 window you'll see a "?" on the tool bar. Open it and click on "About Internet Explorer".
    You should NOT SEE "64-bit Edition".
    If you don't see "64-Bit Edition", go and try to view the PDF again.
    It works for me and it seems I'll have to live with it until there's a patch for 64-Bit.

  • Can I read and search a PDF file on my IPAD

    Can I read and search a PDF file on my IPAD??

    Several apps will allow you to work with PDFs. iBooks will handle them, but there have been a few issues reported about using it for that purpose. My guess is it depends on the size and complexity of the file. Adobe has a reader in teh app store (free). Goodreader is an option, and provides functionality for searches.

  • How do i search a pdf file

    how do i search a pdf file on acrobat.com?

    Currently, you cannot search the PDF content when it is placed on Cloud.
    You need to use the Adobe Reader to search the PDF.
    The Reader can be integrated with Acrobat.com and then you can open your files in Reader application and with ctrl-F you can search any word.
    You can do some more search with multiple files using Advanced Search.
    In the Reader application, choose Edit > Advanced Search.
    Link to install Reader:
    Adobe - Adobe Reader download - All versions
    Regards,
    Anoop

  • No phrase or multi-term search feature? I am using iBooks 4.1 on an iPad Air I thought it would support complex key word searching (even Adobe reader supports phrase searches in PDF files) is it really just limited to one key word per search?

    No phrase or multi-term search feature? I am using iBooks 4.1 on an iPad Air I thought an eBook reader would support complex key word searching (even Adobe reader supports phrase searches in PDF files) is iBooks really just limited to searching for one key word at a time?  Am I missing something  basic in the search interface?

    Greetings NoNameGiven,
    If I understand the problem correctly (I’m not sure I do) you would prefer ‘iii’ to be read as “eye eye eye” rather than “three”? The alt text property is the only way that I know of to make this happen. Hope this helps.
    a ‘C’ student

  • Can I search for pdf and word documents at the same time in finder?

    I often want to search for more than one file type at a time - for instance pdfs and word docs in a directory, or Jpgs, GIFs, PNGs etc.
    Can I do this in the finder in one go (so I can save it as a folder I can then select when I want to)?
    I tried typing OR between the 'tokens' it creates, but then it just searches for OR - so not as intelligent as one would think?!
    Surely there must be a way to do something as simple as this?
    regards
    Rob

    Forget the whole "tokens" business (I think that is a pretty useless "improvement" to constructing Spotlight searches). Hit command-F to bring up the search window, and set your first criteria, in the example I changed it from the default Kind to Created Date, to keep the number of results manageable. Now hold down the Option key and click on the "+" at the end of the criteria line, it will change to "..." and you get a new criteria line. From the dropdown menu choose "Any" if necessary (this will give you the Boolean OR), then enter what you want in the first sub-head. To get a second sub-head OR criteria click the "+" at the end of the Any line.
    I don't generate many MS Word docs, so I just stopped in the example above after typing Microsoft, since that brought up all the MS anything I have from this year (a couple of Power Point thingies sent to me by friends).
    Francine

  • Looking for an App that supports Text Searching in PDF (more complex finding text))

    I try to explain it clearly since my English not good
    when I using Ipad to search with some app like PDF expert. I can search words if I only type one single words or I type more than one words but these words are continuous in one sentence. Example like this
    In this case. I type "and for the", and I got the result, because these three words are contiunuous in one sentence
    But if I type words that not continuous but seperated in the file like this one. I don't get any result.
    So, I'm looking for an app that read pdf and can search multiple words at same time. Like what preview does in the mac
    See in the mac preview, we can search different words at same time and still get all the results.
    So, Is there any similar PDF app that offer same function on IPad?
    THanks so much

    Yes, PDFPen for the Mac does do multiple word searching.  I just tried it using this example:
    I searched the iPad User Guide for "switch control".  It only found those two words together.
    Next, I searched for "switch".  It found additional occurrences.
    I did find, however, that I was required to place the two words "switch control" inside double quotes to find the two words together (and only that string).
    UPDATE:  Oops, sorry, the above statemant about the double quotes being required is false.  Sorry, I used Preview by accident when I wrote that.
    Indeed, PDFPen works just as you seem to want ... searching for "switch control" (without quotes in the search string) finds only those two words together.
    UPDATE 2:  Oops ... it sounds like this is just exactly the opposite of what you want.  You are looking for something that, when searching for "switch control", finds all occurrences of "switch" and all occurrences of "control".  If that is right, PDFPen won't help you.
    Message was edited by: sberman

  • Full Text Search in PDF file Not Working in SQL Server 2012

    OS: Windows Server 2012 @ Azure
    DB: SQL Server 2012 SP 1 with Cum Update 6
    Filter: OfficeFilter installed, PDFFilter64 11 installed (actually I tried 9 too)
    I have done the following steps:-
    1. Configure SQL Server Instance to enable FILESTREAM for Transaction-SQL Access (IO Access and Allow Remote Client Access to FileStream data) and restart the instance service.
    2. Set Stream Access Level to Full Access and  
    3. Create Database with file stream folder and set the created database Properties.Options: FileStreamDirectorName = fileContainer and FileStream Non-Transaction Access = Full.
    4. Create a FileTable with file director
    5. Execute the following scripts to ensure all installed components working. PDF is listed as one of the supported filter.
    EXEC sp_fulltext_service @action='load_os_resources', @value=1;
    EXEC sp_fulltext_service 'verify_signature', 0 -- don't verify signatures
    EXEC sp_fulltext_service 'update_languages'; -- update language list
    EXEC sp_fulltext_service 'restart_all_fdhosts';
    EXEC sp_help_fulltext_system_components 'filter'
    reconfigure with override
    6. Copy a few PPTX, DOCX, PDF file into the file director.
    7. Search the data by following command. I can PPTX and DOCX files can return right result but PDF is not returned although it contains the searching contents.
    SELECT *
    FROM dbo.Course
    WHERE CONTAINS(file_stream, 'Counsellor');
    Any expert advise?
    Ant in SG

    Are you seeing any errors in the SQL Server Error Log, the Windows Application or System logs?  How about in the Full-text crawl logging?
    Troubleshooting Errors in a Full-Text Population (Crawl)
    If your server has a mix of multi-threaded iFilters and single-threaded iFilters, this can cause serious problems with building the full text index.  (How do I know this?  Well, let's just say that I have suffered as well. And I was shocked!) 
    The efficiency was greatly increased by this article: 
    Troubleshooting: Slow Full-Text Indexing Performance Due to Filtering Process
    This means changing the threading model for the multi-threaded (e.g. Microsoft Office) filters to be Apartment Threaded.  Or perhaps if you are full text indexing PDF files, abandoning the free single-threaded Adobe IFilter and purchasing the FoxIt
    (or some other) multi-threaded PDF iFilter would benefit you.
    RLF

  • Why is Preview unable to search some pdfs that Adobe Reader has no problem searching?

    In the last couple months, I've noticed that Preview is unable to search some (but not all) pdfs. These are clearly digitized journal articles, recently published. Since this didn't make sense to me, I opened the documents with Adobe Reader and found they were searchable using Adobe.
    What is going on? I'd prefer to use Preview, since I'd always preferred its search function and like its annotation tools (specifically, the box outline tool I use to identify chunks of text). Should I just give up on Preview and switch to Adobe?

    I can search that document in both Safari and Preview.
    Back up all data.
    In the Finder, select Go ▹ Go to Folder from the menu bar, or press the key combination shift-command-G. Copy the line of text below into the box that opens, and press return:
    /Library/Internet Plug-ins
    From the folder that opens, remove any items that have the letters “PDF” in the name. You may be prompted for your login password. Then quit and relaunch Safari, and test.
    If you still have the issue, repeat with this line:
    ~/Library/Internet Plug-ins
    If you don’t like the results of this procedure, restore the items from the backup you made before you started. Relaunch Safari again.

  • How do you perform partial word search using PDF Open Parameters?

    Hello,
    We are using the 'search=' open parameter in the URL string, which open a PDF and automatically searches for a word within the PDF.  It works great for whole word searches. Unfortunately, it does not work for partial word, or phrases. In other words, if I'm searching on '123456' and there is a word in the document that is '1234567', it will not find the partial word, or first 6 characters of the 7 character word. You can perform a partial or phrased search using the advance search feature of Adobe Reader.  So, currently after the PDF opens, and shows no hits on the automatic search for '123456', we are able to manually search again for a partial word search, and then see matches in the document.  Is there any way to specify to use a whole or partial word search when using the 'search=' open parameter, so that we can automatically match on partial and whole words?  Something like 'search=123456*'?

    It never worked that way. Command-F shows the page search bar.

Maybe you are looking for

  • Trash cannot delete file in use

    I occasionally have troubles deleting files from the Trash because the message says that the files are still in use, even though I haven't got any applications open. One of the main features that convinced me to upgrade to Snow Leopard was that it wo

  • 2 week old MacBook Pro not charging

    I bought a MacBook Pro only 2 weeks ago, and to my surprise, I am already having problems with it. Although the top tool bar shows the battery as charging, and it doesnt lose power while the power cord is connected, it is not charging. The battery po

  • Making an account unable to login, but available over network

    I have the same group of employee user accounts on all iMacs in my airport network, but I would like to prevent certain users from logging into certain machines. Changing the account password on the forbidden machine is not the answer, since the user

  • Updating finance document  when creating Delivery GI

    Hi Everyone When creating GI for delivery related STO  the delivery finance document is updated with the price from the material master record we need that the price will be updated from the STO instead . which  Badi or BAPI  can we use for this acti

  • Having trouble emptying trash

    I get an error message (8003) when trying to empty the contents of my trash.  Anyone else had this trouble?