Searching pdf files
It is apparent that the search engine (find) does not "find" keywords in some files. Why is this occurring?
You would have to share one of the affected PDFs with others for a good look-see to get a more meaningful answer.
Some possibilities:
Characters not mapping to unicode.
PDF is a scanned image with no OCR output.
PDF is a scanned image processed with ClearScan - some characters not recognized and left as a bitmap image.
Be well...
Similar Messages
-
Indexing and Searching PDF Files
Hi All,
I am trying to store and search PDF files in the oracle database.
I can insert and index the PDF files just fine but cannot get any result. I always get No Rows.
Here's what I am doing and the issues I am facing.
I created a Table with fields
ID (VARCHAR)
NAME (VARCHAR)
DOC (BLOB)
I inserted the PDF file in the BLOB field through a Java program and insert worked fine as I verified by retreiving the PDF and writing to file.
I created index using following SQL:
create index my_index on PDF_TABLE(PDF_FLD) indextype is ctxsys.context
parameters ('datastore ctxsys.default_datastore
filter ctxsys.inso_filter');
The index was created successfully without any problems.
I ran query as follows and got no rows although the searched text is in PDF
SELECT SCORE(1), PDF_FLD from PDF_TABLE WHERE CONTAINS (PDF_FLD, 'Table of Cotents',
1) > 0;
I tried alternate queries as well with no luck.
Any ideas ??
ThanksAfter creation the index you need execute next operations.
first, check that your index tables conatins indexed terms. Execute
select token_text from dr$YOUR_INDEX$i;
Second, you will need to check the index errors table CTX_INDEX_ERRORS. This is owned by the user CTXSYS, and most users do NOT have # SELECT privilege to it by default.
If it's OK, then check that your PDF documents is supported by INSO filter.
Citation:
"PDF - Portable Document Format
Acrobat Versions 2.1, 3.0, 4.0, and 5.0 including Japanese PDF"
(Appendix B. Supported Document Formats in Oracle Text Reference 9.2)
For Oracle 9i you could install 9.2.0.4 patchset (it included INSO FILTER 7.5)
P.S.
for the beginning, you could find answers for your question about Oracle Text here
http://otn.oracle.com/products/text
Sorry for my English.
Best regards, Victor Zogin. -
How to search pdf files in another language?
I have been trying to search a large Arabic documents but not result came up?
Yes it is happening in all pdfs files that views Arabic writing, see
attached as an example. When I search any Arabic word I don't get no result
in either basic or advance search boxes.
Also, I am using Windows7 Home Premium as operating system in my computer.
Note: Select any Arabic word from the enclosed text and search it, and see
if you can get a result. -
Cannot search PDF file contents - Windows 7 32 bit - Adobe Acrobat X
Hello,
If this is in the wrong forum please move it.
I work in an enterprise environment and our systems are having trouble searching file contents in Windows Explorer using Acrobat X and Windows 7 32 bit. The files are on a mapped network location.
After removing all adobe products from a test machine and reinstalling the Acrobat 10.0.0 software the windows explorer search function seems to work locally but once I install Acrobat 10.1.anything update, it will fail. It never worked on a networked location.
I have also tried installing Adobe Acrobat 11.0.00 after removing 10.0. Then I made sure my indexing settings were setup to index files and contents and made sure the .pdf extension was selected under file types.
I then created a mapped drive to the network location, and setup my indexing to the folder on that network drive. I was able to do this by installing this Microsoft Add-in that allows use of UNC paths in the indexing.
http://www.microsoft.com/en-us/download/confirmation.aspx?id=3383
Once I set this up, I rebuilt my index and restarted the computer. This is where it gets weird. I can now search the contents of PDF files in this indexed network location, but only by one letter. Searching "c type:pdf" will produce results, but "co type:pdf" will not. I know for sure some of the documents have the work Comment in there so this should should up.
Does anyone have experience getting this to work correctly with the latest versions of Adobe Acrobat X or XI and Windows 7 32 bit? It would be greatly appreciated.
Thank you.I will never understand why but in the end I rebuilt my 32 bit dell laptop from scratch and the pdf files can now be searched.
I cannot search them on a mapped drive as I was able to with Windows XP because now they must be indexed and windows 7 will seems not to allow a mapped location to be indexed which must be done to make the pdf files searchable so I have had to move the files to the local drive.
My Windows 7 64 bit systems can search the mapped drives just fine without needing to be indexed. Again I will never understand why this works and the 32 bit machine does not. -
Searching PDF File hindered by inserted spaces
In the pdf files i have there are numbers like "H123456789" that i would like to search for. when searching for; in adobe 8; the number H123456789 is not found. if i cut and pastethe number from the document the number that looks like this "H123456789" in the document now looks like "H 1 2 3 45 6 7 8 9" in the search field. is there a way to stop the insterted spaces or to force the search to find that number.
Sorry for unabling to reply in forum. I cannot find the button to reply.
Yes. I can select words, but the selected contents are all in strange
characters.
I also tried to convert into word file, and the resulting file was in
strange characters.
I attach a snapshot of the pdf doc property. -
Search pdf files with Windows Desktop Search
I recently installed Windows 7 and have attemped to use Windows Desktop Search to search for text contained in pdf files without success. Today I found a reference to Adobe PDF IFilter v6.0 at http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611. I have Acrobat 8.2.1 installed on my computer. Since I couldn't figure out if the iFilter had already been installed, I downloaded the ifilter60.exe file. However, when I ran this exe file I didn't see any evidence that the filter was installed. For example, the above web page mentions a ReadMe file file which is "included with the download" but I don't know where it or the plugin can be found. I'm still unable to search within pdf files. Is there anything special I need to do to install this IFilter?
PeterCan't help directly on the question. However, the generic problem in Windows is that the search will only delve into a list of MS denoted file types. I spent a year looking for a file I had misplaced and did not remember the name of. The following link resolved my issue and now I can search for anything in any file. I followed the registry edit in the response. Editing the registry can be dangerous and you should back up the registry at a minimum unless you know what you are doing. I also do not know if the fix is available in Windows 7. As I recall, I did find it in VISTA. Good luck if you use this, but also be careful. Bill
http://www.winvistatips.com/search-inside-bas-and-frm-files-t570174.html -
Searching PDF files viewed thru Safari
I know I can view PDF files in Safari on my 3GS but is there a way to search them without having to download them onto the phone?
Thanks,
ScottNo, as all you have is a viewer of PDF's not something that can interpret content, and create a searchable index.
And you cannot save to your iPhone in ay case - it cannot store files other than those attached to emails, or those in the Photo library. -
How Best to Search pdf Files Over the Web
I have a large number of pdf files. (about 73,000). These are scans of old newspapers that have been OCR'd and saved as pdf. I work in a library and need to find the best way to make these text-searchable through my library's website.
Do I need to create an index?
What I would like to do is have a search box on my web page where a user can enter a keyword, and pull up the pages that contain that word.
Any suggestions on how best to do this are greatly appreciated!!
Thanks!Google can do it for you if you use their search engine as the basis of your library search capabilities.
-
Searching PDF files in a folder
I am trying to search a folder containing several thousand pdf files using the search facility with Adobe Reader but keep getting nil results. I get the same result if I open a pdf and search within that document. Can anyone help!
Hi,
Yes the pdf's are scanned using a Canon ir1020 scanner. I cannot find text on any of the pdf's.
Cheers -
TREX does not search PDF files
Hi,
we have another problem with TREX 6.0.
Our file repository is working fine, search also works for .txt files, but doesn't work for pdf files. Out pdf files are indexed correctly, but there are no result for this kind of files if we do a search.
What can we do?
Kind regards
ThomasYour situation may already be solved. However, one thing I did not hear in the details was: 1) how many PDF's were being indexed. What was the size of the files? Did you check the TREX Monitor to ensure all the PDF's had been sent through the entire system. In the crawler monitor, did it state it found the correct number of files you believe to be in the index? By default, TREX holds documents in a que for 30 minutes between processes unless you either reset this property or flush the que.
There is a document TREXRecomenations which give some very good tips with regards to file size and other common settings. For PDF it states:
You want to index very large documents in PDF format from Adobe. These documents are not being indexed because they fail to pass the preprocessing stage.
Limitation PDF is a complicated file format to preprocess. Typically PDF files larger than 15 MB cause problems. The time taken for preprocessing and filtering rises to over an hour and the process delivers bad results. Recommendation You should avoid the indexing and processing of PDF files that are larger than 15 MB.
If you cannot find this document, let me know and I can forward it to you -
Hi,
I am using CF10 with SQL Server 2012. I am building a site for my client that includes catalogues of PDF files. What is the best way to achieve this with Cold Fusion. In the past, the link to the PDF files were bookmarked and were accessible by the users after their subscription is expired __MCE_ITEM__L
I have used Verity 7 years ago (CF MX I think) to do the search. If the user is subscribed to a certain catalogue, within the search results, a link is activated and the user is able to view the entire file. If not only a highlight will come up asking the user to subscribe.
What is the best way to achieve this:
__1. Using SQL Server blob fields? Will Verity be able to search PDFs inside BLOB fields?
__2. SQL Server Full Text Search? How will we return the results to Cold Fusion
__3. Keep the PDF files outside the database, what is the best technique to secure and search the files?
Any other suggestion is appreciated.
Thanks So MuchCan't help directly on the question. However, the generic problem in Windows is that the search will only delve into a list of MS denoted file types. I spent a year looking for a file I had misplaced and did not remember the name of. The following link resolved my issue and now I can search for anything in any file. I followed the registry edit in the response. Editing the registry can be dangerous and you should back up the registry at a minimum unless you know what you are doing. I also do not know if the fix is available in Windows 7. As I recall, I did find it in VISTA. Good luck if you use this, but also be careful. Bill
http://www.winvistatips.com/search-inside-bas-and-frm-files-t570174.html -
Windows 7 x64, can't search PDF files
I am running Windows 7 x64. I have installed Reader version 9.3.1.. Seems to work OK, but Windows search does not index PDF files (this works fine on my Vista x86 box).
I checked the Indexing options and I notice that under Advanced, File Types "pdf" it says: "Registered IFilter is not found".
How can I fix this?There may be an answer here:
http://blogs.adobe.com/acrobat/2008/12/adobe_pdf_ifilter_9_for_64bit.html -
Indexing and Searching pdf files which are used as attachment in an Announcemnet list item
Hi all,
I am using a SharePoint 2013 online environment and trying to search and find pdf files which are attached to a announcement list item. However it does not find anything when I search for the name of the pdf file or the content of the pdf file.
When I attach a word to the list item it gets indexed and it find the file.
thanks and appreciate every kind of advice.Are you able to search for pdfs in other locations? SharePoint 2013 comes with an iFilter out of the box unlike 2010 which needed configuration.
-
Search pdf files from excel or text edit list
I have a folder called "Photos"
in that folder i have many say (100+) files like
1234567_001
1234567-001.01
1234567_001.02
1234567_123456
11111111_001
11111111-001.01
11111111_001.02
11111111_123456
2222222_001
2222222-001.01
2222222_001.02
2222222_123456
6666666_001
6666666-001.01
6666666_001.02
6666666_123456
and so on
I have a list in my excel that contains selected few files (say 20) as 11111111 and 2222222 in seperate cells or in text pad
my requirement is i need to find all the files irrespective of 001 002 and copy, paste in new folder
so i will get 20 main files and other associated files like 001 002 etc
Kindly help!!! Thanks in advance!!!Is your Excel file on a location outside your machine's disk? If so, try copying to local disk.
Does the Excel file have any protection applied to it? If so, try removing the protection
If these don't work, can you post a sample file that demonstrates the problem. -
How to highlight searched words in pdf file opens from web??
Hello fellows
I need to create a web page that holds couple controls include "search" button .When client clicks on the button the programm will search pdf file, open it in client reader with highlight on searched words ..
Is that any posibility to do it using javascript on a client or c# ,vb.net on the server ??
I'm very novice in PDF developing so any ideas or conclusions would be very usefull!Thank you for answer
But maybe you have any idea how I can do it??
This is very important feature for our project.
Some additional information : our project will hosted in SharePoint 2007
When I use adobe Ifilter for moss and set in querystring #search="searchword" I get highlights, but only in a browser.
But if a reader was configured to open client application this feature wasn't work.
Any suggestion ???
Maybe you are looking for
-
Is there an app for the ipad that will allow you to access your mac.
Is there an app for the ipad that will allow you to access your mac via wifi or cell network.
-
Best Practice? Edit using Word - Acrobat X Pro to Distribute?
I create Real Estate contracts/paperwork and email them to clients. The clients then view, print, sign, and fax them back to me. I'm trying to figure out how to setup my system so that I can modify all my document packages in Word Format then distri
-
May i know how can i include a search filter box in Muse?
Hi, i'm going to set up a website which contain a lot of list and I would like to include a search filter box where user get to filter their search result in the list that I inserted. Is there any good tutorials or pre-post that i can refer? Thank yo
-
Is it possible to change LOV definition using dynamic action?
Hi, I have a multi-select LOV page item, P27_MULTI_CLASS_CODE, defined as below. On page load, this list only contains values based on what has been passed to P27_OLO_CODE or P27_OLO_CODE_SW. SELECT DISTINCT c.class_code d, c.class_code r FROM ORG
-
HashMap will grow and shrink dynamically
Hi, By default Hashmap takes 16 as an initial capacity. If it is more than 16 it will grow dynamically based on the constant factor. If it is less than 16 elements set whether it will shrink dynamically as well? Please clarify. Thanks.