Preview, OCR and pdf

Hello,
I've some trouble with pdf files.
I've some pdf files scanned in "bitmap mode".
I've used a Windows Software (PDF Xchange Viewer) to OCR the file (If you know some free software to do it on OSX, let me know). In my comprehension of the tool, it add an invisible layer of text.
After that, I've observed that spotlight can index this files and I can search word in it. So everything is ok.
But if I do some modification on this pdf with Preview (rotate a page for example), the layer is corrupted on all the file.
I can still select some text and copy it but when I paste it, I've some strange caracters (and spotlight can't index it any more)
Thanks you.
PS: Excuse my english...

I noticed in my tests that Preview is changing my font.
Before the modification : ArialMT (Embedded subset)
After the modification : font0000000016c03819 (Embedded Subset)
Why is Preview doing this ? And what is this font ?

Similar Messages

  • Preview (app) and PDFs generated by Photoshop CS3

    (Apologies if this is the wrong forum for this question; I didn't see another more appropriate category to post this)
    Anybody else here had problems with PDFs created with PS CS3 when trying to view with Preview? Any type that was not rasterized in PS disappears! PDFs show correctly when opened in Acrobat or Reader though so I know the type is there and it's readable/recognizable. Seems like a glitch in the Preview application.
    Maybe this will be corrected with Leopard??

    I made a small PS CS3 file using a couple of fonts which were Apple Casual and Stencil, used the pdf option in the print dialog box. The text was there in the doc opened in Preview and Acrobat Reader (8.1). Both of these are TT fonts. So I included one PS type Belo, and it was there also.
    What font type are you using?
    Yes, I know that to modify text in PS it has to be rasterized.
    Could be a PS doc setup problem.
    Make a new file and try putting the text in question alone on it and see if it shows up on Preview.

  • OCR and PDFs

    I have a large bundle of PDFs that have text stored as images, and it would be nice to OCR them so they are just text that I can put into OpenOffice.  I can save the images and put them through gocr the old fashioned way, but that is very long winded and time consuming.  Does anyone know of a way to just shove a PDF through gocr?

    It is one of the first things that I tried, but the word document is not pressentable
    PDF is like
    this is one line of the document in the pdf and
    here we go with the second
    DOC
    this is one line
    of the document
    in the pdf and
    here we go
    with the second
    with a LOT of space, and I cant convert again to pdf or epub without the breaklines and having half of the page in white without use.

  • Preview.app increases PDF file size after deleting pages

    Hello, I'm experiencing odd behavior with Preview.app and PDFs.
    If I open a PDF with Preview, delete a page, then save the file, the file size increases anywhere form 2x to 20x. This happens both with PDFs that only contain text and PDFs that contain text and graphics. It is very frustrating because I start with a file that is 150KB, remove some pages and end up with a 10MB monster that I can't email to people.
    Any help is appreciated. I can post a link to a test PDF for people to try to replicate with if it would be useful.

    I generated the file with pdftex. I'm guessing that there must be different ways to encode a PDF and when Preview gets something with an encoding other than that provided by PDFKit, it rewrites the file how it likes. In my case this is increasing the file sizes. I tried finding docs about PDFKit on the Apple developer site, but couldn't find any details about ways of encoding a PDF.

  • How to examine pdf for image file formats, OCR, and layers

    Hi,
    I have a question about how to find out specific features of a pdf.  I am using Adobe Acrobat 9 Pro for windows.
    With any given PDF, I am looking to find out:
    a]  The specs of any image files used  to create the pdf (i.e., if the pdf is made up of text pages with image objects on top of them, are those image objects JPGs?  TIFFS?  What resolution are they? Are they compressed?)
    b]  Does this PDF have OCR already embedded in it?
    c]  Does this PDF have multiple layers?
    with a] -- file format is perhaps the most important, resolution and compression being second most
    with c] -- is there an easy way to see the layers, visually, on the page images?
    What is a smiple way to find out this information?
    I've poked around a bit in the "Examine Document" function, and various checks in "Preflight," as well as the help manual, and have found bits and pieces that look like some of what I'm looking for, but nothing simple or conclusive yet.
    Any help or advice would be wonderful!  I just want a simple way to be able to see what my PDF is made out of, in terms of image files, OCR, and layers.
    Thanks,
    Andrew

    Preflight is inded the best way to do this operation. You may have to read a bit to figure out what you are looking at, but that is the right route to take.

  • QuickLook / Preview and PDFs with hidden layers

    Hidden layers in PDF files show up in preview, quicklook and in the generated thumbnails. Is there any known fix for this to get the content to render with only the visible layers?

    It would help to look at a sample. Can you post one?
    Also, check the "Show Large Images" Page Display preference.

  • Attaching Word and PDF files in Mail without preview in the middle of a message

    I am using Mail but often attaching Word docs and PDFs. Word attachments appear as small icons, but PDFs as preview and usually in the middle of the message. Can I attach PDFs as small icons and not in the middle  of a message? Having just moved to Mac from PC I am missing the tidyness of Outlook, but if you can help me I'll persist.

    I found this solution in a previous thread:
    1. Open Terminal (Applications > Utilities > Terminal)
    2. copy and paste this line after the $:
    defaults write com.apple.mail DisableInlineAttachmentViewing -bool yes
    3. hit return
    4. you may need to restart Mail for the change to take effect, it worked for me without restarting Mail.
    5. to change it back you would do
    defaults write com.apple.mail DisableInlineAttachmentViewing -bool false

  • Scanned and OCR'd PDF--OCR content is not indexed

    I am setting up a new SharePoint 2013 install, and have put a handful files in a doc library to test search. The content has been indexed, and I can find the content inside many files and file types without issue--including "native" PDF files.
    However, it doesn't seem to index the content of a scanned and OCR'd (text with image overlay) PDF. I have verified that the text is indeed in the OCR text by copying and pasting phrases, and I also confirmed that the crawl log shows the file as successfully
    crawled. The filename is also indexed.
    So... it would seem that the SharePoint 2013 indexer does not index the text in scanned and OCR'd PDF files. Am I missing something? Can anyone else confirm this behavior?
    Thanks!
    Ryan

    To clarify:
    - From what I've read, iFilters can still be installed, but as Mikael said, they can't override the built-in file format handlers in 2013. 2013 has a built-in handler for PDFs, whereas previous versions required a PDF iFilter for indexing PDFs that have
    text content. If one could install the Adobe PDF iFilter in 2013 successfully, it would resolve the issue in this thread, but PDF iFilters don't work in 2013.
    - Aquaforest makes a product that OCRs PDF files. That takes an image-only PDF and makes the
    file searchable, but it is not an indexer. Rather, it enables an index engine to make a big
    collection of OCR'd PDF files searchable via a search engine.
    - The built-in PDF handler in 2013 does index native PDFs. It does
    not index OCR'd PDF files.
    So, that's the issue for which I submitted the ticket to Microsoft. In our case, we don't need to OCR our PDF files--they are already OCR'd. But they don't show up in searches.
    (Regarding Aquaforest... I've talked with someone there previously--for a non-SharePoint DMS--and they seem to make a cool product, but I don't have any personal experience using it.)

  • Since I have connected my iPad with the MacBook through ICloud I have problems in Preview of opening PDF- and JPG-files. Does anyone have a solution for this?

    Since I have connected my iPad with the MacBook through ICloud I have problems in Preview of opening PDF- and JPG-files. Does anyone have a solution for this?

    I'm not sure I understand the connection between iCloud and this problem.  Is it simply a problem of coincidental timing (ie, the problem happened after you set up iCloud)?  If so, there's almost certainly no connection between these two events.
    Where are you getting the files you're trying to open, and what specifically happens when you try to open them?

  • About mismatch in print layout in direct print and  print preview and pdf!

    Hi ,
    I have created one smartforms,when I run the standard trasaction VF03 and select any output type  and then when I click on print preview button,and then when  I put  PDF!  in address bar  ,the print preview will convert in  PDF format,it will show output without bold   and comes only in normal fonts.
    Also, when I run transaction VF03 and direct click Print button, then  it shows print as per requirement , but it overwrites some part on the end of page where company address is priprinted on stationary,so how can i show print format same in both condition without overwriting?
    Thanks.

    HI friend,
    The one which is coming in the print (hard copy) is the correct one.
    Print preview may be different.
    So if the print is coming in overlapping format means check whether the windows are aligned properly. Because if you have given any data in main window and when it grows it over laps on another window.
    If that's is correct check the printer settings for it. Only these would cause the print to be distorted.
    Check this and revert back if you have any issues we will help you.
    Thanks,
    Sri Hari

  • How to set preview as default pdf viewer from itunes?

    I recently installed acrobat CS3 on my mac and it has taken over the duty of being the default reader anytime i open a pdf. However i must prefer Preview, just because it is a million times faster when all you want to do is view the file.
    I have managed to disable acrobat dominance over pdfs from safari but i cannot seem to find a way to set it so that my music booklets open in preview again and not acrobat.
    Please help, as i have a big presentation tomorrow about podcasting and i wanted to demonstrate the seemless nature of macs!!!!
    cheers

    See if this thread helps:
    _http://discussions.apple.com/message.jspa?messageID=6830346#6830346_
    Joe

  • Preview not opening PDF files

    Hi.  For a while now (to when I was on Lion as well as Mavericks) my Macbook Air has been unable to load PDF's in Preview.  When I right click a PDF and Open With, Preview isn't even listed.  When I go to Other, Preview is even greyed out in the Recommended Applications list.  I can select All, and then select Preview, and Preview loads its icon on my dock, but nothing gets displayed.  If I load Preview and try to load a PDF from within it, the PDF file is greyed out.  I was told a while ago to install Adobe Reader to at least be able to read PDF's using another app until this is fixed.  I was hoping Mavericks install was going to fix this, but it has made no difference.  I have also tried repairing disk permissions to no avail.  My friend has also tried putting their Preview app on a USB stick and for me to try their app, but it does exactly the same thing on my Macbook, yet it works perfectly on theirs.
    Is there anything I can do to fix this?

    Never mind.  I uninstalled Adobe Reader, then found another app I'd downloaded a while ago was set as the default reader, which basically just scanned the file.  Uninstalling that resolved the issue.  Preview was then set as default and PDF's were then viewable in Quick Look and Preview.

  • Adobe OCR and Field Output

    Hello,
    At work we need to scan hundreds of old forms and store the data in the fields onto new forms. Doing this by hand would be very time consuming.
    How could I go about using Acrobat's OCR to pull the data from the respective fields, and dump it into a new document with similiar field names?
    Thanks,

    "Old forms" typically means the quality of the source paper is less than optimal. The scanned image will reflect this.
    With that said, it will be well worth the time pulling a useful/representative sample from the OCR's PDF files and exporting the OCR to a plain text file.
    Character is the operative word in optical character recognition. Characters include that which is other than language characters (the a - z stuff).
    An image of only textual content will, typically, yield pretty good results. However, older paper, or text with other shapes (lines, curves, what not) adversely affects the signal to noise ratio.
    An example:
    Don't assume OCR is giving a 1:1 congruence to the image of each language character or, for that matter, a useful image.
    Density of each character is a variable (OCR might "see" a W as V~).
    Contrast between character and background is a variable.
    Scan resolution is a variable.
    What George describes is not, in all cases, an inherently "simple" piece of work when the focus is on the OCR output.
    Even if done, you may find you have gigo in full bloom. So, pull a good sample, export to plain text and evaluate.
    You may find that defaulting to a group with nimble fingers on the keyboard is more timely and less expensive.
    Be well...

  • OCR and oleautomation

    Folks, I've been unable to find a clear answer to a problem I'm having. I've been tasked with providing a conversion of our current application (a win32 executable) using an Acrobat 6 oleautomation object to one using an Acrobat 8 version. This conversion must also support OCR of existing .PDF files.
    I've downloaded a trial version of Adobe 8 and have been able to convert it to an OCR'd version. I've also downloaded the SDK along with documentation. Can anyone confirm for me that the Acrobat 8 SDK does not provide OCR support through ole automation? Can anyone confirm for me that, if I were to stick with Adobe, I would need to also use a separate package (Acrobat Capture v3.0) to arrive at this functionality?
    Frankly, it looks as if I will need to use some other product to convert our .PDF files into OCR'd versions (and just stick with Adobe as the reader/raw writer). Thanks much for any light you can shed on this issue.
    T. Smith
    05/05/2008

    I apologize for not being clear in my original posting. The existing application has no support for OCR generation. It does allow for reading/writing/viewing of a .PDF file. The requested change is for the application to allow the user to OCR the .PDF.
    So, it sounds like there is definitely no OLE support from Adobe/Acrobat to OCR a .PDF (I would need to switch architecture to plugins somehow). Appreciate the info. Thanks much.

  • OCR and Extract

    Hi,
    I have two questions:
    1) When I try to "Recognize Text Using OCR" a document (which is a collection of scanned images of invoices), the result is extremely poor, with most of characters not being recognized at all. The image quality is not great, but I have been able to obtain better results by a) printing an invoice and scanning it with OCR; and b) using Able2Extract from www.ocr.com. However, a) would take too long and b) doesn't seem to have the option of saving OCRed pages as PDF.
    My question is: is it possible to OCR a lower quality PDF more reliably in Acrobat. If not, is there another software that can do this?
    2) Is it possible to extract pages that satisfy certain conditions? For example, can you extract all pages that contain any kind of notes or markings (e.g. Acrobat highlights) or pages that contain certain words?
    Thanks in advance your assistance.

    An answer to question 2 is a conditional yes. The condition is that you have a version which is not Reader (which I assume you do), and also you will need a custom-made script that will determine which pages to extract. Contact me by email if you're interested in such a tool.

Maybe you are looking for

  • IMac early 2008 getting slow. EtreCheck summary in french

    Hello all, I have tgis problem with my iMac for a while (I guess since I switched to OS X 10.9). Some applications are getting slow opening and closing (Mail, Calendar, AddressBook, iPhoto, ...) For example, active windows of Mail won't close immedia

  • Really unhappy with Bold issues with freezing and Verizon not backing products

    I started a discussion on the issues I'm having a few weeks ago, and things are getting worse.  I bought the Bold online just over 2 months ago, and really liked it at first.  I started noticing that it would freeze up and the only button that would

  • Urgent!!! : To delete table entries in COKA,COSP,COSB  from FY2010

    Problem: <b>Business needs to limit the validity of certain cost elements to 31st March 2007. But not able to do the same due to updation of Tables COKA,COSP and their dependent Tables.</b> Reason: The business is currently in fiscal year 2007.Some t

  • A simple Report Server question!!

    Hi there, I have recently installed 9iDS; as to start publishing my reports to the web, none of the following works: http://myhost:port/reports/ http://myhost:port/reports/rwservlet/showenv http://myhost:port/reports/rwservlet/showjobs they all give

  • I completed an album but it only said that i had purchased one of the songs

    Hello, I had used an iTunes gift card to purchase credit and when I went to complete an album (As the World Bleeds-Theocracy) it did not recognize that I had already about five songs, instead only saying i had purchased one and charged me accordingly