ClearScan OCR in Acrobat 9 deletes portions of text

I am experimenting with book scanning using a digital camera and various software including Acrobat 9.  I discovered that in some cases, when I perform OCR with ClearScan, apparently random portions of the text in the scanned PDF image are deleted.  Sample1.jpg shows a page before ClearScan OCR, and Sample2.jpg shows it after.  As you can see some of the text has been deleted.  How could this happen?  And, how can it be prevented?

I'm having the same issue on a document I have a scan of and am trying to convert to a ClearScan pdf. I have found a workaround to the problem, which involves selecting "Optimize Scanned PDF" and using the default setting with the dial set to "high quality" before performing the OCR. This seems to stop chunks of words from dissappearing (at least I haven't found any cases), but it has the horrible consequence of dramatically increasing the file size when performing OCR with ClearScan. In a small file this may not matter much, but in my case it makes a 10MB pdf increase to 70MB.
Can anyone confirm if this also happens on Acrobat X?
From what I can tell, there are no preferences to control the OCR besides the downsampling. Specifically, I would really appreciate it if anyone knew a way to reduce the size of the fonts that are generated by the ClearScan OCR as they account for over 95% of the size of the file.

Similar Messages

  • Acrobat Pro 9 - Clearscan OCR

    Hi folks,
    A collegue has Acrobat Pro 9.3.3 (I am on Pro 8 so no issue) and when he runs the clearscan OCR engine it creates the hidden text but for some reason deletes the scanned text from the image..... certainly a clearscan!!!!! Is this a known bug with Pro 9? or is there a fix to get the text back in the image
    Look forward to some thoughts on this issue
    John

    Bill,
    I am attaching a page from the document. I have marked where the text
    has been deleted. If I examine the document... the hidden text can be
    seen. If I copy all and paste into textpad the text can also be seen!
    Lastly I did a text touch up to edit the pdf on the blank bits and when
    I changed the font to times roman the text appears?
    Any thoughts because there is hidden text... I think
    John

  • BUG in OCR in Acrobat 9.3 on MRC (Mixed Raster Content) PDFs

    I have lots of (non-searchable) PDFs, that were generated from scanned images (1 image per page @ 400 dpi) using LuraDocument from www.luratech.de.
    The images are stored internally in the PDFs as MRC (Mixed Raster Content), that means, the PDF contains a foreground and a background layer for every image/page. These layers have a low resolution, are highly compressed and merged together by Acrobat (while displaying) using a high resolution mask layer. This results in very low file sizes, about 50 kB / page.
    I'd like to make these PDFs searchable, but WITHOUT manipulating or changing the original image layers in any way. OCR software like FineReader or Omnipage seems to store images always with own algorithms, so that the image quality would suffer from the conversion and the size of the PDFs would rise significantly. Acrobat on the contrary offers to maintain the original image(layer)s by using the output style "Searchable Image (Exact)" in the OCR window. Now the problem:
    After starting OCR, Acrobat applies OCR only at the first page (with good results) and deletes (!) all content (the image layers) on all other pages. For my eyes this seems to be a bug.
    I tried a workaround: In Acrobat's Layers Panel I choose the menu option "Flatten Layers". Starting an OCR now, Acrobat does OCR on all pages of the PDF, but the OCR result is a disaster, less than 10% correct. Presumably Acrobat does not take the resulting (actually displayed) page content as input for its OCR, nor the high resolution mask layer, but instead one of the (low resolution, highly compressed) image layers described above.
    Has anyone made similar experiences with MRC-compressed PDFs, e.g. PDFs generated by other MRC-Generators like JRAPublish ? Is there any workaround or bugfix ?
    Thank You in advance !
    L.Benic

    I'm having the same issues.  Using the latest version 9.3.3.  Is this a bug? I tired calling adobe but their CR sounds like 3rd country only.  Anyone can shed a light on this issue?

  • Clearscan OCR ERROR: Paper Capture recognition service experienced and error (6)

    I get the following error when I try to run a Clearscan OCR over my PDF:
    "Acrobat could not perform recognition (OCR) because:
    Unable to process the page because the Paper Capture recognition service experienced an error. (6)"
    The problem seems like it might be resolved with a PDF made from 300dpi scans, but I do not like the quality loss so I would rather run this on the 600dpi scans. I wish I could get more information on this "error" that the Paper Capture recognition service experienced. Any ideas?

    Ok I think I might have a workaround *IF* this error is only thrown on a few offending pages.  If you have only one page, OR if it is essential that all pages then this will probably not be a workaround for you.
    *FYI* this workaround also applies to the stupid "Unknown Error" that occurs in non-Clearscanned OCR.   ("Unknown Error" -- Ha, this is the type of error handling I use when I don't care about anyone using my programs, sounds like Adobe feels the same way).
    Since Acrobat spits out such a stupid error, and since it doesn't work in any other scan mode (non-clearscan in any DPI) for these pages, and since there is no log file to look at for the 'paper capture recognition service' to even diagnose the problem, and since they seem completely ok with their absoutely horrible error handling here and since Adobe seems incapable of fixing this error for the last few versions and since they don't care enough to offer any help in their KB regarding an issue affecting hundreds (maybe a few thousand) of users here is a POSSIBLE workaround to clearscan your document if you get this error:
    (The following instructions are for Acrobat X (you might have to click 'show and hide pannels' at the top-right of the TOOLS BOX if some of these options are not available; 9 has the same bug and workaround but where you find things are different of course)
    1) Scan your pages OR save your pdf as INDIVIDUAL PAGES (there must be a way to export your pdf into individual pages, you might need to use the batch manager (tools->action wizard on Acrobat X))
    2)  Tools->Action Wizard->Save individual files as pdf: If you don't have individual pdf pages (you have jpegs or tiff) then convert them to PDF FIRST (yes that's right, despite the seemingly obvious thought that a batch Clearscan would actually *gasp* batch Clearscan your files, it won't, it will only convert them to PDFs
    3)  Tools->Recognize Text->In Multiple Files:  Now select to clearscan all your individual files
    4) You will get an error on offending pages where Clearscan would normally give you an 'error 6' BUT this method will get around the moronic error-THEN-exit design of Clearscan and allow you to actually Clearscan most of your document
    5) Create->Combine Files into Single PDF:  Once all pages have been Clearscanned you can reassemble the individual pages to the full document
    If you needed that particular page to be clearscanned, this method will not work for you.  If you just want 99% of your document to be clearscanned, then hopefully this will help.
    We have to admit that Clearscan is a one-of-a-kind technology.  The OCR scientists should be given a raise for this.  The software engineers and support staff, on the other hand, should go back to the community college they flunked out of and learn a bit more about programming.

  • Is there a way to allow the user to highlight portions of text like in acrobat?

    I am new to captivate, I was wondering if it is possible to allow the end user to select portions of text on a slide for highlighting purposes like you can do in acrobat or word?

    No, sorry.

  • How do I delete/forward portions of text messages in ios7?

    How do I delete/forward portions of text messages in ios7

    I'm guessing you mean portions of the thread? Tap on the message you want to delete or foward. Tap "More" on the pop up menu that appears. Select any messages you want to forward or delete. Tap either the trash can in the lower left or the forward arrow in the lower right.

  • I have created PDF from hardcopy by using my scanner. After I run OCR option for my PDF by using Acrobat Pro 9. But "Text-to-speech" functionality of the PDF says that an error message comes up that says the page is empty when I turns on the read out loud

    I have created PDF from hardcopy by using my scanner. After I run OCR option for my PDF by using Acrobat Pro 9. But "Text-to-speech" functionality of the PDF says that an error message comes up that says the page is empty when I turns on the read out loud option in Acrobat. Kindly help me to sortout this problems?

    So I tried generating the same PDFs on two other computers that have Acrobat 9 Pro.  Results were reproduced.  The verdict is:
    - complex PDF files (that is, containing cross-references, tables of contents, and bookmarks) generated by Acrobat 9.x Pro are roughly 2-5x larger than the identical file generated with Acrobat 8.x Pro.
    - different PDF conversion settings make a negligable difference (less than 10% rather than 70-80%).
    - using the "Reduce File Size" or "Optimize PDF" option cuts the file size roughly in half, almost always resulting in a "image downsampling mask" warning message, which requires acknowledgement (that is a problem for batch processing or automation).
    - adding an Acrobat watermark to the file cuts the file size roughly in half.
    - just using Save As to another filename has no effect on file size.
    - generating the PDF in Acrobat 9 with links but no PDF bookmarks still results in the inflated file size.
    - generating the PDF in Acrobat 9 without any links or bookmarks results in approximately the same file size as the Acrobat 8 PDF with full links and bookmarks.
    It appears that Acrobat 9's manner of adding links is what's bloating  the files, and in my case it's probably not related to images or image resolution/print quality.  It's a shame, because Acrobat 9 seems to have made some  improvements to the Review Tracker interface, and a few other bells and  whistles which I haven't really gotten around to exporing yet.  But  unless I find a way to keep my links and the PDF file sizes comparable to what I was  getting with Acrobat 8 Pro, it looks like I'm going to stay with Acrobat 8.

  • Acrobat 9.5 - use Touchup text Tool to edit PDF. Updates do not display. But, if I "save as" Word doc, the edits appear in Word. This feature worked in earlier versions of 9.x patches.

    Acrobat 9.5 - use Touchup text Tool to edit PDF. Updates do not display. But, if I "save as" Word doc, the edits appear in Word. This feature worked in earlier versions of 9.x patches.

    I have a suspicion you're working with a scanned document that has had OCR run to recognize the text. The recognized text may be stored on an invisible layer above the image of the text, and that is what you're toucing up. It's invisible, so you don't see it, but retains the changes, so exporting produces the new edits.
    When you run OCR to recognize scanned text,, try using the ClearScan option, instead of Searchable Text. See this help page on Acrobat (this is for version X, but still applies):
    Adobe Acrobat X Pro * Recognize Text - General Settings dialog box
    mh++

  • Distiller failure on a portion of text in the document

    I have a question that I'm hoping someone can assist me with, we have a Mac running Acrobat 8.1.6 (Distiller version 8.1.3) that continually fails on a portion of text in the document when creating a PDF from an EPS file. I have verified that this machine is setup identical to the other Mac that is doing the same type of tasks just watching different folders, the PDF creation is successful on the machine we'll call EPS2, yet fails on EPS1 with the below failure log.
    Can anyone point me in the correct direction as to where to look to try and rectify this issue?
    The Stack portion of the error is the text I'm talking about (* Tax credit to 1st time homebuyers who have) this text is the beginning of a sentence about a promo for 1st time homebuyers, like I stated above the PDF creation is successful on one machine and fails on the other. I am at a loss as to where to look to correct this issue, any assistance that can be provided would be greatly appreciated.
    Error log:
    Distilling: 07-29-2009 Page-01.eps
    Start Time: Wednesday, July 29, 2009 at 9:29 AM
    Source: /Volumes/Vol1/4 Shaun/EPS Files/07-29-2009 Page-01.eps
    Destination: /Volumes/Vol1/4 Shaun/EPS Files/07-29-2009 Page-01.pdf
    Adobe PDF Settings: /Users/EPS2/Library/Application Support/Adobe/Adobe PDF/Settings/folder.joboptions
    %%[ Error: ioerror; OffendingCommand: show ]%%
    Stack:
    (* Tax credit to 1st time homebuyers who have)
    %%[ Flushing: rest of job (to end-of-file) will be ignored ]%%
    %%[ Warning: PostScript error. No PDF file produced. ] %%
    Distill Time: 21 seconds (00:00:21)
    **** End of Job ****

    hockeyshaun wrote:
    Philip,
    Yes I had looked into clearing out the font caches, that also seemed to make no difference. We had one extra license sitting around for Acrobat Pro 9.0 so I installed that version and what do you know? the problem went away, must be something with the Intel's and how 8.1.3 handles the conversion. The file would fail on ALL the intel's but not the G5's, as soon as I upgraded the Intel with version 9.0 the problem vanished....odd, but understandable to a point I guess.
    THat's good to know. Now we will know if you have a problem with Distiller 8 and you using a Intel machine perhaps Distiller 8 is not completely compatible with Intel machines.

  • Anyone know if it is possible to code so that a portion of text is bolded, but not the whole line?

    Here is what I have:
    I have a drop-down list that, when a selection is made, outputs two separate lines of text to two different cells. (By the way, I'm using LiveCycle Designer ES). I want one word in the string of outputted text for one field to be bold, but the rest regular text.
    Here is a fake example to give you the idea:
    case "1": // Walls in bad repair
        row2.box2 = "Correction: Fix the walls.";
        row1.box3 = "C";
        break;
    Basically, I'd like the "Correction: " part to be bolded, but the "Fix the walls." part to be regular text.
    It won't be tragic if it can't be done, but I'm trying to have my report form look as much like the old paper version in terms of text formatting.
    Any takers?
    DE

    Hi Dave,
    I don't think you can change the style of a textfield for part of the value.If you set the textfield to "Rich Text", then at run time the user can select a portion of text and change it to bold (Control + B0. But this is probably not what you want.
    Someone may have that solution for you, in the meantime you could try the following.
    For box2 textfield, turn the caption on and left aligned. Delete the default caption in the Object/Field tab and leave it blank.
    Then your script would look something like this:
         case "1": // Walls in bad repair
         row2.box2.caption.value.#text = "Correction:";
         row2.box2.rawValue = "Fix the walls.";
         row1.box3.rawValue = "C";
         break;
    It means that you would have to size the reserve for the textfield to match the largest string e.g. "Correction:". But you can set the caption of the textfield to Bold, while maintaining the textfield value to normal. In the Font tab, select the "Edit Caption Only" option:
    Good luck,
    N.

  • I have deleted hundreds of pictures, I've deleted apps, deleted emails and texts, and I've turned off and deleted backups of apps on icloud and it is still telling me that there is not enough icloud storage available for my phone to be back

    I have deleted hundreds of pictures, I've deleted apps, deleted emails and texts, and I've turned off and deleted backups of apps on icloud and it is still telling me that there is not enough icloud storage available for my phone to be backed up. I am supposed to do this before I go to the Apple store today to get my phone fixed. Can someone please help me? I have done almost everything.

    This article explains ways to reduce your iCloud storage: Managing your iCloud storage.  In addition to what's mentioned in the article, if you have lots of photos and videos attached to your text messages, these are all included in your iCloud backup and deleting them can sometimes significantly reduce the size of your backup.
    If worse comes to worse, back up your phone to your computer rather than iCloud.  To do this, connect it to your computer, open iTunes and go to File>Devices>Back Up and to File>Devices>Transfer Purchases.  This can be used to restore your data when you get your phone back.  (In fact, it's faster because you don't have to download the data from iCloud.)

  • Delete only one text message at a time w/o deleting entire conversation?

    Does anyone know how to delete only one text message at a time without having to delete the entire conversation? The only two options I get are "clear conversation" or "cancel."
    HP   Windows XP  

    That's the same option I get too. It seems to me I recall seeing somewhere that you could delete individual messages in a thread but I don't remember where.
    It seems silly to limit deletions to only the entire conversation since with most phones you can delete individual messages.

  • How do i delete all my text messages to speed up my phone

    How do I delete all  my text messages at one time  on my s4?

    That's a great way to clear up memory and speed things up, 357tony. What message app do you use with your S4? Is it Messaging, Message+, or one downloaded from the Play Store?
    JenniferH_VZW
    Follow us on Twitter @VZWSupport
    If my response answered your question please click the �Correct Answer� button under my response. This ensures others can benefit from our conversation. Thanks in advance for your help with this!!

  • How do l delete all the text in my InDesign document in one go?

    Hi
               I managed to get the solution to my formatting problems for my book in InDesign CS6. I'm scared that with all the messing about something may have gone wrong. Is there a quick way to just delete the text - ie leave headers, footers and page numbering in place as well as the settings for automatically adding page numbers. I tried to sort this last night but the only solution I found seemed to want to take the whole page out and I don't want to go through all the setting up of headers, footers, page numbering and of course all the associated place holders. Anyone got a solution?
    John

    Hi Peter
                         I'm not sure I completely understand. The file is simply a 200 page+ Word document that was autoplaced in InDesign. You've been really helpful to me with this In Design document - thankyou. What I'm getting at is that because I increased the gutter size to comply with printing specifications after I placed the Word document for the first time - some text that was on page 100 and was the start of a chapter - and therefore had no header, now appears on page 110 in the new file which in the original placing of the text had got a header and footer on. All I want to do is delete the story text - ie the Word document that was autoplaced.
    Cheers Peter

  • Acrobat 7.1.4 - Typewriter Text Less than 100% Opacity

    Acrobat 7.1.4 - Typewriter text color suddenly changed to 52% opacity.  How do I adjust preferences on Typewrite text opacity to 100%??  I cannot find any assistance in Help or in previous forum posts - Morris

    This is normal. The correct measure of load is inspection-load. The CPUs being shown at 100% is becuase the threads are continously polling for new data packets.
    Regards,
    Sawan Gupta

Maybe you are looking for

  • Airport no longer working only on a single mbp on a single home-network

    Hello fellows Been recently foghting with this strnge,unreported issue, The macbook pro is a 13" mid 2012 series. The network router is a D-Link XXXXXXXXXXXX with DSL flat line. All the other machines in there are just working FINE (1 15"2009 mbpro,

  • HT1178 External HDD no files when attached to TIME CAPSULE

    Hi all, I have a question and I don't know if this is an issue or I am missing something, when i try to attached an external HDD with some files on, it does not show the files, but I can access the share from the shared drives on my mac normally and

  • Connect MBP to external Samsung monitor

    Hi all, I see similar q & a's but not exactly this :- I have a late '09 MPB (pre thunderbolt), I want to connect to a Samsung SyncMaster 940MW. I currently run 2 of these synmasters as a dual display for my MacPro with no problems using two DVI cable

  • Files Dissapear after Setting Inherit Permission Model

    Our workgroup was continually having read only problems working on files that we each had created locally. After reading this: When to use Inherit Permissions Generally, a share point should use the standard permissions model, since it will make new

  • How to view purchased track history?

    Hard drive went down and lost some tracks that were not backed up. In an attempt to determine what I lost, would like to see entire purchase history. When I select 18 months in history no purchases are displayed. This is not correct. Many were execut