Proofread and correct OCR'd text in Acrobat 10 Pro

How do you proofread and correct text produced by OCR from a scanned document, in Acrobat 10 Pro?
I scan (many, large) paper documents, then use Recognise Text. After the OCR phase, if I save PDFs as text, I can see many scan errors.
I would like to be able to correct those errors in the scanned text, so that names etc can be successfully searched. However I cannot find any way to view and correct the scanned text.
I experimented with Tools / Content / Edit Document Text, but I cannot see how to display the scanned text to allow correction. It appears to operate on the PDF image. But if I try to change the document image to correct known errors (e.g. in spacing), and then save the PDF as text again, the string where I changed the image becomes gibberish.
How is Edit Document Text supposed to work? Is there any way to achieve what I am looking for (fixing many errors in large OCR'd documents)?
Regards,
Sue.

"This is a 76-page document, and the users will expect to see the image looking like the scanned original."
That locks it down. The only way to satisfy this is Searchable Image (Exact).
The scanned image serves as the an objective replacement for the source hardcopy.
The OCR output exists to facilitate search/find.
At the end of the day there is no practical means of editing OCR's Hidden Text layer with it in the PDF.
That's not to say you cannot work at it and get results. But, the operative word is practical.
In that context you may want to look over a reply I made here:
http://forums.adobe.com/thread/950209?tstart=0  
To increase accuracy of OCR recognition:
Yes, there are dedicated OCR applications (desktop or server). Having used several of each as well as Acrobat's OCR I've learned that also significant is the scanner and the quality of the hardcopy source.
Regarding the remainder of your post above.
Ok, I cannot replicate what you describe with a PDF I've been using.
It is a scanned image of a single page of textual content.
After ClearScan I can export to Word (&, of course, have some cleanup required).
I can use the TouchUp Text / Edit Document Text tool to select all the PDF page's content (the ClearScan output).
Changed the font to TimesNewRoman, saved, and exported to Word.
The content in Word needed cleanup.
Next, I selected various words and typed in a replacement word.
After a Save I Exported to Word. The changed words carried through.
re: Q1 - What you describe is symptomatic of the Hidden text output of Searchable Image / Searchable Image (Exact) and not ClearScan. So, I'm perplexed.
re: Q2 - An advantage of ClearScan is being able to edit a text string to correct it. So, sure, why not correct? With that said, it can be a tedious and labor intensive activity. As well, typos are possible during correction which begs the question "Who bells the cat?"  8^) 
re: Q3 - If corrections to the ClearScan output meets your needs an export to Word may not be needed.
However, sometimes ClearScan cannot recognize the image of a character and leaves it as a bitmapped image.
So, to correct you'd have to get into a word processor.
re: Q4 - Goes back to Q3.
Here are some useful video tutorials:
http://acrobatusers.com/tutorials/clearscan-vs-imagetext-ocr 
A listing of others: http://acrobatusers.com/tutorials/filter/search&keywords=scanning%20ocr&tut_type=Video&cha nnel=tutorials/
At Adobe TV:
http://acrobatusers.com/tutorials/filter/search&keywords=scanning%20ocr&tut_type=Video&cha nnel=tutorials/
Be well...

Similar Messages

  • When I OCR two versions of the same document and then compare th documents in Acrobat Pro XI, I usually get the message that there are no changes to mark.  However, I know there a quite a few number of changes.  I raised this question more than a year ago

    When I OCR two versions of the same document and then compare the documents in Acrobat Pro XI, I usually get the message that there are no changes to mark.  However, I know there a quite a few number of changes.  I raised this question more than a year ago, and the response I received had to do with the quality of the OCR and the scans of the documents.  However, if I use Acrobat Pro XI to save the same documents in Word and then run a comparison in Word all of the changes are marked.  When a PDF is saved as a Word document in Acrobat Pro XI, is a different OCR module being used than the one used in Acrobat Pro XI for text recognition?

    OCR is only for recoginition of the image / picture of text provided by an scanner.
    Content typed into a Word file which is converted to a PDF is (in Word and in PDF) *not* an image  or picture of text - it is the digital text. So, no OCR involved.
    When the "digital" (renderable) text of a PDF's page content is exported to Word no OCR is involved.
    When a PDF's content is from the image output of a scanner and this is a picture of text then OCR comes into play.
    If this content is exported to Word before doing OCR then it is the image that is exported to the Word file.
    Once OCR is performed it is the OCR output that is exported.
    OCR output is (always will be) impacted by "the quality of the OCR and the scans of the documents". 
    Regardless "Compare" is based on a Word file output to PDF1 then edits to the Word file followed by an output to PDF2. You use Acrobat Pro to do a compare of PDF1 & PDF2.
    Paper 1 scanned to image 1 to image 1 in PDF1 that gets OCR 1 and
    Paper 2 scanned to image 2 to image 2 in PDF2 that gets OCR 2
    being processed with Acrobat Pro's Compare can certainly be done.
    But - well you've described what can be observed.
    Be well...

  • Editing Text in Acrobat Pro 9.

    I have been trying to edit text in Acrobat Pro 9 using the touchup tool. Nothing happens when I try to type text or replace existing text. I have read that the system font must match the text in the PDF in order to change the text. According to the properties the text is curier a standard Windows font. Any suggestion?

    Bernd Alheit wrote:
    >
    > Any suggestion?
    >
    >
    >
    >
    > Edit the original document and re-create the PDF document.
    The PDF file maybe password protected. Click Secure and Remove Security.

  • I lost my CD, and have never registered my adobe acrobat pro.

    Hello.
    I lost my CD, and have never registered my adobe acrobat pro. I am trying to register it via the software so that I can re-install it on my laptop and uninstall it from my desktop, but the process always fails.
    Any advise plz.

    Unfortunately, I do not. This software is installed on my desktop in my office. Now, I have been given a laptop to replace the desktop. I communicated the message to the IT department in my company and they do not have the CD or the serial number. However, they said that we can overcome this problem by the registration via the software itself. So I created the account and tried via the software, but it never succeeded. The problem now will get bigger and we will lose the software because the desktop is considered expired and it will be liquidated. At the end, I was advised to contact adobe to help me register it.

  • Changed platforms to mac and wish to use my registered acrobat pro

    Have recently changed platforms to mac and wish to use my registered acrobat pro... please help

    Do you have Acrobat 10 or an older version?  If you have a copy of Acrobat 10 then you can complete the process detailed in Order an Adobe product platform swap or language swap to obtain a Mac OS version.
    If you have an older version then you will want to purchase the upgrade to Acrobat 10.  We no longer offer older versions of Acrobat.

  • I upgraded my computer and need a link to download Acrobat Pro 9 MAC. I have my serial number but do not have my disc. Please help and send link. Thank you.

    I upgraded my computer and need a link to download Acrobat Pro 9 MAC. I have my serial number but do not have my disc. Please help and send link. Thank you. My MacPro computer is slowly dying and I need to get going with Acrobat Pro 9.

    Unfortunately, Adobe has quit support of earlier version. They have a version of AA9 Pro for Windows posted, but not for the MAC as I understand it. You can try Adobe Acrobat X (10) Pro, Reader, and Suite Direct Download Links | ProDesignTools, but again I can only find PC versions. Someone may drop by with a link or you might see if anyone locally has the CD (that you could get the install file from). You will still need to use your original S/N.

  • I am traveling with a new computer and do not have my software (Acrobat Pro) to load but hoped I could do so with my serial number

    I am traveling with a new computer and do not have my software (Acrobat Pro) to load but hoped I could do so with my serial number

    Hey,
    Do you have access to the internet?
    You might visit Adobe website and download Acrobat pro on your system.
    Then, you can use your serial number for activating the software.
    Regards,
    Anubha

  • Correcting OCR'd text misreads

    Hi everyone,
    I'm using Acrobat Pro 8.
    I've OCR'd a bunch of scanned documents and found that the OCR utility does tend to skip over some lines and other times it misreads text (and doesn't mark it as suspect).
    Is there a way to correct misread text? Is there a find/replace feature in Acrobat Pro 8?
    What I've come up with is using the text touchup tool and importing the text into word to do a global find/replace. But I'd rather do it in acrobat if it's available.
    Thanks!
    Kevin

    You can do a find, but not a find and replace. Probably just as easy as the WORD route. You are finding some of the limitations of the OCR utility. If you do a lot of OCR, you really need to look into a full-fledged OCR package, not this plugin for Acrobat.
    The quality of the original images is very important in the ability of Acrobat to do the job. Also, you need to be sure that you have proper resolution and such (or Acrobat won't even run the OCR).

  • Editing text in Acrobat Pro doc using Reader

    I've created documents in Acrobat 8 Pro for a client who needs to be able to add a few lines of text in Reader, then re-save and email the PDFs. Each time they try, though, they get the message telling them that they can't save the PDF and should print it out and fill it in.
    Is there any way round this, other than my client buying Acrobat Pro?

    ~graffiti wrote:
    PjonesCET wrote:
    I'm not talking about pages of edits. For major Updates you simply create a new one.
    I'm talking about after you read it you discover the word "read" is spelled "reed" in one instances.
    I'm not talking about "major" edits either. If you have the original document, you always want to edit it first. It's just silliness to do anything else. Editing the PDF is asking for problems in many different forms.
    That's the ideal but what if you do not have an original. Suppose the pdf was sent to you to correct.
    I've been there.  Also if a PDF is not been setup for Reader User Rights, by an individual, if you have a Pdf that has small defect, there is a tendency to to use touch up text.  I've done that myself.
    But if I find more than one or two items , I do as you. I just go back to riginal if I have it, and create a new one.
    The above was to caution that if you hvae minor defects , you've either got to create a new version or create a copy with usage rights disabled, then work on that. You methos is the best.

  • Fuzzy text in Acrobat Pro X

    Hi,
    I don't know if others noticed, but text seems to be displaying fuzzy in Acrobat Pro X, than Adobe Reader 9 or Apple Preview Mountain Lion. These screenshots should show the issue:
    http://img811.imageshack.us/img811/8615/acrobatproxvsadobereade.png
    http://img850.imageshack.us/img850/9200/acrobatproxvsappleprevi.png
    Acrobat Pro X and Adobe Reader 9 are set to render on an LCD screen. All other display settings in the preferences seems to be the same.
    Also, while colors seem similar in Acrobat Pro X and Apple Preview, they are a bit different in Adobe Reader 9. I don't know which one is the more faithful to the original, since the source is FrameMaker 9 on XP, a combination resulting in a very 'free' idea of color matching.
    My monitor is a wide gamut, recently calibrated HP LP2475w.
    The new Acrobat has an ugly Windowsy GUI, but it seems this is not the only problem. I wonder if this is a case of 'operator's error', or a bug in the Acrobat line.
    Paolo

    Hi Test,
    Keep in mind all snapshots were taken on the same system, each window side-by-side. So, it is not a matter of turning smoothing on or off, since this would affect all documents at the same time. In fact, I tried different smoothing settings (including off), and this was not a viable solution.
    I can clearly see a difference between the three examples. Text in Apple Preview is extremely clear. On Adobe Reader 9 it is quite clear, even if a bit heavier. In Acrobat Pro X it's even heavier, and very hard to read. More than text, I start to see confused ink stains (and have problems forming the sonic image of the words in my mind).
    As for color, I guess you are right in suspecting there is a different interpretation of the embedded color profile. Acrobat Pro X (and Apple Preview/ColorSync) might correctly translate it (being a prepress settings), while Reader could be confused. I don't know where to choose a color profile in Reader 9, so I cannot test it.
    Paolo

  • Why are random spaces being inserted in text on Acrobat Pro 9?

    My apologies if this has been asked & answered before. I did try and search and found some information but I still do not understand the complete picture.
    I am a relatively new user to Acrobat and used 7 prior to upgrading to Pro 9 recently.  I am using it for basic functions only at this point, combining multiple word (Microsoft Word 2003) documents from various authors.  Since upgrading to 9 I have come across a few instances where there were random spaces inserted in the PDF document. These spaces are usually between words or in the middle of words and can be just one or two spaces or 10+ spaces.  After doing some research I think I figured out that these spaces occurred after I had used the Touch Up Text tool to fix a small mistake (likely spelling).  After more research I went into the Change Conversion settings option in Word and removed all of the fonts listed in the 'Never Embed' section of the Fonts tab under advanced settings.  This seemed to solve the problem for a while but now it is occurring again.
    So, my questions are:
    1. Can someone explain to me if my research is correct and using the Touch Up Text tool is what caused the spaces in the first place?
    2. What can I do to ensure that this does not continue to happen while still being able to use the Touch Up Text tool occassionally?
    3. Is there something I am missing? (Highly likely given my lack of experience with Acrobat!)
    I truly appreciate any insight that can be offered. Unfortunately I cannot offer any examples to show you what is happening as I work in a legal environment and I have yet to be able to re-create the issue with something not sensitive.
    Kim

    There is really nothing like a space in an Acrobat pdf file. Each letter is positioned on the page according to the Acrobat language specification (similar to the postscript language). When you edit a pdf file anything can happen. I've seen changing one letter completely screw several lines of text (except for the fact that a line of text is not a pdf concept either). The best recommendation is to edit the original documents---not the pdf file. Sometimes it cannot be helped. Sometimes it works sometimes it doesn't. There are no workarounds for changes that occur when you edit the text in a pdf file.

  • Recognize / OCR Thai pdf in Acrobat Pro 10.1.9

    I have wasted an hour trying to install a language pack and change my installation but have failed.
    I am not even sure it is possible on this version of Acrobat but I want Thai as a Primary OCR Language setting in order to recognize the text in a scanned PDF.  Please help?
    Kind regards
    John

    Thanks for your response.
    So Acrobat Pro can recognize certain languages (Hebrew, Chinese etc) and these are configured during the installation process but there are no 'language packs' I can install to OCR Thai?
    Any suggestions how one might recognise the text in order for it to be copied and pasted etc?

  • OCR not working with Acrobat Pro X

    I have Acrobat Pro X ver 10.1.9 using Windows 7 Pro (64 bit) OS running in a virtual environment (Parallels ver 9.0) on a MacBook Pro. I have 16 GB of System Memory and 6 CPU's dedicated to the virtual environment. For some reason: 1. I cannot select renderable text with my cursor in a pdf containing such (i just see the cursor turn to a hand symbol as if to use the cursor to move the image around;  and  2. I can run Acrobat's OCR on a recently scanned pdf but once again I cannot select the text (same hand symbol appears). I went to Control Panel -  Programs and selected "Repair" the Acrobat program but this was no help.  What I don't understand is prior to installing my Acrobat Pro X program on this computer I had it installed on a PC running Windows 7 Pro (32 bit) and everything worked fine. Any ideas?  WR
    02/10/14 Update: Spoke with Adobe Tech Support and they informed me the problem I am having with the windows version of Acrobat Pro X above is caused by running the program in the Parallels virtual environment on a Mac. This is odd as the purpose of running Parallels on the Mac is so that one can operate the Windows OS and programs on a Mac. They suggested I purchase the Pro XI version for Mac.

    "Copy C:\ProgramFiles (x86)\adobe\acrobat 9.0\acrobat\plug-ins\PaperCapture\* to the parent directory C:\ProgramFiles (x86)\adobe\acrobat 9.0\acrobat\plug-ins
    For Acrobat X, the path would be acrobat 10.0."
    IT WORKS!!! Hey thanks dodland, now all I have to do is get my CS6 MC installer to take less than X hours to install just a version of Acrobat X that will recognize my MC serial number! I installed a version of Acrobat 10 from the MC extracted files and ran it as a trial, no go on the OCR. Then I did your copy suggestion and all is well. I've copied this fix as a text file to live with my MC install files. Now I just have to get a activated version running.
    Thanks again, as I was considering wiping the entire drive just to fix this single issue and that would not be fun...
    TLL

  • Shortcut key for deleting text in Acrobat Pro

    In previous versions of Acrobat Pro, I could highlight text and the delete key would strikethrough. Now in XI that doesn't seem to work.
    Is there a way to program that -- or any other shortcut key?

    Ha - found it! It's the little 'T' with a star at the end of the Annotations - once you select that & tick all the boxes, you can use delete to strikethrough text.

  • Italicizing text in Acrobat Pro

    Hi,
    I have Acrobat Pro 7.0. Is there a way to italicize text in a PDF within Acrobat?
    Robin

    If you insist on editing in Acrobat, then as I said you would select the text touchup tool. Then select the text and right click. In the font, select a bold version of the font. These steps are the process to use and it can be done, but it is not really what you would want to do on a regular basis. There may be some shift in the text that is not desired and that may require doing some size adjustment of the text, not a simple step. In most cases it is easiest to go back to the original document, do the bold there, then create a new PDF as already mentioned.

Maybe you are looking for

  • Error while deploying the Bussiness package in CE 7.1 SP5

    Hi All, I am facing the problem while deploying the Bussiness package. The same bussiness package has been deployed on EP 7.0 using SDM. In CE 7.1 while deploying the package we are getting error. Please check the log. -- Deployment Items -      1. C

  • WLS 8.1, SP1 - Weblogic does not detect my servlet class has changed

              I've come across this problem in weblogic 7.0 sp2 and 8.1 sp1 and never found the           solution. I've asked for solution in BEA's newsgroups and official support and           NEVER received any suitable answer. I've found other people

  • What is a primitive data type?

    ?

  • Fixing CAD Program Paths

    I'm often asked to edit drawings that were created in engineering CAD programs. I keep running into the same problem where Illustrator can't fill a closed path completely, and I can't find a workaround yet for this. Let's say I have a closed path of

  • Help on HR ABAP

    Hi , Can anyone Please provide me good Questionaries about HR ABAP. Thanks in Advance, PAVAN.