How can I correct "hidden" text in a searchable PDF file?

This seems like a simple question. However, the answers are invariably complex, do not yield the desired result, and often answer a different question entirely. I say all that just to warn people up front that the "problem" is easier than how many people and PDF application developers, including Adobe, typically understand it while the proposed "solutions" are invariably a total...well, botch is a reasonable word if a bit understated.
Here is the actual problem:
I have "searchable" PDF files created by scanning documents and running them through an OCR process. I create "searchable" PDF files in order to archive, index, and eventually enable searching for the documents scanned. A "searchable" PDF satisfies those criteria better than any other commonly used, "portable" archive format -- though I would be happy if someone could point out an obvious alternative I may have overlooked. I do not need perfect OCR results. If I need a document to edit or perhaps feed into a spreadsheet or database, I expect to be able to reprocess the page images in a given "searchable" PDF file to OCR and convert the contents to Word, RTF, Excel, or another file format as necessary with more care for the results than for the archived document itself. Therefore, the "searchable" PDF document is the scanned page images which compose it while the OCR generated "searchable" text is secondary, but still important. Therefore, each file must contain scanned page images of sufficient detail to be efficiently converted by OCR if possible and legible enough for whoever views the images to be able to work out what an OCR process may fail to understand. Once scanned, those pages are the "document" and therefore "immutable." However, OCR is imperfect. For a searchable document archive, it does not have to be, but some errors are significant in that they may prevent the document from being found by a search. Therefore, there must be a way to view and, if necessary, edit the "hidden" text in a "searchable" PDF without altering the visual display of a document or how it is printed. No strike-throughs. No visible "corrections." None of the stuff PDF editors want to insert into a PDF file when editing it. I do not want to edit the document without exporting it to a format appropriate for an editable document. I just want adequately "correct" hidden text in a "searchable" PDF file.
I apologize for the length and redundancy in my description of the problem. However, past attempts to explain my problem and objectives as well as what I have seen in reply to similar queries across the Internet indicate that most people trying to answer this question come at it from the same point of view shared by most, if not all, PDF tool or application vendors. They seem to think that any desire to edit a PDF file is a desire to have a PDF word processor of some sort. Or, they assume that the OCR process employed may need tweaking of the means by which people apply it and then a process like "find suspects" is adequate to deal with any errors. But no, those are not what I am trying to accomplish and answers which address those topics do not answer this question.
In short, which tool or application from any vendor will reveal the "searchable" hidden text in a PDF produced by any OCR or other process and then enable corrections to the hidden text without changing any document display parameters at all? Note, hidden text typically includes bounding box information denoting the portion of the image from which the text was recognized. That information must not be lost or changed when editing the "searchable" text.
So, any tools or applications capable of doing this? If Adobe Acrobat XI Pro can (use of a trial copy demonstrated that the hidden text content can be reviewed, but editing did not work by any straight-forward means I could work out while trying out the application), fine. However, $500.00 list or even a $200.00 possible upgrade from a copy of Adobe Acrobat X Standard which came with my scanner is a lot of money for personal use when review and edit of the OCR generated hidden text in a "searchable" PDF file is the only function I require. Therefore, other suggested tools or applications which do what I need for less would be greatly appreciated.

My "claim"? Actually I've made no "claim" such as you've mentioned.
Simply stated your OP has foundational premises that presume as factual what is not.
Here, we're in Adobe's hosted user forum for Acrobat.
Any other application use is not material. 
Acrobat XI provides 3 OCR methods.
Searchable Image, Searchable Image (Exact) & ClearScan.
Only the first two provide the "hidden" text output.
(Glyphs have no stroke, no fill)
From back to the Acrobat 3 product family the design functionality of Searchable Image and Searchable Image (Exact) has been to facilitate the use of Find / Search.
The "hidden" text is can be touched up. Acrobat Pro provides the facility to view the hidden text.
So you can see what the OCR output that correlates to the bit-map images of the characters that are present.  
With Acrobat XI Pro use Tools - Protection -Remove Hidden Information
In the Remove Hidden Information pane select "Hidden text" then "Show preview".
The default for the preview is "Show Only Hidden Text".
Back in the PDF --
You'd select some of the hidden text and retype what you suspect is the correct string of characters.
Save and return to the preview of the hidden text.
If you got it right, good. Continue.
If not, darn - try again.
Plug 'n chug -- somewhere over the rainbow it'll be done eh.
Full disclosure -- this is something I've done (enquiring minds don't you know).
I've found it to be a rather Sisypean undertaking.
So, "doable" but not practicable.
This is to be expected because such touchups are not the concern / focus of the output from Searchable Image or Searchable Image (Exact) - (the names tell it all).
To have touchup "editablity" of an OCR output using Acrobat make use of ClearScan.
ClearScan replaces recognized character bit-maps with a character from an Acrobat internal font.
The character strings can be selected to change to a generic, system available font.
Something that is good to know when embarking on the "tweak the PDF" journey is that PDF (the file format / technology as defined by its ISO Standard, ISO 32000-1) does not tolerate "editing". PDF is decidely not a word processor file format and "editing" can quickly render a PDF unusable.
Minor touchups can be made and your best "tool" for this is still Acrobat Pro. (Save As often and periodically "bank" the PDF via some file rename scheme.) 
Be well...

Similar Messages

  • How can I enable Safari's ability to display pdf files?

    Hi there,
    In Safari 4.0.3 (MacBook Pro) pdf links open as a dark grey block. That is, instead of the intended text, I see just dark gray.
    Control-click allows me to save the file to the desktop in which case Preview can open them with no problem.
    How can I enable Safari's ability to display pdf files?
    Thanks for any help,
    Jacobo

    HI,
    If you have Adobe Acrobat Reader, launch Adobe Reader. From the Menu Bar, click Adobe Reader/Preferences. Select the Internet category...the top option is the one you want. Uncheck the box to make sure Adobe doesn't keep making itself the default reader.
    You might also need to open /Library/Internet Plug-Ins and delete Adobe's PDF viewer plug-in.
    Carolyn

  • How can I get a monthly subscription for combining PDF files

    How can I get a monthly subscription for combining PDF files. I have been able to subscribe in the past, but now all I see is annual subscription. I do not need an annual subscription because I do not use it often enough.

    Hi paige1186,
    The Adobe PDF Pack subscription, which allows you to combine PDF, is only available at an annual rate of $89.99. We do offer month-to-month subscriptions of Acrobat Pro and Standard, but I think you'd actually come out ahead with the annual PDF Pack subscription, if you planned on combining files more than 3-4 times throughout the year.
    Best,
    Sara

  • HT2693 how can i save my pages document as a PDF file or jpeg now?

    Hi...
    How can i save my pages document as a PDF file or Jpeg file now that the share is NA.

    In Pages, go to File > Export > Word, so the document will be saved in .doc to be able to use it in Word

  • How can I buy the software to change the pdf file to excell file en costa rica

    How can I buy the software to change the pdf file to excell file???

    Not sure what you are expecting. Scale is just an abstract concept. The overall content and appearance of the drawing doesn't change. If you place the same drawing on teh same page and just change it's supposed scale, you have changed nothing effectively . As far as AI is concerned, all that matters is the size of the content in relation to the page its on. If you want things to appear at specific sizes, you must scale them accordingly on the page and also keep in mind that AI uses inches as its base unit. so essentialyl you need to define the DPI of the drawing.
    Mylenium

  • How can I remove the blue background from my PDFs files

    How can I remove the blue background from my PDFs files

    Which tutorial?
    Answers are in your HTML and CSS code.  What is the link to your online test page?
    Nancy O.

  • How can I add an expiration date to a pdf file?

    How can I apply an expiration date to a pdf document in Acrobat Pro X? Can this be done in Acrobat XI?

    Read this:
    http://forums.adobe.com/thread/1085319?tstart=0

  • How do I add a text description to a PDF file

    I want to add some text descriptions to my PDF files that later I can use as "search criteria". Is there something I can use other than the "file name" ? I would like to use short file names and append a longer text description to the file so I can search for files using the "text description" as search criteria.

    I want to add some text descriptions to my PDF files that later I can use as "search criteria". Is there something I can use other than the "file name" ? I would like to use short file names and append a longer text description to the file so I can search for files using the "text description" as search criteria.

  • How can I format my Numbers spreadsheet into a PDF file

    How can I change my Numbers spreadsheet into  a PDF format so that people to whom I send it can open it on their PC?

    You can also "share" via iCloud.
    The recipient can view it (and edit it, unless you specify read only) using a modern browser on a Mac or PC. The recipient does not need to have an iCloud account..
    SG

  • How can I do an email merge with a PDF file?

    How do you do an email merge with a PDF file?

    Read this:
    http://forums.adobe.com/thread/1085319?tstart=0

  • How can I copy the text on a published captivate file?

    I would like my students to use the content on the module by copying the text, how can i achieve it?
    Thank you.
    Weiwei

    My understanding is text in a SWF isn't really text any more, from the standpoint of it being mouse-selectable.
    If it is very limited instances of needing to copy text, you can use a text entry box and enter the text you want them to copy as the 'default text,' but it is far from being an ideal solution.
    There is a Text Area Widget included in Cp5.5 but it seems to be broken.  When I try to type random text in it to test, the text gets garbled with caps/lowers, and some letters don't show up at all when entered.  I typed "The quick brown fox jumped over the lazy dog" and it displayed as:  The uiC bron fox umeD oVer the lazy Dog
    There is a webpage widget that can embed a PDF or html page which you can to select the text, it might work for what you need but may take more back end work:  http://captivatedev.com/2010/11/05/adobe-captivate-5-web-page-widget/
    There maybe other included or availble widgets, I am by far not an expert with them.
    hope that helps.

  • How can I replace a color in a finished PDF file?

    I'm on Mac 10.9.2 Mavericks and I want to print a PDF file made by someone else, composed of Powerpoint slides. The problem is, all pages have a standard layout where there are giant cyan lines on the bottom, without any text on them other than page numbers, like so:
    http://i.gyazo.com/d82c816638e27f82e3978fc8a3c6e67e.png
    How can I turn these cyan parts white, WITHOUT cropping?

    How can I turn these cyan parts white, WITHOUT cropping?
    Other than editing the PPT (or PPS), you can't.

  • How can I convert e-mail messages to either PDF files or Microsoft Word?

    I want to be able to convert my Thunderbird e-mails to either PDF files or Microsoft Word. It doesn't matter which. I want to be able to do this so that they can be saved on a flash drive in readable form for anyone who is interested in them.

    How about a text file?
    Right Click-Save As-Text File

  • How can i make the report previewer show a PDF file instead of HTML?

    Hi there,
    i made a report which can be called within a form by a button. This report is a PDF-report.
    I also have an option in the application to start the report from the menu. Then i get the Launch Report Form by Headstart. Here i can fill in for Desformat (in the options) PDF and Output to PREVIEW. Still it generates me a HTML report in my browser. How can i make it work so that the PREVIEWER will show me a PDF-report instead of the HTML?
    Kind regards,
    Dave

    You can not use PREVIEW as output type in webbased forms/reports. Try using as output type CACHE or FILE.

  • How can I display an animated graphic in a PDF file or slide show?

    I have a simple animated graphic comprising eight layers in a PSD file created in Photoshop. Can I create a version of this which can be included in a PDF file or a slideshow. I do not want to display it on the web. Any suggestions would be welcome.

    I'm confused now. Just how would an animated GIF be different from an animated PSD (if there IS such a thing). Sounds like your best bet would be to use the PSD to make an SWF and inport THAT to the PDF, because you can use an SWF in a PDF and it will play.

Maybe you are looking for