Editing Hidden Text with PDF

We currently are in the process of scanning our historic documents and we had them scanned with OCR and now we are using a indexer that looks at the hidden text for indexing.  We will be using this for searching for documents.  However some of these documents are older.  Which means OCR did not work and great as expected.  I want to see if there is a way to edit that hidden text from the OCR to change minor things in order to ensure the ability to find the documents.  Any hints would be greatly appricated.
Thank you,
Jeff

Jeff,
Historical documents - implies you'd not want to adversely effect the scanned image. If so, then OCR Searchable Image (Exact) (SIE) is desired.
As to editing the OCR output. Neither Searchable Image (Exact) nor Searchable Image lend themselves to this.
Yes, there are work around's; but... labor intensive and awkward from within Acrobat.
If you use SIE, consider exporting out the OCR of each PDF to a text file. Referencing the PDF or the source paper you can edit this text (migrated to a word processor perhaps). Output a second PDF. Use these for the catalog index. Link each second PDF to the first PDF.
Search gets you to PDF 2, the link gets you to the scanned PDF.
Alternatively, hold off for Acrobat X.
Today's Adobe Acrobat X: First Look eSeminar demonstrates how Acrobat X can export an OCR'd scanned image directly to Word with impressive retention of layout and format (without coming from a Tagged PDF). This process would permit getting cleaned up text back into PDF(s) to serve as the source of a Cataloged index.
The eSeminar will be presented again Thurs., Oct. 21.
See: http://acrobatusers.com/events/49361/adobe-acrobat-x-first-look
Be well...

Similar Messages

  • Editing hidden text in pdf?

    Scanning 19th-century and early 20th-century documents as TIFFs, creating PDF using the original images as pages. OCR can't recognize the text well, so the hidden text needs extensive editing. Using Adobe Acrobat 8.0 Professional on Windows 2000 Professional.
    Can see the hidden text using Examine Document; can only edit using the Text TouchUp Tool on the page, where the hidden text is not visible.
    * Is there any way to see both the hidden text and the page at the same time?
    * Is there a better way to edit the text?
    * Is there any way to import text to use in the hidden text?
    * Is there any way to apply hidden text to an image where none was created in the OCR conversion?

    The answer to most (maybe all) of your questions is probably to use 'proper' OCR software like Abbyyy Finereader or ReadIris. Trial downloads are available from their web pages. Output can be in the same pdf format that you need.

  • OCR and hidden text in PDF scans of historic documents

    I need to edit the hidden text behind a scanned PDF image of a document.  The image must remain as an “exact” copy of the original scanned document.
    I used Acrobat Pro (versions 7 and 9) to make PDF images of old typed documents from the 1940’s.  When I open those images and run OCR in version 9, then examine the hidden (invisible) text layer behind the image, there are errors.  For example, the word “book” has been picked-up by the OCR as the word “look.”  I need to change the “l” to a “b” in order to make the PDF accurate when it is searched at a later date. 
    I have checked many user forums.  Most people imply that hidden text can be viewed, but NOT edited in Acrobat Pro 7 and 9.  (Hidden text can be viewed in Version 9 by selecting “Document” “Examine Document” and then clicking on the “+” symbol next to “Hidden Text,” then clicking “Show preview.”)  Some say to use Adobe Capture 3.0 to edit hidden text.  Others say to use Photoshop or Illustrator to edit hidden text (I think these folks may have been confused, because Photoshop and Illustrator would be used, logically, to edit the image ON TOP OF the hidden text).  Yet another person seemed to say that a hidden text editor was added to Acrobat 8, but was taken away in Acrobat 9.  (I can’t verify that because I don’t have version 8.)
    The closest answer I was able to find involved using the Text Touch Up Tool on top of the image to edit hidden text behind it, but when you do that you are typing “blind.”  In other words, you highlight a spot on the image (top layer) where you THINK the error MIGHT be, and you type the correction without being able to see what you are typing over.  Then, you go back to the “Examine Document” procedure (described above) to see if you “hit” your mark, and if not, you redo it until you do “hit” your mark.  With the number of documents and corrections that we have, that procedure would be too labor intensive and thus a budget breaker.
    If we have to buy more software, my preference would be to buy a genuine Adobe product because I have experienced problems in the past switching back and forth between Adobe products and other PDF manipulation software.
    Can anyone answer any of these questions: 
    (1) Is there a way in Acrobat versions 7, 8 or 9 to edit hidden text, and if so, how? 
    (2) What Adobe software (other than Acrobat) will edit hidden text behind a PDF image? 
    (3) Assuming no Adobe product will edit hidden text behind a PDF image, is there any non-Adobe products that will do that?
    Thank you!

    Hi,
    Unless you use Acrobat 8 Pro's Formatted Text & Graphics" or Acrobat 9 Pro's ClearScan you will find that there is no
    practicable means of editing the OCR "hidden text" in a PDF.
    The TouchUp text tool (Advanced Editing toolbar) is reliant upon the selected text having an available system font to use during touchup. However, both Searchable Image and Searchable Image (Exact)  OCR output is of text rendering mode 3 (invisible text) that is provided from within Acrobat and not any installed system or other application installed font.
    With Searchable Image (Exact) you have the untouched image augmented by the invisible text which is provided as a user aid for search or find with Adobe Reader or Acrobat. The invisible text is not intended to support word processor like editing.
    To your questions:
    #1. There is no practicable way to edit invisible text (text rendering mode 3) with Acrobat (any past or current release).
    #2. None.
    #3. A good question. Perhaps a specialty program. Keep in mind, many products provide a promise but those those that actually deliver tend to be expensive.
    Something to play with. Using Acrobat 9 Pro or Pro Extended, try the Preflight Fixup to embed hidden text.
    Then try using the TouchUp Text tool. You may also want to see if you can change the font type of this newly embedded font.
    (use copies of the "real" files - just in case <g>).
    Be well...

  • How to edit a text with same font in a image in photoshop 7

    how to edit a text with same font in a image in photoshop 7

    Good day!
    The question seems to provide insufficient information for a relevant answer.
    Do you have the font?
    Is the text a (Type) Layer of its own or part of the image?
    Could you please post a screenshot with the Layers Panel visible?
    Regards,
    Pfaffenbichler

  • How do you edit document text in pdf in Acrobat Pro 11 Mac?

    How do you edit document text in pdf in Acrobat Pro 11 Mac? I know I can do it in the Windows version, but can't find same tool in Mac.

    Should be the same but it isn't. I have included a screenshot of my tools choices in both my Mac and PC versions - they are totally different.
    Mac version
    PC Version

  • How to remove a hidden text in pdf file with Acrobat Pro 9. How to save pdf file and remove hidden text?

    I
    I made this file in indesign, the highlited empty spaces indicates that their is a hidden text and it pop up when searching for some words in pdf file. so how can I save pdf file to keep only the seen text ???

    Dear lrosenth,
    I went through some codes/suggestions in internet and I found that I need to have cmap file and cid font file for the respective font since pdf doesn't support unicode fonts directly.
    Can you help me to know where can I get cmap file and cid font file for tamil language font Latha(TrueType) microsoft font.
    Regards,
    Safiq

  • Hidden Text in PDF file generated from Ai

    One of my clients (an Ad Agency) has a problem with a PDF file.
    They make the layout in Adobe Illustrator them (to send the file to the newspaper) use the "Save as" menu and use the prepress setting.
    The designer use "Helvetica Neue" the TrueType that came with MacOSX.
    But for a weird reason one letter in the headline dissappear... this one>>>> "É"
    When I check the file in Acrobat 9 and X reports a "Hidden Text".
    Any idea what happen there???
    Thanks a lot

    "Save as PDF" occasionally writes internal links as external links (pointing to a file with the current PDF file name). Such links won't work after the PDF is renamed, even if the PDF is a stand-alone PDF.
    Try printing the book to a .ps file and distilling, instead of "Save as PDF".
    Also see: http://www.microtype.com/Hmmms.html#0702
    Shlomo Perets
    MicroType, FrameMaker/Acrobat training & consulting
    "24 easy ways to improve your PDFs with FrameMaker-to-Acrobat TimeSavers/Assistants",
    http://www.microtype.com/ImprovePDF.html

  • Editing Actual Text in PDF File

    Is there any way or a third party software that will allow me to actually edit text in a PDF???
    Right now, I have a PDF file that was emailed to me for printing and some text in the file needs to be modified. I would just like to edit the actual text instead of waiting for the sender to edit the text in whatever program they used to create the PDF from.
    I presently have Adobe Acrobat 7.0 Standard.
    Any help would be greatly appreciated!!!
    Thanks!
    Mike

    The Adobe Reader for SymbianOS isn't a PDF editor. You'd need Acrobat
    for that, using the Text Touch-up Tool.
    Aandi Inston

  • How do I edit hidden regions  with  ICE?

    I have a  page set up on a client's site that displays in 2 parts using a tab to switch between them.  The regions that have been hidden by CSS  do not show up in  the InContext editor. How can I  set  it up so everything on the page  is editable?

    You'll likely need to detect if ICE is active, then make both display at once.
    Here's a little snippet I use to detect if ICE is active on the page:
    var iceActive;    
    if (typeof window.frameElement !== "undefined") {    
        iceActive = (window.frameElement !== null && window.frameElement.className === 'beditor-iframe');    
    } else {    
        console.warn('window.frameElement not defined; ICE mode detection not available');    
        iceActive = false;    

  • Tools-protection-remove hidden information = how to create report where is hidden text???

    Acrobat can show preview hidden text in pdf. Is it possible to export this preview window in some html like report or something?

    I find something about this. I make screenshoots of this with combo mouse recorder pro and screenpresso. mouse recorder click on arrow (next page number) in acrobat hidden text window and than click printscreen which trigger screenpresso to make picture into folder. Not comfortable but work.

  • PDFs with hidden text readable in reader, but not acrobat?

    For some reason, when I scan documents into adobe acrobat the entire document is unreadable -- its all hidden text.  However, when I open the same scanned document in adobe reader the text is perfectly readable.  I have been receiving numerous pdfs from colleagues which have the same problem - i open them in acrobat, the text is hidden and unreadable, in reader its fine.  Is anyone else experiencing this problem?  I've been able to play with the scanner settings and have managed to create pdfs without hidden text from the scanner.  The true problem is that I have to print many of the PDFs i recieve -- many of them are quite large. When printing from reader, it takes at least 5 minutes per page.  Any help/suggestions are much appreciated.
    Thanks
    -John
    (I'm using the latest version of Acrobat, 9)

    It would help to look at a sample. Can you post one?
    Also, check the "Show Large Images" Page Display preference.

  • Can't make Photoshop PDF with editable / vector text.

    Hi,
    I'm trying to File > Save As an Adobe Photoshop CS6 PDF and then be able to open it and edit the text in Adobe Acrobat X.
    Whenever I attempt to edit the text it is a raster image and it doesnt matter what I do in the photoshop pdf settings.
    I want to be able to do this so that the text is able to be searched by google / search engines when I make the PDF available online.
    -Steve.

    That what I did by using CS6 and it worked for me. Hope it helps you too
    Step1) Moved All Graphics (Images/Backgrounds) in one folder (Folder-Layers)
    Step2) Moved All texts (title, Headings, main text etc.) in another folder (Folder-Text)
    Step3) Merge the first folder (Folder-Layers) and made a single layer by right click & Merge Group
                               OR Select folder > Layer > Rasterize > Layer
                               Now I have only one Background Layer (Graphics) and a text folder
    Step4) Go to - File > Save As > Choose Photoshop PDF –
    Check* Use Proof Setup: Working CMYK then SAVE (If you want print)      You will get a message “The settings you choose in the save Adobe PDF dialog can override your    Current settings in the Save As dialog box. “- OK
    Step 5) Save Adobe PDF Dialogue Box
                  Choose settings- 
    Adobe PDF Preset: Adobe PDF Preset 1
    Standard:        PDF/X-4:2010
    Compatibility:    Acrobat 7(PDF 1. 6)
    General
    Check- Optimize for fast Web Preview
    Check- View PDF after Saving
    Compression
    Just change Compression box None (No Zip, No JPEG., No JPEG2000)
    Don’t touch any settings. and then SAVE PDF
    Then open in Acrobat Reader and do the text changes.

  • How can I correct "hidden" text in a searchable PDF file?

    This seems like a simple question. However, the answers are invariably complex, do not yield the desired result, and often answer a different question entirely. I say all that just to warn people up front that the "problem" is easier than how many people and PDF application developers, including Adobe, typically understand it while the proposed "solutions" are invariably a total...well, botch is a reasonable word if a bit understated.
    Here is the actual problem:
    I have "searchable" PDF files created by scanning documents and running them through an OCR process. I create "searchable" PDF files in order to archive, index, and eventually enable searching for the documents scanned. A "searchable" PDF satisfies those criteria better than any other commonly used, "portable" archive format -- though I would be happy if someone could point out an obvious alternative I may have overlooked. I do not need perfect OCR results. If I need a document to edit or perhaps feed into a spreadsheet or database, I expect to be able to reprocess the page images in a given "searchable" PDF file to OCR and convert the contents to Word, RTF, Excel, or another file format as necessary with more care for the results than for the archived document itself. Therefore, the "searchable" PDF document is the scanned page images which compose it while the OCR generated "searchable" text is secondary, but still important. Therefore, each file must contain scanned page images of sufficient detail to be efficiently converted by OCR if possible and legible enough for whoever views the images to be able to work out what an OCR process may fail to understand. Once scanned, those pages are the "document" and therefore "immutable." However, OCR is imperfect. For a searchable document archive, it does not have to be, but some errors are significant in that they may prevent the document from being found by a search. Therefore, there must be a way to view and, if necessary, edit the "hidden" text in a "searchable" PDF without altering the visual display of a document or how it is printed. No strike-throughs. No visible "corrections." None of the stuff PDF editors want to insert into a PDF file when editing it. I do not want to edit the document without exporting it to a format appropriate for an editable document. I just want adequately "correct" hidden text in a "searchable" PDF file.
    I apologize for the length and redundancy in my description of the problem. However, past attempts to explain my problem and objectives as well as what I have seen in reply to similar queries across the Internet indicate that most people trying to answer this question come at it from the same point of view shared by most, if not all, PDF tool or application vendors. They seem to think that any desire to edit a PDF file is a desire to have a PDF word processor of some sort. Or, they assume that the OCR process employed may need tweaking of the means by which people apply it and then a process like "find suspects" is adequate to deal with any errors. But no, those are not what I am trying to accomplish and answers which address those topics do not answer this question.
    In short, which tool or application from any vendor will reveal the "searchable" hidden text in a PDF produced by any OCR or other process and then enable corrections to the hidden text without changing any document display parameters at all? Note, hidden text typically includes bounding box information denoting the portion of the image from which the text was recognized. That information must not be lost or changed when editing the "searchable" text.
    So, any tools or applications capable of doing this? If Adobe Acrobat XI Pro can (use of a trial copy demonstrated that the hidden text content can be reviewed, but editing did not work by any straight-forward means I could work out while trying out the application), fine. However, $500.00 list or even a $200.00 possible upgrade from a copy of Adobe Acrobat X Standard which came with my scanner is a lot of money for personal use when review and edit of the OCR generated hidden text in a "searchable" PDF file is the only function I require. Therefore, other suggested tools or applications which do what I need for less would be greatly appreciated.

    My "claim"? Actually I've made no "claim" such as you've mentioned.
    Simply stated your OP has foundational premises that presume as factual what is not.
    Here, we're in Adobe's hosted user forum for Acrobat.
    Any other application use is not material. 
    Acrobat XI provides 3 OCR methods.
    Searchable Image, Searchable Image (Exact) & ClearScan.
    Only the first two provide the "hidden" text output.
    (Glyphs have no stroke, no fill)
    From back to the Acrobat 3 product family the design functionality of Searchable Image and Searchable Image (Exact) has been to facilitate the use of Find / Search.
    The "hidden" text is can be touched up. Acrobat Pro provides the facility to view the hidden text.
    So you can see what the OCR output that correlates to the bit-map images of the characters that are present.  
    With Acrobat XI Pro use Tools - Protection -Remove Hidden Information
    In the Remove Hidden Information pane select "Hidden text" then "Show preview".
    The default for the preview is "Show Only Hidden Text".
    Back in the PDF --
    You'd select some of the hidden text and retype what you suspect is the correct string of characters.
    Save and return to the preview of the hidden text.
    If you got it right, good. Continue.
    If not, darn - try again.
    Plug 'n chug -- somewhere over the rainbow it'll be done eh.
    Full disclosure -- this is something I've done (enquiring minds don't you know).
    I've found it to be a rather Sisypean undertaking.
    So, "doable" but not practicable.
    This is to be expected because such touchups are not the concern / focus of the output from Searchable Image or Searchable Image (Exact) - (the names tell it all).
    To have touchup "editablity" of an OCR output using Acrobat make use of ClearScan.
    ClearScan replaces recognized character bit-maps with a character from an Acrobat internal font.
    The character strings can be selected to change to a generic, system available font.
    Something that is good to know when embarking on the "tweak the PDF" journey is that PDF (the file format / technology as defined by its ISO Standard, ISO 32000-1) does not tolerate "editing". PDF is decidely not a word processor file format and "editing" can quickly render a PDF unusable.
    Minor touchups can be made and your best "tool" for this is still Acrobat Pro. (Save As often and periodically "bank" the PDF via some file rename scheme.) 
    Be well...

  • Adding text as hidden layer in PDF's

    Hi, I have some hand written documents (Old genealogy letters) which I would like to be made searchable. Can I scan the documents as PDF’s, then manually word process the documents and add this text as a hidden text layer? Thanks Doctor Keo

    About Acrobat OCR.
    Three methods.
    #1 Searchable Image
    #2 Searchable Image (Exact)
    #3 -
    (a) Formatted Text and Graphics (prior to Acrobat 9)
    (b) ClearScan (Acrobat 9)
    #1 - Provides OCR output as a hidden text layer. Will perform some "adjustment" to the image.
    #2. - Provides OCR output as a hidden text layer. Will not "adjustment" to the image.
    #3. a & b
    If process thinks it "knows" what the character is then it replaces the image of the character.
    If process is not sure what the character is then it flags the character(s) as "suspects".
    End-user can edit "suspects".
    If process does not know what the character is the character's image is left alone as a bit-mapped image.
    Note that "ICR" vice "OCR" is meant for handwritten material that has been scanned.
    Acrobat does not provide "ICR".
    However, text from a typewritter typically provides accurate OCR provided the scan is at high enough resolution (typically, 300 ppi).
    If #1 or #2 is used you can always Save As to a *.txt file.
    This can be brought into a text editor, word processor, page layout application, etc.
    There, you can create a "clean" copy from which a PDF can be made.
    Provide the Scan of the original and use a PDF Bookmark or a Button Field having a link action to go to the copy having the corrected content with renderable text. Make a Catalog index of the cleaned up text PDFs to support advanced search.
    For all practicable purposes, there is no manipulation/edits/etc. to the hidden layer of OCR output.
    Be well...

  • Can you edit the text of a PDF that is placed in inDesign

    I have placed a PDF in inDesign and am curious if I can edit the text of that PDF while in inDesign.

    Edit it with Adobe Acrobat Pro, indesign can't. Option for Indesign is:
    of course overlay text with new text
    There is specialized software for editing PDF files, though the choices are much more limited and often more expensive than creating and editing standard editable document formats. Version 0.46 and later of Inkscape allows PDF editing through an intermediate translation step involving Poppler.
    Serif PagePlus can open, edit and save existing PDF documents, as well as publishing of documents created in the package.
    Enfocus PitStop Pro, a plugin for Acrobat, allows manual and automatic editing of PDF files, while the free Enfocus Browser makes it possible to edit the low-level structure of a PDF.
    In Acrobat you should use the TouchUp Text Tool.
    But in the end, PDF-files are not made for editing.

Maybe you are looking for

  • Am I really not able to use CSS3 Mobile Pack in FW CS5 Any More?

    I spent a lot of money on Adobe Creative Suite CS5.5 and have finally got round to using it to produce a mobile website for my business using Fireworks. I had tried a few experiments with this when I initially installed it but I now find that I canno

  • Placing links on images in slideshow

    Now that I have everything up and running again in DW CS3 (thanks, Murray!), I have developed a series of swapped images to create a manual slide show. I have numbers with links in one table cell that Swap Images in another cell. 15 numbers and 15 im

  • Attaching .pdf to form

    Is it possible to attach another .pdf to my form and then open it when a radio button is made active? I would like it, so that when a radio button is selected, an attached pdf opens..

  • Dreamwever suddenly will not start.

    It has been working fine for months. Suddenly it crashes on start up. Microsoft Visual C++ Runtime Library Runtime Error! The application has requested the Runtime to terminate it in an unusual way. Please contact the applications support team for mo

  • Can I use my 4S to make calls and texts with just wifi and no service contract or data plan?

    I met a guy who claims he does not have phone service - no contract - no data plan - he bought a "hotspot" and makes calls, texts, facetime, email, fecebook, everything - total communication everywhere all the time for free.  It was a casual meeting