Combined text/graphics OCR problem

Hi.
I need to convert a dozen of manuals from tiff to (editable) Word files. Each tiff file is one manual consisting about 50 pages.
There are graphics on some pages with words and/or numbers in it (technical schemes).
When I open them in Adobe Acrobat Pro and convert/save it to Word using OCR, the schemes are getting messed up because of the OCR.
How can I let it use OCR on the text pages only and leave the graphics as they are?

Import the TIFF into a pdf, you can use the Text Recognition tool page by page. Personally, I prefer for a job like your to use a dedicated OCR program that lets me designate which parts of the page should be considered graphics and which are text that needs to be OCRd.

Similar Messages

  • COMBINED TEXT & GRAPHICS QUALITY

    I'm looking at buying a high-end colour laser like the Xerox 7750 or Ricoh 7200. I've got a few questions about colour lasers in general (see my other recent posts). Feel free to offer any comments about:
    COMBINED TEXT & GRAPHICS QUALITY
    My HP scanner has options that optimise the scanning quality for text, graphics or both. ie it appears there is no one setting that scans a mixed page at the best quality. I've noticed on reading printer manuals that some also offer different print settings for graphics and text. Do colour lasers print text in an inferior way if there are graphics on the same page; and vice versa? Or should text quality (and graphics) stay the same no matter what else is on the page? Assume the text is sent to the printer as actual text (not rasterised in the application).
    In the Ricoh 7200 manual it states: "If text & graphics are blurred, select Text Priority under Graphic Mode from the Maintenance menu and then print."
    There is no more explanation, but when I saw that I immediately thought: if I select Text Priority is text quality somehow improved at the expense of graphics quality? Do lasers have this limitation?
    Thanks in advance for any comments.

    Import the TIFF into a pdf, you can use the Text Recognition tool page by page. Personally, I prefer for a job like your to use a dedicated OCR program that lets me designate which parts of the page should be considered graphics and which are text that needs to be OCRd.

  • A way to undo Formatted Text & Graphics OCR from Acrobat 7?

    Over the course of a few months, my company received a large number of PDF files for a project for which the internal policy was that every file should be text searchable.  Unfortunately, we did not save the native files in any sort of convenient way, having at that time not realized that failing to do so was a very bad idea.  We ran OCR on every one of the files that we received, which total approximately 4,000.  At the time that we received the majority of these files, my company was still using Acrobat 7; we've since upgraded to version 8.
    Recently we discovered that there were discrepancies between our electronic copies and the hard copy printouts from which our electronic copies had been generated:  in the electronic copies, uppercase F had changed to P, S had changed to 8, etc.  We eventually worked out that it must have been that at some point a computer was mistakenly set to run OCR using the Formatted Text & Graphics setting, as opposed to either Searchable Image or Searchable Image (Exact).  This was absolutely not want we wanted, as for our purposes using a type of OCR that causes the original images to change essentially renders the files useless.  My questions, then, are the following:
    1)  As I asked in the title, is there any way of undoing Formatted Text & Graphics OCR that was performed in Acrobat 7?
    2)  Is there a way of identifying files that have had Formatted Tex & Graphics OCR performed on them (something stored in the metadata)?
    Rebuilding these files from scratch is going to require a gargantuan effort, so any help would be much appreciated.

    Hi,
    Bernd's been across the mountain and seen the bear; so, you can bank on what he posted.
    But, just because, I'll second his "no".
    Formatted Text and Graphics (Acrobat 7, 8) and ClearScan (Acrobat 9, X) effectively replace the image of textual characters.
    If a character is not recognized as 'something' a bit map is of the thing is left behind.
    Now, while Acrobat or other OCR engines (Abbey FineReader, AdLib, Adobe Capture, etc.) are really rather impressive no OCR engine has 100% accuracy 100% of the time. Other variables  come into play (scan lamp age/brightness, platen cleanliness, scanner mechanicals cleanliness, calibration of scanner, hard copy 'quality' (characters' darkness density, contrast between characters and background, presence of lack thereof of boxed in text, text in or adjacent to line arcs/circles, etc.).
    All of that is for semantic content that is "textual". Semantic content that is not textual (but, coincidently may contain text) provides little to no useful OCR output (e.g., graphs, drawings, etc.). Validate this by performing OCR on such a PDF then Export to a plain text file. Print this file out and compare that to the source paper or the scanned image.
    There is no metadata info that identifies the OCR mode used.
    Perhaps something buried in the bowls of PDF page description content; if so, not intrinsically easy to obtain.
    My suggestion (fwiw) - move forward with re-scan.
    A server product would help to move it along but a high speed scanner hooked to a local machine (with ample resources) and Acrobat Pro 8 or 9 get it done. With Acrobat 8 or 9 use Search Image (Exact).  In Preferences check the category Create PDF or TIFF to assure it is what you desire. Check Acrobat's scan presets to assure you have what you want vis-a-vis Compression and Filtering. Do avoid "Automatic".
    Be well...

  • HP PhotoSmart C309g Printer no longer prints Black Text, Graphics, etc

    Hi:
    We have an HP laptop.
    We have an HP PhotoSmart All-In-One C309g Wireless printer/scanner/copier.
    It has been working fine for a couple of years, until 2 days ago.
    It will no longer print anything in black (colors and Photo Black work fine).
    We have taken it apart...removed all ink cartridges, removed the cartridge caddy,
    cleaned all electrodes,  replaced the caddy, installed 5 (all) new cartridges,, etc.
    We went through the process of removing the cartridges, caddy, shutting down,
    removing the AC cord, waiting for several minutes, then doing the reverse process.
    When we turn the printer on, it does its auto-check/install, and after about 4-5 minutes
    tells us that the installation failed, the to press the OK  button.
    We went thorugh this process several times, and the installation failed every time.
    In the past, installing printer cartridges, cleaning, calibrating...executed quickly.
    The printer will not print anything that is Black, whether it me a print from Word, Notepad,
    doing a direct copy from the glass, etc....both, with the laptop On or Off, but it will not
    print anything in standard Black.
    The colors and Photo Black print fine, but the regular Black does not.
    The output from the other 4 cartridges are sharp, crisp, etc.
    Anything... text, graphics, etc.,  in standard Black does not print.
    Any ideas.
    Thank you
    -DaleBr
    This question was solved.
    View Solution.

    The troubleshooting steps in this document may help resolve the issue of black not printing.  If not see the post here.
    Bob Headrick,  HP Expert
    I am not an employee of HP, I am a volunteer posting here on my own time.
    If your problem is solved please click the "Accept as Solution" button ------------V
    If my answer was helpful please click the "Thumbs Up" to say "Thank You"--V

  • "Recognize Text Using OCR" Option Grayed Out in Acrobat 9 Pro (9.5.1)

    Running Adobe Acrobat 9 Pro.  I'm working with electronically filed court documents.  I regularly use the OCR tool (Document -> OCR Text Recognition -> Recognize Text Using OCR...) on these court documents.
    Problem is, every once in awhile, I'll run into a document where the "Recognize Text Using OCR" option is inexplicably grayed out.  I have no idea what is causing this.  I have checked the Document Properties and confirmed there are no security restrictions for the document.  It happens inconsistently, in that OCR will work with a document filed by an attorney in one case, but it won't work in the same kind of document filed by the same attorney in a different case.
    Any help getting OCR to work on these few rogue documents is appreciated!

    Form created with LiveCycle Designer are XML forms in a PDF wrapper and many of the usual PDF properties are not available. This is like embedded rich media in a PDF. If you want to research this, Adobe and ISO have the PDF Reference manual available as a free download.

  • Recognize Text Using OCR from DLL

    Hi:
    We are a service company,working on a project we need to do OCR on PDF files: convert a PDF to a searchable PDF.
    The customer has licenses for Adobe Acrobat Pro Extended.
    The problem we have to solve it: from a JSP page, run an applet and to have access to Adobe Acrobat Pro Extended for use the funcionality "Recognize Text Using OCR" on a PDF file.
    Ideally, we would be able to access a DLL and invoke this functionality, it is possible?
    If not, what would be the way to access this functionality: IAC? Plug-In?
    Would greatly appreciate any help.
    Thank you very much.
    Raimundo Carlos
    www.base100.com

    [lrosenth:]
    > LiveCycle ES includes lots of PDF functionality that you can use from various APIs.
    I tended to associate the term "LiveCycle" with the newfangled (XML-based) way to handle forms, but it has become clear that LiveCycle is much more than a new Forms paradigm.
    It sounds like the LiveCycle SDK/Library can be used as a (full?) replacement for the original APDFL.
    Is there a table somewhere with the differences between those two SDKs?
    TIA,
    -Ramon

  • How disable PDFEdit combines text from multiple Tj operators

    HI
    In http://help.adobe.com/livedocs/pdfl_sdk/9/PDFL_SDK9_HTMLHelp/API_References/Acrobat_API_Re ference/PDFEdit_Layer/PDEText.html mention that "PDFEdit combines text from multiple Tj operators into a single text run, when possible "
    how I can disable this feature
    I want disable this feature because  I write a program that process (encrypt) and copy every element recursively  , I used chain encryption algorithm that mean if one bit change  or shift all result will be wrong
    when I  encrypt, for example count 323 run in one textelement but after encryption and when I want decrypt in same textelement I just get 322 run, I think  this problem caused by PDFEdit combines text.
    thanks for your help

    Pdfedit can and often will write a completely different set of operators. It converts the operators to an object representation, then wites opererators for the objects. you cannot stop this, it is what it is supposed to do.

  • Copying a Text Graphic

    Hello,
    I just picked up on a Flash project at work from our previous web designer who has since left the company and I know nothing about Flash so I am going to try my best to explain my problem.
    The client has requested some information be added to one of the sections of the web page which would involve a vertical scroll bar because it can not all fit in the pre-determined area for text. It will be added under jobs. Under the legal section the text runs outside of the area and has a scroll bar effect applied to it. Can I copy the legal text graphic, delete the old jobs text graphic, place the job text in the copied legal text graphic and then name it as the like the original text graphic that I deleted and have it all work without any problems?
    This might be a little bit hard to understand so I included a picture to show you maybe what I am talking about?
    thanks
    garrett

    I am not clear on what you have in front of you, but if the text is a graphic of text, then you should be able to directly replace that graphic with a new graphic in whatever object is holding it.
    If that section only holds text, you may find it is a textarea that has a scrollbar enabled for it, wherein the text is likely defined in the actionscript code (which means you could edit the text... a graphic isn't involved).
    What I recommend is that you make a copy of the file so that an original is retained.  Use the copy to experiment with making changes.

  • Combine Files and OCR

    My team here has various files that we combine to make a report which we will then process and disclose. We need the files to be either OCR'ed or optimized to do our job effectively. Some of those files have renderable text and others do not (they are scans). Is there an option where we can combine while also OCRing the scanned docs?

    We combine our reports and have the user OCR it before they begin their process. We combine both types of files, renderable and non-renderable, and then the user OCRs the mix. The process will skip the pages that don't need OCRing so I'm not sure where you are getting the idea that you can't.

  • Graphics display problems (dialog boxes) Lenovo G50 AMD A6 Win 8.1

    Graphics display problems Lenovo G50 AMD A6 Win 8.1 Greetings to all. After having spent a number of days trying to find a solution to a display problem, I turn to this forum for help. I own a Lenovo G50 AMD A6 Win 8.1 ever since November 2014, and have had display problems ever since the first day, be it while using internet (Firefox or Explorer) or various software (Text editors, etc.). Many dialog boxes, including colour settings and controls (open, close etc.) either have an inappropriate appearance or even do not appear at all! I manage to "click" on the right places using memory: most areas are INVISIBLE! Another example would be that I do not see any backround colour or image on most webpages such as this one).  Some suggestions included checking for graphics updates (but the drivers used were already up-to-date).  I 'd be most obliged for any new suggestions. Thank you very much. George

    Greetings to all, I finally decided to express my utmost discontent with- the Lenovo G50-45 I bought before Christmas 2014,- the summary response with inappropriate links that I received after exposing  my problem- the fact that I tried to dowload the following graphics driver  (Beema) AMD Driver (VGA, HDAudio, SATA) for Windows 8.1 (64-bit)
    exe
    526 MB
    Windows 8.1 (64-bit)
    VGA V14.502.1002.1002-Logo'd_HDAudio v9.0.0.9905_SATA v1.3.1.220;VGA v13.302.1601.1001_HDAudio v9.0.0.9905_SATA v1.3.1.220
    3/19/2015from here(http://support.lenovo.com/fr/fr/products/laptops-and-netbooks/lenovo-g-series-laptops/g50-45-notebook-lenovo/downloads/DS100174), and after REFUSING to accept cookies and some "Lenovo Service Bridge", finally managed to obtain something  HERE(http://www.notebookcheck.net/Lenovo-IdeaPad-G50-45-Notebook-Review-Update.125641.0.html).  I find it quite disrespectful to NOT provide customers with EASY support, as well as avoiding to  answer their requests for further help when some simplistic "assistance" that one can find almost anywhere on the web leads to no further solution. I shall not repeat the problems I have faced till now: vide supra. The problem, however,  is NOT just with browsers.I simply cannot see any dialog box content and colour in MANY software programs, such as OPEN OFFICE; for instance:  PRESENTATION colour dialog boxes show BLANK squares instead of squares filled with different colours. This has occured AVER SINCE DAY ONE! Finally, a new bug has appeared: Windows 8.1 keeps popping up some email program to which I am invited to register - I never created a Windows email account, and certainly don't intend to do so.NOR shall I accept some "Lenovo Service Bridge", however "discreet and inoffensive" it might be. I don't know if there are any legal grounds for customers to complain for all these problems. I DO know, however, that I shall NEVER buy a LENOVO device ever again. The pricing and processor might be competitive => yet,  the support that ensues is, as far as I'm concerned, LAMENTABLE. To make a presentation, I have to switch to a Samsung R20 using XP..... What is the meaning of all this? George    

  • Grabbled text after OCR

    I'm scanning old law books from a Minolta PS-7000 using IrfanView software, then bringing them up on Adobe Acrobat 8 Standard for PDF files. I then did a text recognition, OCR and some of the pages are OK, but then I come across other pages that have text that is all grabbled with symbols and numbers instead of letters. I have a HP Pavilion 732 computer, with Windows XP. Can someone please give some direction on this problem? I have a lot of old statute books to scan.
    Thanks,

    It's part of the process. I recommend you check out the help pages
    about OCR in Acrobat, there's more to it than you might imagine (or
    than most people want).
    Aandi Inston

  • How do I combine text and photos on the same page in iPhoto using photobook

    How do I combine text and photos on the same page in iPhoto using photobook?

    You mean while creating a book in iPhoto?  Click on the layout button while viewing a page and select the layout that includes both text and photos.  Most themes will have those options.
    OT

  • How do I scan to text using OCR on the Envy 5660?

    Hello,
    Prior to ordering an HP Envy 5660 printer, I confirmed that OCR text recognition is expressly included in the Printer Specifications for the HP ENVY 5640, 5660, 7640, and Officejet 5740 and 8040 e-All-in-One Printer Series document here.
    As you can see, under Scanning Specifications, which apply to all models listed in the above document’s title, it says: "Scan to text: Integrated OCR software automatically converts scanned text to editable text."
    I have now received and set up the HP Envy 5660 printer that I ordered. It is connected via USB to a MacBook Pro running Mavericks (OS X 10.9.5). After clicking the Download HP Software link on the accompanying CD, I was automatically connected to HP's Product Setup area, from where I obtained the latest driver package for my operating system, "HP-ENVY-5660-series_v12.39.0.dmg." Using the "Custom Install" option, I installed “Essential Software,” “HP Scan,” and “Product Help."
    The print and scan functions on my HP Envy 5660 are working, but regardless of whether I scan a page of text via the printer’s control panel, or the installed “HP Scan” application, or the installed “Image Capture” application, I can find no evidence of integrated OCR software, and no option to convert scanned text to editable text.
    Please tell me where to locate the specified OCR software, and how to enable its operation on the Envy 5660.
    Thank you.

    Greetings, @TeaMasterLing , welcome to the community!
    I read through your post about how you are attempting to use OCR software that was to be included with your printer software installation. I was unable to recreate this situation here on my lab computer to see what you are seeing on your end.
    For that reason, I cannot provide you with a possible solution and would suggest calling in to phone support, as they can log on to your computer if need be to see how the issue could be resolved to have the OCR software working for you.
    Here is HP's contact info:
    If you are calling within North America, the number is 1-800-474-6836 and if you are calling outside of the US/Canada: click here.
    I hope you soon have a solution!
    Have a great day
    R a i n b o w 7000I work on behalf of HP
    Click the “Kudos Thumbs Up" at the bottom of this post to say
    “Thanks” for helping!
    Click “Accept as Solution” if you feel my post solved your issue, it will help others find the solution!

  • Graphic card problem maybe? Since when I have changed the hard drive to my macbook pro and I installed all the new softwares, my mac is very slow and the screen gives crazy pictures,could maybe be the graphic card, does anybody have experienced this?

    Graphic card problem maybe? Since when I have changed the hard drive to my macbook pro and I installed all the new softwares, my mac is very slow and the screen gives crazy pictures,could maybe be the graphic card, does anybody have experienced this?

    after a restart it works for a time, but always slow. I went to the Applestore this afternoon and the made a check and said it would be the logicboard and they would have to change it for CHF 600 and it would be ready in 10 - 15 days

  • How can users who have Acrobat Reader only save scanned pdf files so that the text on them is searchable using ctrl-F?  I just use the recognize text with ocr feature in the full version of Acrobat and this seem to do the trick. Reader doesn't work!

    Our users have scanned pdf files they want to be able to search using ctrl-f.  I got them to be searchable by doing a recognize text using ocr with Acrobat Professional vesion 8.  They want to know if they can make the files searchable with Acrobat Reader only or if they need the full Acrobat Professional software to make the files searchable.
    Thanks for the help!!
    Ken K. - 2191

    To clarify a bit they need to have Adobe Acrobat, not Adobe Reader. Reader has not been associated with the Acrobat name for 3 or more versions. The process you are asking about is a creation process - the purpose of Acrobat - and NOT a reading feature.

Maybe you are looking for

  • Can't run cinnamon in 3D mode

    Hi I have just installed arch and i want to use cinnamon desktop. I installed gdm , cinnamon , xorg-xinit , xorg-server and similar packages from AUR. I also installed xf86-video-intel and xf86-video-ati for my switchable inel and radeon cards. But ,

  • How do I tell if a radio button is selected?

    Apart from relying on the radiohandler, is there any direct command or code which i can use to immediately check if a particular radio button is selected or not? Like in Visual Basic, we can immediate state if (radiobutton.value = false) is there any

  • Mass change of material master

    Hi Experts, I have a query regarding mass change to be done in material master. I need to update one value in the Material group 4 field in sales org view 2 in MM02 transaction for nearly 300000 materials. One option i find is LSMW and the other i sa

  • Error message when opening Classic version of Finale

    I use Finale as my music typesetting program. I have some old scores made using Finale 3.0.3 in OS 9.2.2 which, because of their complicated layout and spacing, do not open satisfactorily in Finale 2004 (version for OS X). I have the old Finale 3.0.3

  • Basis for creating the dimension for the cube

    Dear All, I have one basic query. We can build the ODS and pull data from source system to BW system. But now if i am suppose to build the Cube for my reporting then how do i make decision to build dimension for the cube? I mean what are the paramete