Acrobat X and OCR Accuracy Issues

Does anyone know if you can assign a custom dictionary to the OCR process?
I have thousands of scanned pages in PDF format that I need to OCR.
When I OCR the same PDFs using another product, I get much better results. Acrobat does not see the same word in the same way twice (swaping "er" endings for where there are "or" word endings). It also misses words that other engines have picked up.
Does it spell check?
I know it is possible to go back and edit, but I have thousands pages to do.
You might also ask -- so why not just use the other product? It does not compress the PDFs the way Acrobat can -- and a two step solution is not desirable (OCR for one part, then move to Acrobat to optimize).
thank you for your help.

When I attempt to OCR a document with the main language as Japanese, the resulting exported text file does not contain the appropriate unicode characters. There are just periods for the Japanese letter forms.  Is there another option that needs to be selected?

Similar Messages

  • Acrobat X: Improved OCR accuracy / capabilities?

    Anyone know if there is improved OCR accuracy / capabilities in Acrobat X?

    i just tried the acrobat x ocr capabilities and it was very disappointing.  acrobat x ocr is not very accurate.
    also, acrobat x doesn't format or select paragraphs properly, because each line of text is given a paragraph break--which interrupts text flow.  that means any copied paragraph text from acrobat x will require a lot of manual reformatting.
    any chance acrobat can use the abbyy finereader ocr engine in the future or as a plugin?  finereader ocr is virtually perfect in ocr accuracy.  finereader also handles paragraph text flow the way it should be.

  • Acrobat X and OCR/CJK Support

    Does Acrobat X support CJK languages when running OCR?  If not, how much is the upgrade to support these languages?  Thanks in advance.

    When I attempt to OCR a document with the main language as Japanese, the resulting exported text file does not contain the appropriate unicode characters. There are just periods for the Japanese letter forms.  Is there another option that needs to be selected?

  • Can I improve PDF OCR accuracy and compression ratio by running it through Acrobat X Std or Pro?

    I scanned a book and stored the PDF file with minimal OCR and low compression using mp ex navigator v1. Can I use Acrobat X Standard or Acrobat X Pro to take that PDF file and improve both the OCR accuracy percentage AND the compression ratio, and then it in a new PDF file?
    I am running Windows 8 64-bit.  It may or may not matter, but I have Office 2010.
    Message was edited by: computer-girl

    I purchased Acrobat Pro XI last month to try to digitize all my text books (Scanned as TIFF and imported to PDF).  I've been pretty unimpressed in general.  Specifically to your question, it seems that if a page has a photo on it or some complex image, Acrobat won't deskew it.  My version updated just today to 11.0.06 and it still has this problem.  I think it does deskew when "Apply Adaptive Compression" is selected because that option causes each page to be broken into parts (text/image).
    Just incase you haven't tried, on the Optimize Scanned PDF dialoge box, make sure the slider is set to "High Quality" instead of "Small Size", that may help with the image quality.  In my tests, I ended up leaving it turned up all the way.
    I have finally put together a workflow to deal with the problem.  I use Acrobat to extract each page as a TIFF to a separate directory.  I use a third party program to deskew and crop the images.  I use Acrobat to reassemble them into a PDF.  I use Text Recognition/In this File (Searchable Image Exact) to do OCR.
    I am pleased with the results though Acrobat tries to OCR a lot of images and I just get a bunch of jumbled invisible text on my images.  I don't mind it though cuz it's invisible, but pretty pathetic given the cost of this product.
    I'll mention the third party program I use in a post below because I don't know if I'm allowed to post it or not.  I just don't want this post to be deleted for mentioning another program.
    Hope this helps others.

  • Scanner and OCR don't Work in Acrobat 10.1.7

    Up until yesterday, I was able to scan and OCR just fine.  Today I realized that e-mailed pdf's did not OCR, instead, I got an "Unknown Error" on every non searchable pdf I was sent.  I uninstalled and reinstalled, restarted, etc. and now, not only does the OCR not work, but now whenever I scan from my brother MFC scanner that worked fine yesterday, the scanning processing goes on forever with only black pages showing.  For instance, a one page BW scan showed over 10 pages before I exited the process.
    Also, e-mailing is much slower.
    Interestingly, my Scansnap works fine for OCR (though I'm sure that isn't using Acrobat) and search of pre-OCR'd docs work fine.  It's just frustrating that suddenly no docs e-mailed or faxed to me electronically can now be OCR'd.
    I've tried printing to acrobat to no avail.  I'd hate to have to print and re-scan just for OCR.
    Here's the system info
    Available Physical Memory: 947356 KB
    Available Virtual Memory: 3689228 KB
    BIOS Version: TOSINV - 1
    Default Browser: C:\Program Files\Internet Explorer\iexplore.exe
        Version: 10.00.9200.16521 (win8_gdr_soc_ie.130216-2100)
        Creation Date: 2013/07/10
        Creation Time: 9:00:54 AM
    Default Mail: Microsoft Office Outlook
        mapi32.dll
        Version: 1.0.2536.0 (win7_rtm.090713-1255)
        Creation Date: 2011/04/05
        Creation Time: 3:34:16 PM
    Graphics Card: NVIDIA GeForce 310M
        Version: 8.7.2.47873
        Check: Not Supported
    Installed Acrobat: C:\Program Files (x86)\Adobe\Acrobat 10.0\Acrobat\Acrobat.exe
        Version: 10.1.7.27
        Creation Date: 2013/05/10
        Creation Time: 3:57:36 AM
    Locale: English (United States)
    Monitor:
        Name: NVIDIA GeForce 310M
        Resolution: 1600 x 900 x 60
        Bits per pixel: 32
    OS Manufacturer: Microsoft Corporation
    OS Name: Microsoft Windows Vista
    OS Version: 6.1.7601  Service Pack 1
    Page File Space: 4194303 KB
    Processor: Intel64 Family 6 Model 37 Stepping 2  GenuineIntel  ~2128  Mhz
    System Name: OWNER-PC
    Temporary Directory: C:\Users\Owner\AppData\Local\Temp\
    Time Zone: Eastern Standard Time
    Total Physical Memory: 4053856 KB
    Total Virtual Memory: 4194176 KB
    User Name: Owner
    Windows Directory: C:\windows

    I'm scanning for malware now.  The "slow email" was not described well, I meant that the "send" command in acrobat is particularly slow.  My main issues are the sudden lack of OCR capability within Acrobat and the scanner issues.
    I'll see what happens with the malware scan and in the meantime any help would me much appreciated.

  • Acrobat X (performance) and (probably) security issues

    Hello,
    I'm new with Acrobat X and there are two main problems:
    1) I have lots of large OCR scanned documents (PDF/A). I(!!) am the owner of the documents! There is no security build in. When I view the security settings (document properties), all actions are allowed, no restrictions. However, when I want to "compress" (optimize) the pdf document, Acrobat X says: not allowed, I should change the security settings. But there is nothing to change ... when I created the documents I have set all security features to off , all things allowed (but in Acrobat-X the document settings, which show that all actions are allowed,  are - on the other hand - not changeable, fields are locked).  What can I do ??
    2) Performance ! When I save large pdf documents (all such OCR scans of pdf/a-type) to another location (hoping that this will "optimize" the file) the saving process (to a local HD and a 8 GB dual ciore machine) takes 1/2 hour or more (not 1/2 minute!!, what I expected ...). As I have hunderts of those files it can takes weeks or even months to re-save all these documents with Acrobat-X. This must be a malfunction !? Or what else could I do ??
    Thanks.
    kpl1949

    Hi, thanks. Very helpful !
    Dave Merchant wrote:
    If you're viewing a PDF/A document in PDF/A View Mode, all editing is disabled. You can turn it off but are advised to do so only when necessary
    OK, but these are MY documents: PDF/A was only an option when scanning - but size is a much more important criterium.
    Hope, the "performance" topic of my posting is as easy to solve as the pdf/a issue.
    Thanks again.
    Klaus

  • Acrobat X: Is there an ABBYY FineReader plugin for better OCR accuracy?

    Acrobat X's OCR capability is not very good.  Based on my testing and comparisons, ABBYY FineReader's OCR capability is much better!  Is there a FineReader plugin that can be added to Acrobat X for better OCR accuracy?  If not, can this be a possibility in the future?

    ABBYY FineReader is often better, but sometimes Acrobat OCR is better. On some texts Acrobat recognizes word boundaries where FineReader returns multiple consecutive words as a single very long word.

  • I have paid for Adobe Acrobat XI Pro OCR but it will not recognise a letter in the serial number for me to update my adobe account and any of the programs I have tried using to convert a PDF file on a Mac to a word doc it is converting file with funny sym

    I have been supplied info in an email to copy and paste for a new product purchase "Adobe Acrobat XI Pro OCR" but am unsure how and where to copy and paste to. There is also a link below the serial number. I have tried entering serial number into my Adobe ID but it is not recognising one of the letters in the 24 Digit serial number??? Also I have tried other products previously downloaded to convert a 7 page PDF file on my Mac and convert it to a Word doc but everything I have tried is converting the file to display some text correctly but also displays random symbols and fonts in place of the handwritten info filled in on the form... also getting blank pages included instead of the info??? Would appreciate some help... I am older generation and not always tech savvy, and it is doing my head in haha.

    Hi Jock,
    I've checked your account, and all is well there. Please make sure that you're logging in with the same Adobe ID/password that you used when you signed up.
    Then, clear the browser cache, and try logging in directly to https://cloud.acrobat.com.
    Please let us know how it goes.
    Best,
    Sara

  • Acrobat 9 OCR and "OCR Suspects"

    I downloaded the trial for version 9.
    Took a poorly scanned page and OCR'd it.
    It (expectedly) had a few errors.
    Then I selected "OCR Suspects" from menus.
    What it should have done is found the "low confidence" results, but
    instead, it said no OCR suspects were found.
    This used to work in version 8, but I can't get 'OCR suspects' working in V9 trial.
    Can anyone confirm if this works in the full version of Acrobat 9 Pro or Standard?

    It's strange that while I posted to this Adobe forum, there is a response over at objectmix.com. As contributing to this topic from 2 locations seems confusing, I'll carry on here.
    Amannagpal76 responded, saying in part that ClearScan in 9 Pro replaces Formatted Text & Graphics. Good to know this. ClearScan does, however, continue the mix. If ocr doesn't work on a character graphic, that graphic will continue to be displayed as such, amidst ClearScan's synthesized type 3 font imitation of the original font. This is most obvious when using the marquee zoom tool.
    Aman suggests using the Touchup Text Tool and changing the font to any font installed on one's system. This doesn't work for ClearScan. Selecting a different font in Touchup for a PDF that came via a wordprocessor works fine, but not for a PDF that came via a scan. That, unfortunately, is the only time that ClearScan is used. The error message when I try this states that there's no system font to match the one in ClearScan, and text can't be added or deleted.
    ClearScan is remarkable for the small size file it produces. That size can be reduced considerably even further by converting it to the Adobe 7 file format. ClearScan's synthesized font is also remarkable when enlarging the page on screen. Then you can see its true outlines -- rather chewed up in high magnification, but that's OK. It would be nice to extract the font in question and use it on one's system. One downside to ClearScan is that its ocr fails to retain italics when output to RTF and Word.
    I have never found a suspect in 9 Pro.
    The conclusion from the above is that the hidden text produced by any ocr'ing in 9 Pro can't be corrected.

  • Touch-up tool causes font issue in Acrobat 9 and X

    While I edit text using the Edit Document Text tool (formerly TouchUp Text) the font style are getting changed, for example If i select the Uppercase style text using Document Text Tool after exit the selection and trying to save as  the doucment some characters with in the text changed into lower case. In Acrobat 7 and 8 the TouchUp Text tool works fine. Please suggest a solution to get rid of this issue in 9 pro and X pro.

    Hi LoriAUC,
    I have attached the screenshots  for your reference, How I selected the text and what the problem was occur in the pdf files. Actually this is pdf file which is exported from InDesign as per customer provided export settings. if I change 100% of subset fonts in that export settings(See the screenshots below) I can get rid of the font issues in Acrobat 9 and X pro but I am not supposed do any changes in that settings.
    If you want check/analyze the problematic pdf file I can send it as an attachement to your mail ID? if so please let me know your mail ID.
    Please let me know if you need any other details or questions to be clarified regrading this.
    Thanks for your  support.

  • Acrobat and Reader Version Issues

    Hi,
    I've created a PDF porffolio made up of several PDFs. When I view the PDF portfolio at the "top level" (where you see names of the individual PDFs on top of a window that looks like it should show the first page of the PDF) all it says in the window is "For the best experience, open this PDF porffolio in Acrobat X or Adobe Reader X, or later. However, I have Acrobat XI and am viewing it in Acrobat XI, so why am I having this issue?
    I can at least open the PDF portfolio and all the PDFs inside. My girlfriend gets the same window when I try sending it to her and can't open the PDF portfolio at all andl she has Acrobat 11.0.3. Why is this happening to her?
    Attached is a screenshot of my issue.
    Thanks.

    Yes, I guess I did. So I guess that's the issue. Thanks!
    Since you seem to know Acrobat and Reader pretty well, I have two other issues I'm dealing with:
    1.) I'd like to create a PDF portfolio of my graphic design work. In the past I've created a PDF portfolio and added other PDFs or jpegs into them with difference pieces of work. However, is it possible to create a portfolio where a multi-page PDF (within the larger portfolio) can be viewed in it's entirety (at the top level of the portfolio) without having to open the file? The idea would be that you would see one page of the portfolio with a piece of work, scroll to the right and see another, then when you get to the multi-page PDF if you kept scrolling right it would just take you through that entire PDF instead of having to open the file. Basically, what I'm trying to avoid is having to create jpegs of individual spreads of an InDesign file so they can be viewed seamlessly in a PDF porfolio instead of being able to drop in a multipage PDF. Let me know if you need clarification on this.
    2.) Finally, when I've tried to re-order files within PDFs, I've occassionally had success renaming the files (ie 1.jpg , 2.jpg, 3.jpg, 4.pdf, etc) in the "display name" column under the Files tab. I once had success doing an "Initial Sort" for display name but other times it hasn't worked. Is there anyway to make re-order work consistently?
    Thanks!

  • Acrobat 9 and Canon iR-ADV C2030 issues

    I have a user running 10.7.5 on a 2ghz MacBook Pro who constantly has issues printing PDFs to various printers.  When she prints tabloid PDFs to our copier (Canon iR-ADV C2030), the prints come out fine.  When she prints tabloid PDFs to any of our other printers (Xerox Phaser 7750 & 7800, HP LaserJet 5200) the prints come out the wrong size.  It looks fine in the preview before she prints.  I had reset her printing system, gone into the localhost settings and wiped her preferences.  Only happens with PDFs, but it took a while to properly diagnose the issue (very busy user).
    After much testing, this is what I've come up with:  It ONLY happens with Adobe Acrobat 9, and only after printing to the Canon.  The user can first print a tabloid PDF to our HP and Xerox machines with no issue.  She can then print to the Canon with no ill effects.  But once she prints there, it messes up the Acrobat preferences and she can no longer print tabloid correctly to any other printer but the Canon.  The PDFs come out on letter at tabloid size (i.e., top corner oreinted with the rest of the image off the paper).  Odd one.  The problem does not occur with Acrobat X.

    Based on what you have written the ADV C5035 does not have the optional Postscript (PS) kit installed. It has nothing to do with any Mavericks update.
    So you will have to use the UFR2 driver you mention, which is a Canon proprietary printer language and the default printer kit in the Canon iR devices.
    Note that when creating the network print queue you must use the HP Jetdirect-Socket protocol. IPP is not supported by the driver and some users have had issues with the LPD protocol.

  • Acrobat X and Yosemite Issues

    I have been using acrobat 10 and OS Mavericks to manage my office documents using a Fujitsu scanner. After I upgraded to OS Yosemite, I have been unable to print any of the documents that I have scanned. The image of the doc will show up on the print window, but all I get is a blank page. My print drivers are up to date and I have even tried a different brand of printer with the same results. I had coworker scan a doc and email it to me (windows machine) and I can print with no problems. I can also scan an image as a jpeg and print. Troubleshooting with Apple did not solve the issue. Does anyone have any ideas?
    Thank you in advance

    1) Zooming
    the zooming with the swipe gesture was always problematic, but it is now only happening in the Audio FIles Editor and to some extend in the Score Editor. All the other windows work fine on my machine.
    2) Resize Tool
    Works fine without a problem in the Main Window and the Editor windows. Of course, you can resize in the Audio FIles Editor.
    Hope that helps
    Edgar Rothermich
    http://DingDingMusic.com/Manuals/
    'I may receive some form of compensation, financial or otherwise, from my recommendation or link.'

  • Issues betw. PDFs created in Acrobat 7 and reviewed in Reader 10

    Due to constraints with our authoring system, we are limited to using Acrobat 7 to create PDFs and prepare them for review. Recently, our reviewers have not been able to add their comments into these PDFs. We don't know if this is a disconnect between the versions of Acrobat Pro and Reader or if there are other issues that we're not aware of.
    Background:
    Working OS = Win XP SP3
    Reviewer OS = ? (various)
    To prepare the PDF, we select Comments > Enable for Commenting and Analysis in Adobe Reader... and then save this version of the file over the original PDF.
    Any thoughts out there?
    Message was edited by: lindrog007

    Can you elaborate on the exact problem you are facing here? What error do the reviewers get on adding comments? If possible, can you attach one such file created from A7?

  • Hello, I have the usual issue to use AcroExch.PDDoc/AcroExch.App/AcroExch.AVDoc objects in Visual Basic (MS EXCEL). On my company's machine I have Acrobat X and everything works fine, but I have another machine with just "Reader" installed - is there als

    Hello, I have the usual issue to use AcroExch.PDDoc/AcroExch.App/AcroExch.AVDoc objects in Visual Basic (MS EXCEL). On my company's machine I have Acrobat X and everything works fine. I am looking for the cheapest solution to get the stuff running on another machine... is there any way to do that with the Acrobat SDK, or do I need to purchase a full version of Acrobat XI ($$) ?

    The Acrobat SDK is nothing by itself. It is just information on how to automate Acrobat - just as the Office SDK doesn't include Office, but is for people who already have purchased Office but want to automate it.
    These automation things are MARKETING TOOLS FOR ACROBAT. Consider this and the technical limitations make a lot more sense.
    So, yes, you need to buy Acrobat. Standard is cheaper than Pro.

Maybe you are looking for

  • Ora-01426 error

    Hi, If i execute the below pl/sql, i am getting Numeric over flow error. declare t number; begin t :=788888888*305; end; ora-01426 Numeric over flow Ora-06512-Error at line No:4 I am not getting this error, if i use decima point in 788888888.0*305 Is

  • Ipad2 only shows apple logo

    i tried rebooting, nothing. it won't let me do anything. i can't turn it on or off, can't reboot it.  it had at least 60 per cent power left last time it was off. i only turned it off because it would not connect to my wi-fi. now all that it does is

  • Left Outer Joining multiple tables to one source table FAILS with VLD-1511

    Hi all, Is it me, or is OWB unable to handle left outer joining 1 source table to multiple other tables? I want to load a fact table so I have 1 source table with measures. This table must be outer joined to some dimensions that have their FK in the

  • Session Data Overwriting

    Using weblogic 5.1.0           I create an object and store it in the httpsession when a client           accesses a servlet. When a second client logs in, a similar object is           created for him and put in the session.           When the first

  • Java error in XP (URGENT)

    HI all, I am working on a 3 tier software and it is running fine on all machines except one "The XP Machine", The problem is when i m going tru http://appserv:13000 it starts executing but in between an error comes that shows that some java thread or