Converting PDF's to Text

I have a huge collection of documents I want to digitize. I just bought an Epson scanner, so I can scan the documents in a variety of formats, including .jpg, .tiff and pdf. Unfortunately, I can't get the OCR software that came with the scanner (ABBYY FineReader Sprint) to work.
Then I remembered seeing PDF converters online, so I figured I could just scan everyrthing as a PDF, then convert it to text. But I'm confused. I tried Adobe Acrobat's export function, but that didn't do anything. I read that you can open a PDF in Preview and copy the text, but that doesn't work.
It sounds like there are two ways to create a PDF. With OCR software you can create a text-PDF, whereas I apparently have a scanned-image-PDF, if I understand correctly.
Anyway, I'm confused. Can anyone recommend a software program or online service that will convert PDF's to text on a Mac? I'm also interested in learning how to batch process PDF's. I'm going to have hundreds of documents, maybe a few thousand.
Thanks for any tips.

David Blomstrom wrote:
 ...Can anyone recommend a software program or online service that will convert PDF's to text on a Mac? I'm also interested in learning how to batch process PDF's. I'm going to have hundreds of documents, maybe a few thousand. 
Thanks for any tips.
Since being able to scan combined with OCR will be the easiest approach, getting the OCR software to work would seem to be the best solution. Which version of FineReader Sprint do you have? There's a version in the App Store (https://itunes.apple.com/app/abbyy-finereader-express/id412310371?mt=12) which is supposed to be compatible with Lion and Mountain Lion and if that's not the version you have, perhaps you can upgrade to it. Check out http://www.abbyy.com/checkforupdates/?PartNumber=71817&product=FineReader%20Expr ess%20Edition%20for%20Mac which might do the trick.
Unless the scanning process has the OCR step built in, what you'll get is an image, usually JPG, which can be turned into a PDF file but it's still just a picture. If you could turn it into a PDF that has actual text in it, then you can get into text extraction. There are a number of programs which are supposed to be able to do that. The only one I've tried that works pretty well is MS Word in the Office 2013 suite for Windows. I have it running in a Windows 8 Virtual Machine on the Mac, but that's a long and expensive way around to begin to do what you need.

Similar Messages

  • Trouble converting pdf to rich text

    I must get this fixed asap.
    I have updated and even purchased adobe for conversion pdf to rich text format
    but it won't convert - keep getting error message.  it doesn't matter if I use the cloud or directly on my computer.
    This has to be fixed for online class work!

    Hi mozzott,
    We'll get this sorted out. Can you please let me know the exact error message that you're receiving?
    It could be that you're running into trouble because your order is still processing. (I do see that the order is pending in the order management system.)
    Since this sounds time-critical, I would be happy to convert the file for you. I'll send you a private message with my email address. If you'd like, send the file to that email address and I'll take care of it for you.
    Best,
    Sara

  • Converting PDF CLOBS to text or HTML

    I would like to run though all the PDFs (stored as CLOBS) in a database table and copy them to a text or HTML CLOB. Doing this beforehand will should allow me to rapidly index and snippet-ify these fields duirng queries.
    How exactly can I use the built-in facilities in Oracle Text to do this?
    Roger Ford has had some great input on my snippet performance problems and had this to say:
    "The key is to pre-convert before indexing. You can do that with a pl/sql procedure that uses ctxdoc.policy_filter or ctxdoc.ifilter."
    The Reference Manual, page B-2, has this to say:
    "This technology [AUTO_FILTER] also enables you to convert documents to HTML for document presentation with the CTX_DOC package."

    I apologize for posting prematurely....
    I should be able to use CTX_DOC.FILTER as Roger suggested.
    I think I can just loop through every PDF in the table and dump each converted PDF to the result table. I will set the query id to the key from the PDF table thus allowing me to get at the metadata.

  • Convert pdf to word text searchable

    I need to convert a pdf file into a word document that is text searchable.  how do I do that and how long does it take?

    Hi Lynne,
    You can use your ExportPDF subscription to convert your document. Try using this 'getting started' guide to assist you!
    Let me know it that helps.
    Looking forward to hearing back.
    Kind regards, Stacy

  • Problem converting PDF to text

    I have the latest version of adobe on an xp and I am converting pdf files to text files. I get the first couple of files completed fine, the download really fast, like 300 pages in less than 30 secounds. Then the it starts to lock up. It takes forever to download a document that is 30 pages. Anyone know what is going on?
    I have tried this on both of my computers and the same thing happens. These are ebooks in pdf that I am converting to text files because the font is to small to be viewed on the sony 505 using pdf even enlarged.
    Any help would be appreciated.
    Thanks!

    >I am converting to text files because the font is to small to be viewed on the sony 505 using pdf even enlarged
    That doesn't make sense. Especially since you can easily enlarge up to 6,500%. One font could fill the screen.
    Also, what do you mean by downloading? If you are converting a PDF to text, where does downloading anything come in to play?
    And, Adobe what? You are talking about Reader correct? Adobe is a company with many applications.

  • I need to convert PDF file to Word Document, so it can be edited. But the recognizing text options do not have the language that I need. How I can convert the file in the desired of me language?

    I need to convert PDF file to Word Document, so it can be edited. But the recognizing text options do not have the language that I need. How I can convert the file in the desired of me language?

    The application Acrobat provides no language translation capability.
    If you localize the language for OS, MS Office applications, Acrobat, etc to the desired language try again.
    Alternative: transfer a copy of content into a web based translation service (Bing or Google provides a free service).
    Transfer the output into a word processing program that is localized to the appropriate language.
    Do cleanup.
    Be well...

  • I am converting a .docx file to a pdf and the text is coming out blurry

    I am trying to convert a .docx file to a pdf and the text keeps coming out blurry. Some sentences seem to be bolded in the pdf as well. All the colored text seems like there's a shadow behind it and all the text in bold seems extra blurry. I am saving the file as a pdf. I tried to print the file as a pdf but it kept crashing. I have adobe acrobat pro and distiller, but I'm not savy about which program does what.
    Thanks for any help!

    If you are trying to convert a Word file, ensure the text is in 100% black only.
    To create PDF's there are hundreds of different combinations that can be used, but only a few will give you a good 'print ready' file, which is what I think you may be after.
    You may have font issues that are stopping the text looking sharp.
    If you have Distiller, you can try to print as .ps (save as postscript), load distiller up and choose one of the high quality settings in the pop down menu.
    If you are working on a Mac you can drag and drop the .ps file direcetly on the dock icon for distiller, or if you are running Windows you will need to navigate to the file via the menu bar (you will need to know where you have saved your .ps file).
    There may be other issues with the original file format that is causing problems with your PDF creation.
    Let me know how you get on.
    Cheers

  • Converting PDF to Word changes the text to symbols. How to correct?

    Hi. I'm using Acrobat XI Pro and converting PDF's to Word docs changes the main text to symbols. I've tried this several times and always get the same result. Any idea as to how to correct? Thanks.

    Does the same thing happen if you select some text in the document, copy it
    and then paste it to Word?
    If so, then it's most likely a faulty font encoding.
    On Tue, Jan 27, 2015 at 12:12 PM, noodles83 <[email protected]>

  • How to avoid converting pdf documents to searchable text repeatedly

    Can someone help me to resolve this Acrobat 8 or 9 problem.
    I select a folder and initiate the conversion of all pdf documents to searchabel text (or OCR).
    It works great. Next time, if I re-do the  process, some documents re-convert again. Is there a way to tell Acrobat to stop converting pdf documents that already been OCR or converted to searchabel text ??

    Hi,
    Simplest -> OCR the PDF(s) prior to deployment to the production zone.
    "Good" OCR can come from Acrobat or other applications (done ok with AdLib Server & Adobe's Capture Cluster - I'm sure there are others).
    Typically, an OCR engine only knows how to do one thing - OCR what it is pointed to.
    Perhaps a dedicated server application would provide more refined control - but that's not Acrobat's OCR engine.
    Maybe possible -
    Acrobat 8 & 9 Pro each has a Preflight Custom check that can identify if text rendering mode 3 (invisible text) - Acrobat's OCR output- is present.
    Create an appropriate Preflight Profile.
    Create a Batch Sequence to run the Preflight Profile.
    Configure the Preflight: Batch Sequence Setup to create a report (for "On success" or "On error" or both).
    Designate where the report PDF is to go.
    Extract/filter the report's content to obtain a data set of what files have no text rendering mode 3 present.
    Feed this data to a routine that directs/controls OCR.
    "Routine" maybe possible via something with Acrobat JavaScript. Don't know.
    Maybe via an Acrobat plug-in.
    Maybe to control some other OCR application(?).
    (Something for a real 'gritty' codehead <g>).
    Myself, I try assiduously to avoid post processing my "deployed" PDFs; particular when OCR is involved.
    At the work place, for on-going, PDF "streams" to be OCRd  I and co-workers use AdLib Server.
    So, the PDFs are OCRd prior to deployment.
    Be well...

  • Convert pdf to text in adobe pro x

    I would like to convert a PDF document into text.  Can someone advise?

    If the PDF was originally scanned as a picture (JPG) there will be no "renderable" text in it to save from. You'll need to run OCR on it in order to save it as text. Reader CANNOT do OCR, Acrobat Pro can, under Tools.

  • Convert PDF to Word - Tables are in Text Boxes using Acrobat 11. How can this be avoided?

    Hello All,
    I am using Acrobat 11 to convert PDF's to MS Word .doc or .docx. The tables are converted correctly but they are inside Text Boxes. My documents have hundreds of tables and I cant manually remove the text boxes.
    Is there a way to remove the Text Box that contains each table?
    Regards Paul

    Hello Anubha,
    My apologies.  Hopefully you can see the table is inside a text box. The Acrobat 7 doesn't behave like this, nor do some of the other PDF conversion products.
    In my output there is 700 pages with average of 3 tables per page. A manual workaround to get the tables out of the text boxes wouldn't be practicable.
    Can you confirm that this is not the expected behaviour. I thought I recently saw a video showing this behaviour as standard for Acrobat 11.
    Regards Paul

  • I purchased a very expensive package, to convert files to PDF form. when it converted some of the text needs to be adjusted how do i fix this? When i e-mailed the file to myself i was unable to read the file??????

    Question?? my name is Cecelia. I purchased your product and so far I am very unsatisfied with it. I need to fix or add to a converted PDF and I am unable to do so. I e-mailed this form to my self and it states no file avail.. is this what other will see too??????

    First, when you e-mail a PDF you have to be sure the e-mail package you use encodes the PDF as binary file, not all do automatically. If the PDF is sent as an ASCII (text) file, then it will be corrupted. That is likely what happened. The alternative is to zip the PDF and send the zip file.
    Fix ups are generally done in the application you used. The PDF should be a duplicate of what you see in your APP. If you are using WORD, be sure the selected printer in WORD is the Adobe PDF printer during your editing. WORD and many other word processors reflow documents based on the attached printer. It is not clear what your problem is, but that might be the issue.
    As for forms, are you actually creating an electronic form or is this something that looks form for someone to print and fill in. If you want an electronic form, then you need to use the Form Tools in the tools menu. You can let Acrobat try to guess the form fields that are needed, then go back and edit the form fields. Of course, you may not have actually meant a form as such, but the layout of your document that is messed up. If it is the layout, then attaching the Adobe PDF printer during the editing of your document should solve that issue. You should also consider using the Press or Print job settings to embed all fonts.

  • I purchased PDF Converter Plus, v1.0 (4 ). I am unable to convert PDF into text. I get all gibberish when I try to do the conversion.

    I purchased PDF Converter Plus, v1.0 (4 ). I am unable to convert PDF into text. I get all gibberish when I try to do the conversion.

    I went to the site and did exactly as the 'support' said. I tried three PDF documents for conversion to Word. On clicking 'convert', the last window gives the file with .doc suffix. After I save and open it the window says, "The XML file cannot be opened because there are problems with the contents." Under "Details", it says, "Incorrect document syntax".
    Please guide me further.
    Thanks

  • I had paid for service to sign pdf, write text on pdfs and convert to word and back etc. Its now asking me to pay for a new service to be able to convert pdf to word, what is going on here?

    I had paid for service to sign pdf, write text on pdfs and convert to word and back etc. Its now asking me to pay for a new service to be able to convert pdf to word, what is going on here?

    Hi,
    I checked your account, your Export PDF service has been expired on Sep 24, 2014.
    Kindly contact our chat support: http://helpx.adobe.com/x-productkb/global/service-b.html
    Regards,
    Florence

  • Pdf arabic text file unable to convert to word formate text error is coming.

    Dear Sir,
    Pdf arabic text file unable to convert to word formate text error is coming. The arabic text once converted to word  language error is appearing.

    Hi,
    I am moving your posting at PDF Pack(CreatePDF) forum to Acrobat forum.
    Hisami

Maybe you are looking for

  • Windows XP professional on K7T266 Pro 2

    I just installed Windows XP professional on my computer.  Previously I was dual booting Windows 98 and windows 2000 Professional, but I decided to just do one system.  The problem is, I can't use the windows update.  I get to the point where it says

  • Read excel cell when excel is already open

    Hello, I want to read a specific cell of a sheet with excel already open. Labview has only to read the cell and not have to open excel. I have several examples that show : Labvview open excel select excel file select workbook select sheet select cell

  • Dowload file using ftp in bash script

    Hi! I'm runnig a bash script in solaris i want within the script to dowload file using ftp How can i do it? Tanks a lot

  • How can I sync the music yet keep the music already on my iPhone 4s?

    I have purchased music on iTunes on my iPhone 4s however I want to transfer some music from my libary on to my iPhone. However I'm warned that if I 'Sync the music' all the content on my iPhone will be erased, and the music I have purchased off iTune

  • Semicolon and / in SQL Plus scripts?

    Anyone, Seem to have some confusion over the use of / and ; inside PL SQL Scripts run in SQL Plus. I seem to get two commits thereby two rows on INSERT clase that has a ; and a /. i.e. /* Insert record into table for recording statistics on the runti