OCR and Reducing file size

I have a large document (a book) that I am trying to scan. I will be scanning it chapter by chapter. The book was printed in grayscale, so I don't have a pure BLACK AND WHITE document. I would like to optimize the file size, but I have a few questions about that.
Currently running:
Windows 7
Acrobat Pro X
Epson GT-S80 High-speed scanner
1. What is a good typical workflow? I have tried scanning the documents to PDF using the scanner's software then opening them up in Acrobat to OCR them. I have tried using Acrobat's Scan feature with OCR being one of the steps in the scanning process. I have tried letting both softwares do their own color mode detection, where they will mix black and white and grayscale to reduce the file size, but have typically told it to stick with grayscale because that gives me the cleanest and clearest document. Does anyone have any recommendations on getting a good quality image and using a mix of black and white, as well as grayscale, or should I keep using just grayscale?
2. I am having some trouble, I think, with the file size. I have a 12 page document I believe was either scanned at 300 dpi or was scanned at full resolution because I used CLEARSCAN, and downsampled everything to 300 dpi. I don't remember exactly, but that file is about 2.20 MB in size, and I think that runs about 185K per page. I would think there could be a way to get a smaller file.
3. For text recognition purposes, this document is not ideal because it is a collection of powerpoint slide sheets (2 - 3 slides per page), and in some cases there is text on top of image in the slides, and it seems very hard to discern.
4. Once a document has been scanned, and OCR has been run on it, I was under the impression that the OCR is in a separate layer, and that (if Searchable Text is chosen), you basically have a scanned image with another layer of searchable text. Because the OCR'd text is "there somewhere", is it possible to remove the scanned image text, and have just the raw recognized text, similar to if I created the document in Word, and created a PDF?
5. Sort of back to number 1, suppose I am stuck with leaving the scanned image behind, and just running OCR, what is the optimal way to reduce the file size of the PDF? I had read that running your scan at 600 dpi may help with the text recognition. The same article suggested doing the higher resolution scan and using the ClearScan because it would  a) recognize the text better and  b) convert the text image to actual text and reduce the file size. From there, should I then just run the PDF optimizer to downsample the images to a certain DPI to further reduce the size?
Hopefully you all can understand what I am saying and help fill in some gaps.
Thanks,
Ian

Let us know if this tutorial helps you with your workflow Acrobat X: Taking the guesswork out of scanning to PDF.

Similar Messages

  • PDF Reduce File Size makes certain pdfs HUGE

    Generally reduce file size works great for me but certain files, those coming from someone using Microsoft Publisher in particular, result in larger pdfs. She sends me newsletters for a website and sometimes the 3 page pdfs are 2-4mb. I tried opening in Preview and re-saving them using the Reduce File Size filter and the result is that they ballon up to 50-100mb!! Info for the file reveals that the pdf producer is GPL Ghostscript 8.15 and the content creator is PScript5.dll Version 5.2.2. I seem to be able to open and reduce file size using Acrobat so I'm not quite sure why Preview is choking on them.

    geekinthegarden wrote:
    Generally reduce file size works great...
    Microsoft Publisher...Ghostscript...PScript5.dll
    I seem to be able to open and reduce file size using Acrobat so I'm not quite sure why Preview is choking on them.
    Hi geekinthegarden- This does not surprise me. There are many variations and reiterations in specifications of the PDF format.
    Going from Ghostscript language interpreter, to Apple in rendering could introduce complications which we will not be able to sort out in this forum encapsulating a complete description of a fixed-layout 2D documents that includes the text, fonts, images, and 2D vector graphics which compose the documents.
    Suffice to say it is not a perfect science.
    Print PDF has a compress PDF file option as well as ColorSynch in your /Utilities folder +reduce file size+ option for PDF's. This may or may not be of use.
    You found your workaround using Acrobat, this is part of the value of having more than one PDF viewer, good computing!

  • Reduce file size makes documents much larger.

    In my new Acrobat 10.1.6, "reduce file size" and "optimize" keep making the document much, much larger. Reduce file size even took a document previously at 3.3 mb and turned it into 24 mb. So I took the same document and went back to my previous Acrobat 5, and it actually reduced the file size as expected. What's going on?

    geekinthegarden wrote:
    Generally reduce file size works great...
    Microsoft Publisher...Ghostscript...PScript5.dll
    I seem to be able to open and reduce file size using Acrobat so I'm not quite sure why Preview is choking on them.
    Hi geekinthegarden- This does not surprise me. There are many variations and reiterations in specifications of the PDF format.
    Going from Ghostscript language interpreter, to Apple in rendering could introduce complications which we will not be able to sort out in this forum encapsulating a complete description of a fixed-layout 2D documents that includes the text, fonts, images, and 2D vector graphics which compose the documents.
    Suffice to say it is not a perfect science.
    Print PDF has a compress PDF file option as well as ColorSynch in your /Utilities folder +reduce file size+ option for PDF's. This may or may not be of use.
    You found your workaround using Acrobat, this is part of the value of having more than one PDF viewer, good computing!

  • How to reduce file size when using batch processing?

    I use File > Process Multiple Files to batch process photos to a smaller file size along with adding my watermark.  I've played with many different settings and no matter what I choose, I can't get my average file size to be less than about 200k.  However, when I've exported the same photos using iPhoto, I can get the file size to about half of that with no difference (to my naked eye at least) in quality.
    I definitely want to keep the height at 768 pixels so that needs to stay constant.
    My current settings in batch processing (average file size = 200 kb)
    Resize images with a height constraint of 768 pixels at 150dpi
    What I've tried:
    Resize at 72dpi (reduced file size by about 5kb)
    convert file to JPEG low quality (reduced file size by about 10kb)
    convert file to JPEG medium quality (not much difference in file size)
    I'm using PSE 10 on a Mac running Lion.
    Thank you in advance for your help!

    You should go with default settings of Optimizer.
    One difference between default settings of Optimizer and Reduce file size is that Optimizer does not guarantee a reduction in file size (if your Optimizer settings lead to an increase in file size, that's what you will get).
    With Acrobat 9, the default setting in Optimizer has an additional setting which would not do an image optimization that results in increase of file size. In that sense it would in most cases give a smaller file.

  • Reducing File Size Acrobat XI

    Hi, I just combined several PDFs into one file and the size is too big to email. I've tried using Save As to save as an optimized PDF, however my only options are to save as a word or text file. Do I need Pro for this feature? Any other suggestions?
    Thanks in advance!
    Lisa

    Save As Other offers Optimize PDF and Reduce File Size in the Pro version. It sounds like your Std version does not include these conversions. You might check the specifications in the AA XI page at Adobe (tech specs as I recall) that compares Reader, Std, and Pro.

  • Adobe Acrobat Pro XI 11.0.06 when I reduce file size or try to optimize, I get this error: The document could not be saved. A number is out of range. I do the exact same thing every month and it works. I did it a few days ago and it worked. I receated the

    Adobe Acrobat Pro XI 11.0.06 when I reduce file size or try to optimize, I get this error: The document could not be saved. A number is out of range. I do the exact same thing every month and it works. I did it a few days ago and it worked. I receated the pdf, I renamed it. tried to do it before I imported more pages. no go. the 16 mg pdf will normally reduce to 5 or 6

    Hi,
    Are you facing the issue with any pdf file?
    Please try updating Acrobat to 11.0.7 and check.
    You might also want to repair Acrobat and see.
    Regards,
    Rave

  • PDF reduce file size filters and CMYK to RGB conversion

    This doesn't seem to be on-topic to this forum, but I'm hoping someone here has the expertise to answer my question. We have some scripts which take a series of press-quality pdfs and use the "reduce file size" filter to prepare them for viewing on the web. We run these scripts on a 10.4 machine, and the filter works very well, reliably reducing file sizes of all sorts of pages.
    When we tried to upgrade the machine, we discovered that the quartz filtering has changed in 10.5 and 10.6. While it's usually an improvement, getting maybe 5-10% better compression ratios, it has become unreliable in that about 5% of my files fail spectacularly -- they blow up to 3, 4, 5, 6 times the original size.
    The other thing that happens is that the 10.5/10.6 filters munge the colors up. I found the solution to this -- in the ColorSynchUtility, make a duplicate of the Reduce File Size filter, and add a Color Management Component called Convert To Profile. This allows me to set a filter that converts the CMYK content to RGB. The problem is that there are about 40 choices of profiles, and it's not at all clear what I should use. Many of them have printer manufacturer's names in them, some say "Adobe", others have cryptic codes (probably referring to various RFCs and schemes). I've tried a couple of the ones that don't look like they are for printers, basically chosen at random. They all produce files of slightly different sizes for the reductions that go well, but on the files that blow up, some filters are better than others. (For example, I have a 5MB page which reduces to 1.4MB with the 10.4 filter, but blows up to 27MB with the "sRGB IE61966-2.1" profile, but only 12MB with the "Adobe RGB" profile.)
    So I have 2 questions:
    1) Is there any way to configure a 10.5/10.6 custom profile so that it behaves as reliably as the 10.4 "stock" PDF Reduce File Size works? It doesn't have to be the most wonderful compression algorithm out there, just so that it never or rarely has a file blow up in size.
    2) For converting press documents to pdfs that are going to go on the web, what is a good "Convert to profile" to use of the 40-some choices on the pull-down menu?

    Cathy,
    You have posted your question in a forum dedicated to the Final Cut Studio application Color. It is a very specialized program to grade (adjust) the color in video/film images. We know nothing regarding PDFs.
    Have you tried posting this on an Adobe support site?
    Good luck,
    x

  • When I move a RAW file from IPhoto to my desktop or Photoshop it changes to a jpeg and reduces in size. How can I get the Raw file across?

    When I move a RAW file from IPhoto on my macbook pro to desktop or Photoshop it changes to a jpeg and reduces in size. How can I get the Raw file to move across?

    I create separate folders based on the year and then the actual date of when I take images. You can make those folders anywhere on any hard drive that is connected to your Mac whether internal or external. I also use the Photoshop Photo Downloader that is included with Photoshop/Bridge and it will create the date folder so all I do is create a Year folder.
    Open Bridge or click on the Bridge icon in PS and in the File menu item in Bridge select "Get photos from Camera". It can be a camera connected to your Mac or a memory card from a camera. A window will open and you then select the camera or memory card. Set the location they will be downloaded to, just the folder and you can Browse to a folder that you created, then in the "Create  Subfolders drop down select what date stamp you want to use or or custom name or not to create subfolders at all.
    I've never cared for iPhoto one bit. I tried it but found it way to restrictive. It likes to have full control over how you interact with your images.

  • I have to zip pdf files to email them to someone with a PC. The problem is they aren't smaller after zipping and I can't email them. I tried adjusting the Quartz filter to the Reduce file size. Now they're smaller, but the recipient can't read them. Help!

    I need to zip pdf files to email them to a PC user. The problem is file size is not reduced.
    I tried zipping them in Win 7 (which I use via Parallels 7) and sending them to the Finder to email.
    But lately nothing is reduced.  I tried choosing the Reduce File Size option in the Quartz Filter when I save the doc. The file size was reduced. However, the recipient could not read the file. Everything was blurry.
    Is there a solution?

    I am having some PDF sizing issues also, I am a BETA tester for TurboCAD Mac, and with the usage of (public) version 5, I have had some scaling issues. I have had (auto sent) 15 plus crash reports, (available as text) and had posted my scenario within the "lion- problems so far" article, so not to use up extra bandwith- or whatever- there are pictures there. I was not able to upload a PDF on site, and the staff @ Turbo CAd is working on it also. this is Lion specific new update, as I have posted many large format PDF's
    Here (edit to add url)
    Thanks for the helps!
    Johnny

  • Reduce File Size and Optimise PDF both increase file size

    Hi All
    I have exported a pdf (our college mag) at smallest file size from Indesign CS4 for online use. The problem is the resultant file size is a little under 19Mb. Using both or either Reduce File Size or Optimise functions simply result in an increase to 22Mb. The document is packed with images however I would have thought one or both of these processes would have helped reduce the file size to a more manageable download file size.
    A pdf is also produced to go to print so simply compressing all images prior to imcuding them in Indesign is not a preferable option - this would mean creating two separate Indesign documents, something which I would rather avoid if possible.
    Any help would be greatly appreciated as always.
    The file reside here if anyone cares to take a look http://www.ayrcoll.ac.uk/index.php?name=UpDownload&req=viewdownload&cid=11&orderby=dateD  It's the latest release (September 09)
    Thanks
    Colin

    You might want to do an audit of the PDF (button in PDF Optimizer) and try to figure out what the various parts are. It shows 58% is content streams (sorry, but I am not sure what that is), 26% is overhead, and only 12.7% is graphics. Since you are not storing bookmarks and such, you might want to try going back to ID and printing to the Adobe PDF printer for a comparison. It may be one technique is more efficient than the other.
    When I used the optimizer, the file did get larger. When I used Reduce File Size, it got slightly smaller. The key may be in figuring out what the various parts are from the audit. You might try copying one page of ID to a new document and play with various versions - different fonts (all were embedded and that is probably best), variations on graphics, etc. That is all I can suggest since I would simply be playing with different ways to produce the document and looking at what is causing the bloat. When the Images are only 12.7% and the fonts less than 1%, there is something going on to produce the size, and those 2 are typically the killers. I would mention tags that tend to bloat, but you do not have any (normally used for assessibility).
    You might want to try some of the preflight checks to find issues. I did the transparency check and got 170 instances in 22 pages. Flattening the page may help. There were some pages that seemed to be very bad. However, I did not get a big file size reduction. I did not get a lot of improvement by printing to a new PDF either. Certain pages seem to be part of the bloat problem. The print took a long time around pages 11 and 22.
    One troubleshooting technique would be extract pages to separate files and look at the details of each page to see what is causing the issue.
    Need to go. Good luck.

  • I use Adobe Acrobat Pro XI (11.0.08) When performing "Save as" "Reduced file size pdf" Adobe processes for a while then completely stops, and has to close down. Just started doing that today.

    I use Adobe Acrobat Pro XI (11.0.08) When performing "Save as" "Reduced file size pdf" Adobe processes for a while then completely stops, and has to close down. Just started doing that today.

    I have Windows 8.1, and when I right click on the Windows flag (bottom left), the menu gives me a "Search" option. When I enter "%temp%", it goes to " "my username"/ appdata/temp" which is a list of file 1718 file folders. Many are empty but I did a small sample. Is that the right place to delete?    I tried Windows- Disk Clean, but that did not help.

  • Colorsync Utility: PDF Compression vs. Reduce File Size filters

    What's the difference between these two? I used Automator to make two applications, which I think I screwed up, because they're not reducing file sizes now. 1 would make 100+ pngs into a pdf and use "reduce file size," I think. 300 MB would become 80 MB, e.g. I would then use a not-so-great PDF application called PDFStudio9 (because I lost my Acrobat DVD and couldn't get a replacement) to OCR that reduced-size pdf. At which point I would run the other application, which I think compressed the OCR'd pdf. Which would be about 40 MB at the end. But I'm not sure if that's the order of compression I used.
    So what's the difference between the two types of filters, and when would I want to use each for ultimately creating a non-titanic-sized pdf?

    Cathy,
    You have posted your question in a forum dedicated to the Final Cut Studio application Color. It is a very specialized program to grade (adjust) the color in video/film images. We know nothing regarding PDFs.
    Have you tried posting this on an Adobe support site?
    Good luck,
    x

  • Reduce file size not working

    I have Adobe Acrobat 7 Professional.  I scanned an 868 page text only document using two different professional scanners at my work, Xerox Docucolor and Canon DR-6030 Scanner.
    The scanned settings were 300 dpi, black and white, duplex.  No photos, not even line art.
    1.  The result was a 32 mb file.  I don't know why it's so big.  It should be no more than 4-6 mb. 
    2.  Furthermore, the file size remained unchanged after I used the reduce file size command.
    I am baffled, any help would be appreciated.
    Thank you.

    Try the PDF Optimizer. You could also try a ClearScan OCR. If you did OCR, but did not do ClearScan, then you are simply looking at pictures of your scan. The ClearScan will replace photos of text with actual text and a text file. You have left a lot out of what you have done in the scan. Also, scanners or the translations often give you a 24-bit graphic file even if it is really only B&W. Often the best results for OCR are gotten from a Gray Scale form in my experience. So, the structure of your PDF may be the issue. The PDF Optimizer also has an audit feature to show where most of the memory is being used.

  • Reducing file size #2...Preview, Quartz Filter vs Adobe Pro Optimize

    Questions on reducing a pages to pdf file…I will post each question seperatly.
    2) I read that you can reduce file size of a pdf in preview with a quartz filter. I created my own filer and it worked as expected. But, I also have Adobe Acrobat Pro, with save as PDF Optimized. The pro seems to have much more capability than the quartz filters.
    Is one better that the other to use? The adobe optimize (standard settings) took it from 20 to 6 megs.
    THanks, Bob

    The Adobe Acrobat settings you chose are probably using .jpeg to reduce the file size.
    .jpegs are lossy. ie You lose detail and sharpness the more you compress the image.
    The Quartz filters are usually of very high quality, but they are a black box and you need to understand what the settings are in each one. Quartz filters are extremely powerful, fast and as I said usually high quality but I suggest you experiment and see if they meet your needs.
    Peter

  • Reduce File Size in Acrobat Pro (9.5.5) Corrupts Graphics in PDF Documents - shows up as black image

    Whenever I use Reduce File Size in Acrobat Pro (9.5.5), sometimes some of the images (not all) get corrupted and show up as a black image in the new document.
    Actually, the new reduced document looks okay when viewed in Acrobat, but the problem shows up when viewed in the Preview application on the Mac.
    I'm using Acrobat 9.5.5 with Mac OS 10.8.6
    I've tried re-importing the graphic into a new graphic box, which didn't work.  Thinking there may be some type of corruption with the actual graphic file, I then tried viewing the graphic, then taking a screenshot of it to create a completely new file, and then re-importing the new graphic file into the original document (created in Adobe In Design 5.0.4).  I then export the new document as a pdf, and brought it into Acrobat Pro to do the Reduce File Size.  Same thing happens - black box appears where the graphic was.
    I then tried using the Reduce File Size within the Save As function of the Preview application on the Mac - while the graphic remains intact, many of the other graphics in the document are "reduced" too much, to the point where the image quality is seriously degraded, and therefore not usable.
    Any other ideas?

    Hi Anoop,
    I can share the graphic file, but not the pdf which contains it (as it contains confidential information) - thanks!

Maybe you are looking for