Highlight scanned PDF parts and OCR

The problem is:
I have about 500 books scanned and I would need only the important parts.
I have many books that is 4-500 pages and I would need only about 10 of them all together, but into various files.
I have to highlight the required text otherwise I will never found it.
I don't see any meaning to OCR all of the pages, since I need only 1% of the material. OCR takes ages on older PC, and the result needs a really powerful machine (if it is 300+ pages)
Than I would need to make a new PDF that contains only the required part, so I can archive the original file and keep only the shrinked version.
Could I highlight some parts some kind of way (with a tablet for example) and afterwards make the OCR on that selected parts only and error correction in another more powerful machine?
It would be way better to have the possibility to highlight and OCR it "real time" then export it and than I can place it to the specific folder/file so it will be not only organized, but searchable as well.
It saves a big bunch of time for the user (specially on an old PC) and also saves resources (power ect)
Thank you in advance.

You can OCR single pages.

Similar Messages

  • Why can I highlight some pdf files and not others

    Why can I highlight some pdf files and not others

    Highlighting only works on text documents; a scanned document (image) cannot be highlighted.

  • Printing off password protected pdfs to then scan them in and OCR them.. a paperless method?

    To circumvent password protection, I often print off, scan in, then OCR these PDFs to
    give me a version I can highlight etc.
    My question is this, is there a papeless method to achieve the same ends?
    A virtual hardcopy?

    Any PDF that can be printed out can be scanned back in...
    I am not looking to MODIFY content and then rediffuse it masquerading as the original, my aim is only to find keywords and highlight them visually generating a dataset of pages extracted from multiple PDFs where this word (or term) exists - this "condensate" is then reviewed by the reader (me) thus the desire to see where the words are!
    Plagiarism is not my goal, distilling 12,000 pages of documents published online by UK government bodies is..... I am trying to understand where an assumed minimum length of a road surface treatment up to stoplines etc. comes from.
    Please don't hesitate not to help me, I can see where you are coming from.

  • Identifying PDF portfolios and OCRed PDFs

    Hi
    We have an application with about 10'000 PDF file attachments.
    Many of those were run through OCR.. Stupidly the users weren't instructed well enough prior doing so. It now occurs that most of those OCR texts have bad quality.
    Another issue is: our application can do fulltext search on the PDFs, many of the files are PDF portfolios which the fulltext enginen cannot "read" (technically, I have been told by the fulltext search engine programmer, PDF portfolios are NOT PDFs ;-) stupid but I can't change that)
    What I now require is help on how to:
    identify PDFs with images which have been run through OCR, so that we can rerun OCR through those PDFs
    identify PDFs which are atually PDF portfolios, so that we can (maybe automatically, maybe manually) convert them to normal PDFs
    I don't expect any prebuilt solution...
    we would even pay someone to help us out here. The data within those PDFs is crucial for our whole enterprise.
    I tried already some of the javascript apis... but no luck... maybe there are other tools which can help us here?
    I am thankful for any pointers and help in this topic
    Michael

    Testing for whether OCR has been performed may be tough. Preflight can report on hidden text objects, but this probably wouldn't be useful to you.
    You can test to see if a document is a portfolio using the collection property of a document (and/or test to see if there are any file attachments). This can be done using JavaScript in a batch sequence.

  • Calibre type function in organizing my scanned pdf books and make it functional for index

    After scanning books and articles into pdf digital format whats next? Need to make it functional. Index books and able to use it like a library function.
    Medical info - type in words and search the library of digital files? How is it done.

    Hi Titonull,
    You may check this video to create bookmarks: http://tv.adobe.com/watch/acrobat-x-tips-tricks/how-to-add-pdf-bookmarks-to-a-document/
    Also you may check other video tutorial for Acrobat: Acrobat X Tips & Tricks | Adobe TV
    Regards,
    Ajlan Huda.

  • Hp officejet pro 8610 prints solid black on scanned pdf and word doc pictures

    -HP Officejet Pro 8610
    -problem printing scanned pdf document and word doc. photos
    -My new HP Officejet Pro 8610 printed a newly scanned pdf document as an all black picture. I then printed a saved word documtent from my iMac which included a small picture. The document print was acceptable, but the picture was solid black. There were no error messages. These were, my initial tests, to check this new printer.

    Hey there @TM-10 
    Welcome to the Community
    I read through your post about your new Officejet 8610 scanning a document to your computer and printing out all black. I have a couple ideas for you to try.
    Try the steps in this guide: Unexpected Scan Output Using HP Scan Application for HP Multifunction Printers When Using OS X v10.9...
    Press the Power button to turn on the product.
    With the product turned on, disconnect the power cord from the rear of the product.
    Unplug the power cord from the wall outlet.
    Wait at least 15 seconds.
    Plug the power cord back into the wall outlet.
    Reconnect the power cord to the rear of the product.
    If the product does not turn on by itself, press the Power button to turn it on.
    Ensure the printer is plugged in directly to the wall outlet, avoiding power bars and surge protectors. This ensures the printer is receiving full power and may help.
    Good luck
    R a i n b o w 7000I work on behalf of HP
    Click the “Kudos Thumbs Up" at the bottom of this post to say
    “Thanks” for helping!
    Click “Accept as Solution” if you feel my post solved your issue, it will help others find the solution!

  • HP scanned pdf's are incompatible to other users

    I purchased this printer last week.  I used the scanner for the first time yesterday, and all the docs scanned and saved as PDF on my computer.  I am able to open them, but when I emailed them to other coworkers, they could not open those scanned pdfs.  It said the pdf was not a supported file type or was not correctly decoded.  I can send them other non-scanned PDF's and they are able to open them.
    I rescanned the docs as JPEG, and the pictures appears on my desktop, but other coworkers still cannot open them because it said there was no content.

    Hi there @dibarra 
    Welcome to the community
    I will certainly do my best to help you with the scanning issue you are facing! However, I am going to ask that you please let me know some more helpful information to help me research the problem for you.
    Please respond to me with the following. If there are any steps here that you have not tried, please try them before responding and include the result:
    What printer(s) are installed on the computer?
    How is the printer connected (USB/ wireless/ wired/ Bluetooth)?
    What is the Operating System of the computer?
    Have you tested hardware functionality (made copies)?
    Can you print?
    Have you tried uninstalling and reinstalling the software?
    If you're running Windows, run the Print and Scan Doctor and include the results.
    Is the printer plugged directly into the wall outlet (avoiding power bars and surge protectors)?
    Have you completed all Windows/Mac OS Updates?
    Have you tried using a different USB/Ethernet cable?
    Thank you!
    R a i n b o w 7000I work on behalf of HP
    Click the “Kudos Thumbs Up" at the bottom of this post to say
    “Thanks” for helping!
    Click “Accept as Solution” if you feel my post solved your issue, it will help others find the solution!

  • How can users who have Acrobat Reader only save scanned pdf files so that the text on them is searchable using ctrl-F?  I just use the recognize text with ocr feature in the full version of Acrobat and this seem to do the trick. Reader doesn't work!

    Our users have scanned pdf files they want to be able to search using ctrl-f.  I got them to be searchable by doing a recognize text using ocr with Acrobat Professional vesion 8.  They want to know if they can make the files searchable with Acrobat Reader only or if they need the full Acrobat Professional software to make the files searchable.
    Thanks for the help!!
    Ken K. - 2191

    To clarify a bit they need to have Adobe Acrobat, not Adobe Reader. Reader has not been associated with the Acrobat name for 3 or more versions. The process you are asking about is a creation process - the purpose of Acrobat - and NOT a reading feature.

  • Scanned and OCR'd PDF--OCR content is not indexed

    I am setting up a new SharePoint 2013 install, and have put a handful files in a doc library to test search. The content has been indexed, and I can find the content inside many files and file types without issue--including "native" PDF files.
    However, it doesn't seem to index the content of a scanned and OCR'd (text with image overlay) PDF. I have verified that the text is indeed in the OCR text by copying and pasting phrases, and I also confirmed that the crawl log shows the file as successfully
    crawled. The filename is also indexed.
    So... it would seem that the SharePoint 2013 indexer does not index the text in scanned and OCR'd PDF files. Am I missing something? Can anyone else confirm this behavior?
    Thanks!
    Ryan

    To clarify:
    - From what I've read, iFilters can still be installed, but as Mikael said, they can't override the built-in file format handlers in 2013. 2013 has a built-in handler for PDFs, whereas previous versions required a PDF iFilter for indexing PDFs that have
    text content. If one could install the Adobe PDF iFilter in 2013 successfully, it would resolve the issue in this thread, but PDF iFilters don't work in 2013.
    - Aquaforest makes a product that OCRs PDF files. That takes an image-only PDF and makes the
    file searchable, but it is not an indexer. Rather, it enables an index engine to make a big
    collection of OCR'd PDF files searchable via a search engine.
    - The built-in PDF handler in 2013 does index native PDFs. It does
    not index OCR'd PDF files.
    So, that's the issue for which I submitted the ticket to Microsoft. In our case, we don't need to OCR our PDF files--they are already OCR'd. But they don't show up in searches.
    (Regarding Aquaforest... I've talked with someone there previously--for a non-SharePoint DMS--and they seem to make a cool product, but I don't have any personal experience using it.)

  • Once i scan a document and create a pdf.  how do i then change the fields on the document

    once i scan a document and create a pdf.  how do i then change the the info on the document.  i have to update a couple of the columns and cannot figure out how. please help.

    You are probably only looking at a graphic. You have to run OCR before you can do the corrections. For your use, you would need the ClearScan option. Once you have a text based document, you can then use the Text Touchup Tool. Expect to have problems. Editing a PDF is not easy to do well and is not really recommended, except for minor edits. Wordwrap and such features are not part of the editing options. It would be best if you could get an electronic copy of the original, edit that, then create a new PDF.

  • Scanning and OCR

    After scanning and OCR, when an attempt is made to seach the document, instead of locating the desired text, only a square box appears in the upper left corner of the document.  Only after running OCR again from within the Adobe interface does the document become searchable. Any ideas are welcome.
    Thank you.

    Don't have the scan profile do OCR.
    OCR after you have the scanner's output image in the PDF file.
    Then OCR Searchable Image (Exact) is available as a choice when you initiate OCR.
    Using Acrobat XI Pro you can build an Action that calls out the use of Searchable Image (Exact).
    OCR a directory of PDFs that hold the scanner output images.
    Close out the Action with a Save As.
    Be well...

  • PDF files are OCR'd and searchable in Preview, but Spotlight is unable to find

    I use Microsoft OneDrive to sync all of my scanned invoices, bills, and receipts for my business to my home and work computers.  At the office I have a newer HP desktop with Windows 8.1 with OneDrive built in natively.  At home I use a 4+ year old iMac running Mavericks and the OneDrive app from the Appstore.  Everything syncs just fine.
    The problem is that when I want to search for invoices coded to one of our job codes, "5801" for example, Spotlight is unable to locate anything.  However, when I open a scanned pdf containing "5801", it is able to easily find it and highlight it within Preview. 
    My Windows 8.1 desktop has the ability to search "5801" and find every pdf containing "5801", so I know it's possible.
    I have installed the latest OneDrive updates, removed the OneDrive folder from Spotlight indexing twice (to then reindex), and nothing has worked.
    Any ideas?

    Can you search on other terms in OneDrive and find results in Spotlight?
    Reindex Spotlight files
    Open Spotlight in System Preferences.
    Verify that PDF Documents is checked.
    Under Privacy add OneDrive folder, wait a bit then remove to force Spotlight to reindex your OneDrive files.
    You might want to consider running the combo updater over your install to refresh your files. This has been known to fix odd issues.
    Combo updater: Mac OS X 10.6.8 Update v.1.1 - 1.09 GB
    http://support.apple.com/kb/DL1399
    MORE INFO ON WHY RUNNING COMBO FIXES ISSUES
    Apple updates available from the Software Update application are incremental updates. Delta updates are also incremental updates and are available from Apple Downloads (software updates are generally smaller than delta updates). The Combo updates contain all incremental updates and will update files that could have become corrupted.
    Combo updaters will install on the same version as they're applying--no need to roll back or do a clean install. So if you think you've got a borked 10.6.8 install from a regular update, just run the 10.6.8 Combo Updater on that system.
    "Delta" updaters can only take you from one version to the next. For example: 10.6.7 to 10.6.8. If somehow the 10.6.8 is missing something it should have, and that something isn't changed between 10.6.7 and 10.6.8 it will still be stale after the delta update.

  • Are scanned PDF files searchable without using OCR?

    I want to scan documents into PDF format and would like to be able to search and index content, is there a way to do this without running OCR against it first?  If not is there a product or iFilter that can do OCR and index the context without actually changing the format of the document itself?

    It is not possible to search a raster graphic for text, since that is what a scan is before it is OCRd. It is possible to OCR without changing the format of the document. Acrobat has the capability to insert the characters behind the graphic rather than replace the graphic.

  • 2 questions - I have downloaded the PDF Reader - Annotate, Scan , Fill Forms and Take Notes app to m

    When I fill out a form within this app and email it out the recipient receives a blank form, I looked in the manual and it says make sure you complete the saving process, but cannot seem to figure that part out.
    Also, these are forms I will be using on a daily basis and do not want to make permanent changes to the form, is this possible? Or will I need to create several copies of the form?
    Please help, as this is time sensitive subject.

    Can you please tell us what the recipient is using to view the PDF? Many viewers don't support viewing filled PDF forms and the recipient will need to use a PDF viewer like the Adobe Reader to view the form.
    You can make a copy of your form like this:
    In the file browser under Documents view, you should see an Edit button. Tap the edit button.
    Tap the file you want to copy
    Tap the duplicate/move button (two small documents at the bottom of the screen)
    Tap Duplicate
    That creates a duplicate copy of the document. You can then use the edit button to rename if you would like.

  • I have scanned a document and saved it to desk top - when I have then attached to an email it says no plug and attachments don't send. How do I add plug in so pdf documents can be attached and opened.

    I have scanned a document and then saved it to desktop. It has saved as a pdf. When I have then attached it to email and sent - an message comes up saying no plug ins - The recipent couldnt open files. How can I rectify this so that I can attach pdf pleaase

    search the app store for PDF Writer. I've found a lot that will convert documents to PDF, but none yet that will write within the PDF. One thing you want to avoid are cloud based apps. Any of them that talk about editing on the cloud, etc, aren't going to be as standalone as you want.
    It's possible, if you can take your template PDF, turn it into a word document that you can edit, you can then convert that to PDF...kinda a workaround way to do what you want. And apps that convert to PDF are much easier to find

Maybe you are looking for