Identifying PDF portfolios and OCRed PDFs

Hi
We have an application with about 10'000 PDF file attachments.
Many of those were run through OCR.. Stupidly the users weren't instructed well enough prior doing so. It now occurs that most of those OCR texts have bad quality.
Another issue is: our application can do fulltext search on the PDFs, many of the files are PDF portfolios which the fulltext enginen cannot "read" (technically, I have been told by the fulltext search engine programmer, PDF portfolios are NOT PDFs ;-) stupid but I can't change that)
What I now require is help on how to:
identify PDFs with images which have been run through OCR, so that we can rerun OCR through those PDFs
identify PDFs which are atually PDF portfolios, so that we can (maybe automatically, maybe manually) convert them to normal PDFs
I don't expect any prebuilt solution...
we would even pay someone to help us out here. The data within those PDFs is crucial for our whole enterprise.
I tried already some of the javascript apis... but no luck... maybe there are other tools which can help us here?
I am thankful for any pointers and help in this topic
Michael

Testing for whether OCR has been performed may be tough. Preflight can report on hidden text objects, but this probably wouldn't be useful to you.
You can test to see if a document is a portfolio using the collection property of a document (and/or test to see if there are any file attachments). This can be done using JavaScript in a batch sequence.

Similar Messages

  • An existing PDF portfolio and want to convert it to Acrobat's Portfolio but when I pull it in

    I have an existing PDF portfolio and want to convert it to Acrobat's Portfolio but when I pull it into Portfolio is shows one document but does not show the 55 pages that make the exisiting portfolio of work.  Help, all I can seem to find are directions to pull in every photo individually.

    Dave, he is asking the same thing I want to ask.
    He has a pdf portfolio and what he actually wants is a basic pdf.  (his pdf portfolio has already been made, ie existing, not hypothetical!)
    I cannot find the answer either... extremely frustrating since I was not ever trying to create some flash based slide show, but a proper PDF, and inadvertently created the portfolio when choosing to Create PDF From File... You would THINK the default behaviour would then be to create a PDF, and NOT a portfolio pdf!!!!!
    Really annoying and frustrating... bad job Adobe!
    Is there a way to convert back to basic pdf??
    Mark

  • Troubleshooting Adobe Acrobat PDF Portfolios and Flash Player

    Urgent Issue. We have a large number of corporate reference files in Adobe PDF Portfolios, created in Adobe Acrobat Pro 9.
    (1) Since Adobe moved the Flash Player to a separate stand-alone app, the contents of the existing PDF Portfolios are not visible or accessible.
    (2) I created a new PDF Portfolio to troubleshoot the problem. After adding files to the Portfolio the files are not visible either. Adding the same file a second time triggers a message that the file already exists, but the file is not visible in any of the layouts.
    I get a message that I need to have Adobe Flash Player as a separate app when I open existing PDF Portfolios and it provides a link to the download page. But my computer already has Flash Player 11 installed as a separate app.
    How do I get Flash Player and Adobe Acrobat 9 to play together to restore the original functionality?

    I found the followinginformation on Adobe’s website which indicates rendering flash content wasturned off in v9.5.1 - http://blogs.adobe.com/asset/2012/04/background-on-security-bulletin-apsb12-08.
    html
    When you get the error, itrefers you to a webpage with a link for a flash player download:
    http://helpx.adobe.com/acrobat/kb/reader-acrobat-flash-player-download.html#mai
    n_download
    Please note that The Adobe flash player you have installed on your machien is different from the NPAPI version of flash player. So you need to have both of these versions of Falsh player installed on your machine.

  • Scanned and OCR'd PDF--OCR content is not indexed

    I am setting up a new SharePoint 2013 install, and have put a handful files in a doc library to test search. The content has been indexed, and I can find the content inside many files and file types without issue--including "native" PDF files.
    However, it doesn't seem to index the content of a scanned and OCR'd (text with image overlay) PDF. I have verified that the text is indeed in the OCR text by copying and pasting phrases, and I also confirmed that the crawl log shows the file as successfully
    crawled. The filename is also indexed.
    So... it would seem that the SharePoint 2013 indexer does not index the text in scanned and OCR'd PDF files. Am I missing something? Can anyone else confirm this behavior?
    Thanks!
    Ryan

    To clarify:
    - From what I've read, iFilters can still be installed, but as Mikael said, they can't override the built-in file format handlers in 2013. 2013 has a built-in handler for PDFs, whereas previous versions required a PDF iFilter for indexing PDFs that have
    text content. If one could install the Adobe PDF iFilter in 2013 successfully, it would resolve the issue in this thread, but PDF iFilters don't work in 2013.
    - Aquaforest makes a product that OCRs PDF files. That takes an image-only PDF and makes the
    file searchable, but it is not an indexer. Rather, it enables an index engine to make a big
    collection of OCR'd PDF files searchable via a search engine.
    - The built-in PDF handler in 2013 does index native PDFs. It does
    not index OCR'd PDF files.
    So, that's the issue for which I submitted the ticket to Microsoft. In our case, we don't need to OCR our PDF files--they are already OCR'd. But they don't show up in searches.
    (Regarding Aquaforest... I've talked with someone there previously--for a non-SharePoint DMS--and they seem to make a cool product, but I don't have any personal experience using it.)

  • Highlight scanned PDF parts and OCR

    The problem is:
    I have about 500 books scanned and I would need only the important parts.
    I have many books that is 4-500 pages and I would need only about 10 of them all together, but into various files.
    I have to highlight the required text otherwise I will never found it.
    I don't see any meaning to OCR all of the pages, since I need only 1% of the material. OCR takes ages on older PC, and the result needs a really powerful machine (if it is 300+ pages)
    Than I would need to make a new PDF that contains only the required part, so I can archive the original file and keep only the shrinked version.
    Could I highlight some parts some kind of way (with a tablet for example) and afterwards make the OCR on that selected parts only and error correction in another more powerful machine?
    It would be way better to have the possibility to highlight and OCR it "real time" then export it and than I can place it to the specific folder/file so it will be not only organized, but searchable as well.
    It saves a big bunch of time for the user (specially on an old PC) and also saves resources (power ect)
    Thank you in advance.

    You can OCR single pages.

  • Scanner and OCR don't Work in Acrobat 10.1.7

    Up until yesterday, I was able to scan and OCR just fine.  Today I realized that e-mailed pdf's did not OCR, instead, I got an "Unknown Error" on every non searchable pdf I was sent.  I uninstalled and reinstalled, restarted, etc. and now, not only does the OCR not work, but now whenever I scan from my brother MFC scanner that worked fine yesterday, the scanning processing goes on forever with only black pages showing.  For instance, a one page BW scan showed over 10 pages before I exited the process.
    Also, e-mailing is much slower.
    Interestingly, my Scansnap works fine for OCR (though I'm sure that isn't using Acrobat) and search of pre-OCR'd docs work fine.  It's just frustrating that suddenly no docs e-mailed or faxed to me electronically can now be OCR'd.
    I've tried printing to acrobat to no avail.  I'd hate to have to print and re-scan just for OCR.
    Here's the system info
    Available Physical Memory: 947356 KB
    Available Virtual Memory: 3689228 KB
    BIOS Version: TOSINV - 1
    Default Browser: C:\Program Files\Internet Explorer\iexplore.exe
        Version: 10.00.9200.16521 (win8_gdr_soc_ie.130216-2100)
        Creation Date: 2013/07/10
        Creation Time: 9:00:54 AM
    Default Mail: Microsoft Office Outlook
        mapi32.dll
        Version: 1.0.2536.0 (win7_rtm.090713-1255)
        Creation Date: 2011/04/05
        Creation Time: 3:34:16 PM
    Graphics Card: NVIDIA GeForce 310M
        Version: 8.7.2.47873
        Check: Not Supported
    Installed Acrobat: C:\Program Files (x86)\Adobe\Acrobat 10.0\Acrobat\Acrobat.exe
        Version: 10.1.7.27
        Creation Date: 2013/05/10
        Creation Time: 3:57:36 AM
    Locale: English (United States)
    Monitor:
        Name: NVIDIA GeForce 310M
        Resolution: 1600 x 900 x 60
        Bits per pixel: 32
    OS Manufacturer: Microsoft Corporation
    OS Name: Microsoft Windows Vista
    OS Version: 6.1.7601  Service Pack 1
    Page File Space: 4194303 KB
    Processor: Intel64 Family 6 Model 37 Stepping 2  GenuineIntel  ~2128  Mhz
    System Name: OWNER-PC
    Temporary Directory: C:\Users\Owner\AppData\Local\Temp\
    Time Zone: Eastern Standard Time
    Total Physical Memory: 4053856 KB
    Total Virtual Memory: 4194176 KB
    User Name: Owner
    Windows Directory: C:\windows

    I'm scanning for malware now.  The "slow email" was not described well, I meant that the "send" command in acrobat is particularly slow.  My main issues are the sudden lack of OCR capability within Acrobat and the scanner issues.
    I'll see what happens with the malware scan and in the meantime any help would me much appreciated.

  • Acrobat 9 OCR and "OCR Suspects"

    I downloaded the trial for version 9.
    Took a poorly scanned page and OCR'd it.
    It (expectedly) had a few errors.
    Then I selected "OCR Suspects" from menus.
    What it should have done is found the "low confidence" results, but
    instead, it said no OCR suspects were found.
    This used to work in version 8, but I can't get 'OCR suspects' working in V9 trial.
    Can anyone confirm if this works in the full version of Acrobat 9 Pro or Standard?

    It's strange that while I posted to this Adobe forum, there is a response over at objectmix.com. As contributing to this topic from 2 locations seems confusing, I'll carry on here.
    Amannagpal76 responded, saying in part that ClearScan in 9 Pro replaces Formatted Text & Graphics. Good to know this. ClearScan does, however, continue the mix. If ocr doesn't work on a character graphic, that graphic will continue to be displayed as such, amidst ClearScan's synthesized type 3 font imitation of the original font. This is most obvious when using the marquee zoom tool.
    Aman suggests using the Touchup Text Tool and changing the font to any font installed on one's system. This doesn't work for ClearScan. Selecting a different font in Touchup for a PDF that came via a wordprocessor works fine, but not for a PDF that came via a scan. That, unfortunately, is the only time that ClearScan is used. The error message when I try this states that there's no system font to match the one in ClearScan, and text can't be added or deleted.
    ClearScan is remarkable for the small size file it produces. That size can be reduced considerably even further by converting it to the Adobe 7 file format. ClearScan's synthesized font is also remarkable when enlarging the page on screen. Then you can see its true outlines -- rather chewed up in high magnification, but that's OK. It would be nice to extract the font in question and use it on one's system. One downside to ClearScan is that its ocr fails to retain italics when output to RTF and Word.
    I have never found a suspect in 9 Pro.
    The conclusion from the above is that the hidden text produced by any ocr'ing in 9 Pro can't be corrected.

  • Scanning and OCR

    After scanning and OCR, when an attempt is made to seach the document, instead of locating the desired text, only a square box appears in the upper left corner of the document.  Only after running OCR again from within the Adobe interface does the document become searchable. Any ideas are welcome.
    Thank you.

    Don't have the scan profile do OCR.
    OCR after you have the scanner's output image in the PDF file.
    Then OCR Searchable Image (Exact) is available as a choice when you initiate OCR.
    Using Acrobat XI Pro you can build an Action that calls out the use of Searchable Image (Exact).
    OCR a directory of PDFs that hold the scanner output images.
    Close out the Action with a Save As.
    Be well...

  • SAP Portfolio and Project Management: u201CCustomer Connectionu201D

    SAP Portfolio and Project Management: u201CCustomer Connectionu201D
    SAP Portfolio and Project Management is one of the first focus topics of the [Customer Connection initiative|https://service.sap.com/influence].
    The related developments are delivered by SAP Notes and/or by Support Package (SP) only (in case delivery by SAP Note is not possible). Target releases are 5.0 (for all points) and 4.5 (for those points where it is technically possible and reasonable to downport).
    [Composite SAP Note 1631964 |https://service.sap.com/sap/support/notes/1631964]collects and links all SAP Notes released for Customer Connection for SAP Portfolio and Project Management. It also includes a PDF file giving an overview of all released Customer Connection PPM developments as well as a PDF file with details about each released Customer Connection PPM development.
    An overview about all influence channels for customers can be found under [https://service.sap.com/influence|https://service.sap.com/influence].
    Kind regards,
       Florian

    Hi
    Thanks for your Valuable information -Customer connection PPM.
    Regards
    PP

  • How can I remove asm and ocr installation in AIX?

    Hi,
    I try to install single instance with using ASM in AIX.
    But I did not make successfully.
    Now I want to remove ASM and OCR installation then
    I will plan to make new clear installation.
    How can I remove asm and ocr ??
    Or How can I control my removing is fully correct ?

    1) ASM Instance Clean-Up Procedures
    Stop all of the databases that use the ASM instance that is running from the Oracle home that is on the node that you are deleting.
    On the node that you are deleting, if this is the Oracle home which from which the ASM instance runs, then remove the ASM configuration by completing the following steps. Run the command srvctl stop asm -n node_name for all of the nodes on which this Oracle home exists. Run the command srvctl remove asm -n node for all nodes on which this Oracle home exists. If there are databases on this node that use ASM, then use DBCA Disk Group Management to create an ASM instance on one of the existing Oracle homes on the node, restart the databases if you stopped them.
    If you are using a cluster file system for your ASM Oracle home, then ensure that your local node has the $ORACLE_BASE and $ORACLE_HOME environment variables set correctly. Run the following commands from a node other than the node that you are deleting, where node_number is the node number of the node that you are deleting:
    rm -r $ORACLE_BASE/admin/+ASMnode_number
    rm -f $ORACLE_HOME/dbs/*ASMnode_number
    If you are not using a cluster file system for your ASM Oracle home, then run the rm or delete commands mentioned in the previous step on each node on which the Oracle home exists.
    2) Deleting an Oracle Clusterware Home Using OUI in Silent Mode
    !!! Oracle recommends that you back up your voting disk and OCR files after you complete the node deletion process.
    If you ran the Oracle Interface Configuration Tool (OIFCFG) with the -global flag during the installation, then skip this step. Otherwise, from a node that is going to remain in your cluster, from the CRS_home/bin directory, run the following command where node2 is the name of the node that you are deleting:
    ./oifcfg delif –node node2
    Obtain the remote port number, which you will use in the next step, using the following command from the CRS_home/opmn/conf directory:
    cat ons.config
    From CRS_home/bin on a node that is going to remain in the cluster, run the Oracle Notification Service Utility (RACGONS) as in the following example where remote_port is the ONS remote port number that you obtained in the previous step and node2 is the name of the node that you are deleting:
    ./racgons remove_config node2:remote_port
    On the node to be deleted, run rootdelete.sh as the root user from the CRS_home/install directory. If you are deleting more than one node, then perform this step on all of the other nodes that you are deleting.
    From any node that you are not deleting, run the following command from the CRS_home/install directory as the root user where node2,node2-number represents the node and the node number that you want to delete:
    ./rootdeletenode.sh node2,node2-number
    If necessary, identify the node number using the following command on the node that you are deleting:
    CRS_home/bin/olsnodes -n
    Perform this step only if your are using a non-shared Oracle home. On the node or nodes to be deleted, run the following command from the CRS_home/oui/bin directory where node_to_be_deleted is the name of the node that you are deleting:
    ./runInstaller -updateNodeList ORACLE_HOME=CRS_home
    "CLUSTER_NODES={node_to_be_deleted}"
    CRS=TRUE -local
    Deinstall the Oracle Clusterware home from the node that you are deleting using OUI as follows by running the following command from the Oracle_home/oui/bin directory, where CRS_home is the name defined for the Oracle Clusterware home:
    ./runInstaller -deinstall –silent "REMOVE_HOMES={CRS_home}"
    Perform step 9 from the previous section about using OUI interactively under the heading "Deleting an Oracle Clusterware Home Using OUI in Interactive Mode".

  • Start Of Ramp-Up for SAP Portfolio and Project Management 5.0

    Start Of Ramp-Up for SAP Portfolio and Project Management 5.0
    Starting with this new release, the application SAP Portfolio and Project Management 5.0
    replaces both the SAP Resource and Portfolio Management (SAP RPM) application and the
    Collaboration Projects (cProjects) application.
    Start of ramp-up for SAP Portfolio and Project Management is 19th of April, 2010. The end
    of ramp-up is currently scheduled for 19th of October, 2010.
    Functional Innovations And New Features
    A detailed description of all new and/or enhanced functional innovations and features can
    be found in the Release Notes:
    [http://service.sap.com/releasenotes |http://service.sap.com/releasenotes]
      -> SAP Solutions
        -> Release Notes SAP Portfolio and Project Management
    There are some SAP Notes which are in general very important for SAP Portfolio and Project
    Management 5.0 and which also serve as central points of entry to find import information:
    SAP Note [1377104|https://service.sap.com/sap/support/notes/1377104]      FAQs - SAP Portfolio and Project Management 5.0
    SAP Note [1402912|https://service.sap.com/sap/support/notes/1402912]      PPM 5.0: Supported Browsers, Java versions, etc. 
    SAP Note [1411953|https://service.sap.com/sap/support/notes/1411953]      PPM 5.0: Configuration Content
    SAP Note [1436778|https://service.sap.com/sap/support/notes/1436778]      SAP Portfolio and Project Management 5.0: Restrictions
    Kind regards,
       Florian

    Thanks very much for taking the time to post the info Florian. I will update this thread as well if I run into any new information.
    Do we have any idea on SAP's direction for Product Definition? PD is still version 2.0 and I heard that PD functionality will be incorporated into PPM 5.0 which does make a lot of sense. I very briefly went through the notes and config doc in this post and did not get the impression that PPM has any idea and concept management capabilities.

  • General Availability of SAP Portfolio and Project Management 5.0

    Hi,
    SAP Portfolio and Project Management 5.0 is now in unrestricted shipment (GA). The unrestricted shipment is based on SP04 of SAP Portfolio and Project Management 5.0. See also SAP Note [1377104|https://service.sap.com/sap/support/notes/1377104] (FAQs - SAP Portfolio and Project Management 5.0).
    Kind regards,
       Florian

    Thanks for update...
    So its now available for download...
    PlayStation PPM available, better and improved..
    Niranjan

  • Difficulty navigating to and identifying my RAW and JPEG Images

    Hi,
    I have several related issues that I would appreciate help with.
    I am finding difficulty navigating to and identifying my RAW and JPEG images in Aperture. I do actually principally work with JPEG and only use RAW when I perceive there to be a benefit by improving a poorly captured image.
    To give you some background.  I am using an iMac OSX 10.8.3 and Aperture 3.4.3 Camera Raw 4.04. When I Import images I import both RAW and JPEG using RAW as the original.
    At the Import stage both RAW and JPEG thumbnails are displayed. However once imported only one thumbnail is displayed in the Library on some occasions this will be the JPEG and on other occasions the RAW (as identified from the Info tab. How can I select which version to work with?
    I would appreciate assistance with this.
    Regards,
    John

    It is possible to see the Raw master along side the Jpg master if you wish.
    When a version is created (make new version from original) it is created off the original that is currently selected. So if the jpg image is the current original all versions created will be made off the jpg image. Likewise if the raw is the current original all versions created will be made off the raw image.
    Also keep in mind that a version that has no adjustments applied to it is identical to the original it was made from.
    So to get both along side each other do:
    Set raw as original, create new version from original. Label this version Raw original version. Now switch to jpg as original. Make a version from this and label it Jpg original version.
    Just ensure you never apply adjustments to these two images and you will always have both the jpg and raw images available to compare.
    In addition if say the jpg is the original and you want to make a version from the raw, instead of switching the jpg and raw you can just go to the Raw original version and duplicate it.

  • Ipod not identified in vista and not syncd with xp

    Hi,
    I´ve an ipod nano 3rd Gen and since 10 days I can´t get it to be identified by itunes in my vista - based notebook ( i get a message with a yellow warning saying ´an ipod was recognized but couldn´t be properly identified - disconnect, reconnect and try again´) , and when i use my wife´s notebook, windows xp based, my ipod shows up in itunes but i get an error message saying ´unexpected error - ipod could not be sync´d ( error -69) ´. In the xp based notebook all files show as if i was seeing my ipod, but i can not add any files, and when i try to erase them, itunes looks like it erases them but they are still in ipod afterwards. In this xp based notebook itunes sometimes prompts me to try and restore factory settings by downloading them from itunes, but it does not complete the operation either. In the vista notebook I have tried several steps, suchs as reinstalling itunes, deactivating some possible faulty services, renaming the driver, resetting ipod, and so on. Someone can help me ?

    Hi there rebtop2,
    I would recommend taking a look at the troubleshooting steps found in the article below.
    iPod nano: Error message saying that iPod 'could not be identified properly'
    http://support.apple.com/kb/TS3218
    -Griff W.

  • Why logical partition is a must for voting disk and OCR

    Hi Guys,
    I just started handling jobs for RAC installation, I have a simple question regarding the setup.
    Why does logical partition have to be used for voting disk and OCR?
    I tried partition the disk that were provisioned for voting disk and OCR with primary partition but when OUI is trying to recognize the disk, it cannot find the disk that has been partitioned with primary partition.
    Thank you,
    Adhika

    Hello Adhika,
    I found it on this doc http://download.oracle.com/docs/cd/B28359_01/install.111/b28250/storage.htm
    Be aware of the following restrictions for partitions:
    * You cannot use primary partitions for storing Oracle Clusterware files while running the OUI to install Oracle Clusterware as described in Chapter 5, "Installing Oracle Clusterware". You must create logical drives inside extended partitions for the disks to be used by Oracle Clusterware files and Oracle ASM.
    * With 32-bit Windows, you cannot create more than four primary disk partitions for each disk. One of the primary partitions can be an extend partition, which can then be subdivided into multiple logical partitions.
    * You can assign mount points only to primary partitions and logical drives.
    * You must create logical drives inside extended partitions for the disks to be used by Oracle Clusterware files and Oracle ASM.
    * Oracle recommends that you limit the number of partitions you create on a single disk to prevent disk contention. Therefore, you may prefer to use extended partitions rather than primary partitions.
    For these reasons, you might prefer to use extended partitions for storing Oracle software files and not primary partitions.
    All the best,
    Rodrigo Mufalani
    http://www.mrdba.com.br/mufalani

Maybe you are looking for