Howto find PDFs with scanned content - not OCR'rd

I have several thousand PDFs, where most of them are normal PDFs.
But some of them contain just scanned images of a document, and they are not OCR docs, or even of a quality where OCR is possible.
I want so find these documents, separate them for the real PDFs so to speak. Is this possible ? Are there any tag or information in the PDF that says "Image file and nothing else!" :)
Thank you.
Regards Alexander

I'll take the font check as an example.
> In Acrobat 8 Pro go to advanced>preflight
> Click on edit
> click on the plus icon below the list on the left hand side to add a new custom profile
> give it a name and fill out the details as you like and save it for now
> the new profile is now available in the list on the left hand side of the preflight dialog box
> here click on custom check, click on the plus icon (the middle one from the three available, the tooltip should say: 'create new check and include it in the current profile')
> From the group box select font
> in the property box select font type
> The info box will show the 7 types of font available
> Where it says 'begins with' select 'is contained in list'
> click on Add to list and add all available font types to the list
> click ok and then save
This profile will check if a listed font type is part of the document. if it is it will give an error (as indicated by a red cross in front of the custom check) Since it lists all available font types any document containing fonts will get an error when preflight is run on them.
test the profile on a scanned document (containing no fonts thus no error) and a normal pdf (Containing fonts thus getting an error).

Similar Messages

  • Convert PDF with  Arabic  content to word /Excel .

    I have problem , I plan to buy Adobe acrobat  pro  for my ORG , The main target to buy it , to convert PDF a Arabic  content to word or Excel , I download trial version from Adobe acrobat pro  , and when i convert PDF English content  it's fine , but when i  convert PDF with Arabic content  , the result is bad the  word become letters and  the documents become not readable , Pleas  help me cause the main  target is convert Arabic PDF to Word .
    " the mean problem is  when i convert PDF with Arabic content to word the result is bad the   document become unreadable .
    thanks , wait for replay.

    HI,
    Adobe Acrobat is a special PDF viewer which is technically develop for the whole world to read any PDF file which a user wants to read. It's basic languages are:
    English (US)
    English (UK)
    Arabic
    Italian
    Japanese
    Spanish
    German
    But when you going to export the PDF file the arabic language will not support.
    Regards,
    Florence

  • Producing PDFs with 3D content on Mac OS X

    Hello everybody,
    Point 1: I am looking for the best workflow of producing PDFs with 3D content.
    Point 2: We are design studio that works entirely on Mac platform, so there is no chance of having Acrobat Pro Extended in the workflow. (Yes, of course, buying cheap Windows box or running XP on Intel-based Mac is an option, but I dont favour such a solution and will not go for it unless there will be no other way )
    The ultimate goal is as follows:
    Insert textured 3D object with transparent background into pre-made and designed PDF background file, place it so it seamlessly integrates with the background. Then optionally, add some interactivity like buttons for switching between design options etc.
    What have I achieved so far:
    1. I create textured 3D object with baked shading etc. in Blender.
    2. I export the data into Collada .dae file
    3. I import .dae file into meshlab
    4. From meshlab I export data into u3d - appart 3D file itself, it also creates .tex file which ...
    5. ... I open in TexShop (LaTex GUI frontend), where the 3D PDF is finally compiled and created.
    So, I can do 3D PDF file, but the object has white background and I cannot integrate it into design slide. Thats about it.
    Now the questions:
    Is there someone else trying to do the same stuff so we can share the ideas and knowledge?
    Where is the place to mess with the transparency? Is it to be set in .tex file? Can MacOSX version of Acrobat Pro help?
    Can we disable all lighting when displaying 3D file in PDF and use only shading form the 3D software that is baked into the textures?
    Ideas?
    Thank you,
    Petr Ludvik

    Dear Petr,
    Transparency - upload a minimal example of what you want (do it in Acrobat first).
    Consult PDF spec, MeshLab and Asymptote forums (Asymptote is a 3D drawing tool for LaTeX,
    people there know how to integrate 3D pictures into PDF LaTeX way).
    Light and textures - make the material for the surface to emit white color and set diffusive and specular colors to black.
    Like at http://www.iaas.msu.ru/tmp/u3d/cloudq.pdf.
    Ideas:
    Writing an U3D exporter for Blender is not harder than a VRML one. If you have Blender programming skills it may be a way to go.
    I can consult you, just made an U3D exporter for VTK.
    If your data comes not from Blender/MeshLab/LaTeX are not tools internal for your regular toolchain
    but just a ragtag converter for data that comes from your program - consider exporting directly to U3D
    (there is a text intermediate format, IDTF, that is relatively simple).
    Tell more about your toolchain in use, what is at the beginning (before Blender) and at the end (how you utilize the PDF with just 3D model in it).
                Sincerely, Michail
    PS. Other wording:
    Transparent background - is it doable in Acrobat?
    If yes - look into PDF and see how (what elements of what dictionaries are set),
    having PDF spec at your side.
    Upload a minimal example of a PDF file with desired features somewhere to allow me
    and others to have a look at it.
    Consult LaTeX sources of information to see if LaTeX with movie15 package is capable to do what you want
    (Asymptote forum may be the best place).
    If they do not - they may be fixed (movie15 is in active development) or a separate utility may be used.
    But first - show and, if possible, describe in technical terms what you want to achieve.
    As to lighting - make the material of the mesh you apply texture to to emit white light and
    set diffuse and specular colors to black.
    If you have programming skills and know Blender well - writing an U3D exporter for Blender
    must be not that hard, there is an intermediate simple text format.
    I can collaborate (have just made a VTK exporter).
    And that may be also an option - use VTK instead of Blender
    and you are almost there (well, me exporter does not have texture support now,
    but I can add it if there is someone to test).
    Did you ask this questions on a Meshlab forum?
    If your data comes from your own program - it may be simpler to write
    into U3D immediately (via that text intermediate format, IDTF).

  • Trying to create a pdf with jasper - Content is not allowed in prolog

    hi all,
    Im trying to create a pdf file with jasper reports but i get this error message:
    29/11/2007 16:52:03 org.apache.commons.digester.Digester fatalError
    SEVERE: Parse Fatal Error at line 1 column 1: Content is not allowed in prolog.
    org.xml.sax.SAXParseException: Content is not allowed in prolog.
         at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
         at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
         at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
         at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
         at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1438)
         at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(XMLDocumentScannerImpl.java:899)
         at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
         at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
         at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
         at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
         at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
         at org.apache.commons.digester.Digester.parse(Digester.java:1666)
         at net.sf.jasperreports.engine.xml.JRPrintXmlLoader.loadXML(JRPrintXmlLoader.java:151)
         at net.sf.jasperreports.engine.xml.JRPrintXmlLoader.load(JRPrintXmlLoader.java:103)
         at net.sf.jasperreports.view.JRViewer.loadReport(JRViewer.java:1376)
         at net.sf.jasperreports.view.JRViewer.<init>(JRViewer.java:243)
         at net.sf.jasperreports.view.JRViewer.<init>(JRViewer.java:214)
         at net.sf.jasperreports.view.JasperViewer.<init>(JasperViewer.java:140)
         at net.sf.jasperreports.view.JasperViewer.viewReport(JasperViewer.java:397)
         at net.sf.jasperreports.view.JasperViewer.viewReport(JasperViewer.java:328)
         at br.com.abril.contratos.Gerar.geraRelatorio(Gerar.java:38)
         at br.com.abril.contratos.Gerar.main(Gerar.java:47)

    cause of the error:
    After some extensive research on the web, it is found that that you are using an UTF-8 encoded file with byte-order mark (BOM). Java doesn't handle BOMs on UTF-8 files properly, making the three header bytes appear as being part of the document. UTF-8 files with BOMs are commonly generated by tools such as Window's Notepad. This is a known bug in Java, but it still needs fixing after almost 8 years...
    There are some hexadecimal character at the begining of the file, which is giving error"content not allowed in prolog". No solution is provided for this till now.
    Example: suppose your file start with <?xml version="1.0" encoding="utf-8"?>. you will see this in a editor. But if you open this file with any hexadecimal editor you will find some junk character in the start of the file.that is causing the problem
    solution:
    1) convert your file into a string and then read the file from the forst character that in my case the first character will be "<".so the junk character will not be there in the string and then again convert it into a file.
    2) some people are able to solve this problem by changing the "utf-8" to "utf-16". remember only in some case. some times this problem also exits in "utf-16" file.
    3) some are able to solve this problem by changing the LANG to US.en.
    4) If the first three bytes of the file have hexadecimal values EF BB BF then the file contains a BOM.so you can also handle by your own.
    5)Download a hexadecimal editor and remove the BOM.
    6) In case you are not able to think furthe then please to more research in internet may be you find some other solution to this problem. But These solution are some type of hack not exactly a solution.

  • How can I create a PDF with embedded fonts (not a subset of the font) from Excel?

    I need to create PDF's from Excel spreadsheets. The PDF needs to have embedded fonts but every time I create one it only has a subset of the embedded font.  I have tried setting preferences in the Acrobat add in (unchecked the "Subset Embedded Fonts" option and also tried with the Subset Embedded Fonts checked but the percentage set to only 1% in an attempt to force full font to be embedded.)
    I also tried opening the resulting PDF with Acrobat Pro XI but could not figure out how to add the font in.
    A method to accomplish the results from either tool would be great.

    Anna;
    Unless you can add those fonts to your system via the Font book and then substitute them in iPhoto, you'll have to either user the available fonts or create your own pages, 8.5 x 11, in an image editor like Photoshop Elements and use them on pages that are one photo per page. Other's have done that with some success.
    If you has PS or PSE create an 8.5 x 11" canvas at 300 dpi. Then you can add your photos, add text (if PS can use the text), and create your own layout.
    Do you Twango?
    TIP: For insurance against the iPhoto database corruption that many users have experienced I recommend making a backup copy of the Library6.iPhoto database file and keep it current. If problems crop up where iPhoto suddenly can't see any photos or thinks there are no photos in the library, replacing the working Library6.iPhoto file with the backup will often get the library back. By keeping it current I mean backup after each import and/or any serious editing or work on books, slideshows, calendars, cards, etc. That insures that if a problem pops up and you do need to replace the database file, you'll retain all those efforts. It doesn't take long to make the backup and it's good insurance.
    I've written an Automator workflow application (requires Tiger), iPhoto dB File Backup, that will copy the selected Library6.iPhoto file from your iPhoto Library folder to the Pictures folder, replacing any previous version of it. You can download it at Toad's Cellar. Be sure to read the Read Me pdf file.

  • How to print a PDF with multimedia content?

    I have a 127 page PDF with flash files and other multimedia embedded. How do  go about printing this to a laser printer?

    Thank you Michael
    My problem is, I need to print the document with the mm items.  I work 
    in a print & copy shop, we must deal with whatever the customers send 
    us.
    I actually stumbled around and found I could do it with the MultiMedia 
    Flash Tool, editing the poster options. It's a tedious process though, 
    and I'd like to find a way to set this for the whole PDF file at once.
    If you know how to do this I'd appreciate the info.
    Thanks again,
    Mark

  • Problem with Exception - "Content not allowed in Prolog"

    I was using org.w3c.dom package to handle all my XML data in JDK 1.4.1 environment. Recently, I have upgraded my Java version to JDK 1.5. When I run my code, while parsing the XML data, it is throwing an Exception - "Content not allowed in Prolog". I know that this Exception is because the XML data does not contain xml declaration in prolog.
    But, I have the code that queries the database and builds the records in XML format. The sample XML is as follows.
    <list>
    <row>
    <fld1>Val1</fld1>
    <fld2>Val2</fld2>
    </row>
    <row>
    <fld1>Val3</fld1>
    <fld2>Val4</fld2>
    </row>
    </list>
    I want to know why the JVM is validating for XML declaration? Why can't it continue parsing the XML in case it is well formed? Is there any way so that I can skip this validation?
    Please reply. I am in a big trouble because, in case some changes are required, I need to implement in many places of my API.

    The isELIgnored flag just fixes this page.
    If you want to use EL on your pages a better solution is to update your web.xml file:
    EL Expressions in JSP2.0 containers:
    In order to evaluate EL expressions, your web.xml file must be up to date.
    If web.xml states that it is version 2.3 or less, then EL evaluation is disabled by default for backwards compatibility.
    So if your web.xml starts with this:
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/dtd/web-app_2_3.dtd">
    <web-app>Replace it with this
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <web-app xmlns="http://java.sun.com/xml/ns/j2ee"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd"
    version="2.4">
    Cheers,
    evnafets

  • Cannot create new folders on desktop. Cannot drag new items onto desktop. Desktop in Finder opens with Terminal. Not sure if this has anything to do with upgrading to Mavericks but I need help if anyone has ideas. Thank you.

    I can no longer create new folders on my desktop. The option to do that is now light gray on the drop down menu and can't be selected.
    I cannot drag new items onto desktop any longer either.
    The Desktop in Finder opens with Terminal.
    Folders and items on Desktop open and work normally.
    Not sure if this has anything to do with upgrading to Mavericks but I need help if anyone has ideas.
    Thank you.

    Take these steps if the cursor changes from an arrow to a white "prohibited" symbol when you try to move an icon on the Desktop.
    Sometimes the problem may be solved just by logging out or rebooting. Try that first, if you haven't already done it. Otherwise, continue.
    Select the icon of your home folder (a house) in the sidebar of a Finder window and open it. The Desktop folder is one of the subfolders. Select it and open the Info window. In the General section of the window, the Kind will be either Folder  or something else, such as Anything.
    If the Kind is Folder, uncheck the box marked Locked. Close the Info window and test.
    If the Kind is not Folder, make sure the Locked box is not checked and that you have Read & Write privileges in the  Sharing & Permissions section. Then close the Info window and do as follows.
    Back up all data.
    Triple-click anywhere in the line below on this page to select it:
    xattr -d com.apple.FinderInfo Desktop
    Copy the selected text to the Clipboard by pressing the key combination command-C.
    Launch the Terminal application in any of the following ways:
    ☞ Enter the first few letters of its name into a Spotlight search. Select it in the results (it should be at the top.)
    ☞ In the Finder, select Go ▹ Utilities from the menu bar, or press the key combination shift-command-U. The application is in the folder that opens.
    ☞ Open LaunchPad. Click Utilities, then Terminal in the icon grid.
    Paste into the Terminal window (command-V). You should get a new line ending in a dollar sign (“$”). Quit Terminal.
    Relaunch the Finder.
    If the problem is now resolved, and if you use iPhoto, continue.
    Quit iPhoto if it's running. Launch it while holding down the option key. It will prompt you to select a library. Choose the one you want to use (not the Desktop folder.)

  • Problems with Writable PDFs and MACS - content not viewable unless clicked on

    I am having problems viewing documents in Adobe Reader 9. Students are downloading a writable PDF from our system, saving it with content, then uploading it back to the system. All Mac users' documents are coming back blank unless clicked on. For some reason, all Mac PDF's are appearing blank until each individual field is clicked on. Once you click a field, the information shows up until you click another field then the information disappears in the old field and shows up in the new field.
    We really need to be able to view and print these documents so if anyone knows of a quick fix please let me know! We are trying to print all of the applications tomorrow.

    This is due to Apple's improper handling of pdf files with its Preview Application. This will not happen if your users use Adobe Reader. You can find a fix for individual files here: http://kb2.adobe.com/community/publishing/885/cpsid_88564.html

  • FInd Feature with scanned documents

    Hope someone can help. I have some documents that I am trying to scan so I can find specific numbers on the documents. I am scanning using a CanoSacan Lide 100 and it is capable of scanning in OCR and PDF files. Once the documents are scanned and I open up Adobe reader, the find feature cannot find the numerical text I am trying to locate. I have read that the find feature does not work on scanned images so how can I get this to work? Funny thing, when I scanned my test document, saved it and used the find feature the first time it worked - now I am having no luck.
    Also, I came across a document referring to Adobe Acrobat 5 and picture capture. Does Adode reader have this feature - can't seem to find it.
    Thanks for any help you give me.

    Hi
    Finally figured it out...I am doing my happy dance as I type. The problem appears to be that the document must be scanned as a TEXT OCR file FIRST then converted to a PDF file. The scanner cannot do both actions at the same time. So I scanned the document as a TEXT OCR file type and then went to the scanner menu and selected save as PDF. Once that is done, I switched to Adobe Reader and opened my file. Word of caution, if the text is too crooked, the FIND feature will not work accurately.
    Hope this helps someone else who is trying to use scanned items and the FIND feature.
    Thanks for trying to help.

  • URL iView with KM Content not shown in Page - just a "thin line"

    Hi,
    I just created two URL-iViews which point to documents in the Knowledge Management Repository.
    Content Admin --> Preview --> Works out --> Hooray!
    I created two pages for the corresponding iView and deltalinked the iViews in. At this point I want to add I'm not doing this the first time, I have a few dozen KM-Pages here and they all work and are all set up with the exact same properties.
    Content Admin --> Preview --> and THIS is what happens:
    http://www10.pic-upload.de/20.02.12/xiswewmyirk.jpg
    There's just some kind of thin line showing. I cleared the portal cache and looked for anything unusual in the NWA monitoring traces, but didn't find anything. I didn't find anything related on the forums either, probably because I don't really know what search terms to use for this strange occurence...
    Does anyone have a hinch what could be the cause for this paranormal activity?...
    Cheers, Lukas

    Yes. You are correct. Shame on me..
    The pages were configured correctly at FULL but one particular iView was set to AUTOMATIC so although the page was set to full display, the iview in behind was defaulting to automatic, thus this strange behaviour.
    Thanks Meera!
    Cheers, Lukas

  • How to find table with colum that not support by data pump network_link

    Hi Experts,
    We try to import a database to new DB by data pump network_link.
    as oracle statement, Tables with columns that are object types are not supported in a network export. An ORA-22804 error will be generated and the export will move on to the next table. To work around this restriction, you can manually create the dependent object types within the database from which the export is being run.
    My question, how to find these tables with colum that that are object types are not supported in a network export.
    We have LOB object and oracle spital SDO_GEOMETRY object type. our database size is about 300G. nornally exp will takes 30 hours.
    We try to use data pump with network_link to speed export process.
    How do we fix oracle spital users type SDO_GEOMETRY issue during data pump?
    our system is 32 bit window 2003 and 10GR2 database.
    Thanks
    Jim
    Edited by: user589812 on Nov 3, 2009 12:59 PM

    Hi,
    I remember there being issues with sdo_geometry and DataPump. You may want to contact oracle support with this issue.
    Dean

  • Interactive PDF (with embedded SWF) not displaying consistently

    Hi folks,
    We are seeing varying behaviour when we try to open a PDF document which we have created as part of publishing to PDF in Captivate 6. The original project is a recorded PPT file which we publish to HTML5, SWF, ZIP and PDF. In some instances the interactive PDF opens fine, but in some instances we get something like this:
    We have always made sure we have the latest version of reader up to date etc, but can't work out why some machines are giving the above and others done. Any help would be greatly appreciated

    Hey JT,
    Came across this: http://helpx.adobe.com/acrobat/kb/known-issues-acrobat-xi-reader.html
    I know you said you have reader 10, but I had 11, so I went and removed it and and installed a version of 10 again, and wht do you know, they work ok. However, this is not a workable solution for us as many students will automatically update to the latest PDF reader, which is currently 11. Adobe changing the rules around this has stuffed us up, and we have no option but to take offline all our interactive PDF versions produced with Captivate 6. Very disappointing, but will save our helpdesk a lifetime of support calls :-(
    Grant

  • PDF with Flex/Flash not running on iPhone

    Dear all
    I have a pdf document including Flex content running perfectly in the Acrobat or Reader on Windows.
    On the iPhone the Flex is not shown.
    Is there any solution to this?
    Many thanks

    Flash has never been and never will be supported on the iPad. Furthermore, Adobe has withdrawn Flash support from other mobile devices so you're out of luck.

  • Make a PDF with a Mac, NOT using Mac PDF maker

    I am suffering from blurry text when making a PDF in Mac OS 10.6 (using "Save as PDF" in the Mac printing dialogue).
    The software company has confirmed the problem. They suggested using CutePDF to make a PDF because it does a better job. CutePDF is Windows only.

    Also, could it be a problem with the software you are using (you didn't mention what it is?)
    It is ViaCAD 2D. My text is razor-sharp when viewed from within ViaCAD. However, when I make a PDF of my drawing using, File > Print > Print PDF my text comes out blurry.
    http://www.punchcad.com/p-6-viacad-2d3d.aspx
    From the ViaCAD Fourm:
    "This is an old and contentious issue that appears to effect some Mac users and not others.
    The only work around I have found is to print to EPS (Postscript) in the print dialog, and then convert the EPS to PDF in Preview. Sadly the EPS output from the print dialog is also broken in that it will not support anything larger than 8.5 X 11, so that will not work for you.
    The other work around might be to try and export the whole drawing to Illustrator using the File / Export dialog, but again that is also sometimes buggy for some people. The only good news is that the output that appears blurry on your Mac will probably look fine on a PC, or even another Mac as this does appear to be a very system dependent bug (which is why it never gets fixed)."

Maybe you are looking for

  • Problem with Aris and Solution Manager

    Hi, I'm trying to synchronize Aris and SAP Solution Manager. I'm doing installation and configuration following Administration Guide chap. 7 from Aris for Netweaver guide, but, I have a problem when I try to transfer projects from Aris to SolMan. In

  • The iPhone 5 is Finally Here!!  Some Questions About My First Smartphone

    Hello- I do not see any personal set-up or mini class advertised with the iphone 5 purchase webpage.  Does anybody know if theyare not offering this initially or at all?  Does the new adapter come with the iPhone 5 or is that extra? For Verizon custo

  • Network question iMac running 8.5.1

    I have an iMac that runs system 8.5.1 that I am trying to retire for obvious reasons. Trouble is that there is a series of files on the machine that is vital to my business. The machine has a PowerPC G3 processor running @ 266 MHz, ethernet (which is

  • I've lost the File, Edit, Tools, Options etc from the top of my browser window. How do I fix this?

    At the top of my browser page I used to have File, Edit, Tools, Options, Help etc. well that has vanished. How do I get them back? == This happened == Every time Firefox opened == not sure when, but after an update, I think.

  • Upload Component, Pass extra Variable

    Hello all. I have been trying to get an upload component to work for some time now, but have been having a few troubles getting one to do what i need.  Recently i found a component that does pretty much everything i want bar one thing, pass an extra