Extracting metadata from PDF's

I have a client that has hundreds of store signs as PDF's. We need to verify what the actual printed size of each sign will be when it's printed out. We have noticed that hovering our cursor over the lower left hand corner of the PDF's reveals it's size - which is great, but we don't want to have to go through the process of opening up each PDF (there are hundreds) and going through that process. Is there a way to export the file name and it's size into excel so we can quickly audit the pieces?
And perhaps this is a bridge question, but when we view the files in Bridge,  we also see the Metadata, which shows the dimensions of the piece, but those dimensions seem to change depending on the PPI. How does Acrobat come up with it's dimensions if they are different than the metadata we see in Bridge?

Hi Alan,
Typing the following two lines in the Script Editor should list the metadata attributes for the choosen file:
set F to choose file with prompt "Choose a PDF file:" of type "PDF "
do shell script "mdls" & space & quoted form of POSIX path of F
For more information on "mdls", open the Terminal window and just type “man mdls” (and return).
Maybe you might also use the following AppleScript statement in case there were some keywords in the Spotlight comments field of the file:
tell application "Finder" to return comment of F
Message was edited by: Pierre L.

Similar Messages

  • Extract metadata from PDF

    Hi all
    Maybe the wrong category to post this question but I am wondering if it is possible to extract the metadata from PDF document as well as we do from images?
    Since both metadata are based on XMP from Adobe it should theoretically be possible. Anybody knows how?
    Thank you.
    Nitai

    XMP extraction from PDF format files is not implemented by interMedia.
    You can learn more about XMP at the Adobe web site. It is possible to create a simple XMP extractor by implementing a byte scanner that looks for the XMP indicator string. However, I believe that the PDF format allows for some object blocks to be marked old or superseded by newer blocks. Thus it may be possible to have more than one XMP block in a PDF file. You would need to know more about PDF format to determine which is the current block.

  • Extract Metadata from PDF as xmp file

    Hello!
    I have the following problem: I do always File->properties->additional metadata->advanced-> save as .xmp. How could I write this actions in Javascript?

    Thank you for your reply. Unfortunatelly I know javascript badly. I have found the code that creates metadata and open it as pdf (but i need xmp):
    var r = new Report();
    r.writeText(this.metadata);
    r.open("myMetadataReportFile");
    I also found the code that must create new next file, but it doesn't work:
    var filePath = 'C:/Temp/filename.txt';
    var fileSysObj = new ActiveXObject('Scripting.FileSystemObject');
    fileSysObj.CreateTextFile(filePath);
    Habe you any ideas how to combine these codes to solve my problem? (I  know javascript badly and can#t do it)
    Thank you for your hepl!

  • Process to extract comments from PDF

    Greetings,
    I need to extract comments from PDF during a process workflow.  Will exporting metadata alone work?  If not, could someone please point me in the right direction?
    I'm not enitrely sure where the comments reside (written, sticky notes, stamps, etc.).
    Thanks in advance,
    Alex

    I don't think the meta-data will give you th annotations layer of the PDF.  You'll probably need to use Assembler's invokeDDX service to export the comments into a n XFDF file (an XML representation of the comments)
    The instructions should be in the DDX Reference:
    http://help.adobe.com/en_US/livecycle/9.0/ddxRef.pdf
    something like:
    <Comments result="doc1comments.xfdf" format="XFDF">
    <PDF source="doc1.pdf"/>
    </Comments>

  • How to read/extract text from pdf

    Respected All,
    I want to read/extract text from pdf. I tried using etymon but not succed.
    Could anyone will guide me in this.
    Thanks and regards,
    Ajay.

    Thank you very much Abhilshit, PDFBox works for reading pdf.
    Regards,
    Ajay.

  • Extract Text from pdf using C#

    Hi,
    We are Solution developer using Acrobat,as we have reuirement of extracting text from pdf using C# we have downloaded adobe sdk and installed. We have found only four exmaples in C# and those are used only for viewing pdf in windows application. Can you please guide us how to extract text from pdf using SDK in C#.
    Thanks you for your help.
    Regards
    kiranmai

    Okay so I went ahead and actually added the text extraction functionality to my own C# application, since this was a requested feature by the client anyhow, which originally we were told to bypass if it wasn't "cut and dry", but it wasn't bad so I went ahead and gave the client the text extraction that they wanted. Decided I'd post the source code here for you. This returns the text from the entire document as a string.
           private static string GetText(AcroPDDoc pdDoc)
                AcroPDPage page;
                int pages = pdDoc.GetNumPages();
                string pageText = "";
                for (int i = 0; i < pages; i++)
                    page = (AcroPDPage)pdDoc.AcquirePage(i);
                    object jso, jsNumWords, jsWord;
                    List<string> words = new List<string>();
                    try
                        jso = pdDoc.GetJSObject();
                        if (jso != null)
                            object[] args = new object[] { i };
                            jsNumWords = jso.GetType().InvokeMember("getPageNumWords", BindingFlags.InvokeMethod, null, jso, args, null);
                            int numWords = Int32.Parse(jsNumWords.ToString());
                            for (int j = 0; j <= numWords; j++)
                                object[] argsj = new object[] { i, j, false };
                                jsWord = jso.GetType().InvokeMember("getPageNthWord", BindingFlags.InvokeMethod, null, jso, argsj, null);
                                words.Add((string)jsWord);
                        foreach (string word in words)
                            pageText += word;
                    catch
                return pageText;

  • Applescript or workflow to extract text from PDF and rename PDF with the results

    Hi Everyone,
    I get supplied hundreds of PDFs which each contain a stock code, but the PDFs themselves are not named consistantly, or they are supplied as multi-page PDFs.
    What I need to do is name each PDF with the code which is in the text on the PDF.
    It would work like this in an ideal world:
    1. Split PDF into single pages
    2. Extract text from PDF
    3. Rename PDF using the extracted text
    I'm struggling with part 3!
    I can get a textfile with just the code (using a call to BBEDIT I'm extracting the code)
    I did think about using a variable for the name, but the rename functions doesn't let me use variables.

    Hello
    You may also try the following applescript script, which is a wrapper of rubycocoa script. It will ask you choose source pdf files and destination directory. Then it will scan text of each page of pdf files for the predefined pattern and save the page as new pdf file with the name as extracted by the pattern in the destination directory. Those pages which do not contain string matching the pattern are ignored. (Ignored pages, if any, are reported in the result of script.)
    Currently the regex pattern is set to:
    /HB-.._[0-9]{6}/
    which means HB- followed by two characters and _ and 6 digits.
    Minimally tested under 10.6.8.
    Hope this may help,
    H
    _main()
    on _main()
        script o
            property aa : choose file with prompt ("Choose pdf files.") of type {"com.adobe.pdf"} ¬
                default location (path to desktop) with multiple selections allowed
            set my aa's beginning to choose folder with prompt ("Choose destination folder.") ¬
                default location (path to desktop)
            set args to ""
            repeat with a in my aa
                set args to args & a's POSIX path's quoted form & space
            end repeat
            considering numeric strings
                if (system info)'s system version < "10.9" then
                    set ruby to "/usr/bin/ruby"
                else
                    set ruby to "/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby"
                end if
            end considering
            do shell script ruby & " <<'EOF' - " & args & "
    require 'osx/cocoa'
    include OSX
    require_framework 'PDFKit'
    outdir = ARGV.shift.chomp('/')
    ARGV.select {|f| f =~ /\\.pdf$/i }.each do |f|
        url = NSURL.fileURLWithPath(f)
        doc = PDFDocument.alloc.initWithURL(url)
        path = doc.documentURL.path
        pcnt = doc.pageCount
        (0 .. (pcnt - 1)).each do |i|
            page = doc.pageAtIndex(i)
            page.string.to_s =~ /HB-.._[0-9]{6}/
            name = $&
            unless name
                puts \"no matching string in page #{i + 1} of #{path}\"
                next # ignore this page
            end
            doc1 = PDFDocument.alloc.initWithData(page.dataRepresentation) # doc for this page
            unless doc1.writeToFile(\"#{outdir}/#{name}.pdf\")
                puts \"failed to save page #{i + 1} of #{path}\"
            end
        end
    end
    EOF"
        end script
        tell o to run
    end _main

  • Extracting images from pdf

    I am trying to extract images from pdfs using pdfimages, but i am unable to retrieve all the images. By opening the pdfs using Acrobat Reader 9.0, I am able to select, those images retrieved by pdfimages, using the select tool but for other figures/images we need to try other options like print screen and then cut the relevant image. I was wondering why or when does the Acrobat treats the figures/images differently.

    Hi Dave,
    Thanks for the reply. My question was not regarding any non-Adobe product like pdfimages. It was in general the way Acrobat handles the images while creating pdfs.
    I wanted to know why can we select some of the images from the pdf using select tool and can not select others for which we need to print screen and cut. Is there anything in the eps files of included image that causes such effect?
    Thanks.

  • Can I extract metadata from planning or only from essbase?

    Hi!
    Can I extract metadata from planning or only for essbase? If I can from planning, how can I do it? And why when I extract metadata from essbase, it extract the members that are never share in Store? It always put S (Stored) .
    Thanks in advance.
    Bye

    Hi,
    You can only extract metadata from essbase, if you want to have a look at some examples of how to do it then have a read here
    Ok?
    Cheers
    John
    http://john-goodwin.blogspot.com/

  • How can I extract metadata from file names?

    If I want to extract metadata from file names? How can I do that? I want to read through the file names and when I get to a certain character ("-"), I can take the string just before that character and store it in a column in SharePoint. Is this
    do-able through scripting? 

    If I want to extract metadata from file names? How can I do that? I want to read through the file names and when I get to a certain character ("-"), I can take the string just before that character and store it in a column in SharePoint.
    Is this do-able through scripting? 
    You should be able to leverage the split method.
    In PowerShell It would look like:
    # Gather the file name
    $file = "myawesome_filename-Month-Day-Year-Ect.doc"
    #split the file name by the "-" character
    $file = $file.split("-")
    # Use a foreach Loop to gather the individual items.
    foreach ($item in $file) {
    write-host $item
    #Outputmyawesome_filename
    Month
    Day
    Year
    Ect.doc
    # If you want to only grab the first item, you can do $file[0] <-- powershell starts counting with zero base.
    $file[0]
    #output
    myawesome_filename
    Entrepreneur, Strategic Technical Advisor, and Sr. Consulting Engineer - Strategic Services and Solutions Check out my book - Powershell 3.0 - WMI: http://amzn.to/1BnjOmo | Mastering PowerShell Coming in April 2015!

  • Can OWB extract metadata from foxpro 2.5 or foxbase?

    We know that OWB can extract metadata from some relational databases other than Oracle database with Heterogeneous Services. Sybase,DB2 and SQL Server provide
    specific agent for Oracle.What about foxpro 2.5 or foxbase?
    If they haven't any agent for oracle,is it possible to extract metadata from a foxpro 2.5 or a foxbase database using ODBC agent?
    Thanks for any help!!!

    I don't think there is a gateway, so I would use ODBC and the heterogeneous services in the database to do this.
    Jean-Pierre

  • Extracting Images from PDF file

    Hello All,
                   I am reading PDF File.I need to extract images from PDF File programatically.But problem is that some images are stored inside PDF File using FlateDecode Filter and I need to first decode that file and then I can extract that image .I dont know the way to decode that image data.Is there any way or API to do that in C++.
    Thanks
    Aarti Nagpal

    I think you can do it through cos object in VC++ plugin..go through the PDEFilterSpec in
    Acrobat core api reference
    Be well..

  • Extract Metadata from Final Cut Pro Project files...

    We're currently scanning our project directory (running great by the way), but I'm curious - Is there a way for Final Cut Server to extract metadata from the XML. Timecode info, titling, descriptions, etc and tag that info to the final cut project within FC Server?
    That would be great!
    Ryan

    Hello!
    I was just told by Apple (several hours ago), that it's a bug which happens on some Final Cut Servers. They said that the engineering team is looking into it.
    The tech guy had me open the project in Final Cut (directly, not with FC Server involvement), then go up to File, Export, Export as XML. Then, he had me look within Final Cut Server and then it would let me checkin/out, etc. He said that it's a bug they're working on.
    I hope it gets fixed soon! I need it fixed in order to do what we plan on doing.

  • How do I bulk upload documents using PowerShell and extract metadata from file name?

    I have a requirement to upload a bunch of documents into a document library. Based on the content type, the rules of updating the metadata is different...the one giving me trouble is to extract the metadata from the file name. If I have a file name like
    "part1_part2_part3.pdf" how do I extract part1, part2, part3 and tag each document being uploaded into SharePoint, using PowerShell? I have searched and have not been able to find anything to get me started.
    Has anyone done this before? Or is there a blog I can take a look at? Thanks
     

    You will have to write a PS script encompassing this logic.
    Read files from the folder using
    Get-Item cmdlet
    Determine the content type based on the path or filename.
    Split the file name to extract the tag names.
    If the metadata fields in the content type is a managed metadata field, check whether the term exists and set it.
    Updating SharePoint Managed Metadata Columns
    with PowerShell
    This post is my own opinion and does not necessarily reflect the opinion or view of Slalom.

  • Extracting Data from PDF forms in Reader created in Livecycle

    Hello
    We would like users who complete a PDF  document in Adobe Reader created in Livecycle to be able to export the  completed fields (and accompanying questions) to a MS Word document in a  format that appears similar to the PDF so it can be pasted in future  documents.
    Is there a simple step procedure that the users can follow
    Any assistance would be much appreciated

    Hi,
    I think, you had selected "3.x Datasource" as the type when you were replicating the Metadata from second client.
    If so, delete the datsource (in BIW) from the second client , and then replicate the datsource one more time.But this time , you need to select "As Datasource" option only.
    with rgds,
    Anil Kumar Sharma .P

Maybe you are looking for

  • Disable Pixel Aspect Ratio in CS2

    On this site and other I've found info on this but nothing that's worked. I simply want to DISABLE the automatic Pixel Aspect Ratio preview adjustment so that when a file opens it works just like it did in Photoshop 7, it just opens. When I open the

  • Photoshop Photography Program issue - current serial number not recognised

    I own a commercial version of Photoshop CS5 (Extended) and have registered the product. I would like to purchase the 'Photoshop Photography Program for CS3+ customers' however when I type in my product serial number the system says "We don't have any

  • Stopping a program

    I want to use a stop button to end a loop, and preferably have it sitting dormant, until switched... Is there anyway that I can do this, so that as soon as it is pressed, it will be recognized and exit the loop? Alternatively, how do I make it the la

  • Can't find Brush Spacing

    Hi, I can't fint the brush spacing option in the brushes panel. All I get is this: Can anyone tell me how to get the spacing option to appear? I appreciate any help.

  • The battery don't charge. What is this

    My iPod 32gb is old. I try to charge the battery but it's always in red. Answer me what to do, please