Process to extract comments from PDF

Greetings,
I need to extract comments from PDF during a process workflow.  Will exporting metadata alone work?  If not, could someone please point me in the right direction?
I'm not enitrely sure where the comments reside (written, sticky notes, stamps, etc.).
Thanks in advance,
Alex

I don't think the meta-data will give you th annotations layer of the PDF.  You'll probably need to use Assembler's invokeDDX service to export the comments into a n XFDF file (an XML representation of the comments)
The instructions should be in the DDX Reference:
http://help.adobe.com/en_US/livecycle/9.0/ddxRef.pdf
something like:
<Comments result="doc1comments.xfdf" format="XFDF">
<PDF source="doc1.pdf"/>
</Comments>

Similar Messages

  • How to read/extract text from pdf

    Respected All,
    I want to read/extract text from pdf. I tried using etymon but not succed.
    Could anyone will guide me in this.
    Thanks and regards,
    Ajay.

    Thank you very much Abhilshit, PDFBox works for reading pdf.
    Regards,
    Ajay.

  • Extract Text from pdf using C#

    Hi,
    We are Solution developer using Acrobat,as we have reuirement of extracting text from pdf using C# we have downloaded adobe sdk and installed. We have found only four exmaples in C# and those are used only for viewing pdf in windows application. Can you please guide us how to extract text from pdf using SDK in C#.
    Thanks you for your help.
    Regards
    kiranmai

    Okay so I went ahead and actually added the text extraction functionality to my own C# application, since this was a requested feature by the client anyhow, which originally we were told to bypass if it wasn't "cut and dry", but it wasn't bad so I went ahead and gave the client the text extraction that they wanted. Decided I'd post the source code here for you. This returns the text from the entire document as a string.
           private static string GetText(AcroPDDoc pdDoc)
                AcroPDPage page;
                int pages = pdDoc.GetNumPages();
                string pageText = "";
                for (int i = 0; i < pages; i++)
                    page = (AcroPDPage)pdDoc.AcquirePage(i);
                    object jso, jsNumWords, jsWord;
                    List<string> words = new List<string>();
                    try
                        jso = pdDoc.GetJSObject();
                        if (jso != null)
                            object[] args = new object[] { i };
                            jsNumWords = jso.GetType().InvokeMember("getPageNumWords", BindingFlags.InvokeMethod, null, jso, args, null);
                            int numWords = Int32.Parse(jsNumWords.ToString());
                            for (int j = 0; j <= numWords; j++)
                                object[] argsj = new object[] { i, j, false };
                                jsWord = jso.GetType().InvokeMember("getPageNthWord", BindingFlags.InvokeMethod, null, jso, argsj, null);
                                words.Add((string)jsWord);
                        foreach (string word in words)
                            pageText += word;
                    catch
                return pageText;

  • Applescript or workflow to extract text from PDF and rename PDF with the results

    Hi Everyone,
    I get supplied hundreds of PDFs which each contain a stock code, but the PDFs themselves are not named consistantly, or they are supplied as multi-page PDFs.
    What I need to do is name each PDF with the code which is in the text on the PDF.
    It would work like this in an ideal world:
    1. Split PDF into single pages
    2. Extract text from PDF
    3. Rename PDF using the extracted text
    I'm struggling with part 3!
    I can get a textfile with just the code (using a call to BBEDIT I'm extracting the code)
    I did think about using a variable for the name, but the rename functions doesn't let me use variables.

    Hello
    You may also try the following applescript script, which is a wrapper of rubycocoa script. It will ask you choose source pdf files and destination directory. Then it will scan text of each page of pdf files for the predefined pattern and save the page as new pdf file with the name as extracted by the pattern in the destination directory. Those pages which do not contain string matching the pattern are ignored. (Ignored pages, if any, are reported in the result of script.)
    Currently the regex pattern is set to:
    /HB-.._[0-9]{6}/
    which means HB- followed by two characters and _ and 6 digits.
    Minimally tested under 10.6.8.
    Hope this may help,
    H
    _main()
    on _main()
        script o
            property aa : choose file with prompt ("Choose pdf files.") of type {"com.adobe.pdf"} ¬
                default location (path to desktop) with multiple selections allowed
            set my aa's beginning to choose folder with prompt ("Choose destination folder.") ¬
                default location (path to desktop)
            set args to ""
            repeat with a in my aa
                set args to args & a's POSIX path's quoted form & space
            end repeat
            considering numeric strings
                if (system info)'s system version < "10.9" then
                    set ruby to "/usr/bin/ruby"
                else
                    set ruby to "/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby"
                end if
            end considering
            do shell script ruby & " <<'EOF' - " & args & "
    require 'osx/cocoa'
    include OSX
    require_framework 'PDFKit'
    outdir = ARGV.shift.chomp('/')
    ARGV.select {|f| f =~ /\\.pdf$/i }.each do |f|
        url = NSURL.fileURLWithPath(f)
        doc = PDFDocument.alloc.initWithURL(url)
        path = doc.documentURL.path
        pcnt = doc.pageCount
        (0 .. (pcnt - 1)).each do |i|
            page = doc.pageAtIndex(i)
            page.string.to_s =~ /HB-.._[0-9]{6}/
            name = $&
            unless name
                puts \"no matching string in page #{i + 1} of #{path}\"
                next # ignore this page
            end
            doc1 = PDFDocument.alloc.initWithData(page.dataRepresentation) # doc for this page
            unless doc1.writeToFile(\"#{outdir}/#{name}.pdf\")
                puts \"failed to save page #{i + 1} of #{path}\"
            end
        end
    end
    EOF"
        end script
        tell o to run
    end _main

  • Extracting images from pdf

    I am trying to extract images from pdfs using pdfimages, but i am unable to retrieve all the images. By opening the pdfs using Acrobat Reader 9.0, I am able to select, those images retrieved by pdfimages, using the select tool but for other figures/images we need to try other options like print screen and then cut the relevant image. I was wondering why or when does the Acrobat treats the figures/images differently.

    Hi Dave,
    Thanks for the reply. My question was not regarding any non-Adobe product like pdfimages. It was in general the way Acrobat handles the images while creating pdfs.
    I wanted to know why can we select some of the images from the pdf using select tool and can not select others for which we need to print screen and cut. Is there anything in the eps files of included image that causes such effect?
    Thanks.

  • Extracting Images from PDF file

    Hello All,
                   I am reading PDF File.I need to extract images from PDF File programatically.But problem is that some images are stored inside PDF File using FlateDecode Filter and I need to first decode that file and then I can extract that image .I dont know the way to decode that image data.Is there any way or API to do that in C++.
    Thanks
    Aarti Nagpal

    I think you can do it through cos object in VC++ plugin..go through the PDEFilterSpec in
    Acrobat core api reference
    Be well..

  • Import/Export comments from PDF

    Hi
         I am using Adobe Acrobat 6.0 Professional. I am an application devleoper [.Net 4.0 , C#] . I need to export/ import comments from pdf. I will explain my scenario :-
       1.  I have three copies of the same pdf file. Each copy contains different comments.  i need to combine the comments from these three pdf file into a single             pdf file. Can i do this programattically ?
       2. From where i can get the API for the same ? [.Net 4.0]
      Looking for your responses.
       Regards
       Dominic

    Open a PDF having sticky note and underline annotations applied with Acrobat XI Pro (the trial release will do  just fine).
    Open the Comments List.
    From the Options menu select "Create Comment Summary".
    Because you've an interest in getting comment annotation content into a Word file select the "Comments only" layout.
    Complete configuration as desired.
    Click the "Create Comment Summary" button.
    Save the resultant PDF.
    With this PDF open in Acrobat XI Pro use the click-path:
    File :: Save As Other :: Microsoft Word :: Word Document or Word 97-2003 Document
    Be well...

  • Read Comments from PDF...

    Hello Everyone,
    I need to know that how can we read comments from PDF file programmatically using C++.
    I was thinking was to read from file itself. I thought the PDF file must have some text like "StartComments" and "EndComments". (if we open it in notepad) but didn't found anything like that.
    Any suggestions ?
    Or anyone can refer any free Library which does this thing.
    Thanks
    ...Pankaj

    For API, Application Program Interface, there is the Acrobat SDK forum:
    http://forums.adobe.com/community/acrobat/acrobat_sdk
    For coding in Acorbat JavaScirpt there is ths the Acrobat Scripting forum:
    http://forums.adobe.com/community/acrobat/acrobat_scripting

  • Extracting data From PDF to Excel

    I have inherited a large library of PDF invoices which I need to extract data from into excell - or some other spreadsheet. The other option is to open up thousands of pdf documents and run the numbers by hand which is just dumb. I am new to acrobat and an entire afternoon of trial by fire / google hasn't gotten me very far - so even pointers in the right direction are appriciated.
    Ideally I would like to tell Acrobat what data is important on each document (can I use the form tool to do this?), extract the data from the relevant files (batch processing tool I presume?), compile the data and extract it to a CSV.
    It looks like the functionality is here I am just unsure how it all needs to fit together. Any Suggestions?

    Hi,
    There is software out there that will convert PDFs to excel... look for ABBYY or Able to extract... If you have a lot of files that are the same merge them together before using the software. Remember that if the data is created from a scanned image then the results will only be as good as the ability of the OCR engine contained in the software. You can play with the software to create tables, etc...

  • Extract text from pdf

    Hi, is it possible to extract text from a pdf file using the command line to get an output like you would get by using the File menu and then 'Save as text..."?
    I also noticed that in the installation folder there is a small executable called AcroTextExtractor which sounds interesting, but I was unable to figure out how to use it.

    what's wrong with using automator for this? this certainly seems the easiest. I'm not aware of any built in apple script commands that will do this. But You should also ask on the Apple script forum under Mac OS Technologies.
    Message was edited by: V.K.

  • Extracting XML from Pdf form

    There is an industry standard pdf form with an underlying XML schema which can be opened in Adobe reader.
    The form has a custom button on Page 2  called "export" which can be manually clicked to export the XML file.
    We will have hundreds of these forms. How would I automate the extraction of this XML document?
    I would prefer to just write a simple script and extract out the xml to a file folder
    Thanks for your help.

    Thanks Patrick.
    We are thinking about using a third party native Java library to do this (http://www.qoppa.com/pdffields/jpfindex.html). I was hoping we could use acrobat reader, since everyone has it!
    Here are a few more things.
    1. We are an Software Vendor that sells our solutions - our software solutions need to extract the xml from pdf. We have a java based program that parses this xml and does stuff with it.
    2. Obviously, we would need to be able to redistribute whatever solution we use to extract the xml from pdf.
    3. Can Acrobat Professional batch mode be executed from Java?
    4.. If so, Instead of distributing a full blown Acrobat Professional or requiring customers to buy it,  is there a library that Adobe provides that we could repackage and ewdistribute? If so, can you send me some pointers on where I could find what those libraries would be and how much would they cost for each distribution we do.
    5. If no, are you familiar with qoppa or do you have recommendations on any other third party libary for Java?
    Thanks a bunch!

  • Extract images from PDF out of Illustrator with script

    Looking for a script to extract images from a pdf opened in Illustrator.
    I need the images to extract separately to a folder. Jpeg perhaps.

    hi
    I have to do the same... I have to convert a pdf to an image format.... can you solved the problem??? Can you help me??
    Thanks in advance...

  • Extract images from PDF

    Hi there!
    I neet to extract some images from PDF files! I've tried using JPedal, but it seems that they don't offer support for the free version any more, so, I couldn't use the free version!
    Can you tell me other tools that I can use for that scope? Or could you give me some sample code/tutorials of the free version of JPedal?
    Thank you!

    hi
    I have to do the same... I have to convert a pdf to an image format.... can you solved the problem??? Can you help me??
    Thanks in advance...

  • Programatically extract information from PDF

    I am very green to Adobe/Java programming, so this is just a plausibility question not really a how to question.  Is it possible to take text from a PDF document that isn't a form?  I have heard about  database integration with forms  but what if the document doesn't have recoginzed fields?
    The department of labor has an online form that prints to PDF.  Much of the information that is typed there must be re-typed over and over again in communications with employers.  I'm wondering if we could take the information from the PDF and put it in a database to be merged in our office-created forms.
    Sorry if my question is totally out there and thanks for any help.

    I am scadoosh, but not iluvtofly.  The information is in the same place in the forms. I could send the form if that is helpful.  It is a form that we have to fill and submit online.  We were hoping we could implement a solution where we could either extract information from the form that has been "printed to pdf" or the opposite, where we would fill in a database and programmatically fill the form.
    When you say an Adobe LiveCycle product, what is that?  Is it software or hardware?  Would we have to purchase something in addition to Adobe Acrobat?  What do we need to implement such a solution?
    Are there Adobe people who design custom products?  Or could we get training somewhere on how to implement an Adobe LiveCycle solution.  If there are custom designers,  could they implement a solution so that if the government moved fields a little bit, we could adjust the LiveCycle solution to fit the new form.
    Thanks!

  • Extracting metadata from PDF's

    I have a client that has hundreds of store signs as PDF's. We need to verify what the actual printed size of each sign will be when it's printed out. We have noticed that hovering our cursor over the lower left hand corner of the PDF's reveals it's size - which is great, but we don't want to have to go through the process of opening up each PDF (there are hundreds) and going through that process. Is there a way to export the file name and it's size into excel so we can quickly audit the pieces?
    And perhaps this is a bridge question, but when we view the files in Bridge,  we also see the Metadata, which shows the dimensions of the piece, but those dimensions seem to change depending on the PPI. How does Acrobat come up with it's dimensions if they are different than the metadata we see in Bridge?

    Hi Alan,
    Typing the following two lines in the Script Editor should list the metadata attributes for the choosen file:
    set F to choose file with prompt "Choose a PDF file:" of type "PDF "
    do shell script "mdls" & space & quoted form of POSIX path of F
    For more information on "mdls", open the Terminal window and just type “man mdls” (and return).
    Maybe you might also use the following AppleScript statement in case there were some keywords in the Spotlight comments field of the file:
    tell application "Finder" to return comment of F
    Message was edited by: Pierre L.

Maybe you are looking for

  • FLTP field in ALV Grid

    Dear collegues, I would like to use a FLTP field of a database in an ALV grid. How can I influence, how the field is shown in the ALV grid (decimals, comma instead of point etc.)? Thank you. Regards Martin

  • Is there a shopping cart in ME21N schedule line tab in AFS ?

    Hello Experts, My problem is exactly same as mentioned in following  [ link|; As explained in this thread regarding shopping cart in delivery schedule line items, in my system there is no shopping cart appearing in delivery schedule line items howeve

  • Error 1500 - windows 8 - cannot install adobe reader

    Hello everyone, I have a new computer with windows 8 and tried to install adobe reader. During installation an error occured (1500) specifying that another installation is in progress. I tried through the simple installer and the one from the page ht

  • Installing Windows7 on MacBook Pro 2011

    I recently purchased the new macbook and I still need to use some of the windows app. I managed to install the win7 ultimate 32 bit on the mac earlier, but couldn't install the bootcamp drive necessary. the only thing the mac os x cd providede was th

  • Audit setup with dataguard in place

    Auditing is planned to be enabled on one of our database, however this one has dataguard configured. Are there any difference in setting up auditing on a db with and without dataguard? Anything to watch out for? Thanks