Process to extract comments from PDF

Greetings,
I need to extract comments from PDF during a process workflow. Will exporting metadata alone work? If not, could someone please point me in the right direction?
I'm not enitrely sure where the comments reside (written, sticky notes, stamps, etc.).
Thanks in advance,
Alex

I don't think the meta-data will give you th annotations layer of the PDF. You'll probably need to use Assembler's invokeDDX service to export the comments into a n XFDF file (an XML representation of the comments)
The instructions should be in the DDX Reference:
http://help.adobe.com/en_US/livecycle/9.0/ddxRef.pdf
something like:
<Comments result="doc1comments.xfdf" format="XFDF">
<PDF source="doc1.pdf"/>
</Comments>

Similar Messages

How to read/extract text from pdf

Respected All,
I want to read/extract text from pdf. I tried using etymon but not succed.
Could anyone will guide me in this.
Thanks and regards,
Ajay.

Thank you very much Abhilshit, PDFBox works for reading pdf.
Regards,
Ajay.

Extract Text from pdf using C#

Hi,
We are Solution developer using Acrobat,as we have reuirement of extracting text from pdf using C# we have downloaded adobe sdk and installed. We have found only four exmaples in C# and those are used only for viewing pdf in windows application. Can you please guide us how to extract text from pdf using SDK in C#.
Thanks you for your help.
Regards
kiranmai

Okay so I went ahead and actually added the text extraction functionality to my own C# application, since this was a requested feature by the client anyhow, which originally we were told to bypass if it wasn't "cut and dry", but it wasn't bad so I went ahead and gave the client the text extraction that they wanted. Decided I'd post the source code here for you. This returns the text from the entire document as a string.
       private static string GetText(AcroPDDoc pdDoc)
            AcroPDPage page;
            int pages = pdDoc.GetNumPages();
            string pageText = "";
            for (int i = 0; i < pages; i++)
                page = (AcroPDPage)pdDoc.AcquirePage(i);
                object jso, jsNumWords, jsWord;
                List<string> words = new List<string>();
                try
                    jso = pdDoc.GetJSObject();
                    if (jso != null)
                        object[] args = new object[] { i };
                        jsNumWords = jso.GetType().InvokeMember("getPageNumWords", BindingFlags.InvokeMethod, null, jso, args, null);
                        int numWords = Int32.Parse(jsNumWords.ToString());
                        for (int j = 0; j <= numWords; j++)
                            object[] argsj = new object[] { i, j, false };
                            jsWord = jso.GetType().InvokeMember("getPageNthWord", BindingFlags.InvokeMethod, null, jso, argsj, null);
                            words.Add((string)jsWord);
                    foreach (string word in words)
                        pageText += word;
                catch
            return pageText;

Applescript or workflow to extract text from PDF and rename PDF with the results

Hi Everyone,
I get supplied hundreds of PDFs which each contain a stock code, but the PDFs themselves are not named consistantly, or they are supplied as multi-page PDFs.
What I need to do is name each PDF with the code which is in the text on the PDF.
It would work like this in an ideal world:
1. Split PDF into single pages
2. Extract text from PDF
3. Rename PDF using the extracted text
I'm struggling with part 3!
I can get a textfile with just the code (using a call to BBEDIT I'm extracting the code)
I did think about using a variable for the name, but the rename functions doesn't let me use variables.

Hello
You may also try the following applescript script, which is a wrapper of rubycocoa script. It will ask you choose source pdf files and destination directory. Then it will scan text of each page of pdf files for the predefined pattern and save the page as new pdf file with the name as extracted by the pattern in the destination directory. Those pages which do not contain string matching the pattern are ignored. (Ignored pages, if any, are reported in the result of script.)
Currently the regex pattern is set to:
/HB-.._[0-9]{6}/
which means HB- followed by two characters and _ and 6 digits.
Minimally tested under 10.6.8.
Hope this may help,
H
_main()
on _main()
    script o
        property aa : choose file with prompt ("Choose pdf files.") of type {"com.adobe.pdf"} ¬
            default location (path to desktop) with multiple selections allowed
        set my aa's beginning to choose folder with prompt ("Choose destination folder.") ¬
            default location (path to desktop)
        set args to ""
        repeat with a in my aa
            set args to args & a's POSIX path's quoted form & space
        end repeat
        considering numeric strings
            if (system info)'s system version < "10.9" then
                set ruby to "/usr/bin/ruby"
            else
                set ruby to "/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby"
            end if
        end considering
        do shell script ruby & " <<'EOF' - " & args & "
require 'osx/cocoa'
include OSX
require_framework 'PDFKit'
outdir = ARGV.shift.chomp('/')
ARGV.select {|f| f =~ /\\.pdf$/i }.each do |f|
    url = NSURL.fileURLWithPath(f)
    doc = PDFDocument.alloc.initWithURL(url)
    path = doc.documentURL.path
    pcnt = doc.pageCount
    (0 .. (pcnt - 1)).each do |i|
        page = doc.pageAtIndex(i)
        page.string.to_s =~ /HB-.._[0-9]{6}/
        name = $&
        unless name
            puts \"no matching string in page #{i + 1} of #{path}\"
            next # ignore this page
        end
        doc1 = PDFDocument.alloc.initWithData(page.dataRepresentation) # doc for this page
        unless doc1.writeToFile(\"#{outdir}/#{name}.pdf\")
            puts \"failed to save page #{i + 1} of #{path}\"
        end
    end
end
EOF"
    end script
    tell o to run
end _main

Extracting images from pdf

I am trying to extract images from pdfs using pdfimages, but i am unable to retrieve all the images. By opening the pdfs using Acrobat Reader 9.0, I am able to select, those images retrieved by pdfimages, using the select tool but for other figures/images we need to try other options like print screen and then cut the relevant image. I was wondering why or when does the Acrobat treats the figures/images differently.

Hi Dave,
Thanks for the reply. My question was not regarding any non-Adobe product like pdfimages. It was in general the way Acrobat handles the images while creating pdfs.
I wanted to know why can we select some of the images from the pdf using select tool and can not select others for which we need to print screen and cut. Is there anything in the eps files of included image that causes such effect?
Thanks.

Extracting Images from PDF file

Hello All,
I am reading PDF File.I need to extract images from PDF File programatically.But problem is that some images are stored inside PDF File using FlateDecode Filter and I need to first decode that file and then I can extract that image .I dont know the way to decode that image data.Is there any way or API to do that in C++.
Thanks
Aarti Nagpal

I think you can do it through cos object in VC++ plugin..go through the PDEFilterSpec in
Acrobat core api reference
Be well..

Import/Export comments from PDF

Hi
     I am using Adobe Acrobat 6.0 Professional. I am an application devleoper [.Net 4.0 , C#] . I need to export/ import comments from pdf. I will explain my scenario :-
   1. I have three copies of the same pdf file. Each copy contains different comments. i need to combine the comments from these three pdf file into a single             pdf file. Can i do this programattically ?
   2. From where i can get the API for the same ? [.Net 4.0]
Looking for your responses.
   Regards
   Dominic

Open a PDF having sticky note and underline annotations applied with Acrobat XI Pro (the trial release will do just fine).
Open the Comments List.
From the Options menu select "Create Comment Summary".
Because you've an interest in getting comment annotation content into a Word file select the "Comments only" layout.
Complete configuration as desired.
Click the "Create Comment Summary" button.
Save the resultant PDF.
With this PDF open in Acrobat XI Pro use the click-path:
File :: Save As Other :: Microsoft Word :: Word Document or Word 97-2003 Document
Be well...

Read Comments from PDF...

Hello Everyone,
I need to know that how can we read comments from PDF file programmatically using C++.
I was thinking was to read from file itself. I thought the PDF file must have some text like "StartComments" and "EndComments". (if we open it in notepad) but didn't found anything like that.
Any suggestions ?
Or anyone can refer any free Library which does this thing.
Thanks
...Pankaj

For API, Application Program Interface, there is the Acrobat SDK forum:
http://forums.adobe.com/community/acrobat/acrobat_sdk
For coding in Acorbat JavaScirpt there is ths the Acrobat Scripting forum:
http://forums.adobe.com/community/acrobat/acrobat_scripting

Extracting data From PDF to Excel

I have inherited a large library of PDF invoices which I need to extract data from into excell - or some other spreadsheet. The other option is to open up thousands of pdf documents and run the numbers by hand which is just dumb. I am new to acrobat and an entire afternoon of trial by fire / google hasn't gotten me very far - so even pointers in the right direction are appriciated.
Ideally I would like to tell Acrobat what data is important on each document (can I use the form tool to do this?), extract the data from the relevant files (batch processing tool I presume?), compile the data and extract it to a CSV.
It looks like the functionality is here I am just unsure how it all needs to fit together. Any Suggestions?

Hi,
There is software out there that will convert PDFs to excel... look for ABBYY or Able to extract... If you have a lot of files that are the same merge them together before using the software. Remember that if the data is created from a scanned image then the results will only be as good as the ability of the OCR engine contained in the software. You can play with the software to create tables, etc...

Extract text from pdf

Hi, is it possible to extract text from a pdf file using the command line to get an output like you would get by using the File menu and then 'Save as text..."?
I also noticed that in the installation folder there is a small executable called AcroTextExtractor which sounds interesting, but I was unable to figure out how to use it.

what's wrong with using automator for this? this certainly seems the easiest. I'm not aware of any built in apple script commands that will do this. But You should also ask on the Apple script forum under Mac OS Technologies.
Message was edited by: V.K.

Extracting XML from Pdf form

There is an industry standard pdf form with an underlying XML schema which can be opened in Adobe reader.
The form has a custom button on Page 2 called "export" which can be manually clicked to export the XML file.
We will have hundreds of these forms. How would I automate the extraction of this XML document?
I would prefer to just write a simple script and extract out the xml to a file folder
Thanks for your help.

Thanks Patrick.
We are thinking about using a third party native Java library to do this (http://www.qoppa.com/pdffields/jpfindex.html). I was hoping we could use acrobat reader, since everyone has it!
Here are a few more things.
1. We are an Software Vendor that sells our solutions - our software solutions need to extract the xml from pdf. We have a java based program that parses this xml and does stuff with it.
2. Obviously, we would need to be able to redistribute whatever solution we use to extract the xml from pdf.
3. Can Acrobat Professional batch mode be executed from Java?
4.. If so, Instead of distributing a full blown Acrobat Professional or requiring customers to buy it, is there a library that Adobe provides that we could repackage and ewdistribute? If so, can you send me some pointers on where I could find what those libraries would be and how much would they cost for each distribution we do.
5. If no, are you familiar with qoppa or do you have recommendations on any other third party libary for Java?
Thanks a bunch!

Extract images from PDF out of Illustrator with script

Looking for a script to extract images from a pdf opened in Illustrator.
I need the images to extract separately to a folder. Jpeg perhaps.

hi
I have to do the same... I have to convert a pdf to an image format.... can you solved the problem??? Can you help me??
Thanks in advance...

Extract images from PDF

Hi there!
I neet to extract some images from PDF files! I've tried using JPedal, but it seems that they don't offer support for the free version any more, so, I couldn't use the free version!
Can you tell me other tools that I can use for that scope? Or could you give me some sample code/tutorials of the free version of JPedal?
Thank you!

hi
I have to do the same... I have to convert a pdf to an image format.... can you solved the problem??? Can you help me??
Thanks in advance...

Programatically extract information from PDF

I am very green to Adobe/Java programming, so this is just a plausibility question not really a how to question. Is it possible to take text from a PDF document that isn't a form? I have heard about database integration with forms but what if the document doesn't have recoginzed fields?
The department of labor has an online form that prints to PDF. Much of the information that is typed there must be re-typed over and over again in communications with employers. I'm wondering if we could take the information from the PDF and put it in a database to be merged in our office-created forms.
Sorry if my question is totally out there and thanks for any help.

I am scadoosh, but not iluvtofly. The information is in the same place in the forms. I could send the form if that is helpful. It is a form that we have to fill and submit online. We were hoping we could implement a solution where we could either extract information from the form that has been "printed to pdf" or the opposite, where we would fill in a database and programmatically fill the form.
When you say an Adobe LiveCycle product, what is that? Is it software or hardware? Would we have to purchase something in addition to Adobe Acrobat? What do we need to implement such a solution?
Are there Adobe people who design custom products? Or could we get training somewhere on how to implement an Adobe LiveCycle solution. If there are custom designers, could they implement a solution so that if the government moved fields a little bit, we could adjust the LiveCycle solution to fit the new form.
Thanks!

Extracting metadata from PDF's

I have a client that has hundreds of store signs as PDF's. We need to verify what the actual printed size of each sign will be when it's printed out. We have noticed that hovering our cursor over the lower left hand corner of the PDF's reveals it's size - which is great, but we don't want to have to go through the process of opening up each PDF (there are hundreds) and going through that process. Is there a way to export the file name and it's size into excel so we can quickly audit the pieces?
And perhaps this is a bridge question, but when we view the files in Bridge, we also see the Metadata, which shows the dimensions of the piece, but those dimensions seem to change depending on the PPI. How does Acrobat come up with it's dimensions if they are different than the metadata we see in Bridge?

Hi Alan,
Typing the following two lines in the Script Editor should list the metadata attributes for the choosen file:
set F to choose file with prompt "Choose a PDF file:" of type "PDF "
do shell script "mdls" & space & quoted form of POSIX path of F
For more information on "mdls", open the Terminal window and just type “man mdls” (and return).
Maybe you might also use the following AppleScript statement in case there were some keywords in the Spotlight comments field of the file:
tell application "Finder" to return comment of F
Message was edited by: Pierre L.

Process to extract comments from PDF

Similar Messages

Maybe you are looking for