Parsing PDF, iText

Hi all, I just realized that u cant parse a pdf with iText. Like to insert a string of text at 'marker'. Anyone got a better tool or idea?
Thanks...Peter

Even if just identifing the x,y coords of a 'marker' would help, because I can drop text at a hardcoded location.

Similar Messages

Parsing pdf form using PHP or JavaScript

Hello! How can I parse Pdf file with form to get fields position and page # of it?
For example, there are some pdf with structure like this:
<</AcroForm 23 0 R/Metadata 2 0 R/Outlines 6 0 R/Pages 9 0 R/Type/Catalog>>
endobj
19 0 obj
<</DA(/ZaDb 0 Tf 0 g)/FT/Btn/Ff 49152/Kids[18 0 R 20 0 R]/T(Language)>>
endobj
23 0 obj
<</DA(/Helv 0 Tf 0 g )/DR<</Encoding<</PDFDocEncoding 26 0 R>>/Font<</Helv 22 0 R/ZaDb 35 0 R>>/XObject<</DSz 51 0 R>>>>/Fields[19 0 R 21 0 R 39 0 R 16 0 R 17 0 R 46 0 R 47 0 R 48 0 R]/SigFlags 1>>
endobj
25 0 obj
<</BBox[0.0 0.0 72.0 20.0]/FormType 1/Length 102/Matrix[1.0 0.0 0.0 1.0 0.0 0.0]/Resources<</Font<</Helv 22 0 R>>/ProcSet[/PDF/Text]>>/Subtype/Form/Type/XObject>>stream
1 g
0 0 72 20 re
f
/Tx BMC
q
2 1 68 18 re
How can I get field position from this code using PHP or JavaScript?
Thanks.

I solve my problem using Itext Pdf Library for Java. See here: http://stackoverflow.com/questions/19066141/itext-get-field-coordinate s-from-existing-pdf

How to return a pdf (iText) file using Portlet?

Dear all,
I want to know how to return a pdf file using iText API via Portlet.
I can manage to return a pdf file through a standalone servlet, but I don't have idea of how to generate a pdf file via Portlet.
Can anyone help me?
Thanks
George (HK)
Welcome to my blog at www.xanga.com/georgelkh

Hello,
that is easy. In your driver program, please check the inputs of the job/ generator function modules.
You provide GETPDF = 'X' attribute what will make the FM not to print the form or display the preview but to return the binary stream of the PDF data (type FPCONTENT) what is a RAW or something. Next look for the PDF output parameter to get the returned binary data and send it as you wish through JCo.
Regards Otto
p.s.: Note there is Adobe forms forum under NetWeaver, you can find me and the others doing Adobe every day

PDF + IText Question

I have the contents of a pdf document as a byte[]. I send the byte[] to another webserver and I want to re-create a new pdf from this byte[] on this other server. How can I do this using Itext?

// write ByteArrayOutputStream to the ServletOutputStream
                 response.setContentType("application/pdf");
                 response.setContentLength(baos.size());                      response.setHeader("Cache-Control","no-cache");
                 response.setHeader("Pragma","no-cache");
                 response.setDateHeader ("Expires", 0);
                 ServletOutputStream out = response.getOutputStream();
                 baos.writeTo(out);
//here baos is a byte array that contains pdf output.Regards
-John-

Parsing PDF file programetically in C#

Hi,
I want to parse a pdf file programetically from a C# code using Acrobat SDK 9.
I am using the following code to read the page information but couldn't read the content text .
            Acrobat.CAcroApp App = new Acrobat.AcroAppClass();
            App.Show();
            Acrobat.CAcroAVDoc avDoc = new Acrobat.AcroAVDoc();
           if (avDoc.Open(fnCtrl.Text, ""))
                Acrobat.AcroAVPageView avPageView = (Acrobat.AcroAVPageView)avDoc.GetAVPageView();
                Acrobat.CAcroPDDoc pdDoc= (Acrobat.CAcroPDDoc)avDoc.GetPDDoc ();
                nopCtrl.Text = pdDoc.GetNumPages().ToString();
                for (int iPage = 0; iPage < pdDoc.GetNumPages(); iPage++)
                    Acrobat.AcroPDPage pdPage = (Acrobat.AcroPDPage)pdDoc.AcquirePage(iPage);
            App.CloseAllDocs();
            App.Exit();
Please help.

Hi,
I have already looked into that example, the problem is inside the document there are few special charecters like +, -, etc and using
getPageNthWord api those are getting missed.

Parse PDF with ASP

Hello All,
I am currently trying to parse my PDF form with ASP. I am running IIS 5.0 and at the moment have a simple ASP page that uses request.form("NAME") to pull the contents from the PDF and then output them to the screen.
However, no data seems to be getting though.
My own thoughts on the matter are that the name is somehow incorrect, but I am not sure where to look for an adobe specific name. At the moment I am referencing the one that is the "Name" box. If anyone knows how to find the alternative name or what else could be my problem, I would be most appriciative.
The end goal of this is to have the user fill in a PDF form then that information be sent to a database. If I can get it into an ASP page, I can work with it from there, but at the moment the PDF is being quite resistive to parting with its data.
Thank you very much,
Casey

The form("FieldName") works if you give the full fieldname ie Form1.fieldname.
You can also try
For a = 0 To Request.Form.Count - 1
If InStr(Request.Form.Keys(a), "FieldName"))>0 Then
FieldData = Request.Form(Request.Form.Keys(a))
/Ulf

PDF itext

Hi All
I am using SWT. For Printing i am exporting all the values to pdf and then make printing.
So far i have placed my pdf in some location using FileOutptStream and the open it using
Executable ex = new Executable();
ex.openDocument(strFilePath);
OR
Runtime.getRuntime().exec("rundll32 url.dll,FileProtocolHandler " + strFilePath);This codes require the file path for the pdf. But what i need is when click the print button i need to open the pdf without placing my pdf in some location. How can i achieve this?
-vignesh

You can replace the FileOutputStream with ByteArrayOutputStream. Call toByteArray() which will return byte[]. You can then wrap that in a ByteArrayInputStream. Note: This means you are keeping the entire PDF in memory, which can cause issues if you have either a large document or multiple users generating medium sized documents.
- Saish

Parsing PDF for transfer to Access db?

We have a project to do the following:
1. Manually read a PDF document;
2. For each 1:m paragraphs, determine the type of content (e.g. Operational, Support, Organizational) and apply a custom stamp at the head of each set of paragraphs;
3. When the document has been completely stamped, manually transfer each of the stamped sections (i.e. 1:m paragraphs) to its own record in an Access database.
We are trying to automate Step 3. But it does not appear that we can export the document and the stamps together. And it appears that we can export the Comments List to an xfdf file but stamps are not included. There may not be a simple way of transferring data to the Access db, but can any part of the transfer be easily automated? Manually copying and pasting text from the PDF to an Access form is extremely tedious. Any ideas? Thnx.

The error message tells you what the method returns -- it returns a ResultSet. For that matter, the API documentation tells you that too. Can you imagine how you might get a number out of a ResultSet?

Parse PDF object for input values

Hi,
can somebody tell me how I can access the values entered into an offline PDF document? Or can anybody tell me where I could finde apporpriate documentation?
Many thanks!
Birgit

Hello.
You can use the method transferPDFDataIntoContext from WDInteractiveFormHelper to populate your context with the data filled in the offline pdf.
http://help.sap.com/javadocs/NW04S/current/wd/com/sap/tc/webdynpro/clientserver/adobe/api/WDInteractiveFormHelper.html
You have to define the same Context structure used to generate the pdf in your app that reads/uploads it.
Something like:
WDInteractiveFormHelper.transferPDFDataIntoContext(wdContext.currentContextElement().getPdfSource(), wdContext.nodePDFData());
Regards,
Paulo.

Exception while PDF Parsing through PDFBOX jar

While I parsing PDF file, I got the following exception.I used PDFBox-0.7.3 jar for pdf parsing.
java.io.IOException: expected='endobj' firstReadAttempt='' secondReadAttempt='' org.apache.pdfbox.io.PushBackInputStream@1027b4d
     at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:485)
     at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:165)
     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:847)
     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:814)
     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:785)
     at TestPDFReader.extractPDFContents(TestPDFReader.java:50)
     at TestPDFReader.getFileContents(TestPDFReader.java:29)
     at TestPDFReader.main(TestPDFReader.java:70)
Can any one suggest me solution for it.
Thanks in Advance
Sandeep Verma

Hi,
Thanks for your suggestion, I believe that environment path was not set correctly and its working now.
I have executed the "
C:\Adobe\Adobe LiveCycle ES4\pdfg_config\Acrobat_for_PDFG_Configuration.bat" batch file and bounce the server and it worked.
Thanks,
Tejraj

Search for text in PDF binary

Hello experts,
by using a SAP BI tool we generate reports in File format. SAP standard do some generating and give back an internal table with filename and file content in binary form.
Now we would like to search in this PDF binary for an special text or string to use them for changing filename.
Is there any way to do that? Every idea and hint is welcome.
Best regards,
Peter

Now we would like to search in this PDF binary for an special text or string to use them for changing filename. Is there any way to do that?
Based on your posting it sounds a bit like you're doing ABAP processing. However, I'll ignore that for now and just say that in the Java environment I have had good experience with the Java Library [iText PDF|http://itextpdf.com/]. I'm not sure what SAP offers in that area, but they must have something, because [TREX|http://help.sap.com/saphelp_nw70/helpdata/EN/a4/929d4206b70931e10000000a1550b0/frameset.htm] "understands" PDF (though that doesn't mean that you have a nice API for parsing PDFs).
You probably investigated this already, but I'd take a look at possibilities to hook in before (or at the time) the PDF gets generated (might be easier to craft and export a filename there). Thanks to the [enhancement framework|http://help.sap.com/saphelp_nw70ehp2/helpdata/en/94/9cdc40132a8531e10000000a1550b0/frameset.htm] you usually have quite a few ways to get things done...
Note that even if you're able to read a PDF, it doesn't necessarily mean that you can parse it the way you want. A silly example would be scanned pages, where the page is stored as an image and at best the scanner software runs some OCR (with possibly buggy results) to provide capabilities for searching the PDF. In your case that's probably not an issue, but still the question might be if the information you're looking at is structured enough to get it back...
Cheers, harald

Search for text in PDF by VBA with only Adobe Reader installed

My problem is widely known and frequenty posted, for instance:
"Can anyone help me to open and search for a specific text string in a PDF document, return a true or false indicator (and nothing else)?"
The answers mostly refer to and include
Set gApp = CreateObject("AcroExch.App")
which, as I understand, works only with a certain level of Adobe Acrobat being installed.
My question now:
I want to give this type of functionality (via an MSAccess Form, i.e. populate a ComboBox with PDF filenames which answer YES to certain text occurences) to - say 20 - users in my company who have Adobe Reader 9.1 installed and not more.
Bying this number of Adobe Acrobat licenses for just this purpose would be a heavy overkill which I just can't afford.
Any suggestions? many thanks in advance.

Now we would like to search in this PDF binary for an special text or string to use them for changing filename. Is there any way to do that?
Based on your posting it sounds a bit like you're doing ABAP processing. However, I'll ignore that for now and just say that in the Java environment I have had good experience with the Java Library [iText PDF|http://itextpdf.com/]. I'm not sure what SAP offers in that area, but they must have something, because [TREX|http://help.sap.com/saphelp_nw70/helpdata/EN/a4/929d4206b70931e10000000a1550b0/frameset.htm] "understands" PDF (though that doesn't mean that you have a nice API for parsing PDFs).
You probably investigated this already, but I'd take a look at possibilities to hook in before (or at the time) the PDF gets generated (might be easier to craft and export a filename there). Thanks to the [enhancement framework|http://help.sap.com/saphelp_nw70ehp2/helpdata/en/94/9cdc40132a8531e10000000a1550b0/frameset.htm] you usually have quite a few ways to get things done...
Note that even if you're able to read a PDF, it doesn't necessarily mean that you can parse it the way you want. A silly example would be scanned pages, where the page is stored as an image and at best the scanner software runs some OCR (with possibly buggy results) to provide capabilities for searching the PDF. In your case that's probably not an issue, but still the question might be if the information you're looking at is structured enough to get it back...
Cheers, harald

How can u get the matching percentage whenever compare the pdf files(compare the strings)

Actually I want matching percentage whenever compare the pdf files.First I had completed
read the pdf files content into string
my code like as
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
namespace WindowsFormsApplication1
public partial class Form1 : Form
string str1;
string filename;
string path;
string str2;
public Form1()
InitializeComponent();
private void button1_Click(object sender, EventArgs e)
OpenFileDialog openFileDialog = new OpenFileDialog();
openFileDialog.CheckFileExists = true;
openFileDialog.AddExtension = true;
openFileDialog.Filter = "PDF files (*.pdf)|*.pdf";
DialogResult result = openFileDialog.ShowDialog();
if (result == DialogResult.OK)
filename = Path.GetFileName(openFileDialog.FileName);
path = Path.GetDirectoryName(openFileDialog.FileName);
textBox1.Text = path + "\\" + filename;
private void button2_Click(object sender, EventArgs e)
OpenFileDialog openFileDialog = new OpenFileDialog();
openFileDialog.CheckFileExists = true;
openFileDialog.AddExtension = true;
openFileDialog.Filter = "PDF files (*.pdf)|*.pdf";
DialogResult result = openFileDialog.ShowDialog();
if (result == DialogResult.OK)
filename = Path.GetFileName(openFileDialog.FileName);
path = Path.GetDirectoryName(openFileDialog.FileName);
textBox2.Text = path + "\\" + filename;
public static string ExtractTextFromPdf(string filename)
using (PdfReader r = new PdfReader(filename))
StringBuilder text = new StringBuilder();
for (int i = 1; i <= r.NumberOfPages; i++)
text.Append(PdfTextExtractor.GetTextFromPage(r, i));
string result = text.ToString();
return result;
public static string Extract(string filename)
using (PdfReader r = new PdfReader(filename))
StringBuilder text = new StringBuilder();
for (int i = 1; i <= r.NumberOfPages; i++)
text.Append(PdfTextExtractor.GetTextFromPage(r, i));
string result1 = text.ToString();
return result1;
private void button3_Click(object sender, EventArgs e)
str1 = Form1.ExtractTextFromPdf(textBox1.Text);
str2 = Form1.Extract(textBox2.Text);
}Finally how can u get the matching percentage whenever compare the pdf files(compare the strings)please help me.thank u

Hi,
Based on your code, I see your code related to
iTextSharp Pdf.
iText is a third party library to create PDF originally written for java. iTextSharp is
the C# adaptation of that library.
Question regarding iText are better asked on the iText forum, rather than the Microsoft Forum:
http://itextpdf.com/support
Thanks for your understanding.
Best regards,
Kristin
We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
Click
HERE to participate the survey.

Problem opening some PDF files in Adobe Acrobat

I have several PDF files that cannot be open in Adobe Acrobat. I have the message as shown below:
There was an error opening this document. The file is damaged and could not be repaired.
But I first discovered the problem in Adobe Reader: the files cannot be open in neither Adobe Reader 9, X nor XI, in Win7 64-bit as well as Windows Server 2008 R2. The original post in Adobe Reader forum is in this link: http://forums.adobe.com/message/6163936
BUT the files are NOT damaged because I can open them with another software called XChange PDF Viewer.
Where can I file a bug to Adobe so they can fix the bug?
Thanks in advance.

The Acrobat Family complies with the ISO standard for parsing PDF files, and it will reject files that contain unrecoverable errors. Third-party applications may decide to ignore the errors or display the corrupted content anyway, but that's their decision.

Generate reports in PDF or EXCEL from Web App...

Hello, I have to generate some reports from data base using web APP on Java, these reports with few generation criteria must be in PDF format and EXCEL format (both), someone knows a tool with which I can generate them?
Best regards.

PDF: iText
Excel: Apache POI

Parsing PDF, iText

Similar Messages

Maybe you are looking for