Splitting a PDF with iText
Hi,
I'm trying to split a pdf into multiply pdf files. One pdf perbookmark.
Can anyone give any pointers?
I've got the bookmarks (via the outline) (they are bookmarks within a bookmark, thats why I go two levels in). But I've just got lost trying to find the pages, and write them out.
Thanks
Mike
import java.util.*;
public class PDFSplit {
public static void main( String argv[] ) throws Throwable {
PdfReader reader = new PdfReader("test.pdf");
PdfDictionary dic = reader.getCatalog();
PdfObject o = dic.get( PdfDictionary.OUTLINES );
PdfDictionary outline = (PdfDictionary)reader.getPdfObject( o );
PdfDictionary first = (PdfDictionary)reader.getPdfObject( outline.get( PdfName.FIRST ) );
PdfDictionary ff = (PdfDictionary)reader.getPdfObject( first.get( PdfName.FIRST ) );
while( ff != null ) {
String title = ff.get( PdfName.TITLE ).toString();
System.out.println( title );
PdfObject next = ff.get( PdfName.NEXT );
if( next == null )
ff = null;
else
ff = (PdfDictionary)reader.getPdfObject( next );
}
Tis OK, worked it out, but I do feel as if a little to much magic is keeping it together....
(very ugly code follows)
import com.lowagie.text.*;
import com.lowagie.text.pdf.*;
import java.io.*;
import java.util.*;
public class PDFSplit {
public static void main( String argv[] ) throws Throwable {
PdfReader reader = new PdfReader("test.pdf");
PdfDictionary dic = reader.getCatalog();
reader.consolidateNamedDestinations();
PdfObject o = dic.get( PdfDictionary.OUTLINES );
PdfDictionary outline = (PdfDictionary)reader.getPdfObject( o );
PdfDictionary first = (PdfDictionary)reader.getPdfObject( outline.get( PdfName.FIRST ) );
PdfDictionary ff = (PdfDictionary)reader.getPdfObject( first.get( PdfName.FIRST ) );
int size = reader.getNumberOfPages();
int start = 1;
String title = ff.get( PdfName.TITLE ).toString();
PdfObject next = ff.get( PdfName.NEXT );
if( next == null ) ff = null;
else ff = (PdfDictionary)reader.getPdfObject( next );
while( ff != null ) {
PdfArray a = (PdfArray)reader.getPdfObject( ff.get( PdfName.DEST ) );
PdfDictionary page = (PdfDictionary)reader.getPdfObject( (PdfObject)a.getArrayList().get(0) );
int end = size;
for( int i = 1; i<size;i++ ) {
if( page.equals( reader.getPageN(i) ) ) {
end = i;
break;
System.out.println( end );
Document document = new Document( reader.getPageSizeWithRotation( 1 ) );
PdfCopy writer = new PdfCopy( document, new FileOutputStream( title + ".pdf" ) );
document.open();
for( int i = start; i<end; i++ ) {
System.out.println( i );
PdfImportedPage cpage = writer.getImportedPage( reader, i );
writer.addPage( cpage );
PRAcroForm form = reader.getAcroForm();
if (form != null)
writer.copyAcroForm(reader);
document.close();
title = ff.get( PdfName.TITLE ).toString();
start = end;
next = ff.get( PdfName.NEXT );
if( next == null )
ff = null;
else
ff = (PdfDictionary)reader.getPdfObject( next );
Document document = new Document( reader.getPageSizeWithRotation( 1 ) );
PdfCopy writer = new PdfCopy( document, new FileOutputStream( title + ".pdf" ) );
document.open();
for( int i = start; i<=size; i++ ) {
System.out.println( i );
PdfImportedPage cpage = writer.getImportedPage( reader, i );
writer.addPage( cpage );
PRAcroForm form = reader.getAcroForm();
if (form != null)
writer.copyAcroForm(reader);
document.close();
}
Similar Messages
-
I am using iText API for creating a PDF file. However when this PDF file shows in a popup-window the file is empty.
One solution can be i need to set the length of the response to the browser; see
Some browsers also need to know the content-length of the PDF in advance (otherwise they just give you a blank page).
The only way we can work around this, is by buffering the complete file in a ByteArrayOutputStream.
That's a pity, because you risk a timeout in the browser-server communication if you need to send really big or time-consuming PDFs.
Document document = new Document();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfWriter.getInstance(document, baos);
document.open();
document.add(new Paragraph(msg));
document.close();
response.setContentType("application/pdf");
response.setContentLength(baos.size());
ServletOutputStream out = response.getOutputStream();
baos.writeTo(out);
out.flush();
I am using this code:
Document document = new Document(PageSize.A4);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfWriter.getInstance(document, baos);
document.open();
PdfPTable table = new PdfPTable(1);
PdfPCell cell;
cell = new PdfPCell(new Paragraph("ONE"));
table.addCell(cell);
cell = new PdfPCell(new Paragraph("TWO"));
table.addCell(cell);
document.add(table);
document.close();
byte[] pdfContent = baos.toString().getBytes("UTF-8");
IWDCachedWebResource pdfRes = WDWebResource.getPublicCachedWebResource(pdfContent,
WDWebResourceType.PDF, WDScopeType.CLIENTSESSION_SCOPE,
wdThis.wdGetAPI().getComponent().getDeployableObjectPart(),"FileNameHelloText");
IWDWindow window = wdComponentAPI.getWindowManager().createNonModalExternalWindow(pdfRes.getUrl(
WDFileDownloadBehaviour.OPEN_INPLACE.ordinal()), pdfRes.getResourceName());
window.setWindowPosition(WDWindowPos.CENTER);
window.show();
Please advise me on this...
Edited by: M. Koevoets on Aug 14, 2009 9:58 AMI am using iText API for creating a PDF file. However when this PDF file shows in a popup-window the file is empty.
One solution can be i need to set the length of the response to the browser; see
Some browsers also need to know the content-length of the PDF in advance (otherwise they just give you a blank page).
The only way we can work around this, is by buffering the complete file in a ByteArrayOutputStream.
That's a pity, because you risk a timeout in the browser-server communication if you need to send really big or time-consuming PDFs.
Document document = new Document();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfWriter.getInstance(document, baos);
document.open();
document.add(new Paragraph(msg));
document.close();
response.setContentType("application/pdf");
response.setContentLength(baos.size());
ServletOutputStream out = response.getOutputStream();
baos.writeTo(out);
out.flush();
I am using this code:
Document document = new Document(PageSize.A4);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfWriter.getInstance(document, baos);
document.open();
PdfPTable table = new PdfPTable(1);
PdfPCell cell;
cell = new PdfPCell(new Paragraph("ONE"));
table.addCell(cell);
cell = new PdfPCell(new Paragraph("TWO"));
table.addCell(cell);
document.add(table);
document.close();
byte[] pdfContent = baos.toString().getBytes("UTF-8");
IWDCachedWebResource pdfRes = WDWebResource.getPublicCachedWebResource(pdfContent,
WDWebResourceType.PDF, WDScopeType.CLIENTSESSION_SCOPE,
wdThis.wdGetAPI().getComponent().getDeployableObjectPart(),"FileNameHelloText");
IWDWindow window = wdComponentAPI.getWindowManager().createNonModalExternalWindow(pdfRes.getUrl(
WDFileDownloadBehaviour.OPEN_INPLACE.ordinal()), pdfRes.getResourceName());
window.setWindowPosition(WDWindowPos.CENTER);
window.show();
Please advise me on this...
Edited by: M. Koevoets on Aug 14, 2009 9:58 AM -
Merge 2 pdf's dynamicly with iText
I would like to merge two pdf's with iText. There is something special with the merging because the first pdf is dynamicly created and does not exists on the disk. I would like to merge it with an existing pdf (on my hard-disk), but the result of the merge must be dynamic too.
I know there is a possibility to do this with iText, but I have no idea how I have to do it. The first (dynamic) pdf is created with iText too and it works fine.
Thanks,
JiebkeiText has a sample application that merges two PDFs into another. If you examine that code and fiddle around a little bit, you should be able to figure out how to do what you want.
What you are trying to do is not that difficult with iText, and I'm sure you are up to the challenge!
- K -
How to write special characters in PDF using iText
How to write special characters encoded with UTF-8 in PDF using iText.
Regards,
Pandharinath.I don't know what your problem is but that's almost certainly the wrong question to ask about it. Java (including iText) uses only Unicode characters. (You may consider some of them to be "special" if you like but Unicode doesn't.) And when it does that, they aren't encoded in UTF-8 or any other encoding.
So can you describe your problem? That question doesn't make sense. -
Creating PDF using ITEXT API's - error
Hi,
In my WebDynpro Application I want to generate a PDF (using ITEXT API's) out of the data retrieved from back end system .
I used this source code.
Document document = new Document(PageSize.A4);
document.open();
PdfPTable table = new PdfPTable(1);
PdfPCell cell;
cell = new PdfPCell(new Paragraph("ONE"));
table.addCell(cell);
cell = new PdfPCell(new Paragraph("TWO"));
table.addCell(cell);
document.add(table);
document.close();
byte[] b = new byte[100 * 1024];
b = document.toString().getBytes("UTF-8");
IWDCachedWebResource pdfRes = WDWebResource.getPublicCachedWebResource(b, WDWebResourceType.PDF, WDScopeType.CLIENTSESSION_SCOPE, wdThis.wdGetAPI().getComponent().getDeployableObjectPart(),"FileNameHelloText"));
I have used Window Manager to create a external window with the URL from pdfRes.getUrl() method.
After execution i get a pop up window with out PDF document.
Please let me know your thoughts & solutions to the above mentioned problem.
Thanks
SenthilHello Folks,
Use the following snippet of the code to generate PDF using ITEXT API.
Document document = new Document(PageSize.A4);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
PdfWriter.getInstance(document, bos);
document.open();
PdfPTable table = new PdfPTable(1);
PdfPCell cell;
cell = new PdfPCell(new Paragraph("ONE"));
table.addCell(cell);
cell = new PdfPCell(new Paragraph("TWO"));
table.addCell(cell);
document.add(table);
document.close();
byte [] byteContent = bos.toByteArray();
IWDCachedWebResource cachedResource =
WDWebResource.getPublicCachedWebResource(
byteContent,
WDWebResourceType.PDF,
WDScopeType.CLIENTSESSION_SCOPE,
wdThis
.wdGetAPI()
.getComponent()
.getDeployableObjectPart(),
"TestPDF");
IWDWindow externalWindow =
wdComponentAPI
.getWindowManager()
.createExternalWindow(cachedResource.getURL(), "PDF Window",true);
externalWindow.open();
Thanks and Regards,
Gopi -
How do I split a pdf file when the file size is too large?
How do I split a pdf file when the file size is too large? Thanks!
With Adobe Acrobat. It can also optimize your document to make the size smaller.
-
Applescript or workflow to extract text from PDF and rename PDF with the results
Hi Everyone,
I get supplied hundreds of PDFs which each contain a stock code, but the PDFs themselves are not named consistantly, or they are supplied as multi-page PDFs.
What I need to do is name each PDF with the code which is in the text on the PDF.
It would work like this in an ideal world:
1. Split PDF into single pages
2. Extract text from PDF
3. Rename PDF using the extracted text
I'm struggling with part 3!
I can get a textfile with just the code (using a call to BBEDIT I'm extracting the code)
I did think about using a variable for the name, but the rename functions doesn't let me use variables.Hello
You may also try the following applescript script, which is a wrapper of rubycocoa script. It will ask you choose source pdf files and destination directory. Then it will scan text of each page of pdf files for the predefined pattern and save the page as new pdf file with the name as extracted by the pattern in the destination directory. Those pages which do not contain string matching the pattern are ignored. (Ignored pages, if any, are reported in the result of script.)
Currently the regex pattern is set to:
/HB-.._[0-9]{6}/
which means HB- followed by two characters and _ and 6 digits.
Minimally tested under 10.6.8.
Hope this may help,
H
_main()
on _main()
script o
property aa : choose file with prompt ("Choose pdf files.") of type {"com.adobe.pdf"} ¬
default location (path to desktop) with multiple selections allowed
set my aa's beginning to choose folder with prompt ("Choose destination folder.") ¬
default location (path to desktop)
set args to ""
repeat with a in my aa
set args to args & a's POSIX path's quoted form & space
end repeat
considering numeric strings
if (system info)'s system version < "10.9" then
set ruby to "/usr/bin/ruby"
else
set ruby to "/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby"
end if
end considering
do shell script ruby & " <<'EOF' - " & args & "
require 'osx/cocoa'
include OSX
require_framework 'PDFKit'
outdir = ARGV.shift.chomp('/')
ARGV.select {|f| f =~ /\\.pdf$/i }.each do |f|
url = NSURL.fileURLWithPath(f)
doc = PDFDocument.alloc.initWithURL(url)
path = doc.documentURL.path
pcnt = doc.pageCount
(0 .. (pcnt - 1)).each do |i|
page = doc.pageAtIndex(i)
page.string.to_s =~ /HB-.._[0-9]{6}/
name = $&
unless name
puts \"no matching string in page #{i + 1} of #{path}\"
next # ignore this page
end
doc1 = PDFDocument.alloc.initWithData(page.dataRepresentation) # doc for this page
unless doc1.writeToFile(\"#{outdir}/#{name}.pdf\")
puts \"failed to save page #{i + 1} of #{path}\"
end
end
end
EOF"
end script
tell o to run
end _main -
Barcode failed to display in PDF with XFDF
Hi,
I am a newbie in this forum and if you found my topic should not be posted here,
please kindly let me know.
I have a question about online filling data to a PDF and hope someone can help me.
I am having a XFDF and PDF and data will be filled into PDF online. One of the fields
on the PDF is a code 3 of 9 (code39) barcode. The PDF has embedded the code39
font in it but the PDF does not format the text values into barcode form.
I found an article in web showing how to show barcode on PDF with XFDF
<field name="Barcode" type="barcode" symbology="pdf417" >
<value>
PDF-417 is a "stacked linear" symbology invented by
Symbol. The iText library makes it very easy to embed this
symbology in a PDF file.
</value>
</field>
Must I specify this format in the XFDF file such that the PDF knows the text field is
a barcode format?
Should I change the 'symbology="pdf417"' into 'symbology="code39"' ?
If you have any experience on this issue, please kindly share it with me.
Thanks a lot,
RaymondThat sounds like an old nVidia display driver bug.
Is there any possibility you took a Windows Update or something that ''updated'' your display driver to an older WHQL version?
Perhaps completely removing the driver and getting the very latest one from nVidia.com might be in order.
-Noel -
can i split a pdf file by sheet who contains customers with one or two sheets by customer ?
example one pdf file only one sheet by customer et an other pdf file only two sheets by customerDid you try to extract the Pages?
[signature deleted] -
hi,
is there a library that i can use for
pdf (another than pdflib that i use but it can't
merge pdf with transparency)
thanks
alexkritchektry itext
http://www.lowagie.com/iText/ -
Adding text to PDF using iText instead of CFPDF
Hi,
I know this may seem a bit off topic being posted here but i'm asking this board since i'm a complete JAVA noob and i figure some of you CF folk might have had to do this before.
Anyway, about my question...i'm already adding a watermark image to a pdf using iText (CF8) thanks to the help of fellow poster (=cfSearching=). What i'm looking for is the best way to go about adding some text to this same pdf. I need to add 4 lines of text (with specific font and size) and center it underneath the added image. Does anyone have a site they could point me to as to how to add formatted text and how to get the width of that text so as to align it correctly? I've search Google and looked at a lot of JAVA code but being a JAVA noob it's tough to figure out exactly which libs and methods can be used to do this.
Any help would be greatly appreciated!
-MichaelHi again!
Well, the merged image is an idea but i'd rather have it be actual text so that it is at least copy/paste-able if viewed on a computer.
The four lines of text are dynamic (company name, broker name, phone number, email address) and limited to 40 characters. Right now they are being added via CFPDF and DDX and use the following code in the DDX file to add it to the PDF.
<PDF result="DestinationFile">
<PDF source="SourceFile">
<Watermark
rotation="0"
opacity="100%"
horizontalAnchor="#horzAnchor#"
horizontalOffset="#horzOffset#"
verticalAnchor="#vertAnchor#"
verticalOffset="#vertOffset#"
alternation="OddPages"
>
<StyledText text-align="center">
<p font="#font#" color="#color#" >#left(dCompany,maxlinechars)#</p>
<p font="#font#" color="#color#" >#left(dName,maxlinechars)#</p>
<p font="#font#" color="#color#" >#left(dPhone,maxlinechars)#</p>
<p font="#font#" color="#color#" >#left(dEmail,maxlinechars)#</p>
</StyledText>
</Watermark>
</PDF>
</PDF>
Then using the created pdf from above, i use a slightly modified version of the cfscript code ( that uses iText) you provided me previously to add a logo image just above this text. The only changes i made to it were resizing of the image and adding where to place it. Here is that code:
<cfscript>
fullPathToInputFile = "#tempdestfilepath#";
writeoutput("<br>fullPathToInputFile=#fullPathToInputFile#");
fullPathToWatermark = osFile("#request.logofilepath##qord.userlogo_file#",request.os);
writeoutput("<br>fullPathToWatermark=#fullPathToWatermark#");
fullPathToOutputFile = "#destfilepath#";
writeoutput("<br>fullPathToOutputFile=#fullPathToOutputFile#");
ppi = 72; // points per inch
watermark_x = ceiling(#qord.pdftemplate_logo_x# * ppi); // from bottom left corder of pdf
watermark_y = ceiling(#qord.pdftemplate_logo_y# * ppi); // from bottom left corder of pdf
fh = ceiling(0.75 * ppi);
fw = ceiling(1.75 * ppi);
if( not fileexists(fullPathToInputFile) )
savedErrorMessage = savedErrorMessage & "<li>Input file pdf for logo add does not exist<br>#fullPathToInputFile#</li>";
else
try {
// create PdfReader instance to read in source pdf
pdfReader = createObject("java", "com.lowagie.text.pdf.PdfReader").init(fullPathToInputFile);
totalPages = pdfReader.getNumberOfPages();
// create PdfStamper instance to create new watermarked file
outStream = createObject("java", "java.io.FileOutputStream").init(fullPathToOutputFile);
pdfStamper = createObject("java", "com.lowagie.text.pdf.PdfStamper").init(pdfReader, outStream);
// Read in the watermark image
img = createObject("java", "com.lowagie.text.Image").getInstance(fullPathToWatermark);
w = img.scaledWidth();
h = img.scaledHeight();
//$is[0] = w
//$is[1] = h
if( w >= h )
orientation = 0;
else
orientation = 1;
fw = max_h;
fh = max_w;
if ( w > fw || h > fh )
if( ( w - fw ) >= ( h - fh ) )
iw = fw;
ih = ( fw / w ) * h;
else
ih = fh;
iw = ( ih / h ) * w;
t = 1;
else
iw = w;
ih = h;
t = 2;
// adding content to each page
i = 0;
//while (i LT totalPages) {
i = i + 1;
content = pdfStamper.getOverContent( javacast("int", i) );
img.setAbsolutePosition(javacast("float", watermark_x), javacast("float", watermark_y));
if(t==1)
img.scaleAbsoluteWidth( javacast("float", iw) );
img.scaleAbsoluteHeight( javacast("float", ih) );
content.addImage(img);
WriteOutput("Watermarked page "& i &"<br>");
//WriteOutput("Finished!");
catch (java.lang.Exception e) {
savedErrorMessage = savedErrorMessage & "<li>#e#</li>";
// closing PdfStamper will generate the new PDF file
if (IsDefined("pdfStamper")) {
pdfStamper.close();
if (IsDefined("outStream")) {
outStream.close();
</cfscript>
The above code resized the image to a certain width/height if needed and adds it to the pdf.
I just figured they might be a way to tap into one of the java objects that would allow adding the text. Ideally, adding the text and image to some sort of 'bounding box' that would allow centering of the image and text in relation to that bounding box. Or if there is no way to add to a bounding box, a way to get the horizontal length of the longest line of text so i could calculate a common centerline for the image and text.
I've attached the following pdf to show how the image and text would look together. This example is not to scale but a similar image and text would be added to a separate pdf.
Thanks for you help. -
Hi,
How to split the pdf file as chapter?. (or) splitting the pdf file as pages?
Is any packages in java?.
can u please tell me.
Thanks,
nithihttp://www.lowagie.com/iText/
-
Split a pdf based on text?
Hello,
We'd like to split a large pdf (1200+ pages) into multiple files. We have Acrobat X Pro and in this instance, have thousands of records in the initial pdf with each record ending in 'End of Record'.
1.) Can we split the initial pdf into multiple files by simply telling Acrobat to create a new file each time it sees 'End of Record'?
2.) Can we (batch) insert top-level bookmarks after each 'End of Record' then use the Split Document command to create multiple files?
Any help is appreciated!Possible with JavaScript, but it will be a fair amount of coding.
First you need to read each word on each page and search for the words "End of Record". This assumes that the string "End of Record" will not appear elsewhere in the text of the pages being extracted. It also assume that the word string "End of File" appears in retrieval of getting Nth word. The words are not returned in the reading order but in the order that that text was placed or plotted on the electronic page.
You could also just extract the pages based on finding the words "End of Record".
getPageNthWord
extractPages -
Populating Multiple Page PDF with FDF
Hi -
We have a need to populate a multiple page PDF (i.e., 40 fields across two distinct pages) with a FDF that we are creating in a separate scripting language (basically we are taking a user populated spreadsheet and want to populate the thousands of iterations of forms from it).
We are using Adobe Acrobat X and it works if we split the PDF into separate PDF files by page and do the same with the FDF...but can't seem to get it to work with using a single multiple page PDF and a single FDF.
We validated that each field has its own distinct name.
This is our first forray into working with PDFs...
Any thoughts?
JasonIt's hard to say what's wrong without look at the PDF and the FDF. How many pages are in the PDF and how many total fields are in it?
-
How i can split a pdf doc to two?
is it possible to split a pdf document to two and if yes how i can do it?
You can't do it with Reader. You need Acrobat for that.
Maybe you are looking for
-
Profit Center Report doesn't show data
Dear Experts, I've posted some data thru FB50 in FI, with different cost center assignment 1201 and 1202. Both cost centers are assigned to profit centers, YB110 and YB120 respectively in master data. Then, I tried to run a profit center report, S_A
-
HELP Can you see the missing parenthesis ?!
Dear People, I am doing a simple program that creates a ContactBook and allows keyboard entry of lastName, telephoneNumber and emailAddress. I have a few error messages that say parenthesis missing but I don't see any missing ! : "Note.java": Error #
-
Problem with nvidia video card GTX670 in PS CC
I have been using PE10, but have just installed Photoshop CC v. 13.1.2 x64. Although my video card (nvidia GTX 670, 2 gb RAM) is recognized by Photoshop, when I check "use graphics processor" under "preferences/performance" the converted RAW image i
-
My macbook won't give me the sign in window
when I reboot my macbook pro, I get the sign in screen, but the blank to enter my password does not come up
-
Lync 2013 external to internal IM is slow or messages do not arrive
Hi, I'm building a Lync 2013 setup on Windows Server 2012. Internal Lync 2013 standard server with Lync 2013 edge server in a DMZ. No federation. Internally al users are able to chat, audio/video and conferance. My problem is that external users are