Stapler - A python utility for manipulating PDF docs based on pypdf

Link to github:
http://github.com/hellerbarde/stapler/tree/master
* Dependencies *
Stapler depends only on the packages python and python-pypdf, both of
which can be found in the archlinux repositories.
* History *
Stapler is a pure python replacement for PDFtk, a tool for manipulating PDF
documents from the command line. PDFtk was written in Java, and natively
compiled with gcj. And it has been discontinued a few years ago and bitrot is
setting in (i.e. it does not compile anymore in archlinux).
Since I used it quite a lot, I decided to look for an alternative and found
pypdf, a PDF library written in pure Python. I couldn't find a tool which
actually uses the library, so I started writing my own.
At some point I plan on providing a GUI, but the command line version will
always exist.
* License *
A simplified BSD Style license describes the terms under which Stapler is
distributed. A copy of the BSD Style License used is found in the file "LICENSE"
* Usage *
I am too lazy at the moment to learn how to create a proper man page so this has
to suffice.
There are 4 modes in Stapler:
- cat:
Works like the normal unix utility "cat", meaning it con_cat_enates files.
The syntax is delightfully simple:
Syntax
stapler cat input1 [input2, input3, ...] output
Example:
stapler cat a.pdf b.pdf c.pdf output.pdf
# this would append "b.pdf" and "c.pdf" to "a.pdf" and write the whole
# thing to "output.pdf"
you can specify as many input files as you want, it always cats all but the
last file specified and writes the whole thing into the last file specified
- split:
Splits the specified pdf files into their single pages and writes each page
into it's own pdf file with this naming scheme:
${origname}p${zeropaddedpagenr}.pdf
Syntax:
stapler split input1 [input2 input3 ...]
Example for a file foobar.pdf with 20 pages:
$ stapler split foobar.pdf
$ ls
foobarp01.pdf foobarp02.pdf ... foobarp19.pdf foobarp20.pdf
Multiple files can be specified, they will be processed as if you called
single instances of stapler.
- select/delete (called with sel and del respectively)
These are the most sophisticated modes. With select you can cherrypick pages
out of pdfs and concatenate them into a new pdf file.
Syntax:
stapler sel input1 page_or_range [page_or_range ...] [input2 p_o_r ...]
Example:
stapler sel a.pdf 1 4-8 20-40 b.pdf 1-5 output.pdf
# this generates a pdf called output.pdf with the following pages:
# 1, 4-8, 20-40 from a.pdf, 1-5 from b.pdf in this order
What you _cannot_ do yet is not to specifying any ranges. I will probably merge
select and cat at some point in the future so that you can specify pages and
ranges, and if you don't, it just uses the whole file.
The delete command works almost exactly the same as select, but inverse.
It cherrypicks the pages and ranges which you _didn't_ specify out of the
pdfs.
contact me if you have questions about usage or anything, really
2009, Philip Stark (heller <dot> barde <at> gmail <dot> com)
Last edited by Heller_Barde (2009-08-05 21:02:54)

firecat53 wrote:
Are you planning on adding eventually the full functionality of pdftk -- rotate, watermark, encrypt, etc? I occasionally use pdftk, and I'd like to see a replacement since it's apparently having troubles keeping up to date. Great work!
Scott
if you help me with ideas for the command syntax, sure why not. pypdf supports these things. can you make a complete list of things pdftk does that would be important to port over? and how the command line syntax for these features work. I'll then get working
EDIT: I just had a look at the pdftk man page (didn't think of that when i wrote the above...) there are some things that will not be possible with the current version of pypdf (and frankly i doubt there is going to be a new release):
update_info (because there is no way to write document properties with pypdf, you can read them just fine though)
fill_form (similar reason. it's just not supported)
the rest will be fine. I'll rename split to burst and then I'll mimic the cat function. the others should be no problem either.
although... i didn't find anything on rotate in the pdftk manual. how should i implement a rotate function? Rotate complete documents or single pages?
cheers
Phil
Last edited by Heller_Barde (2009-08-06 07:08:42)

Similar Messages

Help for printing PDF docs.

Hi, I am trying to print a PDF doc, and a window appear is " Prop Res DLL not loaded "!!!
I can print everything except PDF file
Thanks for you help
PS: What is the meaning of ERROR Prop .....etc
Ce message a été modifié par: chomedey657

Hi, I am trying to print a PDF doc, and a window appear is " Prop Res DLL not loaded "!!!
I can print everything except PDF file
Thanks for you help
PS: What is the meaning of ERROR Prop .....etc
Ce message a été modifié par: chomedey657

Best Practice for storing PDF docs

My client has a number of PDF documents for handouts that go
with his consulting business. He wants logged in users to be able
to download the PDF docs for handouts at training. The question is,
what is the 'Best Practice' for storing/accessing these PDF files?
I'm using CF/MySQL to put everything else together and my
thought was to store the PDF files in the db. Except! there seems
to be a great deal of talk about BLOBs and storing files this way
being inefficient.
How do I make it so my client can use the admin tool to
upload the information about the files and the files themselves,
not store them in the db but still be able to find them when the
user want's to download them?

Storing documents outside the web root and using
<cfcontent> to push their contents to the users is the most
secure method.
Putting the documents in a subdirectory of the web root and
securing that directory with an Application.cfm will only protect
.cfm and .cfc files (as that's the only time that CF is involved in
the request). That is, unless you configure CF to handle every
request.
The virtual directory is no safer than putting the documents
in a subdirectory. The links to your documents are still going to
look like:
http://www.mysite.com/virtualdirectory/myfile.pdf
Users won't need to log in to access these documents.
<cfcontent> or configuring CF to handle every request
is the only way to ensure users have to log in before accessing
non-CF files. Unless you want to use web-server
authentication.

Used to be able to print multiple page pdf files on my HP 7310 all in one and then it stopped and would only print the curent page. This is tedious for long PDF docs. I am on 10.6.8 .

Used to be ablert to print multiple page pdf files on my HP 7310 all in one printer and now I can only print the current page and therefore it takes forever printing them one by one. I am in version 10.6.8. Tried printing as image using the advanced click and that did not work either. I have Adobe 9.0 reader installed.

I tried this earlier, and I tried it again today. Both times it said "Software Missing!" and "HP Software required to connect to your printer over the network could not be found on this computer." But when I tried to install the software (AIO_CDB_Net_Full_Win_WW_130_141.exe, which I downloaded from HP's web site), it wouldn't install, as described above. In the diagnostic utility, I clicked "skip", and it said "Connection Verified!", and "The printer is connected to the network and the services related to the network connections have been verified and reset to a normal operating state. Everything appears to work fine at this point. Please doa test print to verify that the issue is resolved or click Skip to move to the next step." I clickedthe "Test Print" button, and immediately it popped up a box that said "Test Print Failed."
I tried again to install the HP software, and it installed, detected the printer, and asked me to select it. I selected the printer, clicked "next", and it did its network diagnostics. Then it said "Problem(s) found with your network" and "Problem(s) may exist with the network functions of your printer . . ." I continued the installation without connecting to the printer. Then I ran your network printer diagnostic tool again, and got the same result - "HP Software required to connect to youir printer over the network could not be found on this computer."

Exception in PreFlight for a PDF doc

Now I want to call it vb console app to get error, i have following code
PDFApp = CreateObject("AcroExch.App")
                        PDDoc = CreateObject("AcroExch.PDDoc")
                        PDDoc.Open(fullPathPDF)
                        pdfPage = PDDoc.AcquirePage(0)
                        pdfPoint = pdfPage.GetSize
                        ' Create AV doc from PDDoc object
                        AVDoc = PDDoc.OpenAVDoc("TempPDF")
                        ' Hide Acrobat application so everything is done in silent mode()
                        PDFApp.Hide()
                        app = PDDoc.GetJSObject()
                        app.PreFilght() //This is throwing exception Public member 'PreFilght' on type '_ComObject' not found.
how do I get errors from preiflght in vb code?

'PreFilght' doesn't exists. You may use Preflight.

Report region with column link that opens a pdf doc based on report query

Hello
I'm building a report table that displays info about a customer - simple select - and, for each record, has associated column links based on report queries that receive ID as parameter. When clicked, it opens the report in pdf extension. My problem here is how to pass the ID as a parameter to that report query considering i'm using a report table and that there are no items in page 71...
This is the report query i'm using:
select initcap(a.customer) customer
, initcap(a.address) address
, initcap(a.rep) rep
, (select initcap(b.city)
from portal_records b
where b.contrib=a.contrib
and b.year=to_char(sysdate,'yyyy')) city
, (to_char(a.datereg,'dd')||' de '||to_char(a.datereg,'Month')||' de '||to_char(a.datereg,'yyyy')) datereg
from portal_authorizations_cve a
where a.id=:P71_ID ???????????????
I thank in advance all your replies!!

Hello
First of all, let me compliment your for your demo application... It's awesome!
I've looked into your sample (page 15) and, as far as i see, it opens a document saved in a table's column. I don't want the file to be saved there but generated when the user clicks on that particular link... So i still have the problem of how to pass the right ID as a parameter considering there is no page item on that page...
My javascript knowledge is little so i ask you: when clicking the link, is there any way of opening a window with the url f?p=&APP_ID.:0:&SESSION.:PRINT_REPORT=Authorization_CVE and the ID as a parameter?
I thank in advance!

PDF Docs in browser cannot be found (15;524)

Have received this following The Adobe/Reader selected for viewing PDF docs in browsers cannot be found at its installed location; it may have been moved or deleted.
Please reinstall or repair the application (15;524)
Also on games on facebook some have frozen on download and have been advised that my flash player needs to be updated can you advise how I can do this.

Moving this discussion to the Adobe Reader forum.

TREX search failure for some pdf documents

Hi,
TREX search is not getting correct result for some pdf documents. It's not able read the content of some pdf documents. When we search with file name the search result is correct but we are getting "No document excerpt available" message in search result for this file name.
Let me know where might be the problem for those pdf docs.
Thanks

Hi Tatayya Marni,
1. Can you tell me whether the pdf documents are opening when you are selecting them?
2. This error comes mostly when you havent included this .extn file (.pdf in your case) to the portal.
3. Check whether this extn is their in your portal if not make it inclueded and then try making a new index and one new data source where your pdf is residing. Try for one Test.
You have to include the file extns in System Confi....UI Interface...
Hope this will resolve your issue.
Regards
Piyush Bhurangi

DW 8 and PDF Docs

I'm having trouble getting pdf docs to load in Dreamweaver 8.
All of the links looks fine and it loads on another machine just
fine. Anyone have any suggestions. Any help will be greatly
appreciated.

What happens when the second machine click on a PDF link?
If everything else is accurate, as you say, then the prime
culprit is
Acrobat. Depending on the machine, the browsers, the plug-in,
and the phase
of the moon (which is nearly full), different machines can
handle PDFs
differently.
Does the PDF not open in the browser?
Does it open in Acrobat Reader?
Does it open in full version Acrobat?
Does nothing happen?
MD
murphy#1 wrote:
> It works on my primary machine where I actually put the
website
> together. Copied files and folders (so everything would
match) and
> recreated links on secondary machine. All links work
except for the
> pdf docs and the links are identical. Checked and
doubled checked
> those. I'm thinking about reinstalling Adobe on the
secondary
> machine and printing those files again through the
distriller and
> save them in the correct file. Also, will "dump" the
cookies on the
> secondary machine as you suggested. Site has not been
published to
> the web yet and won't until secondary machine works too.
(I have
> been known to have a ID10T error frequently myself.)

Asha 302 - PDF/Doc/Xls reader issue

I have downloaded ezreader from store for reading pdf/doc/xls files. After installation of the application i cannot open the file tree. As i select open from the ezreader window it shows c: and e: drives, but selecting any of the drives will return the screen to ezreader home window. Can somebody help on this. Without a file opening option the the mail functionality of Asha302 is of no use.

I also have this problem, & its a pain. Im pretty well computer illiterate & have used nokia for online use due to their ease in the past but they've really dropped the ball with the Asha 302. I had an E63 last, & had no problem with it. I had a lot of downloaded files saved to the memory card, & when the E63 recently stopped working I got the asha as I was told it was an upgrade (yeah right). I transfered the sim & memory to the asha, & now, not only cant I open pdf files etc online, but I cant access the saved files, including files I recieved attached to emails that are saved to the memory card. I have downloaded the ezereader with the same results others have describe,,, it dont work. Thinking I may have been doing something wrong due to lack of knowledge, I went to the Optus shop where I bought the POS asha302 & asked for them to assist. Well after wasting over an hour & a half while they unsuccessfully tried I was told the best thing to do would be to buy a card reader so I could at least access the files that are on the memory card via computer. Still means I've been ripped off on the phone, doesnt it

I have had a trial version of Acrobat X1 Pro - I have decided not to buy at this stage - for some time it has been conflicting with opening PDF docs after saving as from word 2007 - I uninstalled Pro X1 and now when I save as from word 2007 to PDF it will

Can anyone help with this - do I have to uninstall Reader and then reinstall?

I have had a trial version of Acrobat X1 Pro - I have decided not to buy at this stage - for some time it has been conflicting with opening PDF docs after "saving as" from word 2007 - I uninstalled Pro X1 and now when I "save as" PDF from word 2007 to PDF it will save the document as a PDF but will not open the document to display after publishing - I have to got to where the file has been saved to view the new PDF document - this is really annoying - do I have to delete adobe reader and reinstall it - adobe needs to look at this conflict with acrobat pro as I have even gone it to properties and tried to have adobe reader as the default PDF program - the main issue is that I cannot view the PDF after publishing it from word 2007

I have Adobe Photoshop Elements 10 plus I create PDF files for work some are scan pdf docs. When I install Photoshop Elements 10 it DOES convert all the PDF files to Photoshop Elements-10 Docs. it even changes and shows the PSE-10 Icon. So I am alway inst

I have Adobe Photoshop Elements 10 plus I create PDF files for work some are scan pdf docs. When I install Photoshop Elements 10 it DOES convert all the PDF files to Photoshop Elements-10 Docs. it even changes and shows the PSE-10 Icon. So I am alway installing PSE-10 or uninstalling it. If I send the PDF file that has been automatically converted to a PSE-10 the person I send the file to can not open it because they do not have PSE-10. What can I do to stop PSE-10 from converting my PDF files? Don't tell me to upgrade PSE-10 I tried their on-line program and it is too advance for a hobby photographer like myself and their Help Desk is impossible to reach.

Hi,
Can you please share the logs?
You can use the Adobe Log Collector tool (Log Collector Tool) and share the corresponding zip file @ [email protected]
Thanks,
Shikha

Does Adobe Reader for iOS have the ability to open inbedded links to additional PDF docs? If not, then what would be the best way to use these already created PDF's?

Does Adobe Reader for iOS have the ability to open inbedded links created with Acrobat Standard to additional PDF docs? If not, then what would be the best way to use these already created PDF's on an I Pad?

driddy61,
As of June 2014, none of the Adobe Reader mobile products support the hyperlink action for opening a separate PDF document.
Adobe Reader for iOS
Adobe Reader for Android
Adobe Reader Touch for Windows 8
In addition, the Reader mobile products do not open multiple windows/documents simultaneously, which would make the navigation between PDF documents nearly impossible. (Once a hyperlink takes you to a different PDF document, you have no way to go back to the original PDF document.)
The only Adobe Reader product that fulfills your department's requirements is Adobe Reader XI (mostly for Windows/Mac desktop/laptop computers). Acrobat Pro and Standard are paid products.
Because you are in search of a less expensive device for your department, you could get a Windows tablet instead of a Windows desktop/laptop computer. Microsoft Surface Pro (that you've mentioned in your previous reply) is just one example. You can also find other less expensive Windows tablets.
Tablets
However, please keep in mind that there are two different types of Windows tablets running two different operating systems.
(a) A Windows tablet with an Intel-based processor running Windows 8.1 Pro
Example: Surface Pro 3
You can install and run traditional desktop apps (e.g. Adobe Reader XI) and new Windows Store apps ("Modern" or "Metro-style" apps).
(b) A Windows tablet with an ARM-based processor running Windows RT 8.1
Example: Surface 2
You can only install and run Windows Store apps (e.g. Adobe Reader Touch) but not traditional desktop apps like Adobe Reader XI.
In general, type (b) tablets are more affordable than type (a) tablets. However, if you want to run Adobe Reader XI, you do need to check the technical specification of each tablet and make sure the following conditions are met.
Processor: Intel
Operating system: Windows 8/8.1 or Windows 8/8.1 Pro, not RT
Hope this helps you choose the right device for your department. Please let us know if you have any questions about system requirements or supported features in the Adobe Reader products.

In MAC, I want to change document size from 8.5X11 to 18X24 to create a poster to print through Staples. I created the doc originally in WORD, changed the size in WORD, converted to PDF doc. But PDF doc is still in 8.5X11. Read ADOBE support help info. Te

In MAC, I want to change document size from 8.5X11 to 18X24 to create a poster to print through Staples. I created the doc originally in WORD, changed the size in WORD, converted to PDF doc. But PDF doc is still in 8.5X11. Read ADOBE support help info. Terls me to change size in application rather than printer. BUT ACROBAT Pro does not give me a page set up option in FILE. I can only find one in the printer dialog box. Help!

from the FAQs on Staples website:
I have a file that I know is a PDF, but the website claims it is not in a PDF format. What should I do?
Check to see that the file has the .PDF extension. Also, check that the filename does not have any special characters such as an ampersand (“&”).
Regarding your measurements set to centimeters rather than inches; is it just in MS Word?
Or does it occur in all other applications.
Check your Work preferences first:
If it is happening in all your applications, check your Mac OS System Preferences.

Pdf doc is required for B2C CRM 2007 WEB SHOP

Hello Experts,
We have a scenario where (pdf doc ) is required for b2c Header level and tem level in CRM 2007 web channel.
we have set up a product and product catalog and maintained a PDF document in the HEADER AREA--
Documents Tab--BDS_PDF folder.
After intial replication Documents was published in the following path
*catalog/DE DE161EE3D35DAAF19F1E000C29D94525.pdf *
we are able to find images and pdf in the index server but unable to see the same in the web shop.
Did I miss some thing ? Do I need to do any other settings in XCM or in JSP?
Thanks in advance
Namitha Verma

Hello namitha Verma
I am facing the same problem, are you able solve the problem kindly let me know .
Regards
Kiran Posanapalli

Stapler - A python utility for manipulating PDF docs based on pypdf

Similar Messages

Maybe you are looking for