Stapler - A python utility for manipulating PDF docs based on pypdf

Link to github:
http://github.com/hellerbarde/stapler/tree/master
* Dependencies *
Stapler depends only on the packages python and python-pypdf, both of
which can be found in the archlinux repositories.
* History *
Stapler is a pure python replacement for PDFtk, a tool for manipulating PDF
documents from the command line. PDFtk was written in Java, and natively
compiled with gcj. And it has been discontinued a few years ago and bitrot is
setting in (i.e. it does not compile anymore in archlinux).
Since I used it quite a lot, I decided to look for an alternative and found
pypdf, a PDF library written in pure Python. I couldn't find a tool which
actually uses the library, so I started writing my own.
At some point I plan on providing a GUI, but the command line version will
always exist.
* License *
A simplified BSD Style license describes the terms under which Stapler is
distributed. A copy of the BSD Style License used is found in the file "LICENSE"
* Usage *
I am too lazy at the moment to learn how to create a proper man page so this has
to suffice.
There are 4 modes in Stapler:
- cat:
  Works like the normal unix utility "cat", meaning it con_cat_enates files. 
  The syntax is delightfully simple:
    Syntax
    stapler cat input1 [input2, input3, ...] output
    Example:
    stapler cat a.pdf b.pdf c.pdf output.pdf
    # this would append "b.pdf" and "c.pdf" to "a.pdf" and write the whole 
    # thing to "output.pdf"
  you can specify as many input files as you want, it always cats all but the
  last file specified and writes the whole thing into the last file specified
- split:
  Splits the specified pdf files into their single pages and writes each page
  into it's own pdf file with this naming scheme:
    ${origname}p${zeropaddedpagenr}.pdf
  Syntax:
    stapler split input1 [input2 input3 ...]
  Example for a file foobar.pdf with 20 pages:
    $ stapler split foobar.pdf
    $ ls
      foobarp01.pdf foobarp02.pdf ... foobarp19.pdf foobarp20.pdf
  Multiple files can be specified, they will be processed as if you called
  single instances of stapler.
- select/delete (called with sel and del respectively)
  These are the most sophisticated modes. With select you can cherrypick pages
  out of pdfs and concatenate them into a new pdf file.
  Syntax:
    stapler sel input1 page_or_range [page_or_range ...] [input2 p_o_r ...]
  Example:
    stapler sel a.pdf 1 4-8 20-40 b.pdf 1-5 output.pdf
    # this generates a pdf called output.pdf with the following pages:
    # 1, 4-8, 20-40 from a.pdf, 1-5 from b.pdf in this order
  What you _cannot_ do yet is not to specifying any ranges. I will probably merge
  select and cat at some point in the future so that you can specify pages and
  ranges, and if you don't, it just uses the whole file.
  The delete command works almost exactly the same as select, but inverse.
  It cherrypicks the pages and ranges which you _didn't_ specify out of the
  pdfs.
contact me if you have questions about usage or anything, really
2009, Philip Stark (heller <dot> barde <at> gmail <dot> com)
Last edited by Heller_Barde (2009-08-05 21:02:54)

firecat53 wrote:
Are you planning on adding eventually the full functionality of pdftk -- rotate, watermark, encrypt, etc?  I occasionally use pdftk, and I'd like to see a replacement since it's apparently having troubles keeping up to date. Great work!
Scott
if you help me with ideas for the command syntax, sure why not. pypdf supports these things. can you make a complete list of things pdftk does that would be important to port over? and how the command line syntax for these features work. I'll then get working
EDIT: I just had a look at the pdftk man page (didn't think of that when i wrote the above...) there are some things that will not be possible with the current version of pypdf (and frankly i doubt there is going to be a new release):
update_info (because there is no way to write document properties with pypdf, you can read them just fine though)
fill_form  (similar reason. it's just not supported)
the rest will be fine. I'll rename split to burst and then I'll mimic the cat function. the others should be no problem either.
although... i didn't find anything on rotate in the pdftk manual. how should i implement a rotate function? Rotate complete documents or single pages?
cheers
Phil
Last edited by Heller_Barde (2009-08-06 07:08:42)

Similar Messages

  • Help for printing PDF docs.

    Hi, I am trying to print a PDF doc, and a window appear is " Prop Res DLL not loaded "!!!
    I can print everything except PDF file
    Thanks for you help
    PS: What is the meaning of ERROR Prop .....etc
    Ce message a été modifié par: chomedey657

    Hi, I am trying to print a PDF doc, and a window appear is " Prop Res DLL not loaded "!!!
    I can print everything except PDF file
    Thanks for you help
    PS: What is the meaning of ERROR Prop .....etc
    Ce message a été modifié par: chomedey657

  • Best Practice for storing PDF docs

    My client has a number of PDF documents for handouts that go
    with his consulting business. He wants logged in users to be able
    to download the PDF docs for handouts at training. The question is,
    what is the 'Best Practice' for storing/accessing these PDF files?
    I'm using CF/MySQL to put everything else together and my
    thought was to store the PDF files in the db. Except! there seems
    to be a great deal of talk about BLOBs and storing files this way
    being inefficient.
    How do I make it so my client can use the admin tool to
    upload the information about the files and the files themselves,
    not store them in the db but still be able to find them when the
    user want's to download them?

    Storing documents outside the web root and using
    <cfcontent> to push their contents to the users is the most
    secure method.
    Putting the documents in a subdirectory of the web root and
    securing that directory with an Application.cfm will only protect
    .cfm and .cfc files (as that's the only time that CF is involved in
    the request). That is, unless you configure CF to handle every
    request.
    The virtual directory is no safer than putting the documents
    in a subdirectory. The links to your documents are still going to
    look like:
    http://www.mysite.com/virtualdirectory/myfile.pdf
    Users won't need to log in to access these documents.
    <cfcontent> or configuring CF to handle every request
    is the only way to ensure users have to log in before accessing
    non-CF files. Unless you want to use web-server
    authentication.

  • Used to be able to print multiple page pdf files on my HP 7310 all in one and then it stopped and would only print the curent page. This is tedious for long PDF docs. I am on 10.6.8 .

    Used to be ablert to print multiple page pdf files on my HP 7310 all in one printer and now I can only print the current page and therefore it takes forever printing them one by one. I am in version 10.6.8. Tried printing as image using the advanced click and that did not work either.  I have Adobe 9.0 reader installed.

    I tried this earlier, and I tried it again today.  Both times it said "Software Missing!"  and "HP Software required to connect to your printer over the network could not be found on this computer."  But when I tried to install the software (AIO_CDB_Net_Full_Win_WW_130_141.exe, which I downloaded from HP's web site), it wouldn't install, as described above.  In the diagnostic utility, I clicked "skip", and it said "Connection Verified!", and "The printer is connected to the network and the services related to the network connections have been verified and reset to a normal operating state.  Everything appears to work fine at this point.  Please doa test print to verify that the issue is resolved or click Skip to move to the next step."  I clickedthe "Test Print" button, and immediately it popped up a box that said "Test Print Failed."
    I tried again to install the HP software, and it installed, detected the printer, and asked me to select it.  I selected the printer, clicked "next", and it did its network diagnostics.  Then it said "Problem(s) found with your network" and "Problem(s) may exist with the network functions of your printer . . ."  I continued the installation without connecting to the printer.  Then I ran your network printer diagnostic tool again, and got the same result - "HP Software required to connect to youir printer over the network could not be found on this computer."

  • Exception in PreFlight for a PDF doc

    Now I want to call it vb console app to get error, i have following code
    PDFApp = CreateObject("AcroExch.App")
                            PDDoc = CreateObject("AcroExch.PDDoc")
                            PDDoc.Open(fullPathPDF)
                            pdfPage = PDDoc.AcquirePage(0)
                            pdfPoint = pdfPage.GetSize
                            ' Create AV doc from PDDoc object
                            AVDoc = PDDoc.OpenAVDoc("TempPDF")
                            ' Hide Acrobat application so everything is done in silent mode()
                            PDFApp.Hide()
                            app = PDDoc.GetJSObject()
                            app.PreFilght()  //This is throwing exception Public member 'PreFilght' on type '_ComObject' not found.
    how do I get errors from preiflght in vb code?

    'PreFilght'  doesn't exists. You may use Preflight.

  • Report region with column link that opens a pdf doc based on report query

    Hello
    I'm building a report table that displays info about a customer - simple select - and, for each record, has associated column links based on report queries that receive ID as parameter. When clicked, it opens the report in pdf extension. My problem here is how to pass the ID as a parameter to that report query considering i'm using a report table and that there are no items in page 71...
    This is the report query i'm using:
    select initcap(a.customer) customer
    , initcap(a.address) address
    , initcap(a.rep) rep
    , (select initcap(b.city)
    from portal_records b
    where b.contrib=a.contrib
    and b.year=to_char(sysdate,'yyyy')) city
    , (to_char(a.datereg,'dd')||' de '||to_char(a.datereg,'Month')||' de '||to_char(a.datereg,'yyyy')) datereg
    from portal_authorizations_cve a
    where a.id=:P71_ID ???????????????
    I thank in advance all your replies!!

    Hello
    First of all, let me compliment your for your demo application... It's awesome!
    I've looked into your sample (page 15) and, as far as i see, it opens a document saved in a table's column. I don't want the file to be saved there but generated when the user clicks on that particular link... So i still have the problem of how to pass the right ID as a parameter considering there is no page item on that page...
    My javascript knowledge is little so i ask you: when clicking the link, is there any way of opening a window with the url f?p=&APP_ID.:0:&SESSION.:PRINT_REPORT=Authorization_CVE and the ID as a parameter?
    I thank in advance!

  • PDF Docs in browser cannot be found (15;524)

    Have received this following The Adobe/Reader selected for viewing PDF docs in browsers cannot be found at its installed location; it may have been moved or deleted.
    Please reinstall or repair the application (15;524)
    Also on games on facebook some have frozen on download and have been advised that my flash player needs to be updated can you advise how I can do this.

    Moving this discussion to the Adobe Reader forum.

  • TREX search failure for some pdf documents

    Hi,
    TREX search is not getting correct result for some pdf documents. It's not able read the content of some pdf documents. When we search with file name the search result is correct but we are getting "No document excerpt available" message in search result for this file name.
    Let me know where might be the problem for those pdf docs.
    Thanks

    Hi Tatayya Marni,
    1. Can you tell me whether the pdf documents are opening when you are selecting them?
    2. This error comes mostly when you havent included this .extn file (.pdf in your case) to the portal.
    3. Check whether this extn is their in your portal if not make it inclueded and then try making a new index and one new data source where your pdf is residing. Try for one Test.
    You have to include the file extns in System Confi....UI Interface...
    Hope this will resolve your issue.
    Regards
    Piyush Bhurangi

  • DW 8 and PDF Docs

    I'm having trouble getting pdf docs to load in Dreamweaver 8.
    All of the links looks fine and it loads on another machine just
    fine. Anyone have any suggestions. Any help will be greatly
    appreciated.

    What happens when the second machine click on a PDF link?
    If everything else is accurate, as you say, then the prime
    culprit is
    Acrobat. Depending on the machine, the browsers, the plug-in,
    and the phase
    of the moon (which is nearly full), different machines can
    handle PDFs
    differently.
    Does the PDF not open in the browser?
    Does it open in Acrobat Reader?
    Does it open in full version Acrobat?
    Does nothing happen?
    MD
    murphy#1 wrote:
    > It works on my primary machine where I actually put the
    website
    > together. Copied files and folders (so everything would
    match) and
    > recreated links on secondary machine. All links work
    except for the
    > pdf docs and the links are identical. Checked and
    doubled checked
    > those. I'm thinking about reinstalling Adobe on the
    secondary
    > machine and printing those files again through the
    distriller and
    > save them in the correct file. Also, will "dump" the
    cookies on the
    > secondary machine as you suggested. Site has not been
    published to
    > the web yet and won't until secondary machine works too.
    (I have
    > been known to have a ID10T error frequently myself.)

  • Asha 302 - PDF/Doc/Xls reader issue

    I have downloaded ezreader from store for reading pdf/doc/xls files. After installation of the application i cannot open the file tree. As i select open from the ezreader window it shows c: and e: drives, but selecting any of the drives will return the screen to ezreader home window. Can somebody help on this. Without a file opening option the the mail functionality of Asha302 is of no use.

    I also have this problem, & its a pain. Im pretty well computer illiterate & have used nokia for online use due to their ease in the past but they've really dropped the ball with the Asha 302. I had an E63 last, & had no problem with it. I had a lot of downloaded files saved to the memory card, & when the E63 recently stopped working I got the asha as I was told it was an upgrade (yeah right). I transfered the sim & memory to the asha, & now, not only cant I open pdf files etc online, but I cant access the saved files, including files I recieved attached to emails that are saved to the memory card. I have downloaded the ezereader with the same results others have describe,,, it dont work. Thinking I may have been doing something wrong due to lack of knowledge, I went to the Optus shop where I bought the POS asha302 & asked for them to assist. Well after wasting over an hour & a half while they unsuccessfully tried I was told the best thing to do would be to buy a card reader so I could at least access the files that are on the memory card via computer. Still means I've been ripped off on the phone, doesnt it

  • I have had a trial version of Acrobat X1 Pro - I have decided not to buy at this stage - for some time it has been conflicting with opening PDF docs after saving as from word 2007 - I uninstalled Pro X1 and now when I save as from word 2007 to PDF it will

    Can anyone help with this - do I have to uninstall Reader and then reinstall?

    I have had a trial version of Acrobat X1 Pro - I have decided not to buy at this stage - for some time it has been conflicting with opening PDF docs after "saving as" from word 2007 - I uninstalled Pro X1 and now when I "save as" PDF from word 2007 to PDF it will save the document as a PDF but will not open the document to display after publishing - I have to got to where the file has been saved to view the new PDF document - this is really annoying - do I have to delete adobe reader and reinstall it - adobe needs to look at this conflict with acrobat pro as I have even gone it to properties and tried to have adobe reader as the default PDF program - the main issue is that I cannot view the PDF after publishing it from word 2007

  • I have Adobe Photoshop Elements 10 plus I create PDF files for work some are scan pdf docs. When I install Photoshop Elements 10 it DOES convert all the PDF files to Photoshop Elements-10 Docs. it even changes and shows the PSE-10 Icon. So I am alway inst

    I have Adobe Photoshop Elements 10 plus I create PDF files for work some are scan pdf docs. When I install Photoshop Elements 10 it DOES convert all the PDF files to Photoshop Elements-10 Docs. it even changes and shows the PSE-10 Icon. So I am alway installing PSE-10 or uninstalling it. If I send the  PDF file that has been automatically converted to a PSE-10 the person I send the file to can not open it because they do not have PSE-10. What can I do to stop PSE-10 from converting my PDF files? Don't tell me to upgrade PSE-10 I tried their on-line program and  it is too advance for a hobby photographer like myself and their Help Desk is impossible to reach.

    Hi,
    Can you please share the logs?
    You can use the Adobe  Log Collector tool (Log Collector Tool) and share the corresponding zip file @ [email protected]
    Thanks,
    Shikha

  • Does Adobe Reader for iOS have the ability to open inbedded links to additional PDF docs?  If not, then what would be the best way to use these already created PDF's?

    Does Adobe Reader for iOS have the ability to open inbedded links created with Acrobat Standard to additional PDF docs?  If not, then what would be the best way to use these already created PDF's on an I Pad?

    driddy61,
    As of June 2014, none of the Adobe Reader mobile products support the hyperlink action for opening a separate PDF document.
    Adobe Reader for iOS
    Adobe Reader for Android
    Adobe Reader Touch for Windows 8
    In addition, the Reader mobile products do not open multiple windows/documents simultaneously, which would make the navigation between PDF documents nearly impossible. (Once a hyperlink takes you to a different PDF document, you have no way to go back to the original PDF document.)
    The only Adobe Reader product that fulfills your department's requirements is Adobe Reader XI (mostly for Windows/Mac desktop/laptop computers).  Acrobat Pro and Standard are paid products.
    Because you are in search of a less expensive device for your department, you could get a Windows tablet instead of a Windows desktop/laptop computer. Microsoft Surface Pro (that you've mentioned in your previous reply) is just one example.  You can also find other less expensive Windows tablets.
    Tablets
    However, please keep in mind that there are two different types of Windows tablets running two different operating systems.
    (a) A Windows tablet with an Intel-based processor running Windows 8.1 Pro
    Example: Surface Pro 3
    You can install and run traditional desktop apps (e.g. Adobe Reader XI) and new Windows Store apps ("Modern" or "Metro-style" apps).
    (b) A Windows tablet with an ARM-based processor running Windows RT 8.1
    Example: Surface 2
    You can only install and run Windows Store apps (e.g. Adobe Reader Touch) but not traditional desktop apps like Adobe Reader XI.
    In general, type (b) tablets are more affordable than type (a) tablets.  However, if you want to run Adobe Reader XI, you do need to check the technical specification of each tablet and make sure the following conditions are met.
    Processor: Intel
    Operating system:  Windows 8/8.1 or Windows 8/8.1 Pro, not RT
    Hope this helps you choose the right device for your department.  Please let us know if you have any questions about system requirements or supported features in the Adobe Reader products.

  • In MAC, I want to change document size from 8.5X11 to 18X24 to create a poster to print through Staples. I created the doc originally in WORD, changed the size in WORD, converted to PDF doc. But PDF doc is still in 8.5X11. Read ADOBE support help info. Te

    In MAC, I want to change document size from 8.5X11 to 18X24 to create a poster to print through Staples. I created the doc originally in WORD, changed the size in WORD, converted to PDF doc. But PDF doc is still in 8.5X11. Read ADOBE support help info. Terls me to change size in application rather than printer. BUT ACROBAT Pro does not give me a page set up option in FILE. I can only find one in the printer dialog box. Help!

    from the FAQs on Staples website:
    I have a file that I know is a PDF, but the website claims it is not in a PDF format. What should I do?
    Check to see that the file has the .PDF extension. Also, check that the filename does not have any special characters such as an ampersand (“&”).
    Regarding your measurements set to centimeters rather than inches; is it just in MS Word?
    Or does it occur in all other applications.
    Check your Work preferences first:
    If it is happening in all your applications, check your Mac OS System Preferences.

  • Pdf doc  is required for  B2C  CRM 2007  WEB SHOP

    Hello Experts,
    We have a scenario where (pdf doc ) is required for  b2c Header level and tem level in CRM 2007 web channel.
    we  have set up a product and product catalog and maintained a PDF document in the HEADER AREA--
    Documents Tab--BDS_PDF folder.
    After intial replication Documents was published in the following path 
    *catalog/DE DE161EE3D35DAAF19F1E000C29D94525.pdf     *
    we are able to find images and pdf in the  index server but unable to  see the same in the web shop.
    Did I miss some thing ? Do I need to do any other settings in XCM or in JSP?
    Thanks in advance
    Namitha Verma

    Hello  namitha Verma
      I am facing the same problem, are  you able solve the problem kindly let me know .     
    Regards
      Kiran Posanapalli

Maybe you are looking for

  • I will Enable Gdesklets if it Kills Me!

    Hello, I built w/ Arch about a week ago and love it!  I had fun setting the system up under RAID with the help of a buddy who's a longtime Arch fan [see script output below for post-irrelevant stats].  I'm ecstatic with the performance of the system

  • .AVI conversion to Apple TV issue

    Hi folks. There are different replies on the message board - somewhat confusing to go thru them all and not all of them seem to be exactly like mine. I have a bunch of .avi movies I'd like to convert and put in iTunes for my Apple TV. Most of the mov

  • ORA-06571: Function TEST_FUNC does not guarantee not to update database

    I have created a very simple C function in a .dll. Everything compiles OK, and I can create the lib, package, and procedure. But when I try to use it from sql*plus: SQL> select test_func(1) from dual; select test_func(1) from dual ERROR at line 1: OR

  • Pp and pm integration for capacity planning

    Dear ALL, Just now i started integration testing between pp and pm. As we know PM equipment is work centre for PP.Suppose there is 1 machine MAC-1 for which i created maint. order  also i ticked Sys. condition in MO and by using  the same  machine ,

  • Best way to access JNDI configured DataSource from outside container

    I'm not sure if this is the right forum category, but it seemed like a good place to start. I have a DB2 database on an OS/390 mainframe that is configured as a JNDI datasource in WebSphere (WSAD) 5.1.x. I can get connections and do database transact