Automatically merging PDF files based on conditions

Hello
I need to be able to setup a job with some basic parameters to automatically merge multiple PDF’s.
For the following group of files:
1_a.pdf
1_b.pdf
2_a.pdf
2_b.pdf
3_a.pdf
3_b.pdf
I need to be able to append the pages in the "_b" files to the start of the "_a" files using the number at the start as the match identifier.
I think the safest method would be to output a new "_c" file and maintain the original files.
I can see functionality to manually do this or merge a group of files into 1 but I need to automate this to deal with larger volumes.
Do I need additional software?  Any suggestions on how to do this?
Appreciate any help.
Cheers

You're looking for a PDF Parser or PDF miner tool (PDFminer) as a starting framework, and you'll almost certainly be writing custom code around that as parsing a text file that's effectively free-form and originating from multiple different sources almost always (always?) involves writing customized processing code and an on-going series of tweaks as the suppliers of the PDF change their ticket formats.  (Even apparently-simple details such as the time and date formats, for instance, can vary by geography and language and by supplier, and can derail common processing.)
In some cases that I can envision, it'd be entirely possible that the data you're after is actually located in an embedded image and not in text that can be parsed.
The best approach is to get folks to send you JSON or XML or some other format intended for interchange, and avoid the whole mess that is parsing or mining a printer-oriented format.
The other obvious option is to use something like Amazon's Mechanical Turk or some other explicitly outsourced help.  Depending on how often the formats change and how many of these PDF files you're dealing with and how varied the formats are, sometimes throwing staff at the problem can be the most cost-effective approach.

Similar Messages

  • Automate combining PDF files based on a text list of URL's

    I'm wondering how I could automate the combination of PDF files into one, using a list a URL's in a text file (one URL per line, each pointing to a PDF file).
    I need to periodically rebuild a combination of PDF files, with the URL's staying the same, but the files behind getting updated. I'm not a Javascript developer, so I'm doing that "manually" by copying and pasting each URL into the "Combine Files" window, but it would be great if I could get Acrobat to read them off a text file.
    Thanks!
    Corrected typo

    I have developed a script that can do just that, but with locally saved PDF
    files:
    http://try67.blogspot.com/2009/10/combine-pdf-files-from-text-list.html
    It might be possible to adjust it to do what you describe. If you're
    interested, contact me personally by PM or at try6767 at gmail dot com.

  • Automatically renaming pdf files based on excel data

    I am creating pdf certificates using variable data from excel files with InDesign.  This creates a multipage pdf file with a different persons name on each page.  Then end result needs to be individual pdf files named for each person.  I can extract the single pages out of the main pdf ending up with however many files all named the same thing besides a number at the end of the common file name.  Question:  Is there a automated process of renaming the individual files using the data from the excel file?

    You're looking for a PDF Parser or PDF miner tool (PDFminer) as a starting framework, and you'll almost certainly be writing custom code around that as parsing a text file that's effectively free-form and originating from multiple different sources almost always (always?) involves writing customized processing code and an on-going series of tweaks as the suppliers of the PDF change their ticket formats.  (Even apparently-simple details such as the time and date formats, for instance, can vary by geography and language and by supplier, and can derail common processing.)
    In some cases that I can envision, it'd be entirely possible that the data you're after is actually located in an embedded image and not in text that can be parsed.
    The best approach is to get folks to send you JSON or XML or some other format intended for interchange, and avoid the whole mess that is parsing or mining a printer-oriented format.
    The other obvious option is to use something like Amazon's Mechanical Turk or some other explicitly outsourced help.  Depending on how often the formats change and how many of these PDF files you're dealing with and how varied the formats are, sometimes throwing staff at the problem can be the most cost-effective approach.

  • Exporting to Separate PDF files based on Group

    Post Author: Tanya Sherin
    CA Forum: Exporting
    Hello,
    I have a report that needs to exported into separate pdf files based on one of the groups already established in the report. I would like to automate this process as much as possible because the report size. Has anyone encountered this need?
    Thanks for your assistance.
    Regards,
    Tanya

    Post Author: synapsevampire
    CA Forum: Exporting
    You'll need CR XI or a third party solution.
    Here's one I suggest:
    http://www.milletsoftware.com/Visual_CUT.htm
    Contact Ido (owner) for a free trial and confirmation that it meets your needs.
    -k

  • How to merge pdf files into one pdf file?

    In E-Rec we need the capability to mass printing for correspondance. There i want to merge pdf files for candidates into one pdf file. I am doing the following:
    Data: lv_document1           TYPE rcf_s_cs_document_content.
    LOOP AT activity_object_tab INTO ls_activity_object.
        lo_act_corr ?= ls_activity_object-activity.
         CALL METHOD lo_act_corr->process_document
           EXPORTING
             channel  = 'FRONTEND'
           IMPORTING
             document = lv_document1.
      ENDLOOP.
    Now lv_document1 contains the pdf file in  lv_document1-DOCUMENT_X which is of type RAWSTRING.
    This works for one candidate, in this case the pdf for last candidate will be displayed.
    But how would i append in the loop or what would i need to do display all candidates in one pdf.
    Any help will be appreciated.
    Thanks

    File>Create PDF>From File.
    Please post followups in the Acrobat forum.
    Bob

  • Problem in Merging PDF files

    Need to print batches of PDF.<br />The command copy *.PDF > LPT1: (redirected by NET USE) doesn't work.<br />On the printer I get an error message. All files are sent in one file.<br />.<br />I guess this is related to the PDF header which is not set correctly for the second PDF copied.<br />I mean, according to the PDF Reference guide, the proper syntax for a PDF file header is for the "%PDF-<version>" comment to be the first line of the PDF file. The recommended second line is a comment line with four binary characters to indicate the file contains binary information.<br />.<br />Acrobat requires only that the header appears somewhere within the first 1024 bytes of the file.<br />.<br />I guess the Rip is not able to convert PDF into PS before its interpretation as the second header exceeds 1024. Even the combined file has %%EOF before each PDF Header, this is probably not enough to merge them without problem.<br />.<br />Did you experience the same ?<br />Thanks for your comments.<br />Regards.<br />Franck

    You can build your own PDF merge utility using the PDF Merge and split libraries for .NET from from http://www.dotnet-reporting.com or http://www.winnovative-software.com .
    You can use it to merge PDF files, html files, text files and images,
    set the page orientation, compression level and page size.
    All this can be accomplished with only a few lines of code:
    PdfDocumentOptions pdfDocumentOptions = new PdfDocumentOptions();
    pdfDocumentOptions.PdfCompressionLevel = PDFCompressionLevel.Normal;
    pdfDocumentOptions.PdfPageSize = PdfPageSize.A4; pdfDocumentOptions.PdfPageOrientation = PDFPageOrientation.Portrait;
    PDFMerge pdfMerge = new PDFMerge(pdfDocumentOptions);
    pdfMerge.AppendPDFFile(pdfFilePath);
    pdfMerge.AppendImageFile(imageFilePath);
    pdfMerge.AppendTextFile(textFilePath);
    pdfMerge.AppendEmptyPage();
    pdfMerge.AppendHTMLFile(htmlFilePath);
    pdfMerge.SaveMergedPDFToFile(outFile);

  • Merge PDF files into (preserving layer info)

    Is it possible to combine several PDF files into one and preserve the layers of each source PDF into the merged one ?

    ... not sure SSIS is the correct tool for you unless you're already using it or have other tasks to perform besides merging pdf files.
    There is no functionality like that available out of the box, but if you have the right (third party) .Net assemblies then you could use SSIS (Script Task/Component). An other option is to call a tool/batch file via the execute process task.
    Please mark the post as answered if it answers your question | My SSIS Blog:
    http://microsoft-ssis.blogspot.com |
    Twitter

  • Merging PDF files through a script

    Hi everyone,
    I want to merge a couple of PDF files
    through a script, because i have to do
    it for a good number of them every time
    I get them sent.
    I was wondering, does Reader have a
    command that can be called in the command
    line in order to merge this pdf files through
    a script.
    Ted.

    Reader can do very few things besides being a reader. If you wanna merge PDF files, use a PDF merger.

  • How to Automate Compressing PDF files in a Watched Directory

    I want to automatically compress PDF files as they get placed into a watched directory. I would like the compressed files placed in another directory and have the original file either deleted or place in another directory. If Acrobat Pro X can do this, how do I set it up.

    I noticed that Acrobat Distiller will watch an input directory and move files to an output directory. I tried using Distiller to compress a PDF file, but all it did was move the PDF file to the out directory without doing any compression. Is there any way that Distiller could be configured to do what I want?
    I also called the number listed on the Adobe web site for LiveCycle products. They told me they don't sell their product to non government companies and they referred me to 4 Point to get further info. I called them and only got a voice mail. I'm waiting for a call back.
    I found a product that does exactly what I need. It is called PdfCompressor Professional Edition. But, it costs more than what I want to spend. I'm guessing that LiveCycle will cost more than I want to spend also. If there are any other products that cost less than $150, I would like to know what they are.

  • Getting NULLPointerException while merging pdf files using PDFDocMerger

    Hi,
    I am using BI Publisher 11.1.1.7 Version Jar Files to use PDFDocMerger Class in Jdeveloper and my requirement is,
    need to merge pdf files taking from the third party server.So, I am passing url as an input to PDFDocMerger(As per BIP documentation we can pass url as an input to PDFDocMerger using Object Class) but I am getting NULLPointerException.
    I am using below code:
    import java.io.File;
    import java.io.FileOutputStream;
    import oracle.xdo.common.pdf.util.PDFDocMerger;
    public class PDFMerger
    public PDFMerger() {
         public void pdfDocumentMerger() {
    try {
    System.out.println("Testing1");
                      Object[] f = new Object[2];
                f[0]= ("http://docs.mytxi.com/downloads/Invoices/OE/06082013/02599553.PDF");
                f[1]= ("http://docs.mytxi.com/downloads/Invoices/OE/06082013/02599553.PDF");
                FileOutputStream output;
                output = new FileOutputStream("c:\\docs\\OutputPDF.pdf");
         System.out.println("Testing3" + f[0]);
                PDFDocMerger pdfMerger;
                pdfMerger = new PDFDocMerger(f, output);
         System.out.println("Testing4");
          pdfMerger.process();
         System.out.println("Testing5");
    pdfMerger = null;
    output.close();
         }//try
          catch (Exception e)
          System.out.println(" Exception " + e.getMessage()) ;
          e.printStackTrace() ;
          System.out.println("End") ;
       public static void main(String[] argv) {
    PDFMerger xmlPublisher;
    xmlPublisher = new PDFMerger();
    xmlPublisher.pdfDocumentMerger();
    Error:
    Testing1
    Testing3http://docs.mytxi.com/downloads/Invoices/OE/06082013/02599553.PDF
    Testing4
    Jul 22, 2013 10:22:22 PM oracle.xdo.common.pdf.util.PDFDocMerger$PDFUtility <init>
    SEVERE: The first input to be merged has problem and caused stopping merging
    Jul 22, 2013 10:22:22 PM oracle.xdo.common.log.Logger log
    WARNING: java.lang.NullPointerException
    at oracle.xdo.template.pdf.util.PDFObjectDictionary.<init>(PDFObjectDictionary.java:36)
    at oracle.xdo.common.pdf.util.PDFDocMerger$PDFUtility.checkIfDupFields(PDFDocMerger.java:2438)
    at oracle.xdo.common.pdf.util.PDFDocMerger$PDFUtility.processWithNoOut(PDFDocMerger.java:2932)
    at oracle.xdo.common.pdf.util.PDFDocMerger.generateMergedPDF(PDFDocMerger.java:623)
    at oracle.xdo.common.pdf.util.PDFDocMerger.mergeDocs(PDFDocMerger.java:551)
    at oracle.xdo.common.pdf.util.PDFDocMerger.process(PDFDocMerger.java:506)
    at xxtxi.oracle.apps.ar.iex.server.PDFMerger.pdfDocumentMerger(PDFMerger.java:24)
    at xxtxi.oracle.apps.ar.iex.server.PDFMerger.main(PDFMerger.java:40)
    Exception java.lang.Exception: Document #2 looks corrupted.
    oracle.xdo.XDOException: java.lang.Exception: Document #2 looks corrupted.
    at oracle.xdo.common.pdf.util.PDFDocMerger.process(PDFDocMerger.java:510)
    at xxtxi.oracle.apps.ar.iex.server.PDFMerger.pdfDocumentMerger(PDFMerger.java:24)
    at xxtxi.oracle.apps.ar.iex.server.PDFMerger.main(PDFMerger.java:40)
    End
    Process exited with exit code 0.
    It would be great, if any one provide solution on this issue.
    Thanks,
    Irfan.

    One of the merged PDF was edited by using the "Adobe Lifecycle Assembler" tool and i was able to open that file successfully in my machine.
    Is bi publisher API compatible with such files?
    It would be great, if any one can throw some light on this issue.
    Thanks,
    Indira

  • Can I create a custom table of contents and link to other .pdf files based on responses to a form?

    Hey Everyone! First post ever, so bear with me:
    I'm trying to create a streamlined method to use a form  to let myself and others add information and select certain options to put together a custom table of contents. Basically, I would like to have a form with a series of text fill and single/multiple choice options that will automatically populate a table of contents based on the selections and will link to other .pdf files that are associated with the selections. I was hoping this would be possible with a form, but I'm relatively new to the function of the software as a whole and my research came up short. Any suggestions on how to start are more than welcome, and if I wasn't quite clear enough I would be happy to elaborate.
    Thanks for your time!

    You would need to search for other PDF creation software that can accomplish what you desire.
    There are many cheaper  PDF creation alternatives other than Adobe's Acrobat Pro software.
    Also, try doing a web search under these terms to see if you can find an app/software/solution that may work for you.
    How to create table of contents in PDF files

  • Help renaming pdf files based on internal content

    I work for a company that has thousands of E-tickets coming in daily, weekly, monthly, etc..
    These tickets come in bafhakfbaifh.pdf and we have to manually rename them or print them all out and then sort through them and put them in order.
    What I would like to do is:
    1. Split any pdf's that have more than one page or "ticket" in my case. I know how to do this with automator easily, but I'd love to keep it all in one program.
    2. Search the file for Event Name (i.e. Madonna)
    3. Search the file for Date of event (August 12, 2012)
    4. Search the file for Section, Row, Seat (124 3 12)
    5. Rename the file based on content found (Madonna August 12 2012 124 3 12.pdf)
    6. Move from original download folder to organized folders based on artist/team.
    7. Automatically print in alphabetical or some sort of designated order.
    Any help is muchly appreciated. So far, I found a PC program called A-PDF rename, but it is not automated enough to be practical. Hazel is awesome at OCRing the pdf and moving from folder to folder, but does not do enough.
    Any help is muchly appreciated.
    Thank you.

    You're looking for a PDF Parser or PDF miner tool (PDFminer) as a starting framework, and you'll almost certainly be writing custom code around that as parsing a text file that's effectively free-form and originating from multiple different sources almost always (always?) involves writing customized processing code and an on-going series of tweaks as the suppliers of the PDF change their ticket formats.  (Even apparently-simple details such as the time and date formats, for instance, can vary by geography and language and by supplier, and can derail common processing.)
    In some cases that I can envision, it'd be entirely possible that the data you're after is actually located in an embedded image and not in text that can be parsed.
    The best approach is to get folks to send you JSON or XML or some other format intended for interchange, and avoid the whole mess that is parsing or mining a printer-oriented format.
    The other obvious option is to use something like Amazon's Mechanical Turk or some other explicitly outsourced help.  Depending on how often the formats change and how many of these PDF files you're dealing with and how varied the formats are, sometimes throwing staff at the problem can be the most cost-effective approach.

  • Merge PDF files in java

    Hi,
    I am having several pdf files in a directory and want to merge them all into one pdf. The problem is, I want a java programm having it done for me automatical.
    If anyone has ever done something like this, could you give me sample java code?
    Millions thanks in advance1

    Try iText PDF library
    http://www.lowagie.com/iText/tutorial/ch13.html#tools

  • How to create additional Line in file based on condition available as part of ZINVOIC02 Idoc segment

    Scenario Details:
    Receiving Zinvoic02 Idoc in PI. Idoc to file translation creates comma separated file with .csv extn
    The logic was kept in such a way that how many E1EDP01 (items) are available in IDoc that many no of records will be created in csv file.
    The file logic for some the fields is as below:
    No of records
    InvNumber
    InvDate
    CusNumber
    LineitemDesc
    Tax1Type
    Tax1%
    for 1st E1EDP01
    E1EDK01-BELNR
    E1EDK03-DATUM
    E1EDK01-PARTN
    Populate when E1EDP04/MSKWZ=O2 or O4 with E1EDP19/KTEXT
    Hardcode when E1EDP04/MSKWZ=O2 or O4
    Sum all E1EDP04 /MSATZ when E1EDP04/MSKWZ=O2 or O4
    for 2nd E1EDP01
    E1EDK01-BELNR
    E1EDK01-DATUM
    E1EDK01-PARTN
    same as above
    same as above
    same as above
    for 3rd E1EDP01
    E1EDK01-BELNR
    E1EDK03-DATUM
    E1EDK01-PARTN
    same as above
    same as above
    same as above
    Additional Line to be created when one or more of E1EDP01 is having E1EDP04/MSKWZ = O3
    same as above
    same as above
    same as above
    Hardcode "REIM for USE TAX"
    Hardcode ""
    Hardcode ""
    Now we have got addition requirement to add a new lineitem when tax code is equal to O3 for any of the E1EDP01.
    Is it possible to create additional lineitem based on condition. If yes, please share what should be the approach.
    How we can create the additional lineitem?
    Currently we are using E1EDP01 to do context handling.
    The target structure is :
    MT_FILE
         INVOICE     0..unbounded
              InvNumber     0..1
              InvDate          0..1
              CusNumber     0..1
              LineitemDesc     0..1
              Tax1Type          0..1
              Tax1%               0..1

    Hello,
    Please add one extra field in the data structure of the target mapping and let its occurrence be 0..unbounded under the root node 'MT_ADP_Invoice'.
    Apply the condition, if tax code MSKWZ (with its context changed to E1EDP01) equalsS to '03', then map it to the newly created target field whose occurrence is 0..unbounded.
    This will then create an additional field which is your requirement.
    The above is one way.
    But if you want to have the same target field name as ADP_File appended for tax field being '03'.
    Then in that case you can you two message mapping for one common operational mapping / interface.
    In the first message mapping you need to have one target data structure created with the source data structure remaining the same as the one shown by you in scrn shot. Now this target data structure will be similar to the source, except that you need to add one more field at the end(name different from other fields) in target (whose occurrence should be 0..unbounded), and it needs to be mapped to E1EDP01 provided the tax code field MSKWZ(its context changed to E1EDP01) equalsS to constant '03'.
    In the second message mapping you need to map the target structure of previous message mapping to the actual required structure. The newly added field should be mapped to ADP_File of your final target structure.
    This will then create the same structure as required.
    Note : Please change the occurence of ADP_File to 0..unbounded.
    Regards,
    Souvik

  • While trying to merge pdf files which are having the version 1.4 throws err

    As per my business need i need to merge some pdf files. Among those files some are having the version above 1.4. I am using the api "oracle.apps.xdo.jar ".
    While trying to merging the pdf files its throwing the exceeption like
    " oracle.apps.xdo.XDOException: oracle.apps.xdo.template.pdf.exception.FatalException: The template seems to be in either corrupted one or newer version than PDF1.4 ".
    If all of the files are having the below 1.4 version then working fine. But any one of the file have above 1.4 version then it is throwing the above error.
    Please can i have the solution for this problem.

    Thanks for the response.
    Sorry.there is no option to feed again into <1.4 version. cause those documents are already existed in database and i have to use the same documents to merge.
    So is there any alternate solution on the same .

Maybe you are looking for

  • Default output type in purchase order

    Dear All i have short  and easy question: how can i set one output type(like NEU) as default in purchase order as when create P.O output type automatically comes from customizing? best regards R.T

  • Export to excel pivot table is incorrect

    Hi All, I have a problem when export Discoverer Plus report to Excel Pivot table... My problem is calculation field (ex. Average salary per employee) in pivot table shown incorrect data. May someone has any idea for this problem and how to solve it.

  • Embed a Quicktime video in a website

    Hi All I just exported some videos using the "export to web" function on QT 7.6 for the Mac. In the "read me" file, it gives instructions on how to copy and paste code into an html document so the video will preview and play embedded in the web page.

  • Optimize application does not empty FAC2 and FACTWB tables

    Hello everyone, I am currently using OutlookSoft 4.2.338 and recently came across the following problem: Optimize application fails to empty FACTWB and FAC2 tables, even though it says "Successfully finished" -Whenever I run an optimize process direc

  • How to reset the [b]Spy-Servlet[/b] aggregates

    Hi, with the Spy-Servlet (/dms0/servlet/Spy) we can view performance metrics. The data used for the metrics is from the start of the oc4j until I call the Spy-Servlet. It would be nice to have the possibility to reset the collected data to get a view