Applescript: Extract data from pdf

I am trying to extract a string from a pdf, and then rename the pdf with that string. The string varies in length, but always comes between "Name:" and "ID:"
Ideally I could drop a pdf with multiple pages, and it would extract the individual sheets and rename as new documents with this string.
From another thread, I've tried using this shell script (the Citing Patent and Classifications were the delimiters):
for f in "$@"
do
echo "$f" >> ~/Desktop/Patent01.txt
cat "$f" | sed -n '/Citing Patent/,/CLASSIFICATIONS/p' | sed 's/CLASSIFICATIONS//p' >> ~/Desktop/Patent01.txt
done
Thanks!

Replying to self with progress, and maybe someone can help.
Drop the single page pdf onto the script; it calls an automator which converts the pdf to plaintext; applescript then reads the txt file for the data I want; last step is to rename the original file with the string that I want (a name).
I'm getting error -10006, can't rename. Below is the script and a screenshot of the automator.
on open fileList
tell application "Finder"
--set thePDFfile to (choose file)
repeat with thePDFfile in fileList
set theInfo to info for thePDFfile
set theFile to name of theInfo
set qtdstartpath to quoted form of (POSIX path of thePDFfile)
set workflowpath to "/Users/Galen/PDFextract/NoInput.workflow"
set qtdworkflowpath to quoted form of (POSIX path of workflowpath)
set command to "/usr/bin/automator -i " & qtdstartpath & " " & qtdworkflowpath
set output to do shell script command
--do shell script "automator /Users/Galen/PDFextract/NoInput.workflow"
set AppleScript's text item delimiters to "
set thetext to text items of (read "/Users/Galen/Desktop/ExtractOutput.txt")
set studentName to item 2 of thetext
set AppleScript's text item delimiters to " "
set thetext to text items of studentName
set lastName to item 2 of thetext
set firstName to item 3 of thetext
set lastFirst to (lastName & " " & firstName)
--return lastFirst
set AppleScript's text item delimiters to ""
set the name of theFile to ((lastFirst as text) & ".pdf")
end repeat
end tell
end open

Similar Messages

Extracting data From PDF to Excel

I have inherited a large library of PDF invoices which I need to extract data from into excell - or some other spreadsheet. The other option is to open up thousands of pdf documents and run the numbers by hand which is just dumb. I am new to acrobat and an entire afternoon of trial by fire / google hasn't gotten me very far - so even pointers in the right direction are appriciated.
Ideally I would like to tell Acrobat what data is important on each document (can I use the form tool to do this?), extract the data from the relevant files (batch processing tool I presume?), compile the data and extract it to a CSV.
It looks like the functionality is here I am just unsure how it all needs to fit together. Any Suggestions?

Hi,
There is software out there that will convert PDFs to excel... look for ABBYY or Able to extract... If you have a lot of files that are the same merge them together before using the software. Remember that if the data is created from a scanned image then the results will only be as good as the ability of the OCR engine contained in the software. You can play with the software to create tables, etc...

Extract data from PDF

Hello,
I am using Adobe Acrobat Professional 6.0 to create a bunch of survey questionnaires for respondents to fill out using an off-line computer. I used check boxes and radio buttons to set up the forms and assign output values. However, I couldn't figure out how to export the response values into one single file (preferably .csv). Does anyone know how to make that happen? Thanks in advance.

Thanks for the reply!
So, how do I get data from fdf to csv?
I have participants coming in to fill out the questionnaire in my lab and we save their files. For example, for participant #001, the PDF file was saved as Questionnaire_001, and participant #002 as Questionnaire_002, and etc. If say I have 50 participants I will have 50 PDF files stored in the computer. This is the method used by the guy worked here before me and he somehow was able to extract data from those saved files.
I know in the Adobe Acrobat Professional 6.0 I can get fdf file by going Advanced--> Forms--> Export Forms Data. But how do I get a csv file that has all 50 people's responses, with each column a response field (Q1, Q2, Q3, and etc) and each row a participant?
Thanks a lot.

Extract data from PDF to SAP

Hi all,
   I have created an Offline form in sfp Transaction and emailed successfully .
     And now that Receiver has sent me the form with the filled pdf form to my outlook id ( bcas my mail id is being configured in SMTP) .
   Now I want to Update a table with that filled values in the received pdf..
1) What r all the steps should i follow now?
2) What for guided procedures or workflow?
3) Do i have the option to receive the mail to my Business       workplace inbox instead my personal mail id?
i went thru all the related threads in this topic. But could not get the Idea..
If someone knows please suggest me ..
Thank you.
Rgrds.
Edited by: Deepa K on Feb 25, 2008 1:30 PM

Hi,
When you create an abap object based on standard interface IF_INBOUND_EXIT_BCS you will got 2 method .
First here is the attributes i define in my object , all are Private instance attributes.
XML_DOCUMENT type ref to IF_IXML_DOCUMENT.
CONVERTER type ref to CL_ABAP_CONV_IN_CE,
ATTACHEMENT_ATTRIBUTES type BCSS_DBPA,
ATTACHEMENT_FILE type BCSS_DBPC ,
BINARY_FILE Type XSTRING,
FORMXML      Type STRING,
PDF_FORM_DATA Type XSTRING ,
XML_NODE Type Ref To IF_IXML_NODE,
XML_NODE_VALUE Type STRING.
Set this code in method CREATE_INSTANCE
* Check if the singleton instance has already
* been created.
IF instance is INITIAL.
CREATE OBJECT instance.
ENDIF.
* Return the iTE nstance.
ro_ref = instance.
The other method is where the mail will be process
here is a sample code for method PROCESS_INBOUND
* Data definition :
DATA : pdf_line    TYPE solix .
DATA : nb_att(10) TYPE n.
DATA w_part TYPE int4 .
FIELD-SYMBOLS : <pdf_line> TYPE solix.
** Set return code so no other Inbound Exit will be done.
e_retcode = if_inbound_exit_bcs=>gc_terminate.
TRY .
* Get the email document that was sent.
      mail = io_sreq->get_document( ).
* Get number of attachement in the mail
* If number is lower than 2 that means no attachement to the mail
      nb_att = mail->get_body_part_count( ) - 1.
      CHECK nb_att GT 0.
      CLEAR w_part.
* Process each document
      DO nb_att TIMES.
        w_part = sy-index + 1 .
        CLEAR xml_document .
* Get attachement attributes
        attachement_attributes =
           mail->get_body_part_attributes( im_part = w_part ).
        IF attachement_attributes-doc_type IS INITIAL.
          DATA w_pos TYPE i .
          FIND '.' IN attachement_attributes-filename
            IN CHARACTER MODE MATCH OFFSET w_pos.
          ADD 1 TO w_pos.
          attachement_attributes-doc_type =
             attachement_attributes-filename+w_pos.
        ENDIF.
* Get the attachement
        attachement_file = mail->get_body_part_content( w_part ).
* If attachement is not a binary one ,
* transform it to binary.
        IF attachement_attributes-binary IS INITIAL.
          CALL FUNCTION 'SO_SOLITAB_TO_SOLIXTAB'
            EXPORTING
              ip_solitab = attachement_file-cont_text
            IMPORTING
              ep_solixtab = attachement_file-cont_hex.
        ENDIF.
* Convert the attachement file into an xstring.
        CLEAR binary_file.
        LOOP AT attachement_file-cont_hex ASSIGNING <pdf_line>.
          CONCATENATE binary_file <pdf_line>-line
             INTO binary_file IN BYTE MODE.
        ENDLOOP.
        TRANSLATE attachement_attributes-doc_type TO UPPER CASE.
* Process the file depending on file extension
* Only XML and PDF file is allow
        CASE attachement_attributes-doc_type .
          WHEN 'PDF'.
* Process an interactive form
            me->process_pdf_file( ).
          WHEN 'XML'.
* Process XML data
            me->process_xml_file( input_xstring = binary_file ).
          WHEN OTHERS.
* Nothing to do , process next attachement
        ENDCASE.
    CATCH zcx_pucl003 .
ENDTRY.
As you can see i add several specific method to my object in order to make the code more clear.
Here is the code for all the specifics methods
PROCESS_PDF_FILE
TRY.
* Extract the Data of the PDF as a XSTRING stream
      me->process_form( pdf = binary_file ).
      me->process_xml_file( input_xstring = pdf_form_data ).
    CATCH zcx_pucl003 INTO v_exception.
      RAISE EXCEPTION v_exception.
ENDTRY.
PROCESS_FORM with inbound parameter PDF type XSTRING
DATA :
     l_fp          TYPE REF TO if_fp ,
     l_pdfobj      TYPE REF TO if_fp_pdf_object .
TRY.
* Get a reference to the form processing class.
      l_fp = cl_fp=>get_reference( ).
* Get a reference to the PDF Object class.
      l_pdfobj = l_fp->create_pdf_object( ).
* Set the pdf in the PDF Object.
      l_pdfobj->set_document( pdfdata = pdf ).
* Set the PDF Object to extract data the Form data.
      l_pdfobj->set_extractdata( ).
* Execute call to ADS
      l_pdfobj->execute( ).
* Get the PDF Form data.
      l_pdfobj->get_data( IMPORTING formdata = pdf_form_data ).
    CATCH cx_fp_runtime_internal
          cx_fp_runtime_system
          cx_fp_runtime_usage.
ENDTRY.
PROCESS_XML_FILE with inbound parameter INPUT_XSTRING type XSTRING.
TRY.
      me->create_xml_document( input_xstring = input_xstring ).
      me->process_xml( ).
    CATCH ZCX_PUCL003 INTO v_exception.
      RAISE EXCEPTION v_exception.
ENDTRY.
CREATE_XML_DOCUMENT with inbound parameter INPUT_XSTRING type XSTRING.
DATA :
     l_ixml        TYPE REF TO if_ixml,
     streamfactory TYPE REF TO if_ixml_stream_factory ,
     istream       TYPE REF TO if_ixml_istream,
     parser        TYPE REF TO if_ixml_parser.
DATA: parseerror TYPE REF TO if_ixml_parse_error,
        str        TYPE string,
        i          TYPE i,
        count      TYPE i,
        index      TYPE i.
DATA :
* Convert the xstring form data to string so it can be
* processed using the iXML classes.
TRY.
      converter = cl_abap_conv_in_ce=>create( input = input_xstring ).
      converter->read( IMPORTING data = formxml ).
* Get a reference to iXML object.
      l_ixml = cl_ixml=>create( ).
* Get iStream object from StreamFactory
      streamfactory = l_ixml->create_stream_factory( ).
      istream = streamfactory->create_istream_string( formxml ).
* Create an XML Document class that will be used to process the XML
      xml_document = l_ixml->create_document( ).
* Create the Parser class
      parser = l_ixml->create_parser( stream_factory = streamfactory
                                      istream        = istream
                                      document       = xml_document ).
* Parse the XML
      parser->parse( ).
      IF sy-subrc NE 0
        AND parser->num_errors( ) NE 0.
        count = parser->num_errors( ).
        index = 0.
        WHILE index < count.
          parseerror = parser->get_error( index = index ).
          str = parseerror->get_reason( ).
          index = index + 1.
        ENDWHILE.
        EXIT.
      ENDIF.
    CATCH cx_parameter_invalid_range
          cx_sy_codepage_converter_init
          cx_sy_conversion_codepage
          cx_parameter_invalid_type.
ENDTRY.
Method PROCESS_XML
DATA v_formname TYPE fpname.
* For each node of the XML file you want to retrieve the value
* Then use the specific method PROCESS_NODE .
* Find Node where System Id is store
CLEAR : xml_node ,
          xml_node_value.
TRY.
      me->process_node( node_name     = 'SYSID' ).
      CHECK NOT xml_node_value IS INITIAL.
      CASE xml_node_value.
        WHEN sy-sysid.
* Search for Form name.
          me->process_node( node_name = 'FORM_NAME').
          CHECK NOT xml_node_value IS INITIAL.
          v_formname = xml_node_value.
        WHEN OTHERS.
      ENDCASE.
      CATCH cx_root.
ENDTRY.
Method PROCESS_NODE with inbound parameter NODE_NAME type STRING
CLEAR : xml_node , xml_node_value .
xml_node = xml_document->find_from_name( name = node_name ).
IF xml_node IS INITIAL.
* Missing one node in the form, nothing will be done
      RAISE EXCEPTION TYPE ....
ELSE.
    xml_node_value = xml_node->get_value( ).
ENDIF.
Hope this help you .
Best regards
Bertrand

Reg Extracting data from PDF using file adapter

Hi Experts,
In my business process I will get different files in the form of pdf. I have to extract the fields from the file and send it to ECC system. Can any one suggest me how to do it without using CA.
Regards
Suresh

you might have to use a custom solution.
you will find tips here Trouble writing out a PDF in XI/PI?

Extracting Data from PDF forms in Reader created in Livecycle

Hello
We would like users who complete a PDF document in Adobe Reader created in Livecycle to be able to export the completed fields (and accompanying questions) to a MS Word document in a format that appears similar to the PDF so it can be pasted in future documents.
Is there a simple step procedure that the users can follow
Any assistance would be much appreciated

Hi,
I think, you had selected "3.x Datasource" as the type when you were replicating the Metadata from second client.
If so, delete the datsource (in BIW) from the second client , and then replicate the datsource one more time.But this time , you need to select "As Datasource" option only.
with rgds,
Anil Kumar Sharma .P

How to Extract Data from the PDF file to an internal table.

HI friends,
How can i Extract data from a PDF file to an internal table....
Thanks in Advance
Shankar

Shankar,
Have a look at these threads:-
extracting the data from pdf file to internal table in abap
Adobe Form (data extraction error)
Chintan

Need to pre-populate and Extract data from static PDF form

Hi Jasmin or Jayan or anyone else that can answer.
I have a requirement to use Digital Signatures. Because of that, the forms must be static PDFs and the form variables will be “document form”. I want to pre-populate the form via an SQL query and custom render process and render it as PDF so that the submitter can apply a digital signature when he/she is done and ready to submit for approvalSubsequent approvers will also digitally sign the form. I know that I will specify the custom render to render only once and thereby preserve the signature(s) on the form. I do, however, need to extract data from the form to control the business process. I cannot access the data in the form the same way I do with an xdp and I also cannot pre-populate the same way I do with an xdp.
Any suggestions on how to attack this?

Parth, one problem with your approach is he will submit PDF and therefore you won't be able to put the PDF in a variable that's suppose to contain just xml.
The prepopulation should be the same. If you start off with an xdp, then you will call a render service that merges data with your xdp to create a PDF.
Now when you submit, you will submit the entire PDF back in the Document Form variable. In Workbench, you can use the FormDataIntegration service to extract data from that PDF that's being stored under Document Form var/object/document and put it in an xml variable. Then you can just use xPath to do your condition.
I'm assuming you'll just pass that same Document Form variable to the next step, because if you do any change to the PDF it'll brake the signature.
Let me know if I missed anything.
Jasmin

Extract data from Dynamic Table in Pdf

Hi,all
How can I extract data from dynamically created table(the rows are added/removed by user in offline scenario) in pdf form?
Regards,
Michael

Hi Micheal,
I have a scenario which is similar as yours.I want to extract table data from the offline form.when i extract data i am getting values only for first row of the table.Can u please guide me how to fetch the data for a table(this table also has dynamically increasing rows in offline).I need the solution urgently.Please help me on this.
WIll reward points for sure.
Thanks and Regards,
Srividya.

Applescript or workflow to extract text from PDF and rename PDF with the results

Hi Everyone,
I get supplied hundreds of PDFs which each contain a stock code, but the PDFs themselves are not named consistantly, or they are supplied as multi-page PDFs.
What I need to do is name each PDF with the code which is in the text on the PDF.
It would work like this in an ideal world:
1. Split PDF into single pages
2. Extract text from PDF
3. Rename PDF using the extracted text
I'm struggling with part 3!
I can get a textfile with just the code (using a call to BBEDIT I'm extracting the code)
I did think about using a variable for the name, but the rename functions doesn't let me use variables.

Hello
You may also try the following applescript script, which is a wrapper of rubycocoa script. It will ask you choose source pdf files and destination directory. Then it will scan text of each page of pdf files for the predefined pattern and save the page as new pdf file with the name as extracted by the pattern in the destination directory. Those pages which do not contain string matching the pattern are ignored. (Ignored pages, if any, are reported in the result of script.)
Currently the regex pattern is set to:
/HB-.._[0-9]{6}/
which means HB- followed by two characters and _ and 6 digits.
Minimally tested under 10.6.8.
Hope this may help,
H
_main()
on _main()
    script o
        property aa : choose file with prompt ("Choose pdf files.") of type {"com.adobe.pdf"} ¬
            default location (path to desktop) with multiple selections allowed
        set my aa's beginning to choose folder with prompt ("Choose destination folder.") ¬
            default location (path to desktop)
        set args to ""
        repeat with a in my aa
            set args to args & a's POSIX path's quoted form & space
        end repeat
        considering numeric strings
            if (system info)'s system version < "10.9" then
                set ruby to "/usr/bin/ruby"
            else
                set ruby to "/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby"
            end if
        end considering
        do shell script ruby & " <<'EOF' - " & args & "
require 'osx/cocoa'
include OSX
require_framework 'PDFKit'
outdir = ARGV.shift.chomp('/')
ARGV.select {|f| f =~ /\\.pdf$/i }.each do |f|
    url = NSURL.fileURLWithPath(f)
    doc = PDFDocument.alloc.initWithURL(url)
    path = doc.documentURL.path
    pcnt = doc.pageCount
    (0 .. (pcnt - 1)).each do |i|
        page = doc.pageAtIndex(i)
        page.string.to_s =~ /HB-.._[0-9]{6}/
        name = $&
        unless name
            puts \"no matching string in page #{i + 1} of #{path}\"
            next # ignore this page
        end
        doc1 = PDFDocument.alloc.initWithData(page.dataRepresentation) # doc for this page
        unless doc1.writeToFile(\"#{outdir}/#{name}.pdf\")
            puts \"failed to save page #{i + 1} of #{path}\"
        end
    end
end
EOF"
    end script
    tell o to run
end _main

Extract data from database tables and download in pdf and csv

extract data from database tables and download in pdf and csv
hi how can i re-write my old form procedure in adf java. the procedure used to extract data from diffirent table and dowload the data in pdf and csv.am not downloading image, i what to extract data from diffirent tables in my database and download that data in pdf and csv. i would like to write this in java adf.i just what direction am not asking anyone to do my work this is my learning curve
the form code is
function merge_header3 return varchar2 is
begin
     return '~FACILITY DESCRIPTION~ACCOUNT NO~BRANCH CODE~BANK REF NO.~P/P/ AMOUNT~Postal Address 1~Postal Address 2~Box Postal Code~Dep. Date~Month~BANK NAME~BRANCH NAME~ACCOUNT TYPE~DESCRIPTION~OBJECTIVE DESCRIPTION';
end;
procedure download_file (i_pbat integer) is
dir varchar2(80);
file_name1 varchar2(80);
file_name2 varchar2(80);
appl_code varchar2(80);
fil1 client_text_io.file_type;
fil2 client_text_io.file_type;
dat varchar2(1000);
DATA VARCHAR2(1000);
bvspro varchar2(100);
ssch   varchar2(100);
bvspro_total number(20,2);
ssch_total   number(20,2);
grand_total number(20,2);
cnt    integer;
cursor pbat is
     select *
     from sms_payment_batches
     where id = i_pbat
cursor pay (pb_id integer) is
     select *
     from sms_payment_vw
     where pbat_id = pb_id
     order by subsidy ASC,programme,beneficiary_name
cursor cgref (low varchar2) is
     select *
     from cg_ref_codes
     where rv_domain ='SMS'
     and rv_low_value = low
success boolean;
begin
     set_application_property(cursor_style,'busy');
     appl_code := sms_global.ref_code('SMS','APP_CODE','SMS',0);
    dir       := sms_global.ref_code('SMS','PAY_DIR','c:\sms\batch_payments',0);
         success := webutil_file.create_directory(dir);
     if webutil_file.file_is_directory(dir) then
         null;
--         message ('directory exists');
    else
--                  message ('create directory ');
         success := webutil_file.create_directory(dir);
--         if success then        message ('directory exists');    end if;
    end if;
    for c_pbat in pbat loop
         file_name1 := dir ||'\' || appl_code||c_pbat.batch_number||'-'||to_char(c_pbat.batch_dt,'yyyymmdd')||'pay.txt';
         file_name2 := dir ||'\' || appl_code||c_pbat.batch_number||'-'||to_char(c_pbat.batch_dt,'yyyymmdd')||'merge.txt';
--message('create files ');
--         fil1 := client_text_io.fopen (file_name1,'W');
--         fil2 := client_text_io.fopen (file_name2,'W');
    fil1 := client_text_io.fopen (file_name1,'W','');
    fil2 := client_text_io.fopen (file_name2,'W','');
               dat :=                       'FROM ACCOUNT NUMBER'
                                                            ||'~'||'FROM ACCOUNT DESCRIPTION'
                                                            ||'~'||'MY STATEMENT DESCRIPTION'
                                                            ||'~'||'BENEFICIARY ACCOUNT NUMBER'
                                                            ||'~'||'BENEFICIARY SUB ACCOUNT NUMBER'
                                                            ||'~'||'BENEFICIARY BRANCH CODE'
                                                            ||'~'||'BENEFICIARY NAME'
                                                            ||'~'||'BENEFICIARY STATEMENT DESCRIPTION'
                                                            ||'~'||'AMOUNT';
         --     client_text_io.put_line(fil1,dat);
         bvspro:= null;
         ssch := null;
         cnt := 0;
         dat := '~'||lpad('~',16,'~');
         for c_pay in pay(c_pbat.id) loop
--message('cpay loop ' || cnt);
           if bvspro is null then
                 dat := lpad('~',16,'~');
                 dat := utility.put_field(1,c_pay.programme,dat,'~');
           client_text_io.put_line(fil2,dat);
           dat := utility.put_field(1,c_pay.subsidy,dat,'~');
           client_text_io.put_line(fil2,dat);
           dat := merge_header3;
                 client_text_io.put_line(fil2,dat);
                 bvspro := c_pay.programme;
                 ssch := c_pay.subsidy;
                 grand_total := 0;
                 bvspro_total := 0;
                 ssch_total := 0;
           end if;
           if bvspro <> c_pay.programme then
                 dat := lpad('~',16,'~');
                 dat := utility.put_field(5,ssch_total,dat,'~');
                 dat := lpad('~',16,'~');
                 dat := utility.put_field(5,bvspro_total,dat,'~');
           dat := utility.put_field(1,'Total:' || bvspro,dat,'~');
                 client_text_io.put_line(fil2,dat);
                 dat := lpad('~',16,'~');
           client_text_io.put_line(fil2,dat);
                 dat := utility.put_field(1,c_pay.programme,dat,'~');
           client_text_io.put_line(fil2,dat);
                 bvspro := c_pay.programme;
           dat := utility.put_field(1,c_pay.subsidy,dat,'~');
           client_text_io.put_line(fil2,dat);
           dat := merge_header3;
                 client_text_io.put_line(fil2,dat);
                 bvspro := c_pay.programme;
                 ssch := c_pay.subsidy;
                 bvspro_total := 0;
                 ssch_total := 0;
                 cnt :=0;
         end if;
           if ssch <> c_pay.subsidy then
                 dat := lpad('~',16,'~');
                 dat := utility.put_field(5,ssch_total,dat,'~');
                 dat := lpad('~',16,'~');
           client_text_io.put_line(fil2,dat);
           dat := utility.put_field(1,c_pay.subsidy,dat,'~');
           client_text_io.put_line(fil2,dat);
           dat := merge_header3;
                 client_text_io.put_line(fil2,dat);
                 ssch := c_pay.subsidy;
                 ssch_total := 0;
                 cnt :=0;
         end if;
        bvspro_total := bvspro_total + c_pay.amount;
        ssch_total   := ssch_total   + c_pay.amount;
              grand_total := grand_total + c_pay.amount;
        cnt := cnt +1;
--message('bfore write file 2 ' );
        client_text_io.put_line(fil2
                               ,cnt
                        ||'~'|| c_pay.beneficiary_name
                                                            ||'~'||c_pay.BENEFICIARY_ACCOUNT_NUMBER ||''
                                                            ||'~'||c_pay.BRANCH_CODE             ||''
                                                            ||'~'|| c_pay.BENEFICIARY_STATEMENT_DESC
                                                            ||'~'|| c_pay.AMOUNT
                        ||'~'|| c_pay.address_line1
                        ||'~'|| c_pay.address_line2
                                                ||'~'|| c_pay.postal_code
                                                ||'~'|| TO_CHAR(c_pay.deposit_date,'DD-Mon-YYYY')
                                                ||'~'|| c_pay.month
                                                ||'~'|| c_pay.bank
                                                ||'~'|| c_pay.bank_branch
                                                ||'~'|| c_pay.account_type
                                                ||'~'|| c_pay.subsidy
                                                ||'~'|| c_pay.programme)
              DATA :=                                  c_pay.FROM_ACCOUNT_NUMBER
                                                            ||'~'||c_pay.FROM_ACCOUNT_DESCR
                                                            ||'~'||c_pay.MY_STATEMENT_DESCR
                                                            ||'~'||c_pay.BENEFICIARY_ACCOUNT_NUMBER
                                                            ||'~'
                                                            ||'~'||c_pay.BRANCH_CODE
                                                            ||'~'||c_pay.BENEFICIARY_NAME
                                                            ||'~'||c_pay.BENEFICIARY_STATEMENT_DESC
                                                            ||'~'||c_pay.AMOUNT;
        DATA := REPLACE(DATA, ',' , ' ' );
        DATA := REPLACE(DATA, '~' , ',' );
--message (cnt ||' ' || data);
--message('bfore write file 1 ' );
              client_text_io.put_line(fil1, data);
         end loop;
--message ('end of write');
             dat := lpad('~',16,'~');
             dat := utility.put_field(6,ssch_total,dat,'~');
             dat := lpad('~',16,'~');
       dat := utility.put_field(1,'Total:' || bvspro,dat,'~');
             dat := utility.put_field(5,bvspro_total,dat,'~');
          client_text_io.put_line(fil2,dat);
          dat := lpad('~',16,'~');
       client_text_io.put_line(fil2,dat);
       dat := utility.put_field(1,'Grand Total:' ,dat,'~');
             dat := utility.put_field(5,grand_total,dat,'~');
          client_text_io.put_line(fil2,dat);
         -- close file
for i in 1..50 loop
       if substr(i,-1) = 0 then
             message ('flush ' || i);
       end if;
              client_text_io.put_line(fil1, lpad(' ',2000));
              client_text_io.put_line(fil2, lpad(' ',2000));
              client_text_io.put_line(fil1, lpad(' ',2000));
              client_text_io.put_line(fil2, lpad(' ',2000));
end loop;
         client_text_io.fclose(fil1);
         client_text_io.fclose(fil2);
    end loop;
   set_application_property(cursor_style,'default');
    exception
         when others then
              message(sqlcode ||' ' ||sqlerrm);
   end download_file;    i try this but this code onlydownload image not data from database tables
    public void downloadImage(FacesContext facesContext, OutputStream outputStream)
        BindingContainer bindings = BindingContext.getCurrent().getCurrentBindingsEntry();
        // get an ADF attributevalue from the ADF page definitions
        AttributeBinding attr = (AttributeBinding) bindings.getControlBinding("DocumentImage");
        if (attr == null)
            return;
        // the value is a BlobDomain data type
        BlobDomain blob = (BlobDomain) attr.getInputValue();
        try
        {   // copy the data from the BlobDomain to the output stream
            IOUtils.copy(blob.getInputStream(), outputStream);
            // cloase the blob to release the recources
            blob.closeInputStream();
            // flush the output stream
            outputStream.flush();
        catch (IOException e)
            // handle errors
            e.printStackTrace();
            FacesMessage msg = new FacesMessage(FacesMessage.SEVERITY_ERROR, e.getMessage(), "");
            FacesContext.getCurrentInstance().addMessage(null, msg);
        }

You should ask your forum in the ADF-forum.

Extract data from a scanned PDF Chart

Hi,
I have a scanned PDF chart, which shows linear relationship between two variables. Is there a way to extract data from the scanned PDF using Acrobat?
I want to avoid error in my calculations by eyeballing the data. Using "Measuring tool" may be an option, but wanted to ask whether any forum members have a better and efficient way to extract data, which can later be used in a spreadsheet software.
As an example, please refer to the attached link, I will like to extract data from Figure 6 in this document: http://www.seas.columbia.edu/earth/wtert/sofos/nawtec/nawtec13/nawtec13-3164.pdf
FYI, I have Acrobat X installed on my Windows computer.
Thanks in advance for your help.

Using AA9 Pro I was able to use the TouchUp Object Tool, right click and open the graph in Photoshop. From there, or Paint and another image editor, one might be able to clean it up or re-create the paths. The gray values are too close for it to be simple.
The measure tool does not seem a good choice; although you see measurement lines, it would not actually produce the line you desire, until maybe, just maybe you flattened annotations, a feature within Fixups, perhaps limited to the Professional version.
IMO, it is a fairly linear graph with only a variation at the 10 Power (MW). You eyeball is as good as mine at this one; the actual values may be available from the authoring entity.

I want to extract data from a PDF using Java

I would prefer to extract data from a PDF and convert it to XML. Is there an API that will convert a PDF to some Adobe format XML? Ideally I would like to add some JAR files to my classpath, similar to PDFBox. I don't want to install a bunch of server side componets or anything like that.
Thanks!

Thank you for the reply!
If I installed the server side components, how would a Java client invoke a service to export data from a PDF? RMI, Web Services?

Extracting data from a pdf form

Hi,
livecycle es2, workbench 9.0
I'm new to workbench and have a problem extracting data from a pdf form submitted to a short lived process.
I have set up the following very simple process :
default startpoint > ProcessForm > exportData > set value > set value > Write Document
The intention is to update the document and write it to disk. So far, each step works except for the 'export data' where I cannot get the pdf to extract to xml.
The Input to the 'export data' step is a variable (myDoc), Data Type: Document, created from the incoming PDF form.
If I write out myDoc it is an exact copy of the incoming document, so I guess the start and finish steps of of the process are OK.
The incoming (PDF) form I was given had no data schema, but I thought I could access the form data by exporting to an xml variable....
Service : FormDataIntegration / exportData
input (PDF Document)    variable : myDoc
output(Data extracted)     variable : myXMLData
Then in the next step (set value) access the xml element I am after ..
Mappings
Location: /process_data/@groupId      Expression: /process_data/myXMLData/xdp/datasets/data/form1/mainPage/groupId
This is did not work, so I got the incoming form, exported the form data to an xml file, and created a schema using Stylus Studio. I then imported that into the myXMLdata definition. ( BTW - Do I need to specify the root node after importing it ? )
Still not working !
Extra info : The XML view of my incoming form shows I have a minimal dataset definition- is this OK ??
<connectionSet xmlns="http://www.xfa.org/schema/xfa-connection-set/2.8/">
   <?originalXFAVersion http://www.xfa.org/schema/xfa-connection-set/2.4/?></connectionSet>
<xfa:datasets xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/">
   <xfa:data xfa:dataNode="dataGroup"/>
</xfa:datasets>
The schema created by stylus studio has none of the xfdf, xfa settings I have seen on other schemas - is this OK ?
Any help to get this fixed greatly appreciated
thanks
steve

hey thanks for the offer, but I am now sorted after I found a simple working example on line.
This is a similar process to the one I am working on, and is clearly described and easy to follow...
http://eslifeline.wordpress.com/2009/04/25/extracting-data-from-signed-pdf-using-livecycle -server/
girish bedekar - I thank you !

Extracting data from Excel To Illustrator javascript or vbscript

Hi all-
I was wondering if there was a way to extract data from Excel to be used in Illustrator. I know there is an option of variables and xml, and I don't want that. I've seen and tried out how to read illustrator and write to excel, and I get that. What I would like to do is pretty much the opposite:
1.Pre-fill in an Excel file(.xls,.csv, doesn't matter) with data such as a filename in column 1 and (Replacement Text) in column 2 and close manually.
2. Run script(VBSCRIPT,Javascript, doesn't matter)
3.For each column in Excel file where cell in first column is not empty, open Illustrator Template with placeholder of "DWG" textframe and replace the frame titled "DWG" with Replacement text from Excel in Column2.
4, Save each to a PDF file and name file with text from Excel Column1(Filename)
In a nutshell, there will be a single illustrator template with a premade textFrame with a name of "DWG". Excel will contain two columns, one for the filename to be named and one for the relative text to replace with the placeholder in AI. I hoped I explained this well enough without causing too much confusion. Thanks in advance.
Filename
Replacement Text
test1.pdf
DWG01
test2.pdf
DWG02
test3.pdf
DWG03
test4.pdf
DWG04

As text… \n is new line character and \r is return character. I can't remember which excel uses but they both equate to a line/paragraph… I very quickly threw together an example for you…
#target Illustrator
textToPDF();
function textToPDF() {
          if ( app.documents.length == 0 ) { return; }
          var doc, csvFile, i, fileArray, opts;
          csvFile = File( '~/Desktop/ScriptTest/Test.csv' );
          if ( !csvFile.exists ) { return; }
          fileArray = readInCSV( csvFile );
          doc = app.activeDocument;
          opts = new PDFSaveOptions();
          opts.pDFPreset = '[Press Quality]';
          // Here we loop the main array
          for ( i = 0; i < fileArray.length; i++ ) {
                    // Here we get the second item of sub array i
                    doc.textFrames.getByName( 'DWG' ).contents = fileArray[i].[1];
                    // Here we get the first item of sub array i
                    doc.saveAs( File( fileArray[i].[0] ), opts );
function readInCSV( fileObj ) {
          var fileArray, thisLine, csvArray;
          fileArray =[];
          fileObj.open( 'r' );
          while( !fileObj.eof ) {
                    thisLine = fileObj.readln();
                    csvArray = thisLine.split( ',' );
                    fileArray.push( csvArray );
          fileObj.close();
          return fileArray;
I haven't tested it but it should be close…?

Applescript: Extract data from pdf

Similar Messages

Maybe you are looking for