Working with PDF files

Hello, we would like to write some functionality that generates PDF files from our Java application and additionally, some functionality that reads them into the app also. What is the best API to use for this? Would it be iText?

Aha,show my code and say nothing[
............................................................................................................................./b]
1�Bjacob  for  taking out  pdf ,word and  excel.
jacob is a bridage�Cwhich connects java and com or win32 functions.It nees a dll,but the authoe of the jacob provide it�B
jacob�Fhttp://www.matrix.org.cn/down_view.asp?id=13
put dll under path,jar file under classpath  ,   import java.io.File;
import com.jacob.com.*;
import com.jacob.activeX.*;
public class FileExtracter{
public static void main(String[] args) {
ActiveXComponent app = new ActiveXComponent("Word.Application");
String inFile = "c:\\test.doc";
String tpFile = "c:\\temp.htm";
String otFile = "c:\\temp.xml";
boolean flag = false;
try {
app.setProperty("Visible", new Variant(false));
Object docs = app.getProperty("Documents").toDispatch();
Object doc = Dispatch.invoke(docs,"Open", Dispatch.Method, new Object[]{inFile,new Variant(false), new Variant(true)}, new int[1]).toDispatch();
Dispatch.invoke(doc,"SaveAs", Dispatch.Method, new Object[]{tpFile,new Variant(8)}, new int[1]);
Variant f = new Variant(false);
Dispatch.call(doc, "Close", f);
flag = true;
} catch (Exception e) {
e.printStackTrace();
} finally {
app.invoke("Quit", new Variant[] {});
}2)
apache's poi  takes out  word�Cexcel�B
poi package�Fhttp://www.matrix.org.cn/down_view.asp?id=14
put it under classpath.
import java.io.*;
import org.textmining.text.extraction.WordExtractor;
* <p>Title: pdf extraction</p>
* <p>Description: email:[email protected]</p>
* <p>Copyright: Matrix Copyright (c) 2003</p>
* <p>Company: Matrix.org.cn</p>
* @author chris
* @version 1.0,who use this example pls remain the declare
public class PdfExtractor {
public PdfExtractor() {
public static void main(String args[]) throws Exception
FileInputStream in = new FileInputStream ("c:\\a.doc");
WordExtractor extractor = new WordExtractor();
String str = extractor.extractText(in);
System.out.println("the result length is"+str.length());
System.out.println("the result is"+str);
}3)
3�Bpdfbox  for   pdf 
http://www.matrix.org.cn/down_view.asp?id=12
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.pdfparser.PDFParser;
import java.io.*;
import org.pdfbox.util.PDFTextStripper;
import java.util.Date;
* <p>Title: pdf extraction</p>
* <p>Description: email:[email protected]</p>
* <p>Copyright: Matrix Copyright (c) 2003</p>
* <p>Company: Matrix.org.cn</p>
* @author chris
* @version 1.0,who use this example pls remain the declare
public class PdfExtracter{
public PdfExtracter(){
public String GetTextFromPdf(String filename) throws Exception
String temp=null;
PDDocument pdfdocument=null;
FileInputStream is=new FileInputStream(filename);
PDFParser parser = new PDFParser( is );
parser.parse();
pdfdocument = parser.getPDDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
OutputStreamWriter writer = new OutputStreamWriter( out );
PDFTextStripper stripper = new PDFTextStripper();
stripper.writeText(pdfdocument.getDocument(), writer );
writer.close();
byte[] contents = out.toByteArray();
String ts=new String(contents);
System.out.println("the string length is"+contents.length+"\n");
return ts;
public static void main(String args[])
PdfExtracter pf=new PdfExtracter();
PDDocument pdfDocument = null;
try{
String ts=pf.GetTextFromPdf("c:\\a.pdf");
System.out.println(ts);
catch(Exception e)
e.printStackTrace();

Similar Messages

  • Working with .pdf files and JAVA

    Hi,
    does anyone have an answer to how I can find more information on .pdf files?
    I would like to convert .pdf files to textfiles and/or xml files. I can not find it in the j2se Edition, and someone told me it can be found in the j2ee edition, but I can not find anything there either. Please help..
    thanks,
    R.

    thanks for your reply. What tools do you mean? I know lots of tools for converting text to a .pdf file, but no tools for the other direction. There is an API available (commercial), that lets you work with PDF in JAVA, but i am interesting in the other possibilities.
    Regards

  • Full-Text search is not working with PDF files - SQL Server 2012 64 bit

    Hi,
    We are in the process of storing PDF files in SQL Server 2012 with Full-Text search capability.
    I followed the steps as below and it works fine with word document but not for PDF files. I tried with PDF ifiler 11 & 9 and both are unsuccessful.
    Server/DB Level Settings:
    1)
    Enable FileStream
    2)
    Install Full-Text
    then restart
    3)
    Use [specific db]
    alter
    database [db name]
    add
    filegroup Files
    contains filestream;
    alter
    database [db name]
    add
    file (
    name = N'Files',
    filename =
    N'D:\SQL\DATA') to
    filegroup [Files];
    3)
    Database level
    Settings:
    FileStream:
    FileStream
    Directory name:
    [Set the name]
    FileStream
    non-transacted
    Access: [set Appropriate]
    3a)
    Add a
    datafile to DB
    with filestreamdata
    filetype.
    4)
    Share D:\SQL\DATA
    directory and
    add specific accounts
    with read/write
    access
    5)
    Give bulkadmin
    access to those
    specific accounts
    at server
    level
    6)
    From the
    page (link)
    download and
    install the *.pdf
    IFilter for
    FTS. Link:
    http://www.adobe.com/support/downloads/detail.jsp?ftpID=5542
    7)
    To the
    PATH global system
    variable add
    path to the
    catalog,
    where you installed
    the plugin.
    Default for
    this version is:
    C:\Program
    Files\Adobe\Adobe
    PDF iFilter 9
    for 64-bit
    platforms\bin
    8)
    From the
    page (link)
    download a
    FilterPackx64.exe
    and install
    it. Link:
    http://www.microsoft.com/en-us/download/confirmation.aspx?id=20109
    9)
    Now from
    SSMS execute the following
    procedures:
    -sp_fulltext_service
    'load_os_resources',1
    -sp_fulltext_service
    'verify_signature', 0
    EXEC
    sp_fulltext_service
    'update_languages';
    -- update language list
    EXEC
    sp_fulltext_service
    'restart_all_fdhosts';
    -- restart daemon
    reconfigure
    with override;
    10)
    Restart the
    server
    11)
    select document_type,
    path from
    sys.fulltext_document_types
    where document_type
    = '.pdf'
    -select
    document_type,
    path from sys.fulltext_document_types
    where document_type
    = '.docx'
    12) Results are OK.
    Following is my Table /Index/ catalog script:
    CREATE
    TABLE dbo.DocumentFilesTest
    DocumentId  INT
    IDENTITY(1,1)
    NOT NULL
    PRIMARY KEY,
    AddDate datetime
    NOT NULL,
    Name nvarchar(50)
    NOT NULL,
    Extension nvarchar(10)
    NOT NULL,
    Description nvarchar(1000)
    NULL,
    FileStream_Id UNIQUEIDENTIFIER
    ROWGUIDCOL NOT
    NULL UNIQUE DEFAULT
    NEWSEQUENTIALID(),
    FileSource varbinary(MAX)
    FILESTREAM DEFAULT(0x)
    go
    --Add default add date for document   
    ALTER
    TABLE dbo.DocumentFilesTest
    ADD CONSTRAINT
    DF_DocumentFilesTest_AddDate
    DEFAULT sysdatetime()
    FOR AddDate
    EXEC
    sp_fulltext_database
    'enable'
    GO
    IF
    NOT EXISTS
    (SELECT
    TOP 1 1 FROM sys.fulltext_catalogs
    WHERE name
    = 'Ducuments_Catalog_test')
    BEGIN
    EXEC sp_fulltext_catalog
    'Ducuments_Catalog_test',
    'create',
    'D:\SQL\PDFBlob';
    END
    --EXEC sp_fulltext_catalog 'Ducuments_Catalog_test', 'drop'
    DECLARE
    @indexName nvarchar(255)
    = (SELECT
    Top 1 i.Name
    from sys.indexes
    i
    Join sys.tables
    t on 
    i.object_id
    = t.object_id
    WHERE t.Name
    = 'DocumentFilesTest'
    AND i.type_desc
    = 'CLUSTERED')
    PRINT @indexName
    EXEC
    sp_fulltext_table
    'DocumentFilesTest',
    'create',
    'Ducuments_Catalog_test', 
    @indexName
    EXEC
    sp_fulltext_column
    'DocumentFilesTest',
    'FileSource',
    'add', 0,
    'Extension'
    EXEC
    sp_fulltext_table
    'DocumentFilesTest',
    'activate'
    EXEC
    sp_fulltext_catalog
    'Ducuments_Catalog_test',
    'start_full'
    ALTER
    FULLTEXT INDEX
    ON [dbo].[DocumentFilesTest]
    ENABLE
    ALTER
    FULLTEXT INDEX
    ON [dbo].[DocumentFilesTest]
    SET CHANGE_TRACKING
    = AUTO
    ALTER
    FULLTEXT CATALOG
    Ducuments_Catalog_test REBUILD
    WITH ACCENT_SENSITIVITY=OFF;
    INSERT
    INTO DocumentFilesTest(Extension,
    Name,
    FileSource)
    SELECT
     'pdf'
    'BOL12006553.pdf'
    * FROM
    OPENROWSET(BULK
    'd:\SQL\PDFBlob\BOL12006553.pdf',
    SINGLE_BLOB)
    AS BLOB;
    GO
    INSERT
    INTO DocumentFilesTest(Extension,
    Name,
    FileSource)
    SELECT
     'docx'
    'test.docx'
    * FROM
    OPENROWSET(BULK
    'd:\SQL\PDFBlob\test.docx',
    SINGLE_BLOB)
    AS Document;
    GO
    SELECT
    d.*
    FROM dbo.DocumentFilesTest
    d WHERE
    Contains(d.FileSource,
    'BILL')
    Returns nothing. it should come from PDF file
    SELECT
    d.*
    FROM dbo.DocumentFilesTest
    d WHERE
    Contains(d.FileSource,
    'TEST')
    Returns from word document as follows:
    2           2014-06-04 10:11:41.393            test.docx docx           
    NULL   [BINARY Value]  [Binary Value]
    Any help is appreciated. Its been a long wait.
    Thanks,
    Vel
    Vel Thavasi

    Hello,
    Did you check the fulltext log files for more details about the errors. If the filter isn’t working, there should be errors in the error log file.
    The following thread is about similar issue, please refer to:
    http://social.msdn.microsoft.com/forums/sqlserver/en-US/69535dbc-c7ef-402d-a347-d3d3e4860d72/sql-server-2008-64bit-fulltext-indexing-pdf-not-working-cant-find-ifilter
    Regards,
    Fanny Liu
    If you have any feedback on our support, please click here.
    Fanny Liu
    TechNet Community Support

  • Change in behavior when working with PDF files in illustrator CC and CC2014. HELP IS NEEDED!

    Make a new CC file. Save in CC as pdf. Open same pdf file in CC 2014, make a change to file. Save file. Open same file in CC again. Now a dialogbox is displayed. This file is made in a newer version of illustrator!. This new behavior is totally stopping our entire production! What to do? NEED HELP ASAP
    Cheers
    Jesper G

    How can i downsave pdf file in CC 2014?
    This is very unfortune, because we use some VB script together with illustrator. That process is stopping now because of this message!!!
    Dont know how i can solve this issue!

  • Working with pdf files in swing applications

    Hi,
    I have a swing application which displays a pdf file and contains a text box. i want to display the current page number of the pdf file in the text box.
    Can any one please guide me how to implement the above functionality.
    Regards,
    Tommy

    How can i downsave pdf file in CC 2014?
    This is very unfortune, because we use some VB script together with illustrator. That process is stopping now because of this message!!!
    Dont know how i can solve this issue!

  • File, Place only works with PDF files...why?

    I create documents in Mac Pages that I want to then create an interactive PDF (mainly navigation).  I am using the demo copy of Indesign to see if it fits the bill.
    The mac pages doocument is a fully formated and ready for export to a static PDF.  As a test, I took a few pages of it and exported it to pdf, word and rtf.
    The only file format that InDesign would import/place is PDF (pages, word and rtf were all grayed out and could not be selected via ID place).
    I had hoped that ID would inport/place pages directly, but I cannot seem to get it to import any format other than PDF.
    I tried some other .doc files (actaully created with WORD) and they were selectable but only imported the table of contents (no red arrow an lower right of text box to continue place).
    Any suggestions?
    thanks
    bob

    ID is certainly capable of placing RTF as well as native Word files (DOC and DOCX). What you're seeing is quite unusual.
    Try trashing your preferences.
    For the other DOC files, you need to hold down the shift key when click the page to place them.
    Bob

  • Problem with PDF files

    Hello Experts,
    I am having trouble with PDF files corrupting the folder in
    which they are contained. If this sounds familiar, and you might be
    able to help, please see if the following steps give you an idea of
    the problem and the fix.
    I want to link to PDF files from a web page. The copy,
    consisting of several files, was provided by my client as Word
    documents, and converted to PDF by me from within Microsoft Word.
    I copied the PDF files into their own folder in my site, and
    found later, after leaving, then reopening Dreamweaver, that the
    folder was corrupted and unreadable. I got a message to run chkdsk.
    I tried working around the folder by repeating the process
    above in a new file, with the intention of ignoring the corrupted
    one. The second folder also became corrupted.
    I don't know if chkdsk is a Dreamweaver utility, or a Windows
    utility, or if this is a Windows, Acrobat or Dreamweaver problem.
    Does anyone know how to run chkdsk?
    Does this sound familiar to anyone? Any ideas?
    Thanks so much in advance for the urgently needed support.
    Richard

    chkdsk used to be a DOS utility, now Windows. Depending on
    the OS you have
    and whether it's Mac or PC, you can run it several ways.
    If XP > Start > Help and Support, type chkdsk in the
    search
    Other OSs, search for chkdsk.
    Jo
    "RTalbott" <[email protected]> wrote in
    message
    news:e91rg5$hkj$[email protected]..
    > Hello Experts,
    >
    > I am having trouble with PDF files corrupting the folder
    in which they are
    > contained. If this sounds familiar, and you might be
    able to help, please
    > see
    > if the following steps give you an idea of the problem
    and the fix.
    >
    >
    > I want to link to PDF files from a web page. The copy,
    consisting of
    > several
    > files, was provided by my client as Word documents, and
    converted to PDF
    > by me
    > from within Microsoft Word.
    >
    > I copied the PDF files into their own folder in my site,
    and found later,
    > after leaving, then reopening Dreamweaver, that the
    folder was corrupted
    > and
    > unreadable. I got a message to run chkdsk.
    >
    > I tried working around the folder by repeating the
    process above in a new
    > file, with the intention of ignoring the corrupted one.
    The second folder
    > also
    > became corrupted.
    >
    > I don't know if chkdsk is a Dreamweaver utility, or a
    Windows utility, or
    > if
    > this is a Windows, Acrobat or Dreamweaver problem. Does
    anyone know how
    > to run
    > chkdsk?
    >
    > Does this sound familiar to anyone? Any ideas?
    >
    > Thanks so much in advance for the urgently needed
    support.
    >
    > Richard
    >

  • Very slow responce when working with Office file on DFS-Share

    Very slow responce when working with Office file on DFS-Share
    We have implemented the following configuration
    Domain level Windows 2000. Two member servers with Windows Server 2008 R2, sharing the same DFS namespace with, at the moment, one folder target called Home.
    Users complaining that the access to different MS Office files is very slow. Even creating a new MS Word document using right click context menu takes up to 4 minutes to open. Saving, for example, one singe Excel sheet takes also few minutes.
    Tested with both, MS Office 2007 and MS Office 2010. Makes no difference. When using Office 2010 you can see the message like contacting:
    \\DomainName\Root\Home\UserName. Other files like TXT, JPG or PDF are not affected.
     What makes the thing really weird is the fact, that the behavior described above can absolutely change after client machine being rebooted, suddenly everything becomes very fast and this condition can revert back again just after the next
    reboot.
    Considerations until now:
    1. This has nothing to do with the file size. Even tiny files are affected.
    2. AD Sites are configured correctly and the client workstations see themselves in the correct sites.
    3. This is not an Office issue. If I map my folder target not as DFS, but directly as shared network drive
    \\ServerName\Root\Home\UserName , everything functions as expected
    What makes me suspicious: when using f.e. TCPView to monitor connections, I can see, that each time I make any operation on an office file, there will be a connection established to one of the domain controllers, sometimes to remote ones,
    located in other countries. But on the other side, even if the connection is established to the nearest DC, operations are still very very slow!
    Just forget to say. All clients are Windows 7
    Thanks to all who respond.

    Dear all,
    sorry for the delayed reply. The problem has been solved now and since September 19<sup>th</sup>. everything is functioning as expected.
    What was done:
    Deleted replication targets excepting the initial ones
    Carefully recreated folder targets
    Deleted and recreated  replication groups
    Disabled SNP features on both namespace servers
    Created EnableTCPA registry entry
    Checked that the following Updates are installed
    http://support.microsoft.com/kb/2688074
    http://support.microsoft.com/kb/2647452
    Concering Office File validation KB2553065 - This Update was already declined on our WSUS server
    Kind Regards
    Eduard

  • Send PO Mail with PDF File that Chinese character doestn't display

    Send PO Mail with PDF File that Chinese character doestn't display.
    I am using RSTXPDFT4, unicode ECC6.0
    Some computer Adobe Reader can read the file, but some computer cannot read, just a blank page.
    Thanks.

    Hi
    I worked for one client-chinese where we have to print chinese & english ( bilingual).You need to have dricer program which could identify both scripts .You are right ( unicode0
    Please check for the driver program : TWPDF : PDF converter Chinese in SPAD setting.
    SAP note is available.I will check and let you update .
    Edited by: sunny on Oct 28, 2009 10:29 AM

  • Has anyone not working with .dv files had synchronization problems?

    Has anyone not working with .dv files had sound synchronization problems? I'm not exactly sure what the alternatives to DV are, but I think one of them is HD.
    The reason for asking this question is to help isolate the nature and cause of a very serious flaw in iMovie '11. In the original release of iMovie '11 (version 9.0) there was a small--but serious--synchronization problem. In the 9.01 there is a large synchronization problem. We know of one person who has not experienced the problem, and he is not working with DV files (media). So we want to find out if anyone who is using something other than .dv files is experiencing a lack of synchronization between sound and picture. Knowing the answer to this will help with figuring out where the cause lies. For the initial iMovie '11 release (9.0), you probably would not notice a problem unless you had very long event-clips, e.g., two hours long. Events get this long if you are transferring from analog 8 mm tapes. Even then, it would have to be in scenes in which the connection between event and sound is obvious, e.g., close ups of people talking. It isn't until the 9.01 release that most people would notice anything. All we need to do is establish one case of a synchronization problem in which the person is using something other than DV.
    Message was edited by: Paul Bullen

    Hopefully, the 9.0.2 release will make my question moot. Zyfert must have posted the announcement of the release just as I was formulating my question. Still, if you have information on the subject, it would be interesting to hear.

  • Working with RAW files in PSE9:  Doable or better to use PSE12?

    I have PSE 9 and will soon get a camera with RAW capability.  I notice that Adobe no longer supports PSE 9.  Am I in for trouble if I don't upgrade to PSE 12 since I don't know how to work with RAW files?  It sounds as if I will need something to convert RAW files?  Where do I get it?  Is it a big deal for a beginner with no software skills?

    You can download a free DNG Converter from Adobe and then convert your RAWs to DNGs which can be used in PSE9.
    You don't need PSE12, but if you want to purchase PSE 12, you get the most current features in the Raw processor (which in my opinion is a huge improvement over what is available in PSE9), and you don't need the extra step of converting to DNG.

  • In the product/app Adobe Acrobat reader mobile.it would be good to have layer support working for pdf files.

    In the product/app Adobe Acrobat reader mobile.it would be good to have layer support working for pdf files.

    http://winsupersite.com/article/windows8/windows-8-tip-change-file-associations-144102

  • I'm using iphone 4S, and I can not open PDF file only from my husband email that using Mic outlook. It was very weird because I can received other email with pdf file from other people. can someone help.

    I'm using iphone 4S and ipad mini, and I can not open PDF file only from my husband email that using Mic outlook. It was very weird because I can received other email with pdf file from other people. Can someone help...
    Thanks in advance

    Hi Eidda,
    This may because the attachment is a winmail.dat file. I would recommend taking a look at the article below for more information. Note: the article is written for OS X mail, but does also apply to this situation.
    Mac OS X Mail: What is a winmail.dat attachment?
    http://support.apple.com/kb/HT2614
    -Griff W.

  • Working with RAW files in iPhoto 5.0.4 and Elements 4.0.1

    I take photos in RAW mode and download them to iPhoto. When I try to edit the photo in iPhoto, the picture is a tiny little file that is impossible to enlarge with any sort of clarity. Also, the word "RAW" does not appear anywhere on the iPhoto window like I read it is supposed to.
    When I drag the file to Photoshop Elements, I get an editing window that has none of the tools usually associated with JPEG files. I get a separate window in which I can darken or lighten the image, that's it.
    Clearly, I'm doing something wrong. No one in their right mind would ever use RAW if this is how it works.
    Any ideas?

    Hi Jack!
    If you're new working with RAW files, your right, it just doesn't make sense. RAW <imho> is a bit overrated. One thing you will need to keep in mind when shooting in RAW, is you will still need to take a well exposed image. What RAW files will allow are CHANGES in all areas of the image v. JPEG which may allow you to ADJUST a few settings in the image. My only suggestion would be to kepp playing around with PSE until you get the hang of it, it is an excellent image editing software. But realize, a well exposed JPEG and RAW file are hard to tell apart...
    Personally, I do not directly download RAW files through iPhoto but will create a folder and download to here, and simply drage folder to iPhoto to import (which are then 'converted' into JPEG files). This way I have the original RAW images safely located outside of iPhoto as well as in iPhoto. You should set Elements as your choice of application to edit files inside iPhoto.
    Good luck, Rick
    Good link: http://www.elementsvillage.com/forums/ and just for fun: http://www.photoshopcosmetics.com/index.php

  • Working with Large files in Photoshop 10

    I am taking pictures with a 4X5 large format film camera and scanning them at 3,000 DPI, which is creating extremely large files. My goal is to take them into Photoshop Elements 10 to cleanup, edit, merge photos together and so on. The cleanup tools don't seem to work that well on large files. My end result is to be able to send these pictures out to be printed at large sizes up to 40X60. How can I work in this environment and get the best print results?

    You will need to work with 8bit files to get the benefit of all the editing tools in Elements.
    I would suggest resizing at resolution of 300ppi although you can use much lower resolutions for really large prints that will be viewed from a distance e.g. hung on a gallery wall.
    That should give you an image size of 12,000 x 18,000 pixels if the original aspect ratio is 2:3
    Use the top menu:
    Image >> Resize >> Image Size

Maybe you are looking for

  • Implicit & Explicit Null Labels

    Hi.. I am a bit confused in Implicit and Explicit null labels. The RFC Says as follows : IMPLICIT NULL LABEL:This label value is only legal at the bottom of the label stack. It indicates that the label stack must be popped, and the forwarding of the

  • XDO Help...Error while generating PDF output

    Hi All, I am currently facing an issue generating the PDF through the concurrent program. We have developed the RTF template using Oracle XML Publisher Builder Template for Word Version 10.1.3.2.0 Build 87. When we run ,the program errors out some ti

  • Business role in IC

    Hi experts, I have a confusion on understanding Business Roles in interation center. scenario: call center set up is there with 100 CSR's, where everyones role is same. here my confusion is Do we have to create different business role for each CSR or

  • EVHOT with links to source template Issue

    Hello Experts, I have created 2 input templates.  The first input template has an EVHOT function to the second template.  The EVHOT function is working fine. I am having an issue because the second template has links to cells on the first template. 

  • Strange FOR loops behavior

    Hello, Given this code... quote: for (var i:uint = 0; i < 10; i++) Alert.show(i.toString()); The alert box counts down from 9 -- shouldn't it start counting up from 0?? Why does it appear that loops are going in reverse? Thanks, Tom