Working with PDF files
Hello, we would like to write some functionality that generates PDF files from our Java application and additionally, some functionality that reads them into the app also. What is the best API to use for this? Would it be iText?
Aha,show my code and say nothing[
............................................................................................................................./b]
1�Bjacob for taking out pdf ,word and excel.
jacob is a bridage�Cwhich connects java and com or win32 functions.It nees a dll,but the authoe of the jacob provide it�B
jacob�Fhttp://www.matrix.org.cn/down_view.asp?id=13
put dll under path,jar file under classpath , import java.io.File;
import com.jacob.com.*;
import com.jacob.activeX.*;
public class FileExtracter{
public static void main(String[] args) {
ActiveXComponent app = new ActiveXComponent("Word.Application");
String inFile = "c:\\test.doc";
String tpFile = "c:\\temp.htm";
String otFile = "c:\\temp.xml";
boolean flag = false;
try {
app.setProperty("Visible", new Variant(false));
Object docs = app.getProperty("Documents").toDispatch();
Object doc = Dispatch.invoke(docs,"Open", Dispatch.Method, new Object[]{inFile,new Variant(false), new Variant(true)}, new int[1]).toDispatch();
Dispatch.invoke(doc,"SaveAs", Dispatch.Method, new Object[]{tpFile,new Variant(8)}, new int[1]);
Variant f = new Variant(false);
Dispatch.call(doc, "Close", f);
flag = true;
} catch (Exception e) {
e.printStackTrace();
} finally {
app.invoke("Quit", new Variant[] {});
}2)
apache's poi takes out word�Cexcel�B
poi package�Fhttp://www.matrix.org.cn/down_view.asp?id=14
put it under classpath.
import java.io.*;
import org.textmining.text.extraction.WordExtractor;
* <p>Title: pdf extraction</p>
* <p>Description: email:[email protected]</p>
* <p>Copyright: Matrix Copyright (c) 2003</p>
* <p>Company: Matrix.org.cn</p>
* @author chris
* @version 1.0,who use this example pls remain the declare
public class PdfExtractor {
public PdfExtractor() {
public static void main(String args[]) throws Exception
FileInputStream in = new FileInputStream ("c:\\a.doc");
WordExtractor extractor = new WordExtractor();
String str = extractor.extractText(in);
System.out.println("the result length is"+str.length());
System.out.println("the result is"+str);
}3)
3�Bpdfbox for pdf
http://www.matrix.org.cn/down_view.asp?id=12
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.pdfparser.PDFParser;
import java.io.*;
import org.pdfbox.util.PDFTextStripper;
import java.util.Date;
* <p>Title: pdf extraction</p>
* <p>Description: email:[email protected]</p>
* <p>Copyright: Matrix Copyright (c) 2003</p>
* <p>Company: Matrix.org.cn</p>
* @author chris
* @version 1.0,who use this example pls remain the declare
public class PdfExtracter{
public PdfExtracter(){
public String GetTextFromPdf(String filename) throws Exception
String temp=null;
PDDocument pdfdocument=null;
FileInputStream is=new FileInputStream(filename);
PDFParser parser = new PDFParser( is );
parser.parse();
pdfdocument = parser.getPDDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
OutputStreamWriter writer = new OutputStreamWriter( out );
PDFTextStripper stripper = new PDFTextStripper();
stripper.writeText(pdfdocument.getDocument(), writer );
writer.close();
byte[] contents = out.toByteArray();
String ts=new String(contents);
System.out.println("the string length is"+contents.length+"\n");
return ts;
public static void main(String args[])
PdfExtracter pf=new PdfExtracter();
PDDocument pdfDocument = null;
try{
String ts=pf.GetTextFromPdf("c:\\a.pdf");
System.out.println(ts);
catch(Exception e)
e.printStackTrace();
Similar Messages
-
Working with .pdf files and JAVA
Hi,
does anyone have an answer to how I can find more information on .pdf files?
I would like to convert .pdf files to textfiles and/or xml files. I can not find it in the j2se Edition, and someone told me it can be found in the j2ee edition, but I can not find anything there either. Please help..
thanks,
R.thanks for your reply. What tools do you mean? I know lots of tools for converting text to a .pdf file, but no tools for the other direction. There is an API available (commercial), that lets you work with PDF in JAVA, but i am interesting in the other possibilities.
Regards -
Full-Text search is not working with PDF files - SQL Server 2012 64 bit
Hi,
We are in the process of storing PDF files in SQL Server 2012 with Full-Text search capability.
I followed the steps as below and it works fine with word document but not for PDF files. I tried with PDF ifiler 11 & 9 and both are unsuccessful.
Server/DB Level Settings:
1)
Enable FileStream
2)
Install Full-Text
then restart
3)
Use [specific db]
alter
database [db name]
add
filegroup Files
contains filestream;
alter
database [db name]
add
file (
name = N'Files',
filename =
N'D:\SQL\DATA') to
filegroup [Files];
3)
Database level
Settings:
FileStream:
FileStream
Directory name:
[Set the name]
FileStream
non-transacted
Access: [set Appropriate]
3a)
Add a
datafile to DB
with filestreamdata
filetype.
4)
Share D:\SQL\DATA
directory and
add specific accounts
with read/write
access
5)
Give bulkadmin
access to those
specific accounts
at server
level
6)
From the
page (link)
download and
install the *.pdf
IFilter for
FTS. Link:
http://www.adobe.com/support/downloads/detail.jsp?ftpID=5542
7)
To the
PATH global system
variable add
path to the
catalog,
where you installed
the plugin.
Default for
this version is:
C:\Program
Files\Adobe\Adobe
PDF iFilter 9
for 64-bit
platforms\bin
8)
From the
page (link)
download a
FilterPackx64.exe
and install
it. Link:
http://www.microsoft.com/en-us/download/confirmation.aspx?id=20109
9)
Now from
SSMS execute the following
procedures:
-sp_fulltext_service
'load_os_resources',1
-sp_fulltext_service
'verify_signature', 0
EXEC
sp_fulltext_service
'update_languages';
-- update language list
EXEC
sp_fulltext_service
'restart_all_fdhosts';
-- restart daemon
reconfigure
with override;
10)
Restart the
server
11)
select document_type,
path from
sys.fulltext_document_types
where document_type
= '.pdf'
-select
document_type,
path from sys.fulltext_document_types
where document_type
= '.docx'
12) Results are OK.
Following is my Table /Index/ catalog script:
CREATE
TABLE dbo.DocumentFilesTest
DocumentId INT
IDENTITY(1,1)
NOT NULL
PRIMARY KEY,
AddDate datetime
NOT NULL,
Name nvarchar(50)
NOT NULL,
Extension nvarchar(10)
NOT NULL,
Description nvarchar(1000)
NULL,
FileStream_Id UNIQUEIDENTIFIER
ROWGUIDCOL NOT
NULL UNIQUE DEFAULT
NEWSEQUENTIALID(),
FileSource varbinary(MAX)
FILESTREAM DEFAULT(0x)
go
--Add default add date for document
ALTER
TABLE dbo.DocumentFilesTest
ADD CONSTRAINT
DF_DocumentFilesTest_AddDate
DEFAULT sysdatetime()
FOR AddDate
EXEC
sp_fulltext_database
'enable'
GO
IF
NOT EXISTS
(SELECT
TOP 1 1 FROM sys.fulltext_catalogs
WHERE name
= 'Ducuments_Catalog_test')
BEGIN
EXEC sp_fulltext_catalog
'Ducuments_Catalog_test',
'create',
'D:\SQL\PDFBlob';
END
--EXEC sp_fulltext_catalog 'Ducuments_Catalog_test', 'drop'
DECLARE
@indexName nvarchar(255)
= (SELECT
Top 1 i.Name
from sys.indexes
i
Join sys.tables
t on
i.object_id
= t.object_id
WHERE t.Name
= 'DocumentFilesTest'
AND i.type_desc
= 'CLUSTERED')
PRINT @indexName
EXEC
sp_fulltext_table
'DocumentFilesTest',
'create',
'Ducuments_Catalog_test',
@indexName
EXEC
sp_fulltext_column
'DocumentFilesTest',
'FileSource',
'add', 0,
'Extension'
EXEC
sp_fulltext_table
'DocumentFilesTest',
'activate'
EXEC
sp_fulltext_catalog
'Ducuments_Catalog_test',
'start_full'
ALTER
FULLTEXT INDEX
ON [dbo].[DocumentFilesTest]
ENABLE
ALTER
FULLTEXT INDEX
ON [dbo].[DocumentFilesTest]
SET CHANGE_TRACKING
= AUTO
ALTER
FULLTEXT CATALOG
Ducuments_Catalog_test REBUILD
WITH ACCENT_SENSITIVITY=OFF;
INSERT
INTO DocumentFilesTest(Extension,
Name,
FileSource)
SELECT
'pdf'
'BOL12006553.pdf'
* FROM
OPENROWSET(BULK
'd:\SQL\PDFBlob\BOL12006553.pdf',
SINGLE_BLOB)
AS BLOB;
GO
INSERT
INTO DocumentFilesTest(Extension,
Name,
FileSource)
SELECT
'docx'
'test.docx'
* FROM
OPENROWSET(BULK
'd:\SQL\PDFBlob\test.docx',
SINGLE_BLOB)
AS Document;
GO
SELECT
d.*
FROM dbo.DocumentFilesTest
d WHERE
Contains(d.FileSource,
'BILL')
Returns nothing. it should come from PDF file
SELECT
d.*
FROM dbo.DocumentFilesTest
d WHERE
Contains(d.FileSource,
'TEST')
Returns from word document as follows:
2 2014-06-04 10:11:41.393 test.docx docx
NULL [BINARY Value] [Binary Value]
Any help is appreciated. Its been a long wait.
Thanks,
Vel
Vel ThavasiHello,
Did you check the fulltext log files for more details about the errors. If the filter isn’t working, there should be errors in the error log file.
The following thread is about similar issue, please refer to:
http://social.msdn.microsoft.com/forums/sqlserver/en-US/69535dbc-c7ef-402d-a347-d3d3e4860d72/sql-server-2008-64bit-fulltext-indexing-pdf-not-working-cant-find-ifilter
Regards,
Fanny Liu
If you have any feedback on our support, please click here.
Fanny Liu
TechNet Community Support -
Make a new CC file. Save in CC as pdf. Open same pdf file in CC 2014, make a change to file. Save file. Open same file in CC again. Now a dialogbox is displayed. This file is made in a newer version of illustrator!. This new behavior is totally stopping our entire production! What to do? NEED HELP ASAP
Cheers
Jesper GHow can i downsave pdf file in CC 2014?
This is very unfortune, because we use some VB script together with illustrator. That process is stopping now because of this message!!!
Dont know how i can solve this issue! -
Working with pdf files in swing applications
Hi,
I have a swing application which displays a pdf file and contains a text box. i want to display the current page number of the pdf file in the text box.
Can any one please guide me how to implement the above functionality.
Regards,
TommyHow can i downsave pdf file in CC 2014?
This is very unfortune, because we use some VB script together with illustrator. That process is stopping now because of this message!!!
Dont know how i can solve this issue! -
File, Place only works with PDF files...why?
I create documents in Mac Pages that I want to then create an interactive PDF (mainly navigation). I am using the demo copy of Indesign to see if it fits the bill.
The mac pages doocument is a fully formated and ready for export to a static PDF. As a test, I took a few pages of it and exported it to pdf, word and rtf.
The only file format that InDesign would import/place is PDF (pages, word and rtf were all grayed out and could not be selected via ID place).
I had hoped that ID would inport/place pages directly, but I cannot seem to get it to import any format other than PDF.
I tried some other .doc files (actaully created with WORD) and they were selectable but only imported the table of contents (no red arrow an lower right of text box to continue place).
Any suggestions?
thanks
bobID is certainly capable of placing RTF as well as native Word files (DOC and DOCX). What you're seeing is quite unusual.
Try trashing your preferences.
For the other DOC files, you need to hold down the shift key when click the page to place them.
Bob -
Hello Experts,
I am having trouble with PDF files corrupting the folder in
which they are contained. If this sounds familiar, and you might be
able to help, please see if the following steps give you an idea of
the problem and the fix.
I want to link to PDF files from a web page. The copy,
consisting of several files, was provided by my client as Word
documents, and converted to PDF by me from within Microsoft Word.
I copied the PDF files into their own folder in my site, and
found later, after leaving, then reopening Dreamweaver, that the
folder was corrupted and unreadable. I got a message to run chkdsk.
I tried working around the folder by repeating the process
above in a new file, with the intention of ignoring the corrupted
one. The second folder also became corrupted.
I don't know if chkdsk is a Dreamweaver utility, or a Windows
utility, or if this is a Windows, Acrobat or Dreamweaver problem.
Does anyone know how to run chkdsk?
Does this sound familiar to anyone? Any ideas?
Thanks so much in advance for the urgently needed support.
Richardchkdsk used to be a DOS utility, now Windows. Depending on
the OS you have
and whether it's Mac or PC, you can run it several ways.
If XP > Start > Help and Support, type chkdsk in the
search
Other OSs, search for chkdsk.
Jo
"RTalbott" <[email protected]> wrote in
message
news:e91rg5$hkj$[email protected]..
> Hello Experts,
>
> I am having trouble with PDF files corrupting the folder
in which they are
> contained. If this sounds familiar, and you might be
able to help, please
> see
> if the following steps give you an idea of the problem
and the fix.
>
>
> I want to link to PDF files from a web page. The copy,
consisting of
> several
> files, was provided by my client as Word documents, and
converted to PDF
> by me
> from within Microsoft Word.
>
> I copied the PDF files into their own folder in my site,
and found later,
> after leaving, then reopening Dreamweaver, that the
folder was corrupted
> and
> unreadable. I got a message to run chkdsk.
>
> I tried working around the folder by repeating the
process above in a new
> file, with the intention of ignoring the corrupted one.
The second folder
> also
> became corrupted.
>
> I don't know if chkdsk is a Dreamweaver utility, or a
Windows utility, or
> if
> this is a Windows, Acrobat or Dreamweaver problem. Does
anyone know how
> to run
> chkdsk?
>
> Does this sound familiar to anyone? Any ideas?
>
> Thanks so much in advance for the urgently needed
support.
>
> Richard
> -
Very slow responce when working with Office file on DFS-Share
Very slow responce when working with Office file on DFS-Share
We have implemented the following configuration
Domain level Windows 2000. Two member servers with Windows Server 2008 R2, sharing the same DFS namespace with, at the moment, one folder target called Home.
Users complaining that the access to different MS Office files is very slow. Even creating a new MS Word document using right click context menu takes up to 4 minutes to open. Saving, for example, one singe Excel sheet takes also few minutes.
Tested with both, MS Office 2007 and MS Office 2010. Makes no difference. When using Office 2010 you can see the message like contacting:
\\DomainName\Root\Home\UserName. Other files like TXT, JPG or PDF are not affected.
What makes the thing really weird is the fact, that the behavior described above can absolutely change after client machine being rebooted, suddenly everything becomes very fast and this condition can revert back again just after the next
reboot.
Considerations until now:
1. This has nothing to do with the file size. Even tiny files are affected.
2. AD Sites are configured correctly and the client workstations see themselves in the correct sites.
3. This is not an Office issue. If I map my folder target not as DFS, but directly as shared network drive
\\ServerName\Root\Home\UserName , everything functions as expected
What makes me suspicious: when using f.e. TCPView to monitor connections, I can see, that each time I make any operation on an office file, there will be a connection established to one of the domain controllers, sometimes to remote ones,
located in other countries. But on the other side, even if the connection is established to the nearest DC, operations are still very very slow!
Just forget to say. All clients are Windows 7
Thanks to all who respond.Dear all,
sorry for the delayed reply. The problem has been solved now and since September 19<sup>th</sup>. everything is functioning as expected.
What was done:
Deleted replication targets excepting the initial ones
Carefully recreated folder targets
Deleted and recreated replication groups
Disabled SNP features on both namespace servers
Created EnableTCPA registry entry
Checked that the following Updates are installed
http://support.microsoft.com/kb/2688074
http://support.microsoft.com/kb/2647452
Concering Office File validation KB2553065 - This Update was already declined on our WSUS server
Kind Regards
Eduard -
Send PO Mail with PDF File that Chinese character doestn't display
Send PO Mail with PDF File that Chinese character doestn't display.
I am using RSTXPDFT4, unicode ECC6.0
Some computer Adobe Reader can read the file, but some computer cannot read, just a blank page.
Thanks.Hi
I worked for one client-chinese where we have to print chinese & english ( bilingual).You need to have dricer program which could identify both scripts .You are right ( unicode0
Please check for the driver program : TWPDF : PDF converter Chinese in SPAD setting.
SAP note is available.I will check and let you update .
Edited by: sunny on Oct 28, 2009 10:29 AM -
Has anyone not working with .dv files had synchronization problems?
Has anyone not working with .dv files had sound synchronization problems? I'm not exactly sure what the alternatives to DV are, but I think one of them is HD.
The reason for asking this question is to help isolate the nature and cause of a very serious flaw in iMovie '11. In the original release of iMovie '11 (version 9.0) there was a small--but serious--synchronization problem. In the 9.01 there is a large synchronization problem. We know of one person who has not experienced the problem, and he is not working with DV files (media). So we want to find out if anyone who is using something other than .dv files is experiencing a lack of synchronization between sound and picture. Knowing the answer to this will help with figuring out where the cause lies. For the initial iMovie '11 release (9.0), you probably would not notice a problem unless you had very long event-clips, e.g., two hours long. Events get this long if you are transferring from analog 8 mm tapes. Even then, it would have to be in scenes in which the connection between event and sound is obvious, e.g., close ups of people talking. It isn't until the 9.01 release that most people would notice anything. All we need to do is establish one case of a synchronization problem in which the person is using something other than DV.
Message was edited by: Paul BullenHopefully, the 9.0.2 release will make my question moot. Zyfert must have posted the announcement of the release just as I was formulating my question. Still, if you have information on the subject, it would be interesting to hear.
-
Working with RAW files in PSE9: Doable or better to use PSE12?
I have PSE 9 and will soon get a camera with RAW capability. I notice that Adobe no longer supports PSE 9. Am I in for trouble if I don't upgrade to PSE 12 since I don't know how to work with RAW files? It sounds as if I will need something to convert RAW files? Where do I get it? Is it a big deal for a beginner with no software skills?
You can download a free DNG Converter from Adobe and then convert your RAWs to DNGs which can be used in PSE9.
You don't need PSE12, but if you want to purchase PSE 12, you get the most current features in the Raw processor (which in my opinion is a huge improvement over what is available in PSE9), and you don't need the extra step of converting to DNG. -
In the product/app Adobe Acrobat reader mobile.it would be good to have layer support working for pdf files.
http://winsupersite.com/article/windows8/windows-8-tip-change-file-associations-144102
-
I'm using iphone 4S and ipad mini, and I can not open PDF file only from my husband email that using Mic outlook. It was very weird because I can received other email with pdf file from other people. Can someone help...
Thanks in advanceHi Eidda,
This may because the attachment is a winmail.dat file. I would recommend taking a look at the article below for more information. Note: the article is written for OS X mail, but does also apply to this situation.
Mac OS X Mail: What is a winmail.dat attachment?
http://support.apple.com/kb/HT2614
-Griff W. -
Working with RAW files in iPhoto 5.0.4 and Elements 4.0.1
I take photos in RAW mode and download them to iPhoto. When I try to edit the photo in iPhoto, the picture is a tiny little file that is impossible to enlarge with any sort of clarity. Also, the word "RAW" does not appear anywhere on the iPhoto window like I read it is supposed to.
When I drag the file to Photoshop Elements, I get an editing window that has none of the tools usually associated with JPEG files. I get a separate window in which I can darken or lighten the image, that's it.
Clearly, I'm doing something wrong. No one in their right mind would ever use RAW if this is how it works.
Any ideas?Hi Jack!
If you're new working with RAW files, your right, it just doesn't make sense. RAW <imho> is a bit overrated. One thing you will need to keep in mind when shooting in RAW, is you will still need to take a well exposed image. What RAW files will allow are CHANGES in all areas of the image v. JPEG which may allow you to ADJUST a few settings in the image. My only suggestion would be to kepp playing around with PSE until you get the hang of it, it is an excellent image editing software. But realize, a well exposed JPEG and RAW file are hard to tell apart...
Personally, I do not directly download RAW files through iPhoto but will create a folder and download to here, and simply drage folder to iPhoto to import (which are then 'converted' into JPEG files). This way I have the original RAW images safely located outside of iPhoto as well as in iPhoto. You should set Elements as your choice of application to edit files inside iPhoto.
Good luck, Rick
Good link: http://www.elementsvillage.com/forums/ and just for fun: http://www.photoshopcosmetics.com/index.php -
Working with Large files in Photoshop 10
I am taking pictures with a 4X5 large format film camera and scanning them at 3,000 DPI, which is creating extremely large files. My goal is to take them into Photoshop Elements 10 to cleanup, edit, merge photos together and so on. The cleanup tools don't seem to work that well on large files. My end result is to be able to send these pictures out to be printed at large sizes up to 40X60. How can I work in this environment and get the best print results?
You will need to work with 8bit files to get the benefit of all the editing tools in Elements.
I would suggest resizing at resolution of 300ppi although you can use much lower resolutions for really large prints that will be viewed from a distance e.g. hung on a gallery wall.
That should give you an image size of 12,000 x 18,000 pixels if the original aspect ratio is 2:3
Use the top menu:
Image >> Resize >> Image Size
Maybe you are looking for
-
Implicit & Explicit Null Labels
Hi.. I am a bit confused in Implicit and Explicit null labels. The RFC Says as follows : IMPLICIT NULL LABEL:This label value is only legal at the bottom of the label stack. It indicates that the label stack must be popped, and the forwarding of the
-
XDO Help...Error while generating PDF output
Hi All, I am currently facing an issue generating the PDF through the concurrent program. We have developed the RTF template using Oracle XML Publisher Builder Template for Word Version 10.1.3.2.0 Build 87. When we run ,the program errors out some ti
-
Hi experts, I have a confusion on understanding Business Roles in interation center. scenario: call center set up is there with 100 CSR's, where everyones role is same. here my confusion is Do we have to create different business role for each CSR or
-
EVHOT with links to source template Issue
Hello Experts, I have created 2 input templates. The first input template has an EVHOT function to the second template. The EVHOT function is working fine. I am having an issue because the second template has links to cells on the first template.
-
Hello, Given this code... quote: for (var i:uint = 0; i < 10; i++) Alert.show(i.toString()); The alert box counts down from 9 -- shouldn't it start counting up from 0?? Why does it appear that loops are going in reverse? Thanks, Tom