Convert HTML to RTF
Hi,
Is there a way to build a utility program which accepts HTML file and converts it to RTF Format. I do not wish to install or purchase any 3rd party software. Please suggest.
Regards,
Murali
Write or find an HTML parser; define the required mapping from input to output; and implement it. There is no RTF library built into Java so you will have to find or write that too.
Similar Messages
-
Textutil html to rtf vs. TextEdit
I have noticed what looks like a bug in either TextEdit of textutil, but not sure where it is. When I convert html to RTF using the textutil command, the resulting file opens in TextEdit with black text on a black background. It looks a lot like the Safari email creation bug they just fixed with 5.0.1 (yes, I installed it).
Opening the resulting file in TexEdit Plus, Pages, OmniOutliner, Word, OpenOffice all display the file without the black background. I will be reporting it to AppleCare tomorrow.
What I'm not sure about and thought I'd ask here is: do you think the RTF code output from textutil is wrong, or TextEdit is displaying the file incorrectly. Since all the other programs display it correctly, I am tempted to implicate TextEdit, but maybe they just ignore the "background" instruction that seems to be placed by textutil (see below).
If i create a document in TextEdit and save to disk, I get the following code:
{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf320
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww9000\viewh8400\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx792 0\tx8640\ql\qnatural\pardirnatural
\f0\fs24 \cf0 Hello World}
If I create a barebones html document (see script below) and convert html to RTF using textutil, I get:
{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf320
{\fonttbl\f0\froman\fcharset0 Times-Roman;}
{\colortbl;\red255\green255\blue255;}
\deftab720
{\*\background {\shp{\*\shpinst\shpleft0\shptop0\shpright0\shpbottom0\shpfhdr0\shpbxmargin\shp bymargin\shpwr0\shpwrk0\shpfblwtxt1\shpz0\shplid1025{\sp{\sn shapeType}{\sv 1}}{\sp{\sn fFlipH}{\sv 0}}{\sp{\sn fFlipV}{\sv 0}}{\sp{\sn fillColor}{\sv 0}}{\sp{\sn fFilled}{\sv 1}}{\sp{\sn lineWidth}{\sv 0}}{\sp{\sn fLine}{\sv 0}}{\sp{\sn bWMode}{\sv 9}}{\sp{\sn fBackground}{\sv 1}}}}}
\pard\pardeftab720\ql\qnatural
\f0\fs24 \cf0 Hello World}
What seems to be the big difference is the "background" code -- If I remove it, all is well. It apparently specifies the background for the document, and in this case tells it to be a rectangular shape with various parameters, but if I try changing the shape or fill color it doesn't seem to make a difference.
So I guess the question is why textutil is putting that code in there, and why it screws up TextEdit's display...
(the spec for the parameters is here: http://www.biblioscape.com/rtf15_spec.htm )
Here's an AppleScript that re-creates the problem:
set oFile to "/Users/username/Desktop/oFile.html"
set strHTML to "<head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"/></head><body>Hello World</body>"
try
set fDesc to open for access oFile with write permission
write strHTML to fDesc as «class utf8»
close access fDesc
on error
try
close access fDesc
return strText
end try
end try
set strCommand to "textutil -convert rtf " & (quoted form of (POSIX path of oFile))
set strResult to (do shell script strCommand)If you want to report this issue to Apple's engineering, send a bug report or an enhancement request via its Bug Reporter system. To do this, join the Mac Developer Program—it's free and available for all Mac users and gets you a look at some development software. Since you already have an Apple username/ID, use that. Once a member, go to Apple BugReporter and file your bug report or enhancement request. The nice thing with this procedure is that you get a response and a follow-up number; thus, starting a dialog with engineering
-
Convert doc to rtf or doc to html
Is there any approach to convert the doc files to html or rtf format? The appache poi just provides the read facilities not the converting facilities.
It's not the problem to convert from rtf to html using XSL transformation.
But what about doc to rtf. Probably there already written solutions using poi or smth. else?Two projects that spring to mind are Apache POI and Apache FOP.
POI:
http://jakarta.apache.org/poi/index.html
FOP:
http://xmlgraphics.apache.org/fop/
Either way, you are in for some tough development if you want to do this using Java, and you might want to consider switch to a more suitable platform such as .NET. Word documents are highly microsoft specific so you will want to use a microsoft platform to work with them for the least amount of headaches and risks. -
Hi,
In a Java Servlet, I need to convert HTML codes into an RTF/word document.
Any help for some related Java API ?
Regards,
Priya Ranjan Sahay
Message was edited by:
Priya Ranjan SahayCheckout iText:
http://www.lowagie.com/iText/
Example code:
http://www.java-tips.org/other-api-tips/itext/manipulating-pdf,-rtf,-or-html-documents-with-java.html -
Converting HTML into a Word document
Hi all,
I have a JSP whose content type is set so that the HTML it produces is opened up in Word. Now this works fine until images come into the equation, as these images must lie somewhere in order to be referenced from the HTML code. As this document must be 'stanalone', booting up the HTML in Word and simply changing the file extension is no good as it is still HTML under the hood.
What I therefore would like to do it generate a Word document from this HTML that is independent in the fact that it 'holds' these images within itself and does not rely on external resources. Does anyone know how I can achieve this?
I have looked into Jakarta POI and have written this off as an option because 1) it is still in development and 2) there is no documentation or examples of how to use what is already there. I am assuming someone has come across this problem before and knows of a solution out there that I could use.
Many thanks in advance!HI,
Thanks all for your replies! Unfortuantely it can't be PDF as the creator will need to edit it before the document is complete. I have actually looked into generating an RTF document instead, but the example I tried seemed to loose the image data. Unfortuantely I know nothing about RTF and so kind of gave up on it :(
Here is the code I used:
import java.io.ByteArrayOutputStream;
import java.io.StringReader;
import java.io.IOException;
import javax.swing.text.BadLocationException;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.StyledEditorKit;
import javax.swing.text.rtf.RTFEditorKit;
import javax.swing.text.html.HTMLEditorKit;
public class FormatConverter {
private HTMLDocument tempHTMLDoc;
private HTMLEditorKit htmlKit;
private RTFEditorKit rtfKit;
public FormatConverter() {
tempHTMLDoc = new HTMLDocument();
htmlKit = new HTMLEditorKit();
rtfKit = new RTFEditorKit();
private String fudge(String strText) {
String strResult = "";
StringReader reader = new StringReader(strText);
try {
tempHTMLDoc.remove(0,tempHTMLDoc.getLength());
htmlKit.read(reader,tempHTMLDoc,0);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
rtfKit.write(byteArrayOutputStream,tempHTMLDoc,0,tempHTMLDoc.getLength());
strResult = byteArrayOutputStream.toString();
catch(IOException ie){}
catch(BadLocationException ble){}
return strResult;
public static void main(String args[]) {
FormatConverter conv = new FormatConverter();
String strRTF = conv.fudge("<P><IMG src=\"http://intratestgbr/announcements/images/1093429553065.jpg\"></P><P> </P><P>50 <STRONG>pounds</STRONG>, <FONT color=#0000ff>wow</FONT></P>");
System.out.println("RTF: '"+strRTF+"'");
strRTF = conv.fudge("<html><head><p class=default><span style=\"color: #000000\">Description </span><span style=\"color: #000000\"><b>with</b> </span><span style=\"color: #000000\"><i>some</i> </span><span style=\"color: #000000\"><u>formatting</u> </span><span style=\"color: #000000\"></span></p></head></html>");
System.out.println("RTF: '"+strRTF+"'");
System.exit(0);
}The output I got from this was:
\rtf1\ansi
\fonttbl\f0\fnil Monospaced;
\colortbl\red0\green0\blue0;\red0\green0\blue255;
\par
\~50 pounds, \cf1 wow\par
}Like I said, when I open the RTF output in Word, everything is fine apart from the missing image. If one of you very nice people could point me in the right direction of a way to convert it to RTF instead while still maintaining the images this would certainly be a very acceptable solution and I would be very grateful :)
Many thanks again! -
- runs on Linux, 2.4.24 Kernel.
- We would like to be able convert the HTML report into a PDF file.
- Ideally we would like to use open source code for the PDF generation
We would like to be able to include both Text and Bitmaps in the PDF output
Thanks!
Message was edited by:
dragontail77HTML to PDF with Java, using OpenOffice.org - example here: [http://www.dancrintea.ro/html-to-pdf/|http://www.dancrintea.ro/html-to-pdf/]
You can use OpenOffice.org, running as a server and command it remotely for document convertion.
Besides HTML to PDF, there are also possible other convertions:
doc --> pdf, html, txt, rtf
xls --> pdf, html, csv
ppt --> pdf, swf
Code example:
import officetools.OfficeFile; // this is my tools package
FileInputStream fis = new FileInputStream(new File("c:/test.html"));
FileOutputStream fos = new FileOutputStream(new File("c:/test.pdf"));
// suppose OpenOffice.org runs on localhost, port 8100
OfficeFile f = new OfficeFile(fis,"localhost","8100", true);
f.convert(fos,"pdf");
----------------------------------------------------------------------------------------------------------------------------------------- -
I've got a number of HTML formatted datafields that users have entered using the APEX text editors (FCK).
I would like to integrate those fields with a report but the report if not interpreting the HTML very well. As such I'd actually like to convert the HTML to RTF.
I see a number of commercial DLL's exist for doing this within end applications.
I'd prefer to actually do this conversion at the database side - has anyone done anything similar? Or have any suggestions for an approach?
Thanks,
ScottI'm working with Crystal Reports - the report must be customer quality. And the issue surrounds the fact that Crystal only supports limited HTML tags (see http://technicalsupport.businessobjects.com/KanisaSupportSite/search.do?cmd=displayKC&docType=kc&externalId=c2014842&sliceId=&dialogID=9876280&stateId=1%200%209874388)
But it's RTF support is a lot better, hence the desire to convert the HTML to RTF. -
How to convert html to pdf using acrobat sdk 8.0?
hi
I am a beginner of acrobat sdk .
I want to know How to use acrobat sdk 8.0 to convert html to pdf?
herere some questions :
1:How to support navigation inside PDF file that generated using acrobat sdk 8.0? For example: theres catalog in the top of HTML file, customer hopes can navigate inside the PDF file just like navigating inside the HTML file.
2:How to support operating some controls in the PDF file that generated using acrobat sdk 8.0? For example: therere some drop down list and text box in HTML file, customer hopes can input text in the text box, click the drop down list to see available options in it just like in HTML file.
Thanks in advance for any help and suggestion.Hello,
I want a system to re-brand my 37 pages PDF for affiliates.
I want a php dynamic link in the PDF online in order to personalize automatically the PDF for each affiliate. I need to change 2 links each time. The affiliate ID and the Paypal email (payment button) in page 36.
Can you help?
Please let me know
Thank you
Alex
PS My system is online and i can give you the url if it helps. -
A tool can convert HTML to Excel
Hi All , Are you using report 6i and want to out put report in excel format? If you are , a free software which can convert HTML to Excel is available .
The software is designed to print very large report , Now a wonderful function is added to software , Thru which you can convert HTML to Excel easily . But the function is still basal , It will do better in the future .
For more information, Please visit
http://repbrowser.freewebpage.org/
Thank you ,
RegardsHi,
the only other ways (as I know), if you really want to convert is
a) write a parser to convert html into csv(xls)
b) use a html2csv script on the os level
like:
http://sebsauvage.net/python/html2csv.py (or just google html2csv)
c) use excel (data source web; local file: "file:///C:/test.htm"
Kind Regards,
Dirk -
Is it possible to convert *.doc to *.rtf in a java program?
Hi :-)
My challenge is to develop a web-app in ADF Faces. Now i verify some technologies to store mailmerge letters in an easy way. The user of my web-app should upload a MS Word mailmerge document and a csv data source file. My web-app must thereupon convert this two files to a pdf per csv-row and store it to a ftp.
I have build a demo using the open office API. But now i want to try the same by using apache POI and FOP. I can merge the doc files with POI and i can create PDF with FOP.
My problem is, that POI cant convert to a rtf file and FOP uses an rtf file to create a pdf. I dont know, if its possible to convert a doc file to a rtf file. If its possible, is there an API, which will help me out?
Regards
Majo
btw...I am not sure, if its the right forum for my question :-/HeHe, no sorry. The binary file is the same, because Windows bind doc and rtf with MS Word, it opens the file, which you have renamed to *.rtf in MS Word. But as a doc document, not as a rtf file ;-)
And i dont want to open the rtf file in MS Word. I want to process the rtf file in java.
Thanks
Majo -
Problem with converting html to pdf using LiveCycle ES Java API
I am using this code to convert html to pdf.
* 1. adobe-generatepdf-client.jar
* 2. adobe-livecycle-client.jar
* 3. adobe-usermanager-client.jar
* 4. adobe-utilities.jar
* 5. wlclient.jar
import java.io.File;
import java.util.Properties;
import com.adobe.idp.Document;
import com.adobe.idp.dsc.clientsdk.ServiceClientFactory;
import com.adobe.idp.dsc.clientsdk.ServiceClientFactoryProperties;
import com.adobe.livecycle.generatepdf.client.GeneratePdfServiceClient;
import com.adobe.livecycle.generatepdf.client.HtmlToPdfResult;
public class ConvertHTML {
public static void main(String[] args)
try{
//Set connection properties required to invoke LiveCycle ES
Properties connectionProps = new Properties();
connectionProps.setProperty(ServiceClientFactoryProperties.DSC_DEFAULT_EJB_ENDPOINT, "t3://localhost:7001");
connectionProps.setProperty(ServiceClientFactoryProperties.DSC_TRANSPORT_PROTOCOL,Service ClientFactoryProperties.DSC_EJB_PROTOCOL);
connectionProps.setProperty(ServiceClientFactoryProperties.DSC_SERVER_TYPE, "WebLogic");
connectionProps.setProperty(ServiceClientFactoryProperties.DSC_CREDENTIAL_USERNAME, "administrator");
connectionProps.setProperty(ServiceClientFactoryProperties.DSC_CREDENTIAL_PASSWORD, "password");
//Create a ServiceClientFactory instance
ServiceClientFactory factory = ServiceClientFactory.createInstance(connectionProps);
//Create a GeneratePdfServiceClient object
GeneratePdfServiceClient pdfGenClient = new GeneratePdfServiceClient(factory);
//Get an HTML document to convert to a PDF document a
String inputFileName = "http://www.adobe.com";
//String inputFileName = "C:\\Documents and Settings\\venkat\\Desktop\\Adobe.htm";
String securitySettings = "No Security";
String fileTypeSettings = "Standard";
System.out.println("one");
//Convert HTML content to a PDF document
HtmlToPdfResult result = pdfGenClient.htmlToPDF2(inputFileName, fileTypeSettings, securitySettings, null, null);
System.out.println("two");
//Get the newly created document
Document createdDocument = result.getCreatedDocument();
//Save the PDF document as a PDF file
createdDocument.copyToFile(new File("C:\\test.pdf"));
catch (Exception e) {
System.out.println("Error OCCURRED: " + e.getMessage());
e.printStackTrace();
I can able to compile this class but while running i am getting error like below.
Error OCCURRED: Internal error.
ALC-DSC-000-000: com.adobe.idp.dsc.DSCRuntimeException: Internal error.
at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.doSend(EjbMessageDispatcher.java
:160)
at com.adobe.idp.dsc.provider.impl.base.AbstractMessageDispatcher.send(AbstractMessageDispat
cher.java:57)
at com.adobe.idp.dsc.clientsdk.ServiceClient.invoke(ServiceClient.java:208)
at com.adobe.livecycle.generatepdf.client.GeneratePdfServiceClient.htmlToPDF2(GeneratePdfSer
viceClient.java:666)
at ConvertHTML.main(ConvertHTML.java:84)
Caused by: java.rmi.RemoteException: Remote EJBObject lookup failed for 'ejb/Invocation'; nested exc
eption is:
org.omg.CORBA.COMM_FAILURE: vmcid: SUN minor code: 203 completed: No
at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.initialise(EjbMessageDispatcher.
java:101)
at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.doSend(EjbMessageDispatcher.java
:130)
... 4 more
Caused by: org.omg.CORBA.COMM_FAILURE: vmcid: SUN minor code: 203 completed: No
at com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(Unknown Source)
at com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(Unknown Source)
at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.writeLock(Unknown Source)
at com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendFragment(Unknown Source)
at com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendMessage(Unknown Source)
at com.sun.corba.se.impl.encoding.CDROutputObject.finishSendingMessage(Unknown Source)
at com.sun.corba.se.impl.protocol.CorbaMessageMediatorImpl.finishSendingRequest(Unknown Sour
ce)
at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete1(Unkno
wn Source)
at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete(Unknow
n Source)
at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.invoke(Unknown Source)
at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.is_a(Unknown Source)
at org.omg.CORBA.portable.ObjectImpl._is_a(Unknown Source)
at weblogic.corba.j2ee.naming.Utils.narrowContext(Utils.java:126)
at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFact
oryImpl.java:94)
at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFact
oryImpl.java:31)
at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:41)
at javax.naming.spi.NamingManager.getInitialContext(Unknown Source)
at javax.naming.InitialContext.getDefaultInitCtx(Unknown Source)
at javax.naming.InitialContext.init(Unknown Source)
at javax.naming.InitialContext.<init>(Unknown Source)
at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.initJndiContext(EjbMessageDispat
cher.java:213)
at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.getJndiContext(EjbMessageDispatc
her.java:226)
at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.initialise(EjbMessageDispatcher.
java:87)
... 5 more
can u plz give me some way to do the convertion.Yes Sir.....Thanks for ur suggestion.....
But i didn't find exact solution..well..yes i found some but not exactly there were not in the way i required...I jus need to convert HTML to PDF using iText API for java.....I already used some classes in that like HTMLParser.....etc..
So Any thing else...Any one...Sure can help me in this................ -
convert html to word document ,
I tried poi-3.0.2-FINAL,Apache POI - HWPF - Java API to Handle Microsoft Word Files
it is not working...My actual goal is convert html file into word document,
i posted into forum, some people are suggested HWPF just look,
I tried one by one program i not getting any answer for example one program,
HWPFDocument doc = new HWPFDocument (new FileInputStream ("c:\\temp.doc"));
Range r = doc.getRange();
System.out.println("Example you supplied:");
System.out.println("---------------------");
for (int x = 0; x < r.numSections(); x++)
Section s = r.getSection(x);
for (int y = 0; y < s.numParagraphs(); y++)
Paragraph p = s.getParagraph(y);
for (int z = 0; z < p.numCharacterRuns(); z++)
//character run
CharacterRun run = p.getCharacterRun(z);
//character run text
String text = run.text();
// show us the text
System.out.print(text);
// use a new line at the paragraph break
System.out.println();
}catch(NullPointerException exception){
exception.printStackTrace();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
java.io.IOException: Invalid header signature; read 5789751444030890300, expected -2226271756974174256 -
How do I save a file as an unformatted txt file instead of html or rtf?
How do I save a file as an unformatted txt file instead of html or rtf?
Use menu Image>Image Size in the image size dialog uncheck Resample and enter 300 in the resolution field and click OK. Note no Pixels are changed only the resolution setting get changed. The use Menu Fils>Save As in the save as dialog use the file type pull down and select Tiff then click Save
In the Tiff Option Dialog in the Image Compression section set None The click OK. -
How to convert html file to master file in sharepoint branding
How to convert HTML file to master file in SharePoint branding Programmatically
Hi,
According to your post, my understanding is that you want to convert HTML file to master file.
You can use Design Manager to achieve it.
On STEP 4 Edit Master Pages and clicked on the option at the top to
Convert an HTML file to a SharePoint master page.
Once completed, make sure the Status is set to Conversion Successful
For more information, please refer to:
SharePoint 2013 – Design Manager – Convert HTML to Master Page
Best Regards,
Linda Li
Linda Li
TechNet Community Support -
Problem reading html and rtf emails
When I send emails from my pc to my iPhone 5 in html or rtf format they are unreadable as all of the coding instructions are also included in the text when it appears on screen. This was never a problem with my iPhone 4 so I am not sure what has changed. I have a business contact who has had similar problems in the recent past with my emails so I know it is not just me.
I have tried sending html emails from other pc's in the office to my phone and they are all readable so perhaps it is something in the set up of my pc that is causing this issue. As my office is changing over to iPhone 5's does anybody have any solution to what will become a very annoyinmg problem.
Obviously I could send all of my emails in plain text but that doesn't really work for what I need to send, logo's / graphics etc.Hi
The best way to organize data and images to get them next to each other is to use a table (no borders) in your RTF template. Create a two celled table and drop the image into one and the text next to it.
Regards
Tim
Maybe you are looking for
-
Dear Gurus , In normal change management situations (while having an open production order), when we make a change to main item BOM or internal phantom, we use a change number with release key which we reflect to production order using COCM. In WBS B
-
Hi Sap Experts, Right now i am working in a project where mostly material is import.Can any body tell me,How import process is differnt from local purchase.what access sequence,condition types,control data.and taxes are define in customization menu.H
-
Im facing a big project where i need to prosess a big number of images, but i will get payed for only the images which actually needs prosessing(There are about 50 prosent of the total amount that needs work). Do photoshop have a counting system scri
-
IPhoto deletes pictures when shared
Not that Apple will do anything about this problem that has been going on since iPhoto got the major update, but in shared mode, we imported loads of photos of our beautiful 20 month old son and newborn daughter. And now they are permanently gone. Va
-
Error mess when I try to export to PDF
The file's security settings do not allow export.