Convert HTML to RTF

Hi,
Is there a way to build a utility program which accepts HTML file and converts it to RTF Format. I do not wish to install or purchase any 3rd party software. Please suggest.
Regards,
Murali

Write or find an HTML parser; define the required mapping from input to output; and implement it. There is no RTF library built into Java so you will have to find or write that too.

Similar Messages

  • Textutil html to rtf vs. TextEdit

    I have noticed what looks like a bug in either TextEdit of textutil, but not sure where it is. When I convert html to RTF using the textutil command, the resulting file opens in TextEdit with black text on a black background. It looks a lot like the Safari email creation bug they just fixed with 5.0.1 (yes, I installed it).
    Opening the resulting file in TexEdit Plus, Pages, OmniOutliner, Word, OpenOffice all display the file without the black background. I will be reporting it to AppleCare tomorrow.
    What I'm not sure about and thought I'd ask here is: do you think the RTF code output from textutil is wrong, or TextEdit is displaying the file incorrectly. Since all the other programs display it correctly, I am tempted to implicate TextEdit, but maybe they just ignore the "background" instruction that seems to be placed by textutil (see below).
    If i create a document in TextEdit and save to disk, I get the following code:
    {\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf320
    {\fonttbl\f0\fswiss\fcharset0 Helvetica;}
    {\colortbl;\red255\green255\blue255;}
    \margl1440\margr1440\vieww9000\viewh8400\viewkind0
    \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx792 0\tx8640\ql\qnatural\pardirnatural
    \f0\fs24 \cf0 Hello World}
    If I create a barebones html document (see script below) and convert html to RTF using textutil, I get:
    {\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf320
    {\fonttbl\f0\froman\fcharset0 Times-Roman;}
    {\colortbl;\red255\green255\blue255;}
    \deftab720
    {\*\background {\shp{\*\shpinst\shpleft0\shptop0\shpright0\shpbottom0\shpfhdr0\shpbxmargin\shp bymargin\shpwr0\shpwrk0\shpfblwtxt1\shpz0\shplid1025{\sp{\sn shapeType}{\sv 1}}{\sp{\sn fFlipH}{\sv 0}}{\sp{\sn fFlipV}{\sv 0}}{\sp{\sn fillColor}{\sv 0}}{\sp{\sn fFilled}{\sv 1}}{\sp{\sn lineWidth}{\sv 0}}{\sp{\sn fLine}{\sv 0}}{\sp{\sn bWMode}{\sv 9}}{\sp{\sn fBackground}{\sv 1}}}}}
    \pard\pardeftab720\ql\qnatural
    \f0\fs24 \cf0 Hello World}
    What seems to be the big difference is the "background" code -- If I remove it, all is well. It apparently specifies the background for the document, and in this case tells it to be a rectangular shape with various parameters, but if I try changing the shape or fill color it doesn't seem to make a difference.
    So I guess the question is why textutil is putting that code in there, and why it screws up TextEdit's display...
    (the spec for the parameters is here: http://www.biblioscape.com/rtf15_spec.htm )
    Here's an AppleScript that re-creates the problem:
    set oFile to "/Users/username/Desktop/oFile.html"
    set strHTML to "<head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"/></head><body>Hello World</body>"
    try
    set fDesc to open for access oFile with write permission
    write strHTML to fDesc as «class utf8»
    close access fDesc
    on error
    try
    close access fDesc
    return strText
    end try
    end try
    set strCommand to "textutil -convert rtf " & (quoted form of (POSIX path of oFile))
    set strResult to (do shell script strCommand)

    If you want to report this issue to Apple's engineering, send a bug report or an enhancement request via its Bug Reporter system. To do this, join the Mac Developer Program—it's free and available for all Mac users and gets you a look at some development software. Since you already have an Apple username/ID, use that. Once a member, go to Apple BugReporter and file your bug report or enhancement request. The nice thing with this procedure is that you get a response and a follow-up number; thus, starting a dialog with engineering

  • Convert doc to rtf or doc to html

    Is there any approach to convert the doc files to html or rtf format? The appache poi just provides the read facilities not the converting facilities.
    It's not the problem to convert from rtf to html using XSL transformation.
    But what about doc to rtf. Probably there already written solutions using poi or smth. else?

    Two projects that spring to mind are Apache POI and Apache FOP.
    POI:
    http://jakarta.apache.org/poi/index.html
    FOP:
    http://xmlgraphics.apache.org/fop/
    Either way, you are in for some tough development if you want to do this using Java, and you might want to consider switch to a more suitable platform such as .NET. Word documents are highly microsoft specific so you will want to use a microsoft platform to work with them for the least amount of headaches and risks.

  • Convert HTML codes to RTF

    Hi,
    In a Java Servlet, I need to convert HTML codes into an RTF/word document.
    Any help for some related Java API ?
    Regards,
    Priya Ranjan Sahay
    Message was edited by:
    Priya Ranjan Sahay

    Checkout iText:
    http://www.lowagie.com/iText/
    Example code:
    http://www.java-tips.org/other-api-tips/itext/manipulating-pdf,-rtf,-or-html-documents-with-java.html

  • Converting HTML into a Word document

    Hi all,
    I have a JSP whose content type is set so that the HTML it produces is opened up in Word. Now this works fine until images come into the equation, as these images must lie somewhere in order to be referenced from the HTML code. As this document must be 'stanalone', booting up the HTML in Word and simply changing the file extension is no good as it is still HTML under the hood.
    What I therefore would like to do it generate a Word document from this HTML that is independent in the fact that it 'holds' these images within itself and does not rely on external resources. Does anyone know how I can achieve this?
    I have looked into Jakarta POI and have written this off as an option because 1) it is still in development and 2) there is no documentation or examples of how to use what is already there. I am assuming someone has come across this problem before and knows of a solution out there that I could use.
    Many thanks in advance!

    HI,
    Thanks all for your replies! Unfortuantely it can't be PDF as the creator will need to edit it before the document is complete. I have actually looked into generating an RTF document instead, but the example I tried seemed to loose the image data. Unfortuantely I know nothing about RTF and so kind of gave up on it :(
    Here is the code I used:
    import java.io.ByteArrayOutputStream;
    import java.io.StringReader;
    import java.io.IOException;
    import javax.swing.text.BadLocationException;
    import javax.swing.text.html.HTMLDocument;
    import javax.swing.text.StyledEditorKit;
    import javax.swing.text.rtf.RTFEditorKit;
    import javax.swing.text.html.HTMLEditorKit;
    public class FormatConverter {
         private HTMLDocument tempHTMLDoc;
         private HTMLEditorKit htmlKit;
         private RTFEditorKit rtfKit;
         public FormatConverter() {
              tempHTMLDoc = new HTMLDocument();
              htmlKit = new HTMLEditorKit();
              rtfKit = new RTFEditorKit();
         private String fudge(String strText) {
              String strResult = "";
              StringReader reader = new StringReader(strText);
              try {
                   tempHTMLDoc.remove(0,tempHTMLDoc.getLength());
                   htmlKit.read(reader,tempHTMLDoc,0);
                   ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
                   rtfKit.write(byteArrayOutputStream,tempHTMLDoc,0,tempHTMLDoc.getLength());
                   strResult = byteArrayOutputStream.toString();
              catch(IOException ie){}
              catch(BadLocationException ble){}
              return strResult;
         public static void main(String args[]) {
              FormatConverter conv = new FormatConverter();
              String strRTF = conv.fudge("<P><IMG src=\"http://intratestgbr/announcements/images/1093429553065.jpg\"></P><P> </P><P>50 <STRONG>pounds</STRONG>, <FONT color=#0000ff>wow</FONT></P>");
              System.out.println("RTF: '"+strRTF+"'");
              strRTF = conv.fudge("<html><head><p class=default><span style=\"color: #000000\">Description </span><span style=\"color: #000000\"><b>with</b> </span><span style=\"color: #000000\"><i>some</i> </span><span style=\"color: #000000\"><u>formatting</u> </span><span style=\"color: #000000\"></span></p></head></html>");
              System.out.println("RTF: '"+strRTF+"'");
              System.exit(0);
    }The output I got from this was:
        \rtf1\ansi
            \fonttbl\f0\fnil Monospaced;
            \colortbl\red0\green0\blue0;\red0\green0\blue255;
        \par
        \~50 pounds, \cf1 wow\par
    }Like I said, when I open the RTF output in Word, everything is fine apart from the missing image. If one of you very nice people could point me in the right direction of a way to convert it to RTF instead while still maintaining the images this would certainly be a very acceptable solution and I would be very grateful :)
    Many thanks again!

  • How to convert HTML to PDF

    - runs on Linux, 2.4.24 Kernel.
    - We would like to be able convert the HTML report into a PDF file.
    - Ideally we would like to use open source code for the PDF generation
    We would like to be able to include both Text and Bitmaps in the PDF output
    Thanks!
    Message was edited by:
    dragontail77

    HTML to PDF with Java, using OpenOffice.org - example here: [http://www.dancrintea.ro/html-to-pdf/|http://www.dancrintea.ro/html-to-pdf/]
    You can use OpenOffice.org, running as a server and command it remotely for document convertion.
    Besides HTML to PDF, there are also possible other convertions:
    doc --> pdf, html, txt, rtf
    xls --> pdf, html, csv
    ppt --> pdf, swf
    Code example:
    import officetools.OfficeFile; // this is my tools package
    FileInputStream fis = new FileInputStream(new File("c:/test.html"));
    FileOutputStream fos = new FileOutputStream(new File("c:/test.pdf"));
    // suppose OpenOffice.org runs on localhost, port 8100
    OfficeFile f = new OfficeFile(fis,"localhost","8100", true);
    f.convert(fos,"pdf");
    -----------------------------------------------------------------------------------------------------------------------------------------

  • HTML to RTF Convertor

    I've got a number of HTML formatted datafields that users have entered using the APEX text editors (FCK).
    I would like to integrate those fields with a report but the report if not interpreting the HTML very well. As such I'd actually like to convert the HTML to RTF.
    I see a number of commercial DLL's exist for doing this within end applications.
    I'd prefer to actually do this conversion at the database side - has anyone done anything similar? Or have any suggestions for an approach?
    Thanks,
    Scott

    I'm working with Crystal Reports - the report must be customer quality. And the issue surrounds the fact that Crystal only supports limited HTML tags (see http://technicalsupport.businessobjects.com/KanisaSupportSite/search.do?cmd=displayKC&docType=kc&externalId=c2014842&sliceId=&dialogID=9876280&stateId=1%200%209874388)
    But it's RTF support is a lot better, hence the desire to convert the HTML to RTF.

  • How to convert html to pdf using acrobat sdk 8.0?

    hi
    I am a beginner of acrobat sdk .
    I want to know How to use acrobat sdk 8.0 to convert html to pdf?
    herere some questions :
    1:How to support navigation inside PDF file that generated using acrobat sdk 8.0? For example: theres catalog in the top of HTML file, customer hopes can navigate inside the PDF file just like navigating inside the HTML file.
    2:How to support operating some controls in the PDF file that generated using acrobat sdk 8.0? For example: therere some drop down list and text box in HTML file, customer hopes can input text in the text box, click the drop down list to see available options in it just like in HTML file.
    Thanks in advance for any help and suggestion.

    Hello,
    I want a system to re-brand my 37 pages PDF for affiliates.
    I want a php dynamic link in the PDF online in order to personalize automatically the PDF for each affiliate. I need to change 2 links each time. The affiliate ID and the Paypal email (payment button) in page 36.
    Can you help?
    Please let me know
    Thank you
    Alex
    PS My system is online and i can give you the url if it helps.

  • A tool can convert HTML to Excel

    Hi All , Are you using report 6i and want to out put report in excel format? If you are , a free software which can convert HTML to Excel is available .
    The software is designed to print very large report , Now a wonderful function is added to software , Thru which you can convert HTML to Excel easily . But the function is still basal , It will do better in the future .
    For more information, Please visit
    http://repbrowser.freewebpage.org/
    Thank you ,
    Regards

    Hi,
    the only other ways (as I know), if you really want to convert is
    a) write a parser to convert html into csv(xls)
    b) use a html2csv script on the os level
    like:
    http://sebsauvage.net/python/html2csv.py (or just google html2csv)
    c) use excel (data source web; local file: "file:///C:/test.htm"
    Kind Regards,
    Dirk

  • Is it possible to convert *.doc to *.rtf in a java program?

    Hi :-)
    My challenge is to develop a web-app in ADF Faces. Now i verify some technologies to store mailmerge letters in an easy way. The user of my web-app should upload a MS Word mailmerge document and a csv data source file. My web-app must thereupon convert this two files to a pdf per csv-row and store it to a ftp.
    I have build a demo using the open office API. But now i want to try the same by using apache POI and FOP. I can merge the doc files with POI and i can create PDF with FOP.
    My problem is, that POI cant convert to a rtf file and FOP uses an rtf file to create a pdf. I dont know, if its possible to convert a doc file to a rtf file. If its possible, is there an API, which will help me out?
    Regards
    Majo
    btw...I am not sure, if its the right forum for my question :-/

    HeHe, no sorry. The binary file is the same, because Windows bind doc and rtf with MS Word, it opens the file, which you have renamed to *.rtf in MS Word. But as a doc document, not as a rtf file ;-)
    And i dont want to open the rtf file in MS Word. I want to process the rtf file in java.
    Thanks
    Majo

  • Problem with converting html to pdf using LiveCycle ES Java API

    I am using this code to convert html to pdf.
    * 1. adobe-generatepdf-client.jar
    * 2. adobe-livecycle-client.jar
    * 3. adobe-usermanager-client.jar
    * 4. adobe-utilities.jar
    * 5. wlclient.jar
    import java.io.File;
    import java.util.Properties;
    import com.adobe.idp.Document;
    import com.adobe.idp.dsc.clientsdk.ServiceClientFactory;
    import com.adobe.idp.dsc.clientsdk.ServiceClientFactoryProperties;
    import com.adobe.livecycle.generatepdf.client.GeneratePdfServiceClient;
    import com.adobe.livecycle.generatepdf.client.HtmlToPdfResult;
    public class ConvertHTML {
       public static void main(String[] args)
            try{
            //Set connection properties required to invoke LiveCycle ES                             
            Properties connectionProps = new Properties();
            connectionProps.setProperty(ServiceClientFactoryProperties.DSC_DEFAULT_EJB_ENDPOINT, "t3://localhost:7001");
            connectionProps.setProperty(ServiceClientFactoryProperties.DSC_TRANSPORT_PROTOCOL,Service ClientFactoryProperties.DSC_EJB_PROTOCOL);       
            connectionProps.setProperty(ServiceClientFactoryProperties.DSC_SERVER_TYPE, "WebLogic");
            connectionProps.setProperty(ServiceClientFactoryProperties.DSC_CREDENTIAL_USERNAME, "administrator");
            connectionProps.setProperty(ServiceClientFactoryProperties.DSC_CREDENTIAL_PASSWORD, "password");
            //Create a ServiceClientFactory instance
            ServiceClientFactory factory = ServiceClientFactory.createInstance(connectionProps);
              //Create a GeneratePdfServiceClient object
            GeneratePdfServiceClient pdfGenClient = new GeneratePdfServiceClient(factory);
           //Get an HTML document to convert to a PDF document a
            String inputFileName = "http://www.adobe.com";
            //String inputFileName = "C:\\Documents and Settings\\venkat\\Desktop\\Adobe.htm";
            String securitySettings = "No Security";
            String fileTypeSettings = "Standard";
    System.out.println("one");
            //Convert HTML content to a PDF document
            HtmlToPdfResult result = pdfGenClient.htmlToPDF2(inputFileName, fileTypeSettings, securitySettings, null, null);
    System.out.println("two");         
            //Get the newly created document
            Document createdDocument = result.getCreatedDocument();
            //Save the PDF document as a PDF file
            createdDocument.copyToFile(new File("C:\\test.pdf"));
        catch (Exception e) {
            System.out.println("Error OCCURRED: " + e.getMessage());
            e.printStackTrace();
    I can able to compile this class but while running i am getting error like below.
    Error OCCURRED: Internal error.
    ALC-DSC-000-000: com.adobe.idp.dsc.DSCRuntimeException: Internal error.
            at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.doSend(EjbMessageDispatcher.java
    :160)
            at com.adobe.idp.dsc.provider.impl.base.AbstractMessageDispatcher.send(AbstractMessageDispat
    cher.java:57)
            at com.adobe.idp.dsc.clientsdk.ServiceClient.invoke(ServiceClient.java:208)
            at com.adobe.livecycle.generatepdf.client.GeneratePdfServiceClient.htmlToPDF2(GeneratePdfSer
    viceClient.java:666)
            at ConvertHTML.main(ConvertHTML.java:84)
    Caused by: java.rmi.RemoteException: Remote EJBObject lookup failed for 'ejb/Invocation'; nested exc
    eption is:
            org.omg.CORBA.COMM_FAILURE:   vmcid: SUN  minor code: 203  completed: No
            at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.initialise(EjbMessageDispatcher.
    java:101)
            at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.doSend(EjbMessageDispatcher.java
    :130)
            ... 4 more
    Caused by: org.omg.CORBA.COMM_FAILURE:   vmcid: SUN  minor code: 203  completed: No
            at com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(Unknown Source)
            at com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(Unknown Source)
            at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.writeLock(Unknown Source)
            at com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendFragment(Unknown Source)
            at com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendMessage(Unknown Source)
            at com.sun.corba.se.impl.encoding.CDROutputObject.finishSendingMessage(Unknown Source)
            at com.sun.corba.se.impl.protocol.CorbaMessageMediatorImpl.finishSendingRequest(Unknown Sour
    ce)
            at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete1(Unkno
    wn Source)
            at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete(Unknow
    n Source)
            at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.invoke(Unknown Source)
            at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.is_a(Unknown Source)
            at org.omg.CORBA.portable.ObjectImpl._is_a(Unknown Source)
            at weblogic.corba.j2ee.naming.Utils.narrowContext(Utils.java:126)
            at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFact
    oryImpl.java:94)
            at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFact
    oryImpl.java:31)
            at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:41)
            at javax.naming.spi.NamingManager.getInitialContext(Unknown Source)
            at javax.naming.InitialContext.getDefaultInitCtx(Unknown Source)
            at javax.naming.InitialContext.init(Unknown Source)
            at javax.naming.InitialContext.<init>(Unknown Source)
            at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.initJndiContext(EjbMessageDispat
    cher.java:213)
            at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.getJndiContext(EjbMessageDispatc
    her.java:226)
            at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.initialise(EjbMessageDispatcher.
    java:87)
            ... 5 more
    can u plz give me some way to do the convertion.

    Yes Sir.....Thanks for ur suggestion.....
    But i didn't find exact solution..well..yes i found some but not exactly there were not in the way i required...I jus need to convert HTML to PDF using iText API for java.....I already used some classes in that like HTMLParser.....etc..
    So Any thing else...Any one...Sure can help me in this................

  • Convert  html to word document

    convert html to word document ,
    I tried poi-3.0.2-FINAL,Apache POI - HWPF - Java API to Handle Microsoft Word Files
    it is not working...

    My actual goal is convert html file into word document,
    i posted into forum, some people are suggested HWPF just look,
    I tried one by one program i not getting any answer for example one program,
    HWPFDocument     doc = new HWPFDocument (new FileInputStream ("c:\\temp.doc"));
                   Range r = doc.getRange();
              System.out.println("Example you supplied:");
              System.out.println("---------------------");
              for (int x = 0; x < r.numSections(); x++)
              Section s = r.getSection(x);
              for (int y = 0; y < s.numParagraphs(); y++)
              Paragraph p = s.getParagraph(y);
              for (int z = 0; z < p.numCharacterRuns(); z++)
              //character run
              CharacterRun run = p.getCharacterRun(z);
              //character run text
              String text = run.text();
              // show us the text
              System.out.print(text);
              // use a new line at the paragraph break
              System.out.println();
              }catch(NullPointerException exception){
                   exception.printStackTrace();
              } catch (FileNotFoundException e) {
                   // TODO Auto-generated catch block
                   e.printStackTrace();
              } catch (IOException e) {
                   // TODO Auto-generated catch block
                   e.printStackTrace();
    java.io.IOException: Invalid header signature; read 5789751444030890300, expected -2226271756974174256

  • How do I save a file as an unformatted txt file instead of html or rtf?

    How do I save a file as an unformatted txt file instead of html or rtf?

    Use menu Image>Image Size in the image size dialog uncheck Resample and enter 300 in the resolution field and click OK.  Note no Pixels are changed only the resolution setting get changed.  The use Menu Fils>Save As in the save as dialog use the file type pull down and select Tiff then click Save
    In the Tiff Option Dialog in the Image Compression  section set None The click OK.

  • How to convert html file to master file in sharepoint branding

    How to convert HTML file to master file in SharePoint branding Programmatically

    Hi,
    According to your post, my understanding is that you want to convert HTML file to master file.
    You can use Design Manager to achieve it.
    On STEP 4 Edit Master Pages and clicked on the option at the top to
    Convert an HTML file to a SharePoint master page.
    Once completed, make sure the Status is set to Conversion Successful
    For more information, please refer to:
    SharePoint 2013 – Design Manager – Convert HTML to Master Page
    Best Regards,
    Linda Li
    Linda Li
    TechNet Community Support

  • Problem reading html and rtf emails

    When I send emails from my pc to my iPhone 5 in html or rtf format they are unreadable as all of the coding instructions are also included in the text when it appears on screen. This was never a problem with my iPhone 4 so I am not sure what has changed. I have a business contact who has had similar problems in the recent past with my emails so I know it is not just me.
    I have tried sending html emails from other pc's in the office to my phone and they are all readable so perhaps it is something in the set up of my pc that is causing this issue. As my office is changing over to iPhone 5's does anybody have any solution to what will become a very annoyinmg problem.
    Obviously I could send all of my emails in plain text but that doesn't really work for what I need to send, logo's / graphics etc.

    Hi
    The best way to organize data and images to get them next to each other is to use a table (no borders) in your RTF template. Create a two celled table and drop the image into one and the text next to it.
    Regards
    Tim

Maybe you are looking for

  • WBS BOM

    Dear Gurus , In normal change management situations (while having an open production order), when we make a change to main item BOM or internal phantom, we use a change number with release key which we reflect to production order using COCM. In WBS B

  • Import process customization

    Hi Sap Experts, Right now i am working in a project where mostly material is import.Can any body tell me,How import process is differnt from local purchase.what access sequence,condition types,control data.and taxes are define in customization menu.H

  • Advice on "image counter"

    Im facing a big project where i need to prosess a big number of images, but i will get payed for only the images which actually needs prosessing(There are about 50 prosent of the total amount that needs work). Do photoshop have a counting system scri

  • IPhoto deletes pictures when shared

    Not that Apple will do anything about this problem that has been going on since iPhoto got the major update, but in shared mode, we imported loads of photos of our beautiful 20 month old son and newborn daughter. And now they are permanently gone. Va

  • Error mess when I try to export to PDF

    The file's security settings do not allow export.