Converting HTML into a Word document

Hi all,
I have a JSP whose content type is set so that the HTML it produces is opened up in Word. Now this works fine until images come into the equation, as these images must lie somewhere in order to be referenced from the HTML code. As this document must be 'stanalone', booting up the HTML in Word and simply changing the file extension is no good as it is still HTML under the hood.
What I therefore would like to do it generate a Word document from this HTML that is independent in the fact that it 'holds' these images within itself and does not rely on external resources. Does anyone know how I can achieve this?
I have looked into Jakarta POI and have written this off as an option because 1) it is still in development and 2) there is no documentation or examples of how to use what is already there. I am assuming someone has come across this problem before and knows of a solution out there that I could use.
Many thanks in advance!

HI,
Thanks all for your replies! Unfortuantely it can't be PDF as the creator will need to edit it before the document is complete. I have actually looked into generating an RTF document instead, but the example I tried seemed to loose the image data. Unfortuantely I know nothing about RTF and so kind of gave up on it :(
Here is the code I used:
import java.io.ByteArrayOutputStream;
import java.io.StringReader;
import java.io.IOException;
import javax.swing.text.BadLocationException;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.StyledEditorKit;
import javax.swing.text.rtf.RTFEditorKit;
import javax.swing.text.html.HTMLEditorKit;
public class FormatConverter {
     private HTMLDocument tempHTMLDoc;
     private HTMLEditorKit htmlKit;
     private RTFEditorKit rtfKit;
     public FormatConverter() {
          tempHTMLDoc = new HTMLDocument();
          htmlKit = new HTMLEditorKit();
          rtfKit = new RTFEditorKit();
     private String fudge(String strText) {
          String strResult = "";
          StringReader reader = new StringReader(strText);
          try {
               tempHTMLDoc.remove(0,tempHTMLDoc.getLength());
               htmlKit.read(reader,tempHTMLDoc,0);
               ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
               rtfKit.write(byteArrayOutputStream,tempHTMLDoc,0,tempHTMLDoc.getLength());
               strResult = byteArrayOutputStream.toString();
          catch(IOException ie){}
          catch(BadLocationException ble){}
          return strResult;
     public static void main(String args[]) {
          FormatConverter conv = new FormatConverter();
          String strRTF = conv.fudge("<P><IMG src=\"http://intratestgbr/announcements/images/1093429553065.jpg\"></P><P> </P><P>50 <STRONG>pounds</STRONG>, <FONT color=#0000ff>wow</FONT></P>");
          System.out.println("RTF: '"+strRTF+"'");
          strRTF = conv.fudge("<html><head><p class=default><span style=\"color: #000000\">Description </span><span style=\"color: #000000\"><b>with</b> </span><span style=\"color: #000000\"><i>some</i> </span><span style=\"color: #000000\"><u>formatting</u> </span><span style=\"color: #000000\"></span></p></head></html>");
          System.out.println("RTF: '"+strRTF+"'");
          System.exit(0);
}The output I got from this was:
    \rtf1\ansi
        \fonttbl\f0\fnil Monospaced;
        \colortbl\red0\green0\blue0;\red0\green0\blue255;
    \par
    \~50 pounds, \cf1 wow\par
}Like I said, when I open the RTF output in Word, everything is fine apart from the missing image. If one of you very nice people could point me in the right direction of a way to convert it to RTF instead while still maintaining the images this would certainly be a very acceptable solution and I would be very grateful :)
Many thanks again!

Similar Messages

  • Is there a way to edit a PDF file without converting it into a word document?

    Is there a way to edit a PDF file without converting it into a word document?

    Then you posted in the wrong forum...
    At any rate, you can use the Edit Text & Images tool (under Tools - Content Editing) to make changes to the file. You'll need to be a bit more specific about what you want to change if you want more detailed instructions.

  • Have converted pdf file into a word document. How do I engage the edit option?

    Have converted pdf file into a word document. How do I engage the edit option?
    jnicholasperkins@

    Not stupid you! To, from PDF--it gets confusing!
    To edit a PDF file, you need to use Acrobat. Depending on how much work needs to be done to the PDF, it may make sure sense to edit in Word, and then reconvert to PDF. Acrobat doesn't function as a word processor, so while limited changes to the text should be fine, extensive updates may not yield the results you're looking for.
    That said, if you don't have Acrobat, you are welcome to try it for free for 30 days. For more information, see www.adobe.com/products/acrobat.html.
    Best,
    Sara

  • How do I convert an Adobe file into a word document?

    How do I convert an Adobe file into a word document?

    What type of Adobe document are you trying to convert?  Which Adobe software or service are you utilizing?

  • How do I convert a pdf file into a word document? I have a form to fill out.

    I have a form to fill out and it is currently in pdf format I would like to change it into a word document so that I can fill it out. This document I need to be able to e-mail it out today. So any help that anyone can give me would be greatly appreciated.

    Hi, lnquisitive.
    ExportPDF and CreatePDF will both convert a PDF to an editable Word document. I would recommend you subscribe , give it a try, and let us know if it works for you. You have 30 days to get a full refund if you're not satisfied.
    Dave

  • I do not have the ability to edit my .pdf once converted into a word document. Did I purchase the wrong version?

    When I covert my pdf into a word document it does not allow me to edit the actual text. I can add random text to the document, not following the format of the document (not aligned, not the same font, etc.). I can add text, but I cannot what is already in a sentence or paragraph.
    Did I purchase the wrong version?
    Thanks,
    Cindy.

    Hi cindyjay,
    You have the right tool for converting PDF files to Word. For starters, please try triple-clicking in the text block that you want to edit. If that doesn't work, we'll need to take a closer look at how the PDF that you converted was created (did it start out as a scanned document, for example). If it did start as a scanned document, it's important that OCR (optical character recognition) was performed during conversion to convert scanned text to editable text. OCR is on by default when you convert via the ExportPDF website, but can be turned off if you convert via Reader.
    If triple-clicking doesn't work, let me know, and then we'll take a closer look at your file.
    Best,
    Sara

  • How can I convert a PDF file in my computer into a word document?

    How Can I convert a PDF file in my computer into a Word Document?

    You might try posting to the Adobe ExportPDF forum:
    http://forums.adobe.com/community/exportpdf
    If you would like to email me ([email protected]) the PDF, I'll see if there's anything I can do to help.
    Regards,
    Brian

  • Is there an app for converting a Pages document into a Word document

    is there an app for converting a Pages document into a Word document?

    In pages, Use the menu Share - Export - to Word.
    Matt

  • I need to convert PDF file to Word Document, so it can be edited. But the recognizing text options do not have the language that I need. How I can convert the file in the desired of me language?

    I need to convert PDF file to Word Document, so it can be edited. But the recognizing text options do not have the language that I need. How I can convert the file in the desired of me language?

    The application Acrobat provides no language translation capability.
    If you localize the language for OS, MS Office applications, Acrobat, etc to the desired language try again.
    Alternative: transfer a copy of content into a web based translation service (Bing or Google provides a free service).
    Transfer the output into a word processing program that is localized to the appropriate language.
    Do cleanup.
    Be well...

  • Convert 1 single microsoft word document with section breaks to multiple pdf files

    I am a windows 7 users. I have a single microsoft word document which contain 1500 pages. These 1500 pages are seperated by sections breaks in the microsoft word. I am trying to convert this word document to multiple pdf files seperated by the section breaks in the Microsoft word. How can I convert the single microsoft word document with section breaks to multiple seperate pdf files?

    Acrobat (Adobe PDF Printer and PDFMaker ) doesn't recognize the Section breaks.  It never has as far as I am aware.  The easiest thing to do is to manually break up a copy of the MS Word Document into the Sections you need and then create the PDFs from those MS Word documents.

  • Problems converting PDF to MS Word document.  I successfuly converted 4 files and now subsequent files generate a "conversion failure" error when attempting to convert the file.  I have a large manuscript and I separated each chapter to assist with the co

    Problems converting PDF to MS Word document.  I successfully converted 4 files and now subsequent files generate a "conversion failure" error when attempting to convert the file.  I have a large manuscript and I separated each chapter to assist with the conversion; like I said, first 4 parts no problem, then conversion failure.  I attempted to convert the entire document and same result.  I specifically purchased the export to Word feature.  Please assist.  I initially had to export the Word Perfect document into PDF and attempting to go from PDF to MS Word.

    Hi sdr2014,
    I'm sorry to hear your conversion process has stalled. It sounds as though the problem isn't specific to one file, as you've been unable to convert anything since the first four chapters converted successfully.
    So, let's try this:
    If you're converting via the ExportPDF website, please log out, clear the browser cache, and then log back in. If you're using Reader, please choose Help > Check for Updates to make sure that you have the most current version installed.
    Please let us know how it goes.
    Best,
    Sara

  • Need help with converting PDF files to word document.

    When attempting to convert PDF files to word document, after clicking on "convert" a message "error occurred while signing in" displays. Don't know why??? I'm logged into my account and haven't been able to convert documents.

    Assuming that you try to use the ExportPDF service, I am moving your topic to the ExportPDF forum.
    Just for completeness, can you post your operating system, browser, and Adobe Reader version?

  • How to merge multiple XML or Text documents into 1 Word Document?

    Hi all,
    We're looking for a way to merge multiple XML or Text documents into 1 Word document.
    All the XML or Text documents are oriented as a 'Paragraph', meaning smaller pieces of text.
    By selecting some of these XML documents, the system should be able to create a new Word document with all the selected text paragraphs included.
    The Word document can then be edited for applying a correct lay-out and the document is ready.
    Actually, we are trying to do some kind of 'mail merge' but with multiple XML or Text documents!
    Has anybody an idea whether something exist already or give us a direction how to proceed?
    Thanks in advance,
    Pascal Decock

    You use Assembler for this purpose.
    1) Assembler can be accessed through LC Java API. See http://help.adobe.com/en_US/enterpriseplatform/10.0/programLC/help/index.html
    API Quick Starts (Code Examples) > Assembler Service API Quick Starts
    2) Last week I posted on generating and merging PDF's from PostScript. Take a look at the assembly service instance in the .lca. Assembler uses DDX (Document Description XML) to describe document construction. NOTE the .lca was developed with ES 3 (aka ADEP). The .lca It contains the most basic DDX.
    <?xml version="1.0" encoding="UTF-8"?>
    <DDX xmlns="http://ns.adobe.com/DDX/1.0/">
    <PDF result="out.pdf">
      <PDF source="inDoc1"/>
      <PDF source="inDoc2"/>
    </PDF>
    </DDX>
    http://forums.adobe.com/message/4019760#4019760
    DDX Reference at http://help.adobe.com/en_US/livecycle/9.0/ddxRef.pdf
    Steve

  • Can you edit a pdf document without exporting it into a word document?

    Can you edit a pdf document without exporting it into a word document?

    Hi sylvias99766822,
    You can if you have Acrobat. If you don't, feel free to give it a try. You can download a 30-day trial from http://www.adobe.com/products/acrobat.html.
    Keep in mind, though, that Acrobat isn't intended to be a word processor. So, while you can make adjustments to text and graphics, if you need to do a major overhaul of the text in the document, it's best to go back to the source document and edit the content there.
    Best,
    Sara

  • Convert html into tidy html to convert pdf using iText

    hello.
    I am try to convert html document into pdf.
    first i tried iText it works properly. but it needs all the tags to be witten correctly.
    when u try html not well formeted it gives an exception.
    So is there any way to convert html to pdf.
    or if not if not then way to convert html into properly taged HTML
    so it s easy to convert it to html,
    If you have any working example of Tidy.jar please send me.
    Thanks..

    Hi,
    I had a similar tasko to do i.e converting HTML to PDF.
    Please follow the link to this site and download the trial code.
    http://www.pd4ml.com
    I was able to convert my HTML to PDF.
    Have a look at it and let me know.
    Regards,
    Joe

Maybe you are looking for