MS-Word Doc to Filtered HTML Conversion

Hi Dear all
In my application i want to convert MS Word Document to Filtered Html (Since there are two options available in word that is save as webpage and second one save as webpage filtered what i need is the html generated by saving a word doc as web page filtered). However i want that my application that is java based should do it automatically so that the user need not to first convert it into filtered html and then open in my app.
so please is there any way or apis or components available that can automatically convert a word doc to filtered html in java. i need urgent help in this regard
thanks

Maybe this can help: [http://poi.apache.org/]
I haven't looked at it to much.
Edited by: Farmor on Jul 24, 2008 8:51 AM

Similar Messages

  • Problems importing a Word doc into Robohelp HTML

    I have a 116-page Word document with 13 chapters. When I
    import this document into Robohelp HTML as a winhelp output file,
    only two topics are created, rather than 13. On the Conversion
    Options window, I select to have new topics created by Word style
    "Heading 1" which are my chapter titles. Why doesn't Robohelp
    create the topics, and what should I change to faciliate that?
    Also, why don't my numbered lists from Word stay numbered in
    Robohelp? Is there a way to automate numbered lists?

    Thanks for your reply.
    When I insert .gif files JDeveloper ends with a message saying "Process exited with exit code 0". I then go and check in the database and I find the image added to the table.
    With a word doc, JDeveloper does not give that message. No message at all regarding what the status of the process is. And the document is not added to the database. No error messages too. Could this be an issue with Oracle?

  • Download MS word doc from an HTML page link

    Hi can someone please tell me how to do this.
    I have created the page and inserted <!--Download the large documentinto the region source but when I run the page and click on the link nothing happens.
    I am a newbie please be simple but elaborate in what needs to be done.

    user2446549 wrote:
    Hi can someone please tell me how to do this.
    I have created the page and inserted <!--Download the large document
    You cannot access the files on you pc, it needs to be available on the web server or in a database table(blob)
    I am a newbie please be simple but elaborate in what needs to be done.What are you trying to achieve?

  • Early Watch not Producing Word.doc, or HTML Reports

    Hello all,
    I've managed to maintain the SDCCN, and the jobs associated with producing an Early Watch.
    However, the process isn't producing any attachments, including a word doc, or an HTML report.
    Oddly, in the operations screen from the CEN, I'm not able to view the report session at all, despite the Early Watch report being in the monitored system as a tree structure.
    When trying to view the session through the operations screen in solution overview, the system throws that it cannot be viewed:
    Cannot open session
        Message no. DSWP033
    Even odder is that the icon (the two bottles) suggests "Data has been transferred".
    Any suggestions would be most helpful.
    Message was edited by:
            Kenneth P

    In case any searches on this and needs a solution...
    Believe it or not, the error was b/c of the >decimal< settings in my user ID.  How very odd.
    However, if you continue to get odd messages, please make sure that your user is set to display with decimals between powers of 10, and commas between powers of 1/10th.
    ie 100.100.100,000

  • Remove Header and footer from Word Docs

    Is it possible to remove the header/footer from Word docs?
    The html output from ctx_doc.markup gets the header/footer and (specialy)page numbers all messed up.
    []'s
    thanks

    Please define 'messed up'. I'll reproduce here and then try to find a cause/solution. I checked a couple of other forums and Metalink with no good discussion on the topic.
    -Ron

  • Preserving HTML links when converting Word docs

    I created a document in MS Word for Mac 2011 (ver 14.5.2)
    When I used Adobe Pro 11 to convert the document into pdf format, all the HTML links are lost. 
    I have tried converting from MS Word, opening the word version with Adobe Pro, printing from MS Word to pdf.  Nothing works. Even opened the MS Word document using Apple's Pages software, re-formatted and then converted to pdf.  Still no live links.
    Best work around so far is to convert and then use the edit feature in Adobe Pro to re-insert the HTML links as invisible rectangles on top of the still-blue-and- underlined text.  So to the user it looks like the html links are still live, but what a pain for editor.
    I have seen this issue raised in other posts, but none of the answers seem to work. And the work around described above is clearly less than ideal.
    Very curious, as reading other posts, the issue apparently does not arise when the word doc converted from a Wintel computer.  But I can't imagine Adobe writes software one way for Intel and another for Mac.

    Imagine it. this has been 15 year fued between Microsoft and Adobe.  Adobe Claims that Mac office doesn't hav ethe proper hooks for URLs. Microsoft says the fault is with adobe.
    Since a Word Created file will work when opened in the windows version  and saved as a PDF, the links work just fine. Just opening and not saving, but converted to Pdf does nothing to the actual Word File.
    They had it fixed last year  in Acrobat X if your dropped the file on to Acroabt directly. But broke it again with the upgrade to XI.
    IF you have iWork and Open the word file in Pages then exported as a Word.docx file then create The PDF the links will become active.  Also if you Open in OpenOffice and export as docx file the resulting PDF when open in Word and PDF is Created the links will become active.
    If you have neither you will have to open the PDF and add the links. Note the Links will be hot (active) but the links will not turn Blue and  be Underlined
    Mac Office2011 is a Conversion of Office2010/2007 code.  So there should be no pproblem.
    Also Don't use the Save As . . .  PDF Method. Instead go to Print Menu  click on PDF wait for context menu > the choos Adobe Quality PDF or Adobe PDF (uses Adobe's PDF engine).
    wait for next screen that shows qualtity leave as sent unles you need specific job options. click okay then next screen File name. Rename as necessary the browser to desired to location then click save.  Or you can drop the saved Word document (with Word quit.) on to Acrobat and after a minute or so the PDF will be created. (Using this method in AcrobatX would actully show URL s or Mailtos as active - They broke this in Acrobat XI PDF still can be created but hot links no longer work).
    Well it seems it does work on occasion  see: http://www.screencast.com/t/cib2kcYG

  • Unique issue with PDF to WORD .doc conversion with Acrobat Pro - any ideas?

    I have been unable to solve the following issue when converting (save as...) PDF documents to Microsoft Word .doc using numerous methods. This could either be an issue that would be fixed in Acrobat Pro itself, or in MS Word - posting to the Adobe forums first.
    PREFACE: I am attempting to use the converted .doc file with translation applications/software. Google Translator Toolkit is what I use the most, but ALL other translators are having this very same issue with the .doc file. --The source PDFs are product information from drug manufacturers in various countries that I need to have translated to English. I do not have access to their source documents, as they do not provide their own source docs for obvious reasons.
    ALSO: I cannot use Google Translator toolkit to translate from PDFs directly - if you do that, it will attempt to translate a PDF and then export in an .html file, but it does not get the exact spacing of the sentences correctly, which leads to errors in translating - key things such as "can take with alcohol" and "do not take with alcohol". So that's out!
    I am not having any problems with the resultant .doc file in MS Word itself. It looks right, the spacing matches the original PDF source perfectly, prints correctly, etc... Reference here on a product info sheet from Austria in German:
    The problem: This is a screenshot from Google Translator Toolkit - the right side of the image - the spacing in the lettering from the .doc file I am uploading is not being read correctly, resulting in untranslated gibberish. (Note: this isn't a problem with the translation applications or software -- all are having this issue with .doc files converted from .pdf - this issue isn't present with any old .doc file that wasn't converted from a .pdf) -- It's definitely got something to do with some kind of embedded data in the .doc file that I cannot isolate!!)
    My settings in Adobe Pro (convert from PDF to .doc):
    Page layout: Flowing Text (this prevents the resultant .doc from having all of those text boxes, which also don't then work in translators)
    Include comments: True
    Include images: True
    Run OCR if needed: True
    Notes:
    -I have run OCR text recognition on the source PDF files in it's specific language.
    -I have edited the accessibilty of the PDF and have run the tag recognition and quick checks (to see if they solved the issue, which it did not - tagged or untagged, same problems!)
    -I have exported the .doc BACK to PDF using MS Word's function, which results in a great looking tagged PDF. THEN I re-saved this new PDF back as a .doc - same issue.
    -I have tried saving the PDF in all of the other formats that the translators accept. All have different issues. The only one that works consistently is saving to a .txt (plain)... The best is a .doc to .doc conversion, with all the original spacing. (I am not spending hours reformatting a .txt translation in word)...
    I can't seem to find where this spacing data is in the .doc file!!!! (Changing the fonts, sizes, margins -- doesnt fix this either). I have tried so many methods...
    Any thoughts on other things to try in Adobe Pro (or Word)?
    EDIT: Here's an additional tidbit of info that may be the key to this... There's some kind of coding that is in the .doc that Adobe Pro converted from the source PDF that doesnt display in Word, but that is being seen by the translation programs....... I have no idea what these are, but I want to remove them!
    Message was edited by: KaotikADC

    I would suggest you look at the fonts that are being used. It may be a font issue that is not properly being read by the translation program.

  • Strange bug: Acrobat 9 deletes words upon conversion from Word doc

    Situation: converting from MS Word 2007 (SP2) to Acrobat 9 Standard (v 9.2.0) in WinXP.
    I've tried this repeatedly. The first word in multiple headings disappear when converted to PDF (they remain in the Word doc, but don't exist in the PDF that was produced from the doc).
    Example: "Correlation Analysis" becomes " Analysis".  This is happening to the text of every "Heading 2" style heading. All the other headings are unaffected.  Does anyone know why this is happening?
    This is insidious because we assume that the finished doc matches the original one. So now I have to also proof the Adobe conversion for unexpected deleted words?
    In the meatime, I'll return to Adobe 8 to do these conversions.

    I see a similar problem being reported in this posting:
    http://forums.adobe.com/message/2370280
    No resolution there either.
    Regarding your suspicion that it may be related to Word, what contradicts that is the fact that the full text of the header appears in the resulting PDF bookmark!!  However, the first word of the text is missing from the heading in the PDF (though it remains in the source Word doc).
    Also, I am NOT referring to the page header or footer.  I'm referring the text heading in the body of the document. In my case, only the "Header 2" styled headings are affected.

  • Table data in html from word doc

    I have a nice looking invoice word doc I would like to use to make a apex report. The word doc has a table with all the numeric data for the invoice, pretty simple. I saved the word doc as html and used a program to strip all the excess html. I put the html into a apex report as a html region and it looks perfect. What is the best way to get SQL data into the html table in the region.
    Do I have to create a SQL region and then put the html data into the header and footer, am I missing something simple?
    Thanks

    794181,
    I'd take the HTML and use it as the framework for a named column template. This will allow you to put the data from your query in to the exact spots it needs to go.
    To make a named column template, go to shared components, templates, create, report; for "Template Type:", select "Named Column (row template)".
    Named column templates are fairly easy to set up, but that also seems to mean that it's hard to find documentation for them. Basically, define your query, and then use #&lt;COLUMN_NAME&gt;# in your HTML, replacing &lt;COLUMN_NAME&gt; with the actual names of the columns from your query.
    Oh, and generally, you won't want to set up a column heading template in your named column template--I almost always just fill in Row Template 1, Before Rows, and After Rows.
    -David

  • Word doc file conversion error - virus or OS?

    I have word documents made on the Mac and have only existed on the Mac. When double clicked, some of them will not open anymore. Office prompts to convert the file but no matter what format chosen, the converted file is no good.
    This issue doesn't seem to effect all word docs. Seems like there was an Mac OS update or something that may have caused issue. The system has Parallels running XP and it was thought that might have allowed a virus or caused some corruption. (The virtual PC never had e-mail setup or Office installed.)  Moving the files to another Mac provides the same results. So the files themselves seemed to have changed or it's something in the OS.
    Only difference I have noticed between the two files are, the files that don't open have a file extension and the files that do open have none. The ones with no file extension I think are a legacy from an older OS version that was not as strict on having them. Adding and then removing a file extension does nothing, the OS still sees those files as Word docs.
    Removing the file extension on a file that has one, does not change convert file prompt. In fact it just makes the file loose it application type so the OS doesn't know what type of file it is.
    All of this was happening using Office 2008 (Word 12.3.6) so today I bought Office 2011 today to see if the issue could be resolve and got the same results.
    I have read many threads on Word conversion issues but none like this one or that doesn't involve a PC. Anybody have thought or seem this? Maybe there is a work around I'm missing?
    Thanks for your time.

    Thank you about your XP answer. That makes sense. I would think that would be a major flaw in the Parallels product.
    That leaves OS versions possibly. All these word docs were created in Office 2008 after the company got the Macs and made the switch. The only things that really changed over that time was OS upgrades and MS Office updates.
    The issue only came to known last year. Inital research and current have found nothing to act on.  Other issues I've read involve going between platforms and that's just not the case here. We threw the PCs out after the move.
    This issue effects many files throughout our client file folders, the only differences being as I noted. Attach two JPEGs showing a good (no extension) and bad (has file extension) file info.

  • Conversion of word doc to pdf with ms word how to change to pdf and then ppt?

    I have used my ms word to convert word doc to pdf, i purchased conversion ability from link in adobe reader and what do i do next. 
    New to this.

    Hi adog,
    No worries! We'll get this sorted out for you. I'm not sure that I follow, though. You converted a Word file to PDF, and now you want to convert that file back to Word, and then to PowerPoint? That seems like a lot of steps, when you could import your Word document into a PowerPoint file.
    If I'm missing something, and misunderstanding your process, please let me know. It's best to avoid unnecessary conversions from one format to another to another, if possible.
    Best,
    Sara

  • Reading a multilevel list from MS Word Doc and converting it into an HTML nested list using C#

    I can achieve the above for a single level list as follows:
    foreach (Paragraph item in app.Selection.Range.ListParagraphs)
    item.Range.InsertBefore("<li>");
    item.Range.InsertAfter("</li>");
    Using C#, how can I programmatically convert a multilevel list (like the following) in a Word doc to a nested HTML list? Note: The bullet icons are not important. Thanks..Nam
    List from Word Doc:
    A
    B
    C
    D
    E
    F
    G
    H
    I

    Hi Nam,
    >>how can we programmatically determine the start and end elements of the sub-list with elements C,DE,F,G in the example of my original post? <<
    We can check the begin and end elements of the sub-list by the
    ListLevelNumber. For example, the sub-list's ListLevelNumber start at 2 by default. Here is the code to find the begin element for your reference:
    Sub FindBeginSubElement()
    For i = 1 To Selection.Range.ListParagraphs.Count
    If Selection.Range.ListParagraphs(i).Range.ListFormat.ListLevelNumber = 2 Then
    Debug.Print "begin sub element:" & Selection.Range.ListParagraphs(i).Range.Text
    Exit Sub
    End If
    Next i
    End Sub
    Also we can loop the selection in reverse order to find the end element for the sub-list.
    Hope it is helpful.
    Regards & Fei
    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click
    HERE to participate the survey.

  • Ch@ndra plz help: conversion to word doc

    Some people have repsonded to this post but it is not working as desired
    the code below is working but when I am tryin to add the table, it is not working...
    Incase chabdra you can post the code for creating tables it would be great..
    some one posted a pdf file as well to accheive this but the steps and code given in that is not working correctly...
    sorry for bothering you...
    thanks,
    Hello,
    Any pointers as how to acheive if we want to convert a flat file into a meanigful word output.
    approach: I want to add some lines of introduction to the word,
    then add some data from the flat file,
    again some explanation to the data and then again a bunch of data..
    earlier i had posted the post and Ch@ndra had replied giving some inputs,
    What I could understand, we can create a new word file and add some text to it, But how we combine the the selected flat file data and some text.
    posting the code which Ch@ndra posted me ..
    INCLUDE ole2incl.
    DATA: word TYPE ole2_object,
    documentos TYPE ole2_object,
    documento TYPE ole2_object,
    selection TYPE ole2_object,
    font TYPE ole2_object.
    CREATE OBJECT word 'WORD.APPLICATION'.
    CALL METHOD OF word 'Documents' = documentos.
    CALL METHOD OF documentos 'Add' = documento.
    CALL METHOD OF documento 'Activate'.
    GET PROPERTY OF word 'Selection' = selection.
    GET PROPERTY OF selection 'Font' = font.
    SET PROPERTY OF word 'Visible' = 1.
    SET PROPERTY OF font 'Name' = 'Arial'.
    SET PROPERTY OF font 'Size' = 10.
    SET PROPERTY OF font 'Bold' = 1. "o 0
    SET PROPERTY OF font 'Underline' = 1. "o 0
    CALL METHOD OF selection 'TypeText' EXPORTING #1 = 'Assesment Tool'.
    CALL METHOD OF selection 'TypeParagraph'.
    CALL METHOD OF documento 'SaveAs' EXPORTING #1 = 'c:\test.doc'.
    CALL METHOD OF word 'Quit'.
    any other commands availbale to convert the data into table and other forms.
    thanks

    If you have adobe X, simply go to "File" -> "Save-As" and Microsoft Word is an option. Also, there is a download (paid) called Able2Extract that will allow you to convert pdfs to word, excel, etc. but the formatting is often lost. If you get the premium package, you can even convert scanned documents. I've used Adobe X to do the conversion, and it retains headers, footers and all formatting.
    And i personally share you with a article about how to converting PDF files to Word(.doc), you just need to follow the easy guide.
    Hope it can help you a lot.

  • I don't own a MAC. I can probably use one at the library. I am trying to publish my book with Apple's ibooks, etc. Since my files are in Word.doc or html; how can I format my book so as to publish it?

    I don't own a MAC. I can probably use one at the library. I am trying to publish my book with Apple's ibooks, etc. Since my files are in Word.doc or html; how can I format my book so as to publish it?
    www.amessageforthehumanrace.org

    Use an aggregator and follow their instructions for formatting.

  • Java Library for MS Word (.doc) to PDF Conversion

    Hi,
    My customer would like to use BI Publisher's PDF Binding and Merging features to combine BIP PDF outputs with another documents in Solaris platform. However, currently those documents are all in Word .doc format only and the customer does not want to consider converting those into other formats like RTF.
    Does BI Publisher provide library for converting .doc format directly into PDF? If not, does anyone know a Java library on the market that can best do the job?
    Geoffrey

    From: <[email protected]><br /><br />| @graffiti, Even if a printer does work, it doesn't solve my on-screen appearance. I<br />| want the document to look good both on-screen and printed. At presented, the on-screen<br />| doesn't look good.<br /><br />| @David, can you please elaborate a bit further? The signature should be a WMF file? I<br />| looked at the various file types from the Photoshop Save-As dropdown menu, and WMF was<br />| not among them. One of my critical elements is to be able to save the signature with a<br />| transparent background, not a white background.<br /><br />| Can you elaborate more on the "low compression ratio"? I have got no clue what that is<br />| about or where to change it.<br /><br /><br />I don't what to tell 'ya about WMF so here is a good Wiki on it...<br />http://en.wikipedia.org/wiki/Windows_Metafile<br /><br />As for the compression ratio...<br /><br />{ the following is based upon my installation of Acrobat 9 but most versions are<br />relatively the same }<br /><br />In "Printers and faxes"<br />Right-Click on "Adobe PDF"<br />Choose; "Print preferences"<br />Under "default settings" choose "Edit"<br />Now choose "Images"<br /><br />You will find settings for "compression" and "Image quality"<br /><br />The objective is "high quality" and "low" or no compression.<br /><br /><br />-- <br />Dave<br />http://www.claymania.com/removal-trojan-adware.html<br />Multi-AV - http://www.pctipp.ch/downloads/dl/35905.asp

Maybe you are looking for

  • Can you track a form if the file name has been changed after distribution?

    I created a form in Adobe Acrobat X Pro -- I then used the distribute feature and sent the form to myself using the internal server. After testing the form, and making sure all was good, I copied the attachment from the email I received when I distri

  • No Connection between WRT54G and Zhone 6212

    A friend of mine has recently upgraded to DSL from dial-up. The company gave her a Zhone model #6212-I3-200 and she has a Linksys model #WRT54g v5 wireless hub. If she connects direct via wire from the modem to her computer, she has an internet conne

  • Urgent help on GUI

    Hello, I am a newbie to GUIs in java. I am writing an application that has 4 buttons. You can run the codes below to see how it looks. I need it to connect to an oracle server when the "Connect to Server" button is clicked If the connection is succes

  • Does Pages '09 embed font in PDF?

    Well, my subject says it all. Some PDF creators on Windows have the "embed fonts" option. Pages doesn't. However, I don't know if it does it by default when exporting to PDF. Thanks, Manu

  • How to install VISA after Labview and NI-DAQ are already installed

    I have Labview 5.1.1 and NI-DAQ 6.6 installed and running properly. I now find that I need to install VISA. Must I uninstall everything (hardware, NI-DAQ, and Labview) and then run the Labview installation again to install VISA? Or can I just go ahea