PDF to html conversion with embedded images in java

hi all,
i want to convert any pdf file to its html equivalent. currently i am using PDFBOX java api to do that. it works fine with simple pdf files having no images, but if there are embedded images in pdf file then it do not show these images.
anyone who has clue of solving this problem. i can convert individual pdf pages to jpg pictures if all embedded images would also be in these pictures.
help me regarding pointers to other APIs, code snippets etc that can solve my purpose.
thanks in advance

Hi..
really soorry i am not having any solution for u.
But i am having one problem regarding pdf box, i think u know pdf box, i am reading japanese file using pdf box, its giveing
caught a class java.io.IOException
with message: Unknown encoding for 'UniJIS-UCS2-H'
I have wrriten code like this.....
PDDocument pdfDocument = null;
PDFParser parser = new PDFParser( new FileInputStream(file));
parser.parse();
pdfDocument = parser.getPDDocument();
PDFTextStripper stripper = new PDFTextStripper();
String text = stripper.getText(pdfDocument);
reader = new StringReader(text);

Similar Messages

  • Pdf file to html conversion with embedded images

    hi all,
    i want to convert any pdf file to its html equivalent. currently i am using PDFBOX java api to do that. it works fine with simple pdf files having no images, but if there are embedded images in pdf file then it do not show these images.
    anyone who has clue of solving this problem. i can convert individual pdf pages to jpg pictures if all embedded images would also be in these pictures.
    help me regarding pointers to other APIs, code snippets etc that can solve my purpose.
    thanks in advance

    Hi..
    really soorry i am not having any solution for u.
    But i am having one problem regarding pdf box, i think u know pdf box, i am reading japanese file using pdf box, its giveing
    caught a class java.io.IOException
    with message: Unknown encoding for 'UniJIS-UCS2-H'
    I have wrriten code like this.....
    PDDocument pdfDocument = null;
    PDFParser parser = new PDFParser( new FileInputStream(file));
    parser.parse();
    pdfDocument = parser.getPDDocument();
    PDFTextStripper stripper = new PDFTextStripper();
    String text = stripper.getText(pdfDocument);
    reader = new StringReader(text);

  • Sample Java  Code to send an HTML mail with embeded image

    Hello,
    Please can I get a sample Java code on sending an HTML mail with embeded image.
    The HTML message and relevant input parameters withhbe supplied from a PL/SQL that will call the class , the class will embed the image and send the mail to the recepient.

    tev wrote:
    Please can I get a sample Java codeNo. This is a forum, not a code mill.
    Recommended reading: How to ask questions the smart way
    db

  • UTL_SMTP send HTML email with embedded image?

    Hi, I can use UTL_SMTP to send an HTML email ok, but does anyone have an example of how to include an inline embedded image in the email? Thanks!

    If you want to send the html page and have it
    reference the images and css files on your web
    site, that's pretty easy. Just create a message
    with text/html content that is your html page.
    If you want to include all the images and css files
    in your message along with the html page, you'll
    need to create a multipart/related message and
    you'll need to change all the html to reference the
    images and css files using "cid:" references.

  • Send HTML Email with Embedded Images and CSS

    Hi All,
    I have a html page. I want to send that html page(not with attachment) with all images and css. i search and try but I cant find a good solution. can any one help... plz..........
    Thank You.....

    If you want to send the html page and have it
    reference the images and css files on your web
    site, that's pretty easy. Just create a message
    with text/html content that is your html page.
    If you want to include all the images and css files
    in your message along with the html page, you'll
    need to create a multipart/related message and
    you'll need to change all the html to reference the
    images and css files using "cid:" references.

  • When I forward an HTML email with embedded graphics to someone, it forwards it as plain text.. this is driving me batty.. how do I forward such mails INTACT??

    I have the latest Thunderbird installed on a new 64-bit Winblows Eight netbook.. fantastic program, but one problem is driving me absolutely batty, and after using the latest Thunderbird for weeks, I simply can't figure out how to fix it..
    I'm on a lot of mfr. and other kinds of mailing lists, like eBay watch list alerts, and so on.. these are not s p a m (although I get plenty of that.. who doesn't).. but lists I WANT to be on..
    Many such emails from those mailing lists are in HTML format with embedded graphics.. I'm not talking about graphic file attachments, but embedded graphics which are coming from the senders' servers, and appear AS a graphic in the email.. sometimes such emails are one huge graphic with hardly any text.. all well and good..
    However, here's the problem.. when I want to forward such an email to a friend, Thunderbird ALWAYS formats it as plain ASCII text.. I know this because I look in the "sent mail" folder, and can see that it has turned an HTML email with embedded graphics into plain ASCII text..
    I absolutely can't figure out how to get it to forward an HTML email with embedded graphics INTACT, so the sender receives it looking the way it looks when I receive it from a mailing list, or an advertiser, or eBay, or whoever..
    Is Thunderbird capable of forwarding an HTML email with embedded graphics INTACT?.. If so, how / where do I turn on that capability?..
    If the capability to do this isn't built into the program, is there an add-on I can install that will give it that ability?..
    I am not new to computers.. but this really has me stumped.. I want to put Thunderbird on my 32-bit Vista laptop and stop using its horrible "Windoze Mail" program, which I've been using for years, and is slower than snot, and has all kinds of other problems..
    So, assuming whoever reads this FULLY understands my question, PLEASE tell me how to get Thunderbird to have the ability to forward an HTML email with embedded graphics AS-IS, so the receiver(s) I forward it to see it exactly the way I see it when I receive it.. if that ability is built in, please tell me how to turn it on.. if that ability is not built-in, please tell me what add-on I need to install to give Thunderbird this capability.. if Thunderbird absolutely can't forward an HTML email with embedded graphics at all, please also tell me that..
    A virtual box of candy and a dozen long-stemmed roses to anyone who can give me a solution that works..
    Thanks..

    Dear Mr. Toad (my all-time favorite ride at Disneyland ;-) ..
    Thanks so much for your detailed reply.. my netbook is in the bedroom, turned off.. I (so far) only use it in the evening, in the bedroom.. I've saved your response, and will try your suggestions, and let you know if they solve the problem I described. I really appreciate you taking the time to post such a detailed reply..
    I can't answer your Thunderbird "configuration" questions, because I'm in the living room, using the crap Vista laptop, on which I plan to install Thunderbird, and then take Windoze Mail out in the street and drive over it a few times.. I'll get back to you one way or the other, and let you know if your instructions solved the problem, or not..
    I don't understand why Thunderbird "out of the box", so to speak, simply doesn't forward HTML emails with embedded graphics, (like Outlook Excess, and Winblows Mail do).. without having to go through those steps. I personally HATE HTML email, but over the years, it's become more and more prevelant.. so it's a problem I must fix..
    Thanks again..
    Harv..

  • Send HTML mail with an image

    Hi there,
    I'm trying to send html mail with an image. I'm able to see the image but in another section, at the very bottom of my message. Is it possible to display the image in the main part of my message ?
    Here my code :
    MimeMultipart multi = new MimeMultipart();
    BodyPart messageBodyPart = new MimeBodyPart();
    String htmlText = <H1>Test</H1><img src=\"cid:image\">";
    messageBodyPart.setContent(htmlText, "text/html");
    multi.addBodyPart(messageBodyPart);
    MimeBodyPart imagePart = new MimeBodyPart();
    DataSource fds = new FileDataSource("C:\\Resources\\Templates\\logo.jpg");
    imagePart.setFileName( "geologo.jpg" );
    imagePart.setDataHandler(new DataHandler(fds));
    imagePart.setHeader("Content-ID","<image>");     
    multi.addBodyPart(imagePart);
    msg.setContent(multi);
    Transport.send(msg);

    How many different e-mail clients have you tested that with?

  • Mail problem with embedded images and links

    Since Yosemite Apple Mail seems to have a problem with images that are used as a hyperlink to a website.
    While links are clickable with external loaded images work fine, links with embedded images just do not anything. Clicking on them selects them, a double-click opens the image in Preview.
    This works:
    <a href="https://www.google.de"><img src="https://www.google.de/images/srpr/logo11w.png" alt="Google Logo"></a>
    This does not work:
    <a href="https://www.google.de"><img src="cid:logo_google" alt="Google Logo"></a>
    Can you reproduce this problem?
    We tried it with OS X 10.10.1 and 10.10.2

    After a bit of digging around I think I have found the reason for the EO being called prior to the CO. In the parent page CO, there is a transaction commit (oapagecontext.getApplicationModule(oawebbean).getTransaction().commit();).
    Therefore, I assume that as my custom AM is a child of standard AM where the transaction is being commit, the child AM is sharing the same transaction/session and hence the EO being called.
    I am running into issues with moving the validation to the EO as the validation requires visibility of the VO values to calculate a total value. Therefore is there a way to ensure that my custom AM maintains a separate database session/transaction to the parent/standard AM? Is it possible to break the parent/child relationship?
    This way I assume that the commit issued by the parent page CO will not affect my custom AM/EO.
    Cheers.
    Jon.

  • Why can't I forward emails I receive with embedded images?

    I have an iPad 2 and it seemed like I was able to forward emails with embedded images in the past, but now it never works. In fact, I've tried emailing directly from Photos and I have the same problem. No one gets any of my attachments or images. I've updated the operating system to the most recent version and it's made no difference.
    And to make it even more frustrating, when I check my sent messages, the images are there!

    Are you sure that you've set up an account that will automatically delete the messages? Is it a POP or IMAP account? I had a .mac account years ago running Entourage, but never had a problem. For some reason it seems that the server is not deleting your messages upon quit (or however Mail handles it). Maybe we'll just bump this up and see what kind of response you get from those with .me accounts running Mail.
    What version of Office do you have? I'm pretty sure that 2004 was a PPC suite and won't run on your new MBP. If you had 2008 on your PB, then you should just have to re-enter your activation key.
    Good luck,
    Clinton

  • How can I programatically identify PDF files with embedded images?

    Our company has 27,266,949 .PDF files that we're planning to compress in order to save server space.
    We don't want to compress any of the .PDF files that have embedded images as to not alter the image's state.
    How can we programatically create a list to exclude from the compression process?

    Ah, see told you we were new to this and no, my taxs already have enough digits to the balance.
    Ok, so based on that, we should be able to use the preflighting tool to identify the PDF’s with images, factor them out, and then continue with lossless compression on the remaining balance.
    That will give us the compression we need to save space, but also allow us to stand in the court of law (if the scenario was to ever occur) and proclaim that none of our medical images have ever been altered by compression.
    Sound like a reasonable plan?

  • Font Problem with Embedded Images when making PDFs

    Hello,
    I am revising a document I didn't create which has embedded images. Although all of the fonts are turned on, when I create a PDF, the fonts in the EPS come out as Courier. When I print to my laser printer, the fonts in the EPS come out correctly. This job will eventually be produced as a PDF, so I need to figure this out. Any ideas?
    Thanks.

    We can't spend the time to retrieve the originals and distill them.
    Yes, I understand that. I like to know why things don't work, even if it takes me a few minutes to test things. You don't have to retrieve any originals. Just unembed one from the ID file. I know it doesn't make sense to Distill all your EPS files when printing to PDF is going to take much less time. But if you can ascertain why it's not working, you'll have a better chance to fix it next time.
    If the fonts weren't included in the original EPSs, why do they print to PDF in Franklin Gothic? This is true even in the one page document I sent you that doesn't have any styles.
    They don't...unless you have that exact Franklin Gothic loaded on your system. They print in Courier here.
    Your EPS file references a specific font name. When you print, your print driver matches up that reference with the Franklin Gothic font you have loaded. For reasons that I don't fully understand, when you Export to PDF Indesign is not able to match up the reference to the font the same way your print driver can.
    Obviously, printing to PDF is a good workaround. And make sure you send your printer a PDF, not a live Indesign file.
    Ken

  • Creating an HTML page with an embedded image in JAVA

    is it possible to have a code in java that creates an HTML page,
    reads an image file and embeds this to the HTML page?
    can anyone give me a sample code? =)
    thank you very much!

    Just tried this out, and it doesn't do what I was hoping: embed the image into the HTML document as a byte stream, so that the HTML document will display the image, without a seperate file.
    When I did:
    import java.io.*;
    public class EmbeddedImageTest {
        public static void writeHTML(String htmlfile) {
            File htmf = new File(htmlfile);
            FileOutputStream fout;
            DataOutput dout;
            try {
                fout = new FileOutputStream(htmf);
                dout = new DataOutputStream(fout);
                dout.writeBytes("<html><head></head><body><img src=" +
                    "file:/C:/image.jpg></body></html>");
            catch(IOException ie) {
                ie.printStackTrace();
        public static void main(String[] args) {
            writeHTML("c:/embeddedImage.html");
    }I just got an HTML file containing "<html><head></head><body><img src=file:/C:/image.jpg></body></html>".
    I've seen images get embedded in MS Outlook HTML e-mails. Does anyone know if this is done in a standard way (ie not through some MS-proprietary way), and if it can be done with an HTML file made by Java?

  • HTML formatted Email notification with embedded images

    I am using Oracle BPM 6.5 (Studio) and been spending my wheels on trying to send out an email from a process with formatting. I am able to send emails in plain text. I get a parseexception whenever I try to set the content type property of MAil object. Not sure what might be going on. Here is the sample code.
    reminderEmail as Mail
    reminderEmail = Mail()
    reminderEmail.from="[email protected]"
    reminderEmail.recipient="[email protected]"
    reminderEmail.subject="Review xx Information"
    reminderEmail.contentType = "Content-Type: text/html; charset=utf-8" (IT WORKS WITHOUT THIS LINE)
    reminderEmail.message = "<HTML><h2>Please check and resend the message</h2></HTML>"
    send reminderEmail
    It would be great if I can get a sample to send out HTML formatted emails. I know it should work because the automatic notification mail that is sent out by the engine is HTML. This is the one where it says "The instance can be accessed "here""
    Thanks so much

    If you want to send the html page and have it
    reference the images and css files on your web
    site, that's pretty easy. Just create a message
    with text/html content that is your html page.
    If you want to include all the images and css files
    in your message along with the html page, you'll
    need to create a multipart/related message and
    you'll need to change all the html to reference the
    images and css files using "cid:" references.

  • DW to create Word sigs with embedded images?

    This may be a bit off topic, but I've tried in many other
    places....
    Microsoft Word can be such crap. I simply wanted to create an
    email sig with an embedded graphic, but Word screws up in wonderful
    ways.
    I need a very simple html file with some formatted text and
    en embedded graphic, so I thought I'd use DW (I use 2004). What I
    don't understand is stuff about CID's. When you use Word as an html
    editor for Outlook, if things work OK then you can embed a graphic,
    and the html will point to an image source as a CID which I guess
    means the data is embedded in the mime stream. Can I accomplish
    this in DW? Whenever I try my image source points to a local file
    which is obviously no good. I don't want it pointing to an image on
    my server either. Any clues??
    Thanks
    James

    Posted by mistake to this forum - I'll repost into General.
    Soz James.

  • RichEditableText with embedded images does not handle mouse events reliably

    I'm using Flash Builder "Burrito".  downloaded a couple of weeks ago.  Flash Player 10.1.85.3 debug version.
    I have the following MXML object:
    <s:Scroller
    width="100%" height="100%"
    xmlns:fx="http://ns.adobe.com/mxml/2009"
    xmlns:s="library://ns.adobe.com/flex/spark"
    xmlns:mx="library://ns.adobe.com/flex/mx"
    skinClass="components.skins.SxScrollerSkin"
    >
    <s:Group id="myGroup" width="100%" height="100%">
    <s:RichEditableText id="myRichText" >
    </s:RichEditableText>
    <!--- Do not set the height of the RichEditableText - since it seems to prevent the appearance of the vertical scroll bars -->
    </s:Group>
    </s:Scroller>
    With the following Actionscript to initialize, etc.
                textContainer = new SxBorderContainer(name);
                textContainer.dx = dxAvailable;
                textContainer.dy = dyAvailable;
    scroller = new SxScroller();
    scroller.move(dxPadding,dyPadding);
    scroller.dx = dxAvailable
    scroller.dy = dyAvailable;
    textContainer.addElement(scroller);
    var richText:RichEditableText;
    richText = scroller.richText;
    richText.toolTip = toolTip;
    richText.enabled = true; // required for mouse click capture
    richText.selectable = true; // required for mouse click capture
    richText.width = dxAvailable
         var textFlow:TextFlow = TextFlowUtil.importFromString(textFlowString, WhiteSpaceCollapse.PRESERVE);
          richText.textFlow = textFlow;
    And the following code to catch events:
                    richText.addEventListener(FlowElementMouseEvent.MOUSE_DOWN,userMouseEvent);
                    richText.addEventListener(FlowElementMouseEvent.MOUSE_UP,userMouseEvent);
                    richText.addEventListener(FlowElementMouseEvent.CLICK,userMouseEvent);
    Here's the problem.   If I have embedded images in the TextFlow that is imported into the richText, I only catch "some" mouse clicks.  It's hard to know, but it seems that only mouse clicks into white space (between paragraphs) are caught.  Mouse up and down are caught, but not "click".  Very puzzling.
    TextFlow like this:
        <TextFlow>
          <div color="#442222" fontFamily="Times New Roman" fontSize="20" paragraphSpaceAfter="15" textIndent="15">
            <p>
              <span>
                Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do:  once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, `and what is the use of a book,' thought Alice `without pictures or conversation?'
              </span>
            </p>
            <p>
                  <span>So she was considering in her own mind</span>
                  <img source="assets/library/alice/images/White Rabbit.png" height="auto" width="auto" float="left" />
                   <span> (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her.</span>
            </p>
    If there are no embedded imags, I get all clicks exactly.
    Is it a bug?  Am I missing something? 
    Thanks,
    Oz

    Thanks for the answer. First I need clicks anywhere in the RichEditableText (no links or images). I use the selection manager to find the exact word that was clicked.
    The next step would be to capture clicks on links and images.
    I have tried using MouseEvent and had the same result (or worse) as FlowElementMouseEvent.  I will go back and retest with your suggestion.
    Oz
    Result of retest:
    I catch clicks only over images embedded in the TextFlow.  I catch only mouse up/down over text - both MouseEvent and FlowElementMouseEvent.
    Puzzling.  Where are the clicks going?

Maybe you are looking for