Exporting PDF text to html

How can I export PDF text and post the exported text on a web page, to which I can then apply Google Translate?  Our organization post PDF articles from our journal.  (I can manually block and copy the text, so I know the text can be captured.)  I want a program/app/software to run on our website that will allow a user to extract the text from the PDF and display the text as html.  From there, the user can apply Google Translate.  So does anyone know how I can do this?  It doesn't seem like a difficult task -- I can do it manually -- but I want an app that will do it automatically.

Thanks for the reply.  Do you have any idea how I could do what I want to do, perhaps with some other software?

Similar Messages

  • InDesign, export PDF texte manquant

    Bonjour,
    Je réalise en ce moment une maquette, et j'utilise un fond en ton direct PANTONE 426 C sur lequel je place des images en ton direct de la même couleur.
    J'ai également ajouté du texte par dessus, et à l'écran, tout fonctionne à merveille… jusqu'à ce que j'exporte le .PDF ; le texte est absent du fichier exporté.
    De plus, étant novice en la matière, la couleur n'est pas identique à celle afichée dans InDesign mais je peux admettre que c'est normal, ou pas.
    Voilà, en espérant qu'une personne puisse m'aider,
    Sincèrement,
    Jules

    Je te remercie pour ta réponse, j'ai vérifié que le bloc soit imprimable et effectivement, ce n'était pas le cas — il apparaît désormais lorsque j'exporte le fichier.
    Concernant la couleur, je te laisse jeter un œil aux options d'export ci-dessous…

  • Display PDF itab in HTML viewer w/o file

    Hello!
    If I download PDF internal table (for example, converted from spool) to file, I can easily display it in a CL_GUI_HTML_VIEWER control. Is there a way to pass PDF data directly to HTML viewer, without using intermediate file?
    Thanks!
    Kind regards,
    Igor Barbaric

    You need to use the following methods of htmlviewer class
    call method l_html_control->load_data
        exporting
          type                 = 'text'
          subtype              = 'html'
        importing
          assigned_url         = l_url
        changing
          data_table           = l_new_html_page
        exceptions
          others               = 4.
      if sy-subrc <> 0.
        message id sy-msgid type sy-msgty number sy-msgno
                   with sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
      endif.
      call method l_html_control->show_data
        exporting
          url                  = l_url
        exceptions
          others                 = 3.
      if sy-subrc <> 0.
        message id sy-msgid type sy-msgty number sy-msgno
                   with sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
      endif.
    l_new_html_page will hold the PDF data and also
    type                 = 'text'
          subtype              = 'html'
    has to maintained for PDF.
    I havent tried this but it should work.
    Regards
    Raja

  • Export pdf as html by Acrobat 9.2 on windows 7

    Dears,
    i have a problem with exporting pdf as html by Acrobat 9.2 on windows 7.
    after exporting, images and text may have wrong positions or wrong width and height of the images.
    is the problem in the compatibility between acrobat 9.2 and windows 7 ?
    what can i do ?
    Thank you in advance....
    amt

    thank you, but i think that i can get 90% representation of the pdf,
    but that didn't occure,
    also i saw examples for some tools which can do that, but for pdf version 1.5 and i think that is on old windows too.

  • Exporting pdf's to text

    I have Acrobat 8.0 Standard.
    I can search for words... so the pdf's are not an image but were created from another program. I do not have access to the original files. I only have the pdf's.
    When I export to text, html, word doc, rtf, etc, the text is "mostly" right. But there are many instances where characters are just in the wrong spot.
    i.e., say the pdf has a couple of lines of text like this
    District 24 District 205 District 216
    389 .....
    The corresponding export text looks like this:
    istrict 24 Distict 25 District 215
    Dr0389 ......
    "Dr0389" is the problem. The "D" is from "District 24" the "r" is from "District 205" and the "0" is also from "District 205".
    If I use the select icon and right arrow over the document. It moves from the first D, to the r, to the zero, and then to the number 389 on the next line.
    Any ideas? or is it just the engine that converted the original document to pdf that has messed up?
    Thanks!

    I know PDFs are not intended to be edited, but sometimes you have no choice, and it isn't all that rare. Opening in Illustrator can work after a fashion, but that's no easy trick.
    We produce all our PDF newsletters now using InDesign, but for a few years they were made using Quark, and those original Quark files are long gone - all that remains is the PDFs. We are adding these old issues to a searchable database, so I need to add XML tags to all the stories. I haven't figured out a way to do that in the PDF, so I'm trying to put it back into some sort of form where tagging is possible. The pages in these documents are fairly highly designed - text in three columns, pull quotes in boxes centered on the page overlapping all three columns, graphics, etc. - so selecting the text from a whole story is challenging, to say the least. But while labor intensive, it is possible to copy the text, paste into a Word document, then manually kill all the unwanted stuff that came along with the Copy and use a macro of several Find/Replace routines to get rid of all the spurious paragraph returns.
    I investigated a promising solution offered by Recosoft (www.recosoft.com) called, appropriately, PDF2ID. It lets you set up parameters, and auto-converts a PDF to a fully-formatted ID file. They offer a somewhat free demo that's both brain-dead (after the first page, the text in the rest of the document is replaced by x's) and time limited. It does a good job of conversion, though. Multiple-column type is rendered with each column as a separate story, so it has to be re-threaded. And it's fairly expensive at $249. But it might be worth a look if you need to do a lot of this.
    I'm proceeding with my labor intensive approach. I just have to do about 30 issues, each of which I can convert to taggable Word docs in about a half hour. Very tedious, so I'm doing one a day. Then I'm done.

  • Keynote PDF Export - black text appears white on some computers, but not all - why?

    This problem seems to have been going on forever, why is there STILL no solution? I'm really keen to mover over to Keynote, cos Powerpoint is very cumbersonme and frustrating in my experience, but I'll only ever use Keynote to export PDFs for presentations, so if it can't even get this basic function right, there's really no point in abandoning Powerpoint - at least I know my docs will come out fine and won't have to be embarrassed when clients come back to me frustrated cos they can't read the text!
    Any thoughts?

    That would because some of the colors you are using are RGB and are being converted to cmyk.
    Make a color swatch in the Color specifier and set the ICC profile in the small rainbow square next to the pull down menu in the color sliders tab.
    Then use that color swatch in set styles to ensure you have the same color set on all your objects and text.
    You should not be using jpegs if you want to ensure an exact color. jpegs are not suitable for flat areas of color.
    Peter

  • Exporting to PDF: Text as black and white only possible?

    Hi!
    I do have a large document where 30% of the pages do use colors 70% plain text. I'm asked by the printing company to export the text only pages as plain black and white PDF's. They showed me the function on Windows and Word. Is it possible with pages, too?
    If it helps, I do have the Adobe Creative Suite Design Premium (with Acrobat).
    Thanks,
    Matt

    Yes it is but you have to fix your text so that it is styled to black only.
    Click in the text and go command a (select all):
    Menu > View > Colors
    Click on the sliders icon at the top of the pallette and choose Grey Scale Slider and then the 0% swatch.
    Resave any styles you have used and fix any text or objects that are not part of your main text.
    You can also open your pdf file in the ColorSync Utility found in your Utilities folder inside your Applications folder and apply Filters > Black & White
    Peter

  • Keep exported image size in HTML as shown in PDF

    I have many inline formulas (imported from Word file via using mathtype) in PDF article made by ID. But when I export the articles as HTML, the images for formulas become much larger than shown in PDF version. How can i keep the exported images to the same size as they are shown the PDF file? I know I can edit the html file to specify image size, but that is the ideal work flow.
    Thank you.

    Hi Eric,
    If i'm getting you right. you have the formulas placed as the images.
    So, to keep the size fixed, In HTML export Options in image tab keep the image size as fixed.
    Snapshot to refer:
    Now you have image of same size as it appears in InDesign.

  • It concerns adobe export pdf program. When we open  this program, it appears on the right of the screen." recognize the text in english"but. we would like to change it for french language. Because when we export the document under word program , it uses

    It concerns adobe export pdf program. When we open  this program, it appearsq on the right of the screen." recognize the text in english"but. we would like to change it for french language. Because when we export the document under word program , it uses english dictonnary to correct the text. thanks for your answer..

    [topic moved from Developers to Acrobat.com forum]

  • How to export PDF to HTML with JPEG image format (not PNG)?

    Hello,
    When I export a ".pdf" file to ".html", using Acrobat 11 Pro, the program creates a subdirectory with ".png" image files.
    I need these images to be in the ".jpg" format, not ".png".
    Do any of you know how to change this setting? I am assuming that it is not a permanent default...
    Thank you,
    brivera0

    Alas, I checked on my Acrobat XI before posting. That setting was removed.

  • Anti-aliased text when exporting PDF to image

    I need to be able to batch-convert multi-page PDFs to individual bitmap images (one image for each page) with anti-aliased text.
    Photoshop works this way if you open a single PDF, allowing you to select one or more pages to rasterize as separate images, but not when batch processing (specifically, if you use the Image Processor script on a folder of images and PDFs, it will rasterize the PDFs automatically, but it will only do so with the first page of each PDF.)
    Acrobat, on the other hand, automatically creates an image for each page when exporting, and can do this in a batch sequence, but the text is not anti-aliased, making the image look like a screenshot from 1997. No matter how high an image resolution you select, the text is still jagged when you zoom in.
    So, is there a setting I'm missing that will allow the text to be anti-aliased when using Acrobat to export PDF to an image? I am using Acrobat 8, not 9, so something might have changed in the newest version.

    Not sure about Acrobat Pro 8, but in Acrobat Pro 9 (not Extended) you can Export>Image>Multiple Choice: JPEG, JPEG2000, PNG, TIFF.  I used JPEG and under the options in the export dialog box, leave the filename as is to coincide with the PDF filename and then choose Maximum Resolution under File Settings: Grayscale (JPEG, Quality:Maximum); Color (JPEG, Quality:Maximum) . . . skip down to Conversion Colorspace: Determine Automatically and Resolution choose 600pixels/inch for a letter size document.  This will result in a file size of 1.3MB per JPEG image if there is not a lot of information on the page.  I chose a simple header, footer with page numbering, and 5 lines of Lorem Ipsum text.  600dpi is overkill, you can go for 300dpi and still result in a decent image that will be able to be printed on a laser photocopier that is connected to a production computer.  Obviously if you are printing to a laser printer or a high quality inkjet 300dpi will suffice as well for a letter sized document.  But I have been told that 300dpi is not a standard rule of thumb and you must obtain specs from your printer since he/she can calculate by very strict rules the dpi you need for your content.  It depends on whether you have background images such as watermarks and also if your text body contains line-art.

  • Working with Exported PDF in Word - text jumps around

    I just purchased an Adobe Export PDF subscription and am having trouble working with the document once it has been exported to Word.  The cursor jumps from place to place, as do words.

    Try selecting the text you want to change within Word, then choosing a different font for that text.  I've seen this really help with the text-jumpy issue that occurs from time to time in files converted with ExportPDF. The underlying cause is usually a poorly or incorrectly embedded font in the original PDF.

  • When trying to export pdf to .docx the text doesn't convert. how do I remedy this?

    How do I get exported pdf to copy exact document, text, graphics and all?

    Hi hamsa142,
    I'm sorry that you're having trouble converting your PDF files to Word. I see you posted a similar question earlier. Please try disabling OCR as outlined here: How to disable Optical Character Recognition (O... | Adobe Community
    Let us know how it goes.
    Best,
    Sara

  • Will Export PDF transfer graphics to Word as well as text?

    Does Export PDF transfer graphics as well as text to Word? 

    Hi Mary,
    Yes it will as long as the entire file is below 100 MB. Let me know if you have further questions!
    Regards, Stacy

  • NEED HELP! How do I export UTF-8 encoded HTML to PDF?

    Right now my HTLM to output is in Japanese, and encoded in UTF-8 format.
    I tried to use html_entity_decode(), however that does not work, all I get are a bunch of weird characters.
    If anyone have any experience with exporting UTF-8 encoded HTML in Japanese or any foreign glyph-like language to PDF, please help me on this.

    Ellis home wrote:
    In File/Document Setup do you have bleed set to 0?
    Yeah.
    [Jongware] wrote:
    What are 'the obvious export settings'? You are looking at these as well?
    Right. Export > Marks and Bleeds. (I was posting from my phone and couldn't check InDesign to see what it was called.) Those are my settings, at any rate.
    With its default settings, there are no 'blue' bleed lines. You may have changed its color (this can be done in Preferences), or you may mistake the default Purple of margins for blue, or you may mistake any empty frame for 'bleed' (as the default 'new layer' color is blue). Or maybe someone did draw a blue frame to act as 'Bleed' -- but calling it that is not going to work. InDesign can only use its own kind of Bleed, the one you set up in Page Size, that displays outside of the Document Page, and that you can switch off in the PDF Export dialog
    Sorry, I was wrong about about the colour, too. I need it to be cropped to the line I have circled in red - it's the innermost border.

Maybe you are looking for

  • The applicatio​n has failed to start dll bcaz libmex.dll was not found. Re-install​ing the applicatio​n may fix this problem

    Hi ,   I am new to Labview & Matlab related Software. I am using LabView 8.20 and Matlab 6.5.1.199709 . I generated simple .dll(for adding 2 numbers) file with the help of Matlab and trying to use that dll file in Labview using "Call Library Function

  • Need help: Problems with exported Dreamweaver html file in different browsers

    I'm trying to send out a newsletter, and had our designer create it in Dreamweaver, and then export and html file which we uploaded.  For some reason it looks wrong on Mac's, Gmail, and a few issues with Outlook.  Does anyone know how to help with th

  • Parse multiple files in one flat file?

    Hi all, I'm currently working with flat file with  this kind of structure: "849000","1","2","3","4"             <- begin of file "849HD","","1939","12"              <- header level "849D1","39193","313","1"         <- detail level "849D2","","descrip

  • Archiving billing invoices

    Hi, I am working on Billing documents( T Code VF02). I am working on two programs the first one uses SCRIPTS. and the second is SMARTFORM. The reqquirement is for the o/p type i have assigned the ARCHIVE ONLY mode. Here I am suppose to get the archiv

  • ASCP Server time change

    Hello, The current server time for global ASCP implementation is US-eastern time. We would like to change it to India time (IST), so the plans can be launched earlier and as the next day starts earlier. Any ideas are appreciated. Regards, Sash