Exporting PDF text to html

How can I export PDF text and post the exported text on a web page, to which I can then apply Google Translate? Our organization post PDF articles from our journal. (I can manually block and copy the text, so I know the text can be captured.) I want a program/app/software to run on our website that will allow a user to extract the text from the PDF and display the text as html. From there, the user can apply Google Translate. So does anyone know how I can do this? It doesn't seem like a difficult task -- I can do it manually -- but I want an app that will do it automatically.

Thanks for the reply. Do you have any idea how I could do what I want to do, perhaps with some other software?

Similar Messages

InDesign, export PDF texte manquant

Bonjour,
Je réalise en ce moment une maquette, et j'utilise un fond en ton direct PANTONE 426 C sur lequel je place des images en ton direct de la même couleur.
J'ai également ajouté du texte par dessus, et à l'écran, tout fonctionne à merveille… jusqu'à ce que j'exporte le .PDF ; le texte est absent du fichier exporté.
De plus, étant novice en la matière, la couleur n'est pas identique à celle afichée dans InDesign mais je peux admettre que c'est normal, ou pas.
Voilà, en espérant qu'une personne puisse m'aider,
Sincèrement,
Jules

Je te remercie pour ta réponse, j'ai vérifié que le bloc soit imprimable et effectivement, ce n'était pas le cas — il apparaît désormais lorsque j'exporte le fichier.
Concernant la couleur, je te laisse jeter un œil aux options d'export ci-dessous…

Display PDF itab in HTML viewer w/o file

Hello!
If I download PDF internal table (for example, converted from spool) to file, I can easily display it in a CL_GUI_HTML_VIEWER control. Is there a way to pass PDF data directly to HTML viewer, without using intermediate file?
Thanks!
Kind regards,
Igor Barbaric

You need to use the following methods of htmlviewer class
call method l_html_control->load_data
    exporting
      type                 = 'text'
      subtype              = 'html'
    importing
      assigned_url         = l_url
    changing
      data_table           = l_new_html_page
    exceptions
      others               = 4.
if sy-subrc <> 0.
    message id sy-msgid type sy-msgty number sy-msgno
               with sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
endif.
call method l_html_control->show_data
    exporting
      url                  = l_url
    exceptions
      others                 = 3.
if sy-subrc <> 0.
    message id sy-msgid type sy-msgty number sy-msgno
               with sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
endif.
l_new_html_page will hold the PDF data and also
type                 = 'text'
      subtype              = 'html'
has to maintained for PDF.
I havent tried this but it should work.
Regards
Raja

Export pdf as html by Acrobat 9.2 on windows 7

Dears,
i have a problem with exporting pdf as html by Acrobat 9.2 on windows 7.
after exporting, images and text may have wrong positions or wrong width and height of the images.
is the problem in the compatibility between acrobat 9.2 and windows 7 ?
what can i do ?
Thank you in advance....
amt

thank you, but i think that i can get 90% representation of the pdf,
but that didn't occure,
also i saw examples for some tools which can do that, but for pdf version 1.5 and i think that is on old windows too.

Exporting pdf's to text

I have Acrobat 8.0 Standard.
I can search for words... so the pdf's are not an image but were created from another program. I do not have access to the original files. I only have the pdf's.
When I export to text, html, word doc, rtf, etc, the text is "mostly" right. But there are many instances where characters are just in the wrong spot.
i.e., say the pdf has a couple of lines of text like this
District 24 District 205 District 216
389 .....
The corresponding export text looks like this:
istrict 24 Distict 25 District 215
Dr0389 ......
"Dr0389" is the problem. The "D" is from "District 24" the "r" is from "District 205" and the "0" is also from "District 205".
If I use the select icon and right arrow over the document. It moves from the first D, to the r, to the zero, and then to the number 389 on the next line.
Any ideas? or is it just the engine that converted the original document to pdf that has messed up?
Thanks!

I know PDFs are not intended to be edited, but sometimes you have no choice, and it isn't all that rare. Opening in Illustrator can work after a fashion, but that's no easy trick.
We produce all our PDF newsletters now using InDesign, but for a few years they were made using Quark, and those original Quark files are long gone - all that remains is the PDFs. We are adding these old issues to a searchable database, so I need to add XML tags to all the stories. I haven't figured out a way to do that in the PDF, so I'm trying to put it back into some sort of form where tagging is possible. The pages in these documents are fairly highly designed - text in three columns, pull quotes in boxes centered on the page overlapping all three columns, graphics, etc. - so selecting the text from a whole story is challenging, to say the least. But while labor intensive, it is possible to copy the text, paste into a Word document, then manually kill all the unwanted stuff that came along with the Copy and use a macro of several Find/Replace routines to get rid of all the spurious paragraph returns.
I investigated a promising solution offered by Recosoft (www.recosoft.com) called, appropriately, PDF2ID. It lets you set up parameters, and auto-converts a PDF to a fully-formatted ID file. They offer a somewhat free demo that's both brain-dead (after the first page, the text in the rest of the document is replaced by x's) and time limited. It does a good job of conversion, though. Multiple-column type is rendered with each column as a separate story, so it has to be re-threaded. And it's fairly expensive at $249. But it might be worth a look if you need to do a lot of this.
I'm proceeding with my labor intensive approach. I just have to do about 30 issues, each of which I can convert to taggable Word docs in about a half hour. Very tedious, so I'm doing one a day. Then I'm done.

Keynote PDF Export - black text appears white on some computers, but not all - why?

This problem seems to have been going on forever, why is there STILL no solution? I'm really keen to mover over to Keynote, cos Powerpoint is very cumbersonme and frustrating in my experience, but I'll only ever use Keynote to export PDFs for presentations, so if it can't even get this basic function right, there's really no point in abandoning Powerpoint - at least I know my docs will come out fine and won't have to be embarrassed when clients come back to me frustrated cos they can't read the text!
Any thoughts?

That would because some of the colors you are using are RGB and are being converted to cmyk.
Make a color swatch in the Color specifier and set the ICC profile in the small rainbow square next to the pull down menu in the color sliders tab.
Then use that color swatch in set styles to ensure you have the same color set on all your objects and text.
You should not be using jpegs if you want to ensure an exact color. jpegs are not suitable for flat areas of color.
Peter

Exporting to PDF: Text as black and white only possible?

Hi!
I do have a large document where 30% of the pages do use colors 70% plain text. I'm asked by the printing company to export the text only pages as plain black and white PDF's. They showed me the function on Windows and Word. Is it possible with pages, too?
If it helps, I do have the Adobe Creative Suite Design Premium (with Acrobat).
Thanks,
Matt

Yes it is but you have to fix your text so that it is styled to black only.
Click in the text and go command a (select all):
Menu > View > Colors
Click on the sliders icon at the top of the pallette and choose Grey Scale Slider and then the 0% swatch.
Resave any styles you have used and fix any text or objects that are not part of your main text.
You can also open your pdf file in the ColorSync Utility found in your Utilities folder inside your Applications folder and apply Filters > Black & White
Peter

Keep exported image size in HTML as shown in PDF

I have many inline formulas (imported from Word file via using mathtype) in PDF article made by ID. But when I export the articles as HTML, the images for formulas become much larger than shown in PDF version. How can i keep the exported images to the same size as they are shown the PDF file? I know I can edit the html file to specify image size, but that is the ideal work flow.
Thank you.

Hi Eric,
If i'm getting you right. you have the formulas placed as the images.
So, to keep the size fixed, In HTML export Options in image tab keep the image size as fixed.
Snapshot to refer:
Now you have image of same size as it appears in InDesign.

It concerns adobe export pdf program. When we open this program, it appears on the right of the screen." recognize the text in english"but. we would like to change it for french language. Because when we export the document under word program , it uses

It concerns adobe export pdf program. When we open this program, it appearsq on the right of the screen." recognize the text in english"but. we would like to change it for french language. Because when we export the document under word program , it uses english dictonnary to correct the text. thanks for your answer..

[topic moved from Developers to Acrobat.com forum]

How to export PDF to HTML with JPEG image format (not PNG)?

Hello,
When I export a ".pdf" file to ".html", using Acrobat 11 Pro, the program creates a subdirectory with ".png" image files.
I need these images to be in the ".jpg" format, not ".png".
Do any of you know how to change this setting? I am assuming that it is not a permanent default...
Thank you,
brivera0

Alas, I checked on my Acrobat XI before posting. That setting was removed.

Anti-aliased text when exporting PDF to image

I need to be able to batch-convert multi-page PDFs to individual bitmap images (one image for each page) with anti-aliased text.
Photoshop works this way if you open a single PDF, allowing you to select one or more pages to rasterize as separate images, but not when batch processing (specifically, if you use the Image Processor script on a folder of images and PDFs, it will rasterize the PDFs automatically, but it will only do so with the first page of each PDF.)
Acrobat, on the other hand, automatically creates an image for each page when exporting, and can do this in a batch sequence, but the text is not anti-aliased, making the image look like a screenshot from 1997. No matter how high an image resolution you select, the text is still jagged when you zoom in.
So, is there a setting I'm missing that will allow the text to be anti-aliased when using Acrobat to export PDF to an image? I am using Acrobat 8, not 9, so something might have changed in the newest version.

Not sure about Acrobat Pro 8, but in Acrobat Pro 9 (not Extended) you can Export>Image>Multiple Choice: JPEG, JPEG2000, PNG, TIFF. I used JPEG and under the options in the export dialog box, leave the filename as is to coincide with the PDF filename and then choose Maximum Resolution under File Settings: Grayscale (JPEG, Quality:Maximum); Color (JPEG, Quality:Maximum) . . . skip down to Conversion Colorspace: Determine Automatically and Resolution choose 600pixels/inch for a letter size document. This will result in a file size of 1.3MB per JPEG image if there is not a lot of information on the page. I chose a simple header, footer with page numbering, and 5 lines of Lorem Ipsum text. 600dpi is overkill, you can go for 300dpi and still result in a decent image that will be able to be printed on a laser photocopier that is connected to a production computer. Obviously if you are printing to a laser printer or a high quality inkjet 300dpi will suffice as well for a letter sized document. But I have been told that 300dpi is not a standard rule of thumb and you must obtain specs from your printer since he/she can calculate by very strict rules the dpi you need for your content. It depends on whether you have background images such as watermarks and also if your text body contains line-art.

Working with Exported PDF in Word - text jumps around

I just purchased an Adobe Export PDF subscription and am having trouble working with the document once it has been exported to Word. The cursor jumps from place to place, as do words.

Try selecting the text you want to change within Word, then choosing a different font for that text. I've seen this really help with the text-jumpy issue that occurs from time to time in files converted with ExportPDF. The underlying cause is usually a poorly or incorrectly embedded font in the original PDF.

When trying to export pdf to .docx the text doesn't convert. how do I remedy this?

How do I get exported pdf to copy exact document, text, graphics and all?

Hi hamsa142,
I'm sorry that you're having trouble converting your PDF files to Word. I see you posted a similar question earlier. Please try disabling OCR as outlined here: How to disable Optical Character Recognition (O... | Adobe Community
Let us know how it goes.
Best,
Sara

Will Export PDF transfer graphics to Word as well as text?

Does Export PDF transfer graphics as well as text to Word?

Hi Mary,
Yes it will as long as the entire file is below 100 MB. Let me know if you have further questions!
Regards, Stacy

NEED HELP! How do I export UTF-8 encoded HTML to PDF?

Right now my HTLM to output is in Japanese, and encoded in UTF-8 format.
I tried to use html_entity_decode(), however that does not work, all I get are a bunch of weird characters.
If anyone have any experience with exporting UTF-8 encoded HTML in Japanese or any foreign glyph-like language to PDF, please help me on this.

Ellis home wrote:
In File/Document Setup do you have bleed set to 0?
Yeah.
[Jongware] wrote:
What are 'the obvious export settings'? You are looking at these as well?
Right. Export > Marks and Bleeds. (I was posting from my phone and couldn't check InDesign to see what it was called.) Those are my settings, at any rate.
With its default settings, there are no 'blue' bleed lines. You may have changed its color (this can be done in Preferences), or you may mistake the default Purple of margins for blue, or you may mistake any empty frame for 'bleed' (as the default 'new layer' color is blue). Or maybe someone did draw a blue frame to act as 'Bleed' -- but calling it that is not going to work. InDesign can only use its own kind of Bleed, the one you set up in Page Size, that displays outside of the Document Page, and that you can switch off in the PDF Export dialog
Sorry, I was wrong about about the colour, too. I need it to be cropped to the line I have circled in red - it's the innermost border.

Exporting PDF text to html

Similar Messages

Maybe you are looking for