Exporting pdf containing Unicode text

I tried to covert a pdf with unicode (Urdu language) text into MS Word. It produced all garbage. . . Any ideas?

Check FAQ for supported languages.

Similar Messages

Working with Exported PDF in Word - text jumps around

I just purchased an Adobe Export PDF subscription and am having trouble working with the document once it has been exported to Word. The cursor jumps from place to place, as do words.

Try selecting the text you want to change within Word, then choosing a different font for that text. I've seen this really help with the text-jumpy issue that occurs from time to time in files converted with ExportPDF. The underlying cause is usually a poorly or incorrectly embedded font in the original PDF.

Problems exporting PDF in CS5, text missing, images not transparent....

I am using InDesign CS5 v7.04 on a Mac running 10.8.1.
(This exact problem has happened to me in previous versions of ID as well though?)
When I export a document to PDF and view in Preview OR place into another ID document, every other page has an issue where some or all of the text is missing and images that should be transparent are not or are missing as well. Actually, the first two pages in my current document are OK, the rest, every other page is missing text, images, has non-transparent images that should be transparent... The font I am using is Avenir and the imaages are .png created in PS. The images are in the master.
WHY is this happening? What do I need to do to fix?
Thanks in advance for your assistance. I am a mostly self-taught ID user since version 1, just getting back into using it after a long break.

pixelpusher_mama wrote:
OK, sorry, yes, it looks the same in Acrobat as it does in Preview as it does when placed into ID.
I was able to save as a non-current version of PDF (Acrobat 4) and it shows correctly. (???) What am I missing, here? I mean, I have a work around, but?
So the original PDF is bad, rather than there being a problem with the file after placing. That implicates the original document, and since Acrobat 4 compatibility works it's probably either a transparency or layer issue.
Try trashin g your prefs and export again and see if it works. See Replace Your Preferences

Exporting pdf's to text

I have Acrobat 8.0 Standard.
I can search for words... so the pdf's are not an image but were created from another program. I do not have access to the original files. I only have the pdf's.
When I export to text, html, word doc, rtf, etc, the text is "mostly" right. But there are many instances where characters are just in the wrong spot.
i.e., say the pdf has a couple of lines of text like this
District 24 District 205 District 216
389 .....
The corresponding export text looks like this:
istrict 24 Distict 25 District 215
Dr0389 ......
"Dr0389" is the problem. The "D" is from "District 24" the "r" is from "District 205" and the "0" is also from "District 205".
If I use the select icon and right arrow over the document. It moves from the first D, to the r, to the zero, and then to the number 389 on the next line.
Any ideas? or is it just the engine that converted the original document to pdf that has messed up?
Thanks!

I know PDFs are not intended to be edited, but sometimes you have no choice, and it isn't all that rare. Opening in Illustrator can work after a fashion, but that's no easy trick.
We produce all our PDF newsletters now using InDesign, but for a few years they were made using Quark, and those original Quark files are long gone - all that remains is the PDFs. We are adding these old issues to a searchable database, so I need to add XML tags to all the stories. I haven't figured out a way to do that in the PDF, so I'm trying to put it back into some sort of form where tagging is possible. The pages in these documents are fairly highly designed - text in three columns, pull quotes in boxes centered on the page overlapping all three columns, graphics, etc. - so selecting the text from a whole story is challenging, to say the least. But while labor intensive, it is possible to copy the text, paste into a Word document, then manually kill all the unwanted stuff that came along with the Copy and use a macro of several Find/Replace routines to get rid of all the spurious paragraph returns.
I investigated a promising solution offered by Recosoft (www.recosoft.com) called, appropriately, PDF2ID. It lets you set up parameters, and auto-converts a PDF to a fully-formatted ID file. They offer a somewhat free demo that's both brain-dead (after the first page, the text in the rest of the document is replaced by x's) and time limited. It does a good job of conversion, though. Multiple-column type is rendered with each column as a separate story, so it has to be re-threaded. And it's fairly expensive at $249. But it might be worth a look if you need to do a lot of this.
I'm proceeding with my labor intensive approach. I just have to do about 30 issues, each of which I can convert to taggable Word docs in about a half hour. Very tedious, so I'm doing one a day. Then I'm done.

Trouble exporting pdf with Arabic text to .doc file

I am having an issue exporting a pdf from the trial of Acrobat Pro XI on a Windows 7 computer.
The pdf has a table with English fields in the left column and Arabic equivalents in the right column.
It appears to display fine within Acrobat, so I assume it has fonts embedded and that Acrobat is somehow handling the display of the Arabic font correctly.
I need to export this to .doc or .docx so that I can enter additional data.
When I convert to .doc, the English column and fields convert pretty good, but the Arabic fields do not display well at all. I am not sure if this is some sort of trouble with Word (or the Windows OS) requiring an Arabic font or Arabic text support to be installed on the system or if there is some other issue (such as Acrobat not handling the conversion well)??.
I did note that there is an Arabic Font add on to Acrobat that can be downloaded from Here:
http://www.adobe.com/support/downloads/thankyou.jsp?ftpID=4885&fileID=4558
However, since Acrobat is seeming to properly display the Arabic text, I don't know that this is required...additionally when I tried to install the Arabic font support from that URL, it told me that Acrobat X was require, I am using Acrobat XI reader (installed with the trial version of Acrobat pro that I wish to purchase if I can get this working properly).
Thanks in advance for any assistance or advice offered on this subject.
David Dean

Unfortunately our conversion engine cannot convert text, such as Arabic, which reads from right to left. It maybe something that we add in the future however it is not available at this time.
Please refer to the article mentioned below : http://forums.adobe.com/docs/DOC-1812
~Pranav

Exporting pdf to word - text becomes pic blocks not editable

Just got subscription online and have tried both online and thru READER. Whenever I convert .pdf to .docx it turns the text into pic block in Word, which I can move around and change the size of, but can't edit text. I followed similar thread and turned off OCR recognition in READER and it still did same thing.

Hi worshipdude,
With OCR turned off, ExportPDF won't convert scanned text to editable/searchable text. Therefore, if the PDF you're trying to convert was created from a scanned document with image text, then the Word document will also have image text.
Please convert without OCR and then triple-click in the Word document to select the text. Did that do the trick?
Best,
Sara

How to find pdf contain hide text?

I have attached screen shot to find hidden text using Examine Document.
I have raise so many question, still i am not get solution for problem.
can any one help me to find solution?
I need to find the hidden text is exist in PDF using SDK.

I used below code to extract text in pdf, due to hidden text some text is not extract.
if (startPg < 0 || endPg <0 || startPg > endPg || endPg > PDDocGetNumPages(pdDoc) - 1)
AVAlertNote("Exceeding starting or ending page number limit of current document.");
return false;
PDWordFinder pdWordFinder = NULL;
DURING
pdWordFinder = PDDocCreateWordFinderEx(pdDoc, WF_LATEST_VERSION, toUnicode, pConfig);
if (toUnicode) fprintf(pOutput, "%c%c", 0xfe, 0xff);
for (int i = startPg; i <= endPg; i++)
PDWordFinderEnumWords(pdWordFinder, i, ASCallbackCreateProto(PDWordProc, &WordEnumProc), pOutput);
PDWordFinderDestroy(pdWordFinder);
E_RETURN(true);
HANDLER
char buf[256], errmsg[256];
sprintf(buf, "[ExtractText()]Error %d: %s", ErrGetCode(ERRORCODE), ASGetErrorString(ERRORCODE, errmsg, sizeof(errmsg)));
AVAlertNote(buf);
if (pdWordFinder) PDWordFinderDestroy(pdWordFinder);
if( pOutput) fclose(pOutput);
return false;
END_HANDLER

Export PDF Workflow with Applescript and CS3

Hello,
I am setting up some PDF workflow with Applescript.
On a given moment, as my script runs and after getting some user-input answers to questions in some dialogs, my script tells InDesign CS3 to open the Export Adobe PDF window for the current document. I copied and pasted that small part of the script:
tell application "Adobe InDesign CS3"
tell document 1
export format PDF type to "Macintosh_HD:Test01.pdf" using "somePreset" with showing options
end tell
end tell
When you run this small part of my Applescript, InDesign opens the Export Adobe PDF window (as expected) waiting for me to click on "Export". That is exactly what I want, since the user is given here a last opportunity to change some values (for example page range, or spreads). When all is set, the user can click on Export to close the dialog and finish the script.
Problem: I was hoping that the Adobe PDF Preset "somePreset" would be selected in the first pull-down menu of the Export Adobe PDF window when this window is opened by the script. Unfortunately the last used preset is always selected by default. Anyone suggestions or help?
Kind regards,
Bertus Bolknak.

My operators enter the page range and filename into a dialog box. Then I set those in the script. I use the Press Quality preset to start with and then set the changes I want into a export variable. I set things like bleed, marks, page range, etc.
Here is an example:
set theProps to properties of PDF export preset "[Press Quality]"
try
delete PDF export preset "Schmidt PDF"
end try
set theStyle to {name:"Schmidt PDF", acrobat compatibility:acrobat 7, bleed top:"0.125i", bleed bottom:"0.125i", bleed inside:"0.125i", bleed outside:"0.125i", page marks offset:"0.125i", include ICC profiles:Include None, effective PDF destination profile:use no profile, effective PDF X profile:"No Color Conversion"} & theProps
make PDF export preset with properties theStyle
set properties of PDF export preferences to theStyle
set color bitmap sampling of PDF export preferences to none
set grayscale bitmap sampling of PDF export preferences to none
set page range of PDF export preferences to (item i of myPageList) as string
export document 1 format PDF type to (PrinergyFolder & myJobNumFinal & "_" & VerCode & ".pdf") as Unicode text without showing options
I am also doing this in Quark.

Help - cannot edit a pdf containing bridge hands

I have spent over a week trying to edit the text from a pdf containing bridge hands, similar to what you see when you look at a newspaper bridge column. I downloaded the trial version of Acrobat XI last week, intending to purchase it if I can get it to work, but I cannot. I have attached a jpg of page one. Can anyone help?

Does that PDF contain actual text, or is it a scanned image?
If you are unsure, share the actual PDF itself: http://forums.adobe.com/thread/1070933
If the document is a scanned image, you will need to perform text recognition (OCR) on it before you can edit the text.

Indesign CS3 text frame parameters and export PDF

could use some help with the following:
I need to edit and export a large number (7000) Indesign documents (one page)
1. check for locked text frames and unlock
2. group all text frames
3. set grouped text frames at x=8 millimeters, y 10 milimeters
4. export the documents to PDF in a subfolder called "Out"
5. save and close the documents in the same subfolder as an Indesign CS3 document (orig is CS2)
This is what I've been trying soo far:
Now only checking one text frame - should be all text frames
"close document 1 saving yes" doesn't work because the originals are from CS2
I get a PDF called "Adobe Indesign SC3"
set processFolder to choose folder with prompt "Choose a folder that contains Innd Docs to process"
tell application "Finder"
if not (exists folder "OUT" of processFolder) then
make new folder at processFolder with properties {name:"OUT"}
end if
set the destination_folder to folder "OUT" of processFolder as alias
end tell
tell application "Finder"
try
set listFiles to (files of contents of processFolder) as alias list
on error
set listFiles to (files of contents of processFolder) as alias as list
end try
repeat with thisFile in listFiles
tell application "Adobe InDesign CS3"
with timeout of 120 seconds
activate
set properties of view preferences to {horizontal measurement units:millimeters, vertical measurement units:millimeters, ruler origin:page origin}
open thisFile
set myDoc to document 1
set docName to name
tell myDoc
set transform reference point of layout window 1 to top left anchor
set myBox to text frame 1 of page 1
set properties of myBox to {locked:false}
move myBox to {8, 10}
end tell
export document 1 format PDF type to (destination_folder as string) & docName & ".pdf" using PDF export preset "[Drukwerkkwaliteit]" without showing options
close document 1 saving yes
tell application "Finder" to move thisFile to destination_folder with replacing
end timeout
end tell
end repeat
end tell
end
end
Any help is greatly appreciated - Doing this manually is a lot of work!!!!!
Peter

You're asking for the name of the application, not the document -- you need to do it after your "tell myDoc". You could also get the name from the alias you open, rather than the open document, something like: repeat with thisFile in listFiles set oldDelims to AppleScript's text item delimiters set AppleScript's text item delimiters to {":"} set docName to text item -1 of (thisFile as Unicode text) set AppleScript's text item delimiters to oldDelims You should also move your "set properties of view preferences" line to after your "tell myDoc". -- Shane Stanley <[email protected]>

Trying to OCR pdf, pdf says it can't perform bc it already contains renderable text-but does not.

I work for a large agency, and we receive PDF's all the time. 98% of the time I am able to OCR a document with no issues. Just recently I have come across this issue several times, and was wondering if anyone can solve this irritating problem!
*Acrobat 8.1 - When going to OCR the document, I receive the following message " Acrobat could not perform recognition (OCR) on this page because this page already contains renderable text. However, it does not. When you go to select text or search for anything the whole page is selected (like it's still in a "picture" format, not a document format that you can search, ect.)
I am not sure if it is how the document is uploaded originally by the other party that causes this, but the only thing I can do as a work-around - is to print out the entire document, scan and then I can OCR the document just fine! The problem is, if the document is 400 pages or so, this can be a huge waste of time, and money just to be able to search the PDF.
*I have also checked the pdf properties to see if this is some sort of permissions issue, and there are not permissions/security settings in place.*
PLEASE HELP! Any assistance in this matter would save me a lot of time, and of course (my sanity!).
Thank you in advance!

While the alert speaks to "renderable text" that is a simplification. The issue is that you've PDF page content consisting of at least one renderable "character".
Look at font families - you will observe that there are many characters that are not "text" characters (i.e., linguistic characters).
So, there's a "renderable character" present. It may be an alpha numeric that has a font color the same as the page background. It may be under the image and thus not visible to the eye.
You might be able to determine just what is present.
You could export the page of interest to a text file then view that file.
You could deplay the page of interest in Acrobat Pro then select the "Content panel" to view the content tree.
Locate and click on the page number for the page of interest.
From the Content panel's Options menu select "Highlight Content".
Walk down the tree. Select the content containers in turn and observe what is highlighted on the PDF page.
Where might the renderable character come from ? Typically that'd be associated with something in the work flow.
Not always easy to find so don't take anything in the work flow for granted.
Be well...

Export PDF as a container with images

Hello, I would like to export a PDF with some jpgs inside. I want those images to be portable like you have jpgs inside a HTML, where you can save them to your desktop individually. How can I do this? Basically The Pdf would work as a container of text and Jpg images.
thanks very much for your time

That's not how PDFs are designed to work. They are designed to be a final output for print or for viewing on screen. To extract the images you could use Document Precessing>Extract All Images but this would only work in a full version of Acrobat.

How to export all images that don't contain editable text

Hi guys,
Is there any hack, script or general function to export all images from a PDF withouth them containing editable text?
For instance, i have a pdf that contains some scanned documents - these need exporting, i also have some tables in there that have editable text in them - these are being exported when i select export all images.
Can anyone help?
Cheers!

Search for a word or words in them as a group. Search for .pdf file type and containing....
Use a common word that they all might contain such as "a" or "the", etc.
The files that have been OCR'd will show up in the search. This won't catch 100% of them, but it will allow you to separate most of them.

Anti-aliased text when exporting PDF to image

I need to be able to batch-convert multi-page PDFs to individual bitmap images (one image for each page) with anti-aliased text.
Photoshop works this way if you open a single PDF, allowing you to select one or more pages to rasterize as separate images, but not when batch processing (specifically, if you use the Image Processor script on a folder of images and PDFs, it will rasterize the PDFs automatically, but it will only do so with the first page of each PDF.)
Acrobat, on the other hand, automatically creates an image for each page when exporting, and can do this in a batch sequence, but the text is not anti-aliased, making the image look like a screenshot from 1997. No matter how high an image resolution you select, the text is still jagged when you zoom in.
So, is there a setting I'm missing that will allow the text to be anti-aliased when using Acrobat to export PDF to an image? I am using Acrobat 8, not 9, so something might have changed in the newest version.

Not sure about Acrobat Pro 8, but in Acrobat Pro 9 (not Extended) you can Export>Image>Multiple Choice: JPEG, JPEG2000, PNG, TIFF. I used JPEG and under the options in the export dialog box, leave the filename as is to coincide with the PDF filename and then choose Maximum Resolution under File Settings: Grayscale (JPEG, Quality:Maximum); Color (JPEG, Quality:Maximum) . . . skip down to Conversion Colorspace: Determine Automatically and Resolution choose 600pixels/inch for a letter size document. This will result in a file size of 1.3MB per JPEG image if there is not a lot of information on the page. I chose a simple header, footer with page numbering, and 5 lines of Lorem Ipsum text. 600dpi is overkill, you can go for 300dpi and still result in a decent image that will be able to be printed on a laser photocopier that is connected to a production computer. Obviously if you are printing to a laser printer or a high quality inkjet 300dpi will suffice as well for a letter sized document. But I have been told that 300dpi is not a standard rule of thumb and you must obtain specs from your printer since he/she can calculate by very strict rules the dpi you need for your content. It depends on whether you have background images such as watermarks and also if your text body contains line-art.

Acrobat Pro 8.1.2 crashes when displaying pdf made from Word doc containing url text

Hello,
I and a few others in my workplace have a similar issue.
Acrobat crashes when they scroll through and view a pdf file that was created from Word. The Word document contains url text in its header to the effect of "visit us at www.blah.com ." They're running Acrobat Pro 8.1.2 and Office 2K Pro SR-1 on XP Pro SP2.
Even though the link is not an active link, (i.e. clicking on it does not open the web page), if I convert a document containing this header to PDF, Acrobat will not allow me to manipulate the file and will crash. When I take the same document and remove the web address, then convert from Word to PDF, I experience no problems with Acrobat (note that this is the identical file, but with the web address removed).This does not explain why Acrobat will work for awhile even with the web address in the header, then stop working. But it does appear to fix my problem with creating a PDF from Word, then manipulating it further.
Does anyone know how to allow for this text to exist in the header and still have a stable pdf file/acrobat behavior?
The pdf is attached.
Thanks....

To sum up, I've found that if a url beginning with www is in the Word document and then created to pdf Adobe will crash when viewing that pdf. If it is deleted from the Word document before creating the pdf, Acrobat will not crash when viewing that pdf.

Exporting pdf containing Unicode text

Similar Messages

Maybe you are looking for