Unicode text in PDF

I have gone through some threads about japanese/chinease text in PDF.
my application creates PDF files. In my application text is stored in UNICODE form. currently the text (for Tj) is convered to char* which is locale dependant. I need to store text in PDF which will be locale independant.
I am using 'embedded' type 1 (TTF) font in above example. The result is, PDF created on Eng locale gives '?' for (each) *** text and PDF created on *** locale gives correct results (*How*).
From PDF Ref 3.8.1 Text Strings, I understand that, the text can be stored in UTF-16BE. I tried it but boxes (default char) apears in Adobe Reader 6.0... means each byte is treated seperatly. But as I said before for PDF created on *** locale (eventhough 1 character is of 2 byte) characters are read correctly.
This is a situation. Now I am confused about some aspects of speficications:
- Do I have to use 'type 0' font object (even if I am embedding simple TTF) to display text beyond 256 char code?
- why 2 bytes per character is read for text created on *** locale and not when I create text in UTF-16BE?
Thanks for your help,
Sameer

>From PDF Ref 3.8.1 Text Strings, I understand that, the text can be stored in UTF-16BE.
Many people have read this and made a big assumption that is not
valid.
Text strings are a particular type that is used in particular cases.
For example, they are used for bookmarks, which can indeed be
UTF-16BE. Nowhere does it say that this text string type works for
page contents.
>
>- Do I have to use 'type 0' font object (even if I am embedding simple TTF) to display text beyond 256 char code?
Absolutely. A /Type1 or /TrueType font is by definition a single byte
font, and the rules for Encoding are followed exactly as described.
There is no two byte escape.
If you wanted to use only 256 characters FROM a large font this is
possible; you could break your Japanese font down into multiple
embedded subsets, each of less than 256 characters.
>
>- why 2 bytes per character is read for text created on *** locale and not when I create text in UTF-16BE?
You mean this sometimes seems to work? Suggests a bug.
Aandi Inston

Similar Messages

  • How to write a unicode text in pdf file

    Dear Friends,
    I am a beginner in acrobat pdf plug-in development. I was trying to write a unicode text (Tamil text) into pdf file.
    Using same api I am able to write english text in time-roman, areal etc fonts. But I am not able to write tamil texts.
    The code is as below:
            memset(&pdeFontAttrs, 0, sizeof(pdeFontAttrs));
            pdeFontAttrs.name = ASAtomFromString("Latha");
            pdeFontAttrs.type = ASAtomFromString("TrueType");
            pdeFont    = PDEFontCreateFromSysFont(                                        \
                            PDFindSysFont(&pdeFontAttrs, sizeof(pdeFontAttrs), 0),    \
                            kPDEFontCreateEmbedded);
            pdeText = PDETextCreate();
            PDETextAdd(pdeText, kPDETextRun, 0, (ASUInt8 *)buffer, _tcslen(buffer),
                                    pdeFont, &gState, sizeof(gState), NULL, 0, &textMatrix, NULL);
            PDEContentAddElem(pdeContent, kPDEAfterLast, (PDEElement)pdeText);
            PDPageSetPDEContent(pdPage, gExtensionID);  
            PDPageReleasePDEContent (pdPage, gExtensionID);
    KIndly assume that PDEGraphicsState and PDETextMatrix are set properly set, I am not pasting entire code to avoid complexity.
    Thank you,
    Safiq

    Dear lrosenth,
    I went through some codes/suggestions in internet and I found that I need to have cmap file and cid font file for the respective font since pdf doesn't support unicode fonts directly.
    Can you help me to know where can I get cmap file and cid font file for tamil language font Latha(TrueType) microsoft font.
    Regards,
    Safiq

  • Automatically copy only the unicode-Text from a Word-Document into FM8

    In my daily work I often have the problem to copy and paste text from a  Word-Document or other documents into my FrameMaker documents.
    The common way is to copy it in Word and "Special Paste" it in FrameMaker 8 as unicode. But this is not as confortable as to shortly hit Ctrl-v on the keyboard.
    I would like to have a new menuitem to special paste only the unicode text from the clipbopard. Is there a way to make such a makro and to put it in the menubar with a Key-Trigger?
    I read something like that in http://www.rzg.mpg.de/from_external/Frame6_Handbuch/Setting_up_FrameMaker.pdf (german), but I actually don't know what to chage for my case.

    Rather than create a new menu item (possible, I gather, but fiddly) I would recommendchanging the default “paste” behaviour.
    In the MAKER.INI file there is a line beginning
    ClipboardFormatsPriorities=
    Change this to put UNICODE TEXT as the first item.
    Ctrl-v will then paste Unicode text by default. You can still access the other options, should you need to, by using Paste Special.

  • Paste Special - unicode text equivalent

    When using Excel, I could copy tables from PDFs or Web Pages, then use "Paste Special" and "unicode text" to paste into a new document. By doing this, all data would go into appropriate columns. I can't figure out how to do this type of copy/pasting with Numbers.

    (1) You are in a Numbers dedicated forum so, Excel behavior doesn't matter here !
    (2) *_Help & Terms of Use_* which every one is supposed to read before asking here claims that we must search in existing threads if the question to ask was already answered before.
    Yours was answered several times.
    To be short : use my good old huge AppleScript
    set the clipboard to the clipboard as text
    Yvan KOENIG (VALLAURIS, France) jeudi 3 février 2011 22:38:47

  • How to change text in PDF doc. which is a musical score

    Hello,
    I'm new here, so please excuse me if I do or say something I shouldn't.
    I need to change the words in a musical score because the font is too small. OCR recognition doesn"t work because there are illustrations that are different from images or text... Is there a way to get in there and make the changes I need to do?
    Any help greatly appreciated.

    Thanks for the reply, but I have Adobe Reader 9 Pro. Will it still not 
    work ?
    Le 29 sept. 2011 à 29 sept. 11 - 16:09, Claudio González a écrit :
    Re: How to change text in PDF doc. which is a musical score
    created by Claudio González in Adobe Reader - View the full discussion
    Unfortunately, not with the free Reader.
    Replies to this message go to everyone subscribed to this thread, 
    not directly to the person who posted the message. To post a reply, 
    either reply to this email or visit the message page: [http://forums.adobe.com/message/3944833#3944833
    To unsubscribe from this thread, please visit the message page at [http://forums.adobe.com/message/3944833#3944833
    ]. In the Actions box on the right, click the Stop Email 
    Notifications link.
    Start a new discussion in Adobe Reader by email or at Adobe Forums
    For more information about maintaining your forum email 
    notifications please go to http://forums.adobe.com/message/2936746#2936746

  • How can I digitalize a document text in PDF and export it to WORD?

    Anthony
    How can I digitalize a document text in PDF and export it to WORD?

    If you already have a PDF document, ExportPDF can help you with this task. https://www.acrobat.com/exportpdf/en/convert-pdf-to-word.html
    On the other hand, if you have a physical document, you'll need to scan it into a PDF document first.
    Depending on what you need to do you may require different tools & services, so please help us out with more details.
    Vlad

  • Changing the seeded rdf report output from text to PDF

    Hi All,
    I am trying to change the seeded report "Print Requisition Report" output from text to PDF. I changed the output format in the concurrent program definition from text to pdf. The report is now getting displayed in pdf format, but the alignment of the fields are changed. The output is not printing as it was printing while in the text output format. The output looks very sloppy, with uneven fonts and too much of text wrapping.
    I tried some formatting and fields are somehow getting displayed but doesnt look in proper format. I am not able to change the font style in the report builder 6i, even if i change it is printing in the same font style with uneven font size. I think reducing the font size might help, but i am unable to change.We are on 11.5.10.2. Please provide your valuable suggestions on how to go about.
    Thanks
    Sarvesh

    Hi Hussein,
    Thanks for the information. After the adjustments in the layout and the font change the report is now getting displayed without any clipping when clicked on view output. But when the same report is printed some characters on the left and and right side is getting clipped. Please help on how to solve this issue.
    Thanks,
    Sarvesh

  • How to read/extract text from pdf

    Respected All,
    I want to read/extract text from pdf. I tried using etymon but not succed.
    Could anyone will guide me in this.
    Thanks and regards,
    Ajay.

    Thank you very much Abhilshit, PDFBox works for reading pdf.
    Regards,
    Ajay.

  • How to upload a text in .pdf to convert it?

    How to upload a text in .pdf to convert it?

    How to upload a text in .pdf to convert it?

  • How to change background color of text in pdf based by font name

    Hi
    How to change the background color of text in PDF based by font name. Is there any option in Javascript. e.g: If PDF containing ARIAL font, the ARIAL text background color needs to be changed in red color for all pages. Same for all fonts with different different color in the PDF.
    Thanks in Advance

    Hi
    1) Is there any possibilities to highlight with different color based on font using javascript
    2) list of font used in PDF using javascript
    3) How to hilight the text using javascript
    Thanks in Advance

  • Why can't I "Save as Text" a pdf file received as an email attachment?

    I can "Save as text" a pdf file which I have created in my own computer (that is, it goes into MS notebook that I then can Copy and Save as an MS Word file) but not when I receive a pdf as an email attachment. (The file is saved, but it is empty.) Why would I want to convert my own pdf back to text? Well, in case I no longer have the original Word document I suppose, but the thing is "Save as text" works with my pdf, but not with those I recieve from others. How come? Thanks!

    Is this a scanned PDF? If so, it must first be OCR'd.

  • Randomly Missing Text in PDF Created from FrameMaker

    This problem relates to a structured FM document, but I suspect it might be a general issue and have posted it here in the general forum for that reason.
    I am generating PDFs that are missing text somewhat randomly throughout. I tried searching the forum for solutions, but none of the suggested fixes worked and none of the posts specifically addressed the issue I am experiencing.
    I am working in structured FM. The templates we use were originally created for FM8. We use both FM8 and FM10 in our work group. We are able to duplicate the same problem in both versions and on multiple computers.
    I thought I had narrowed the problem down to certain paragraph formatting, since it only ocurs in three or four paragraph formats (a bullet list, table text, etc.) Garden variety formatting. But in most places in the document, these formats appear perfectly. The strangest occurence is a single intance where the page number is missing from the footer.
    I thought it might be a font issue, as I've had similar issues in the past. I had a missing font warning in the console, but I am pretty sure that this has nothing to do with it, since they are fonts we are not using and all the other text from the same formats appears.
    I tried turning off "Remember Missing Font Names" in preferences. No help.
    I checked that the fonts are in the local directory and appear as embedded subsets in the PDF.
    I also tried checking and unchecking the "Rely on system fonts only; do not use document fonts" option in the PDF output settings. Also no help.
    The randomness of the missing fonts bewilders me and I've exhausted my own troubleshooting abilities. I would be happy to share a source file if anyone thinks they could help me that way.
    Thanks in advance,
    Douglas

    There is a known bug in Windows XP that causes random dropped text in
    PDF. The hotfix is here, though the link does not seem to be working at
    the moment:
    http://support.microsoft.com/?id=952909
    However, the above link directs you to a download link that is here:
    http://support.microsoft.com/Hotfix/KBHotfix.aspx?kbnum=952909&kbln=en-us <http://support.microsoft.com/Hotfix/KBHotfix.aspx?kbnum=952909&kbln=en-us

  • How to Extract the Highlight Text in PDF File

    Hi Scripters,
    i want know, how to extract the hightlight text in pdf files for text only format for (*.txt) file extension save.
    regards
    baby

    Hi,
    Okay i'll try do best.
    thanks for your reply.
    Regards
    Baby

  • Paid to allow edit text in PDF, and not working

    I wanted to 'edit' text in PDF and followed the instructions, it told be to subscribe and pay to do this.  I did subscribe and pay, but the functionality is still not working, keeps directing me to subscribe and pay.

    Hi rrobati,
    I checked your account,your Export PDF subscription is not confirmed yet at our end.
    Once it gets confirmed you will be able to use it hassle free.
    Regards,
    Florence

  • Search text in PDF and MS Word document

    Can any body tell me how search text in PDF and MS Word document through Java code, any body has code or any suggestion to give
    Thank You
    Adnan

    Can any body tell me how search text in PDF
    and MS Word document through Java code, any
    body has code or any suggestion to giveYes.
    First, you need to work out how to read each document type from Java.
    E.g, for MS Word you could use Apache Jakarta POI - HWPF: http://jakarta.apache.org/poi/hwpf/index.html
    Then, you use Apache Lucene to index and search.
    See http://lucene.apache.org/java/docs/index.html
    ~D

Maybe you are looking for

  • A database connection in a swing application

    hi , i am developping an intranet stabdalone swing application based on a jdbc connection to oracle8i data server. in fact i am interested in keeping my user authentification encrypted while this connection is done! is there any method to do it ! i r

  • VGA adapter for Book Pro Mid 2014

    Hello all, so I need to buy an adapter for my Book Pro, since I need to connect it to the projectors at the university. I saw 2 types of VGA adaptors - http://www.conrad.de/medias/global/ce/9000_9999/9700/9710/9719/971926_BB_00_FB.E PS_1000.jpg and h

  • IPod problem connecting to computer

    Every time I connect my iPod to my computer, it crashes my computer. I get the fatal blue screen of death. What, may I ask, is going on?

  • Keep your old music on ipod

    i find it hard to keep me old music on my ipod when my computer is crash and i need to reboot my laptop. can someone tell me how to save my music so i dont lose them when i reload the software. coz i keep getting the message (new ipod is plug and do

  • Can I change providers of my Office 365 Business?

    I have a customer with Office 365 Business who wants to use me as his retailer for this service instead of his current provider, is there any way of changing this or do I have to set this up as a new service? Thanks for your help. Peter