UTF-8 Unicode output

Hi all,
I'm at the end of my resources (and those are very limited when speaking of character sets).
Specifically, I'm having trouble with unicode character 2628 (http://www.fileformat.info/info/unicode/char/2628/index.htm)
The SOAPMessage response I'm receiving is fine.
SOAPMessage response = getResponse();
response.writeTo(System.out); // Works fine. Outputs it correctlyWhere I'm having trouble is reading the data from the attachment itself. I've tried a few hundred different things to no avail.
Iterator attItr = response.getAttachments();
AttachmentPart attPart = (AttachmentPart)attItr.next();
String content = new String(attPart.getRawContentBytes(), "UTF-8"); // Doesn't work. Giberish.
String contentTo = attPart.getContent(); // Doesn't work either. Giberish as wellI've tried a few other things ... but I'm really stuck.
Any help would be greatly appreciated.
Thanks.

You may be able to find a text editor that can do the conversion. Alternatively, I have converted from one encoding to the another programmatically using Java as well.
Tim Tow
Applied OLAP, Inc

Similar Messages

Robocopy unicode output jibberish log file

When I use the unicode option for a log file or even redirect unicode output from robocopy then try to open the resuting file in notepad.exe or whatever, it just looks like jibberish. How can I make this work or when will Microsoft fix it? Since Microsoft put that option in there, one supposes that it works with something. What is the expected usage for this option?
Yes, I have file names with non-ASCII characters and I want to be able to view them correctly. Without unicode support robocopy just converts such characters into a '?'. It does, however, actually copy the file over correctly, thankfully. I have tried running robocopy from PowerShell and from cmd /u. Neither makes any difference. Also, one odd thing is that if I use the /unicode switch, the output to screen does properly show the non-ASCII characters, except that it doesn't show all of them, such as the oe ligature used in French 'œ'. That was just converted into an 'o' (not even an oe as is usually the case). Again, it does properly make a copy of the file. This just makes it not quite possible to search log results.
Let's see if this post has those non-ASCII characters transmuted when this gets posted even though everything looks fine as I type it. âéèïöùœ☺♥♪

When I use the unicode option for a log file or even redirect unicode output from robocopy then try to open the resuting file in notepad.exe or whatever, it just looks like jibberish. How can I make this work or when will Microsoft fix it? Since Microsoft put that option in there, one supposes that it works with something. What is the expected usage for this option?
Yes, I have file names with non-ASCII characters and I want to be able to view them correctly. Without unicode support robocopy just converts such characters into a '?'. It does, however, actually copy the file over correctly, thankfully. I have tried running robocopy from PowerShell and from cmd /u. Neither makes any difference. Also, one odd thing is that if I use the /unicode switch, the output to screen does properly show the non-ASCII characters, except that it doesn't show all of them, such as the oe ligature used in French 'œ'. That was just converted into an 'o' (not even an oe as is usually the case). Again, it does properly make a copy of the file. This just makes it not quite possible to search log results.
Let's see if this post has those non-ASCII characters transmuted when this gets posted even though everything looks fine as I type it. âéèïöùœ☺♥♪
Uses /UNILOG:logfile instead of /LOG:logfile

Conversion utf-16 unicode - ASCII

Hello,
i read a utf-16 file. The read text is diplayed like this string:
yp<#?#x#m#l# #v#e#r#s#i#o#n#=#"#1#.#0#"
How can I convert the text? I haven't found functions to do that.

Here's a QDAC (Quick & Dirty ABAP code) to convert UTF-16 file on app server to sap default codepage file.
Tested on a non-unicode 640 system.
data: gt_line type string occurs 0 with header line,
      gv_bin type xstring,
      conv_obj type ref to cl_abap_conv_in_ce.
parameters: in_file type rlgrap-filename,
            out_file type rlgrap-filename.
start-of-selection.
check not in_file is initial.
open dataset in_file for input in binary mode.
do.
    read dataset in_file into gv_bin.
    if sy-subrc ne 0.
      close dataset in_file.
      exit.
    endif.
    if sy-index eq 1.
      perform create_conv_obj.
    endif.
    try.
        call method conv_obj->convert
          exporting
            input = gv_bin
            n     = -1
          importing
            data = gt_line.
      catch cx_sy_conversion_codepage .
      catch cx_sy_codepage_converter_init .
      catch cx_parameter_invalid_type .
    endtry.
    append gt_line.
enddo.
check not out_file is initial.
open dataset out_file for output in binary mode.
loop at gt_line.
    transfer gt_line to out_file.
endloop.
close dataset out_file.
*&      Form create_conv_obj
form create_conv_obj.
data: lv_bom(2) type x,
        lv_encoding type abap_encod,
        lv_endian type abap_endia.
lv_bom = gv_bin.
if lv_bom eq 'FFFE'.
    lv_encoding = '4103'.          "code page for UTF-16LE
    lv_endian = 'L'.
    shift gv_bin left by 2 places in byte mode.
elseif lv_bom eq 'FEFF'.
    lv_encoding = '4102'.          "code page for UTF-16BE
    lv_endian = 'B'.
    shift gv_bin left by 2 places in byte mode.
else.
    message 'Byte order mark not found at the begining of the file'
             type 'E'.
endif.
try.
      call method cl_abap_conv_in_ce=>create
        exporting
          encoding    = lv_encoding
          endian      = lv_endian
          replacement = '#'
        receiving
          conv        = conv_obj.
    catch cx_parameter_invalid_range .
    catch cx_sy_codepage_converter_init .
endtry.
endform.                    "create_conv_obj
Regards
Sridhar

Internationalization/Unicode output from Servlets...

[att1.html]


After some help from others on this board, I was able to accomplish my
          goal of outputting Korean and Japanese characters to an IE5 browser from
          within a WLS v4.5.1-Service-pack-1 servlet.
          I have been able to output and properly display both Korean and
          Japanese characters from within a servlet running on WLS4.5.1 using the
          8-bit Unicode character set (UTF-8).
          Summary: Both straight text output, FreeMarker, and WebMacro were able
          to handle this simple example. I don't know what will happen if you
          try to use WebMacro's more advanced reflection capabilites, but I will
          try to test that also.
          First, you'll have to install the 'Japanese Language Support' and
          'Korean Language Support' for IE5 (obtained from the Windows Update
          site). This should only take a few minutes.
          Now try the following servlet. It should output Japanese characters:
          public void service(HttpServletRequest req, HttpServletResponse res)
          throws ServletException, IOException {
          res.setContentType("text/html; charset=UTF-8");
          PrintWriter out = new PrintWriter(new OutputStreamWriter(res.getOutputStream(), "UTF8"),true );
          String test = new String("<HTML><BODY>\u72ac\u72ac</BODY></HTML>");
          out.println (test);
          out.close();
          [att1.html]

UTF-8 Unicode in JEditorPane doesn't work

I do hope this is the correct forum for this question so that some forum nazi doesn't give me grief...here goes.
I have a JEditorPane with the contentType set to "text/html; charset=UTF-8"
I do a .setText method with this text:
<HTML><body><font face='Arial Unicode MS' size='3'>
Followed by some text extracted from an RSS feed (just the contents of the <description> tag)
and then </body></html> to finish it off.
It displays fine apart from a unicode character, it looks like one of those 'fancy' apostrophes so that in the word "We've" the appostrophe shows as an accented a and two squares as shown in this screenshot : Screenshot
So does that mean that 'Arial Unicode MS' cannot display Unicode as the name would suggest, or am I doing something else wrong?

When you specify the charset in the contentType setting, you're telling the JEditorPane how to convert raw bytes that it reads from a URL into a Java string. That's assuming you use one of the setPage() methods to populate the component--but you're using setText(), which takes a String. That means the text was corrupted before you put it in the JEditorPane; you need to look at how it's getting brought in from the RSS feed. It's obviously encoded as UTF-8, but being decoded as if it were a single-byte encoding like ISO-8859-1 or windows-1252 (the default for English-locale Windows systems).

Convert UTF-8 (Unicode) Hex to Hex Byte Sequence while reading file

Hi all,
When java reads a utf-8 character, it does so in hex e.g \x12AB format. How can we read the utf-8 chaacter as a corresponding byte stream (e.g \x0905 is hex for some hindi character (an Indic language) and it's corresponding byte sequence is \xE0\x45\x96).
can the method to read UTF-8 character byte sequence be used to read any other (other than Utf 8, say some proprietary font) character set's byte sequence?

First, there's no such thing as a "UTF-8 character". UTF-8 is a character encoding that can be used to encode any character in the Unicode database.
If you want to read the raw bytes, use an InputStream. If you want to read text that's encoded as UTF-8, wrap the InputStream in an InputStreamReader and specify UTF-8 as the encoding. If the text is in some other encoding, specify that instead of UTF-8 when you construct the InputStreamReader. import java.io.*;
public class Test
// DEVANAGARI LETTER A (अ) in UTF-8 encoding (U+0905)
static final byte[] source = { (byte)0xE0, (byte)0xA4, (byte)0x85 };
public static void main(String[] args) throws Exception
    // print raw bytes
    InputStream is = new ByteArrayInputStream(source);
    int read = -1;
    while ((read = is.read()) != -1)
      System.out.printf("0x%02X ", read);
    System.out.println();
    is.reset();
    // print character as Unicode escape
    Reader r = new InputStreamReader(is, "UTF-8");
    while ((read = r.read()) != -1)
      System.out.printf("\\u%04X ", read);
    System.out.println();
    r.close();
} Does that answer your question?

XDK support for UTF-16 Unicode

Hi,
Does the Oracle Java XDK, specifically the XSQL servlet and api,
support UTF16-Unicode.
Presumably, .xsql files have to be stored, read and queries
executed in a Unicode compliant format for this to work. Is this
possible currently???
Thanks,
- Manish

If you are using XDK 9.0.1 or later, and you are using JDK 1.3,
this combination supports UTF-16. XSQL inherits the support
for "free" in this case.

Regd:UTF 8 Unicode Version

Hi Gurus,
How to find UTF-8 Version In Database? I need whether it is using version 1.0 or 2.0 like that.
Regards,
Simma...

jeneesh wrote:
Efficientoracle wrote:
hi,
s i need unicode version. How to find it in our database. Am using 11.2.0.3.
Regards,
Simma..It cannot change across databases.. As the documentation says, for 11g, it uses unicode version 3.0Oracle Database 11g, Release 1: 5.0
Cheers,
Manik.

Unicode output in BEx

I am not able to see the report in multi language ( unicode ) in the Bex analyser, I am able to see the multi langauge in Web as well as when I download the report to excel.
Please let us know what setting needs to be changed in order to make sure I see the report in multilanguage in my BEx Reporting.
thanks
Bwinfo

Thanks...
HAs anyone treied to intervene the VB in excel to display Unicode.
As we know during download into excel it workd perfectly fine, can we do any macros on top of excel to ensure it displays int Unicode.
thanks
harish

UTF-8, Unicode, XML and windows problems

Hi there,
I'm developing an application which uses a lot of russian text.
This russian text is stored in XML and can be sent to a server remotly
I use the standard javax.xml libaries to parse the xml files and DBunits XMLWriter to write generate XML strings.
The XML returned by the DBunit stuff is UTF-8, but its inside a UTF-16 string.
So when I generate xml and print it out I get something that looks like the following:
�� ?��Thats ok, beacause I can stick that streight into a file when writing files and it works.
But the problem comes when sending the XML over the server.
The sending implentation I use must be able to send java generated utf-16 and xml read utf-8.
So I convert the XML from utf-8 to utf-16, using the following:
byte[] stringBytesISO = isoString.getBytes("ISO-8859-1");
utf8String = new String(stringBytesISO, "UTF-8");And that works perfectly on my linux system.
However when I run it on windows, it only seems to convert some of the characters
Привычным �?ном за�?нут дороги до ве�?ны,Does anyone know whats going wrong here?

jammers1987 wrote:
I use the standard javax.xml libaries to parse the xml files and DBunits XMLWriter to write generate XML strings.DbUnit is a testing tool; are you saying you're using it in a production system? Ideally, you should use the same library to write the XML as you do to read it, but you definitely shouldn't be using DbUnit in this context.
The XML returned by the DBunit stuff is UTF-8, but its inside a UTF-16 string. That should never happen. XML is just text, and text can either be in the form of a Java string, or it can be stored externally using a text encoding like UTF-8. Never mind that Java strings use the UTF-16 encoding; you don't need to know or mention that. Encodings only come into play when you're communicating with something outside your program, like a file system or a database.
When you generate the XML, you specify that the encoding is UTF-8. When you read the XML, you specify that the encoding is UTF-8. That's all.

File/FTP adapter, outbound channel, content conversion, UTF-8 (Unicode)?

We would like to send "delimited" files to another application (tab-delimited, CSV, ... - the other application does not support XML-based interfaces). Obviously we will have an outbound channel that uses the file/FTP adapter and the data will be subjected to "content conversion".
The data contains names in many languages; not all of this can be represented in ISO Latin-1, much less in US-ASCII. I suppose UTF-8 would work. The question is: how is this handled by the FTP protocol? (considering that the FTP client is part of the SAP PI file/FTP adapter and the FTP server is something on the "other" machine)

Hi Peter,
you can maintain the file encoding in the outbound adapter. See [Configuring the Receiver File/FTP Adapter|http://help.sap.com/saphelp_nw2004s/helpdata/en/bc/bb79d6061007419a081e58cbeaaf28/content.htm]
For your requirements "utf-8" sounds pretty fitting.
Regards,
Udo

UTF-16, unicode OK?

Can I change xml file's encode(Shift-JIS) to unicode by XSLT Transform ?
Most of XSLT Processor doesn't clealy indicate which encode they support or the generated files is bad code.

I tried xml.exe but it said "Failed to initialize XML parser, error 201".
However, also though I tested same xml-file with Xerces & Xalan, I got right result.
Oracle parser has a bug or won't support all xml format.
Thanx all.

Example font subset with non-unicode output

Hi,
I am searching for an example to use a TTF Font as font subset in PDF but without UTF16 encoding.
Is this even possible?
Something similar to
7 0 obj
<</Type /Font /Subtype /Type0 /BaseFont /AAAAAD+Arial/Encoding /Identity-H /DescendantFonts [9 0 R] /ToUnicode 8 0 R >>
9 0 obj
<</Type /Font /Subtype /CIDFontType2 /BaseFont /AAAAAD+Arial /CIDSystemInfo << /Registry (Adobe) /Ordering (Identity) /Supplement 0 >> /FontDescriptor 11 0 R /W [...
But how does the rest should look like?
Thank you in advanced

No, this is proceeding from wrong assumptions.
For all single byte fonts, EVERY PDF contruction deals with 256 values, range 0 to 255.
Let's look at what you say. You mention two Unicode values, and you continue to think that Unicode has some importance or meaning in PDF. No, it doesn't. Perhaps your characters are U+0021 and U+FEFC but the PDF does not care, and nor should you (when writing the PDF). I emphasise: if you are thinking "I will put this Unicode value somewhere in the PDF" and you are dealing with page streams/fonts, you are almost certainly not following the spec.
For a single byte font the mapping from the arbitrary codes in the PDF to the internal font structures has one and only one variable: the Encoding value. The Encoding value, if present (and the situations in which it can be omitted are specific and exact), is a shorthand representation for something which will always be an array of 256 names.
Since TrueType fonts don't contain names, the interpretation of this is complicated for embedded TrueType, and you must take care to follow exactly what the spec says. Other things MAY SOMETIMES WORK without being valid, but will probably fail in different software, so stick to the spec exactly.
There is no substitute for endless, deep and repetitive study of the PDF Reference. It will help if you can phrase your future questions in terms of "what does it mean in the specification when it says...". That will help keep you focussed. I have been working with the PDF specification for many, many years and I have found that whenever I assume anything, I am almost always wrong. Read the spec first, second and last.
FInally, don't be discouraged if you find this hard. Font embedding IS one of the very hardest things to master about PDFs, easily adding an order of magnitude of complexity to any project.

Japanese characters, outputstreamwriter, unicode to utf-8

Hello,
I have a problem with OutputStreamWriter's encoding of japanese characters into utf-8...if you have any ideas please let me know! This is what is going on:
static public String convert2UTF8(String iso2022Str) {
   String utf8Str = "";
   try {
      //convert string to byte array stream
      ByteArrayInputStream is = new     ByteArrayInputStream(iso2022Str.getBytes());
      ByteArrayOutputStream os = new ByteArrayOutputStream();
      //decode iso2022Str byte stream with iso-2022-jp
      InputStreamReader in = new InputStreamReader(is, "ISO2022JP");
      //reencode to utf-8
      OutputStreamWriter out = new OutputStreamWriter(os, "UTF-8");
      //get each character c from the input stream (will be in unicode) and write to output stream
      int c;
      while((c=in.read())!=-1) out.write(c);
      out.flush();
     //get the utf-8 encoded output byte stream as string
     utf8Str = os.toString();
      is.close();
      os.close();
      in.close();
      out.close();
   } catch (UnsupportedEncodingException e1) {
      return    e1.toString();
   } catch (IOException e2) {
      return e2.toString();
   return utf8Str;
}I am passing a string received from a database query to this function and the string it returns is saved in an xml file. Opening the xml file in my browser, some Japanese characters are converted but some, particularly hiragana characters come up as ???. For example:
屋台骨田家は時間目離れ拠り所那覇市矢田亜希子ナタハアサカラマ楢葉さマヤア
shows up as this:
屋�?�骨田家�?�時間目離れ拠り所那覇市矢田亜希�?ナタ�?アサカラマ楢葉�?�マヤア
(sorry that's absolute nonsense in Japanese but it was just an example)
To note:
- i am specifying the utf-8 encoding in my xml header
- my OS, browser, etc... everything is set to support japanese characters (to the best of my knowledge)
Also, I ran a test with a string, looking at its characters' hex values at several points and comparing them with iso-2022-jp, unicode, and utf-8 mapping tables. Basically:
- if I don't use this function at all...write the original iso-2022-jp string to an xml file...it IS iso-2022-jp
- I also looked at the hex values of "c" being read from the InputStreamReader here:
while((c=in.read())!=-1) out.write(c);and have verified (using character value mapping table) that in a problem string, all characters are still being properly converted from iso-2022-jp to unicode
- I checked another table (http://www.utf8-chartable.de/) for the unicode values received and all of them have valid mappings to a utf-8 value
So it appears that when characters are written to the OutputStreamWriter, not all characters can be mapped from Unicode to utf-8 even though their Unicode values are correct and there should be utf-8 equivalents. Instead they are converted to (hex value) EF BF BD 3F EF BF BD which from my understanding is utf-8 for "I don't know what to do with this one".
The characters that are not working - most hiragana (thought not all) and a few kanji characters. I have yet to find a pattern/relationship between the characters that cannot be converted.
If I am missing some....or someone has a clue....oh...and I am developing in Eclipse but really don't have a clue about it beyond setting up a project, editing it and hitting build/run. It is possible that I may have missed some needed configuration??
Thank you!!

It's worse than that, Rene; the OP is trying to create a UTF-8 encoded string from a (supposedly) iso-2022 encoded string. The whole method would be just an expensive no-op if it weren't for this line: utf8Str = os.toString(); That converts the (apparently valid) UTF-8 encoded byte array to a string, using the system default encoding (which seems to be iso-2022-jp, BTW). Result: garbage.
@meggomyeggo, many people make this kind of mistake when they first start dealing with encodings and charset conversions. Until you gain a good understanding of these matters, a few rules of thumb will help steer you away from frustrating dead ends.
* Never do charset conversions within your application. Only do them when you're communicating with an external entity like a filesystem, a socket, etc. (i.e., when you create your InputStreamReaders and OutputStreamWriters).
* Forget that the String/byte[] conversion methods (new String(byte[]), getBytes(), etc.) exist. The same advice applies to the ByteArray[Input/Output]Stream classes.
* You don't need to know how Java strings are encoded. All you need to know is that they always use the same encoding, so phrases like "iso-2022-jp string" or "UTF-8 string" (or even "UTF-16 string") are meaningless and misleading. Streams and byte arrays have encodings, strings do not.
You will of course run into situations where one or more of these rules don't apply. Hopefully, by then you'll understand why they don't apply.

SmartForms : Printing Non-English characters with SWIN output device?

I have a Japanese Text (entered in Unicode at So10) and I want to print it from my SmartForms application using SWIN device type.
I thought SWIN converted all the components into graphics before sending to printer. To my surprise, the Japanese text printed out as "########".
I even tried duplicate the SWIN output device to another output device and changed the character set to 4103 UTF-16LE Unicode / ISO/IEC 10646 but then the print process hang.
Anyone knows how to overcome the problem so that I can print Japanese with SWin output device?

I have a Japanese Text (entered in Unicode at So10) and I want to print it from my SmartForms application using SWIN device type.
I thought SWIN converted all the components into graphics before sending to printer. To my surprise, the Japanese text printed out as "########".
I even tried duplicate the SWIN output device to another output device and changed the character set to 4103 UTF-16LE Unicode / ISO/IEC 10646 but then the print process hang.
Anyone knows how to overcome the problem so that I can print Japanese with SWin output device?

UTF-8 Unicode output

Similar Messages

Maybe you are looking for