Conversion utf-16 unicode - ASCII

Hello,
i read a utf-16 file. The read text is diplayed like this string:
yp<#?#x#m#l# #v#e#r#s#i#o#n#=#"#1#.#0#"
How can I convert the text? I haven't found functions to do that.

Here's a QDAC (Quick & Dirty ABAP code) to convert UTF-16 file on app server to sap default codepage file.
Tested on a non-unicode 640 system.
data: gt_line type string occurs 0 with header line,
      gv_bin type xstring,
      conv_obj type ref to cl_abap_conv_in_ce.
parameters: in_file type rlgrap-filename,
            out_file type rlgrap-filename.
start-of-selection.
check not in_file is initial.
open dataset in_file for input in binary mode.
do.
    read dataset in_file into gv_bin.
    if sy-subrc ne 0.
      close dataset in_file.
      exit.
    endif.
    if sy-index eq 1.
      perform create_conv_obj.
    endif.
    try.
        call method conv_obj->convert
          exporting
            input = gv_bin
            n     = -1
          importing
            data = gt_line.
      catch cx_sy_conversion_codepage .
      catch cx_sy_codepage_converter_init .
      catch cx_parameter_invalid_type .
    endtry.
    append gt_line.
enddo.
check not out_file is initial.
open dataset out_file for output in binary mode.
loop at gt_line.
    transfer gt_line to out_file.
endloop.
close dataset out_file.
*&      Form create_conv_obj
form create_conv_obj.
data: lv_bom(2) type x,
        lv_encoding type abap_encod,
        lv_endian type abap_endia.
lv_bom = gv_bin.
if lv_bom eq 'FFFE'.
    lv_encoding = '4103'.          "code page for UTF-16LE
    lv_endian = 'L'.
    shift gv_bin left by 2 places in byte mode.
elseif lv_bom eq 'FEFF'.
    lv_encoding = '4102'.          "code page for UTF-16BE
    lv_endian = 'B'.
    shift gv_bin left by 2 places in byte mode.
else.
    message 'Byte order mark not found at the begining of the file'
             type 'E'.
endif.
try.
      call method cl_abap_conv_in_ce=>create
        exporting
          encoding    = lv_encoding
          endian      = lv_endian
          replacement = '#'
        receiving
          conv        = conv_obj.
    catch cx_parameter_invalid_range .
    catch cx_sy_codepage_converter_init .
endtry.
endform.                    "create_conv_obj
Regards
Sridhar

Similar Messages

File/FTP adapter, outbound channel, content conversion, UTF-8 (Unicode)?

We would like to send "delimited" files to another application (tab-delimited, CSV, ... - the other application does not support XML-based interfaces). Obviously we will have an outbound channel that uses the file/FTP adapter and the data will be subjected to "content conversion".
The data contains names in many languages; not all of this can be represented in ISO Latin-1, much less in US-ASCII. I suppose UTF-8 would work. The question is: how is this handled by the FTP protocol? (considering that the FTP client is part of the SAP PI file/FTP adapter and the FTP server is something on the "other" machine)

Hi Peter,
you can maintain the file encoding in the outbound adapter. See [Configuring the Receiver File/FTP Adapter|http://help.sap.com/saphelp_nw2004s/helpdata/en/bc/bb79d6061007419a081e58cbeaaf28/content.htm]
For your requirements "utf-8" sounds pretty fitting.
Regards,
Udo

File_To_File: UTF-8 to ASCII format conversion.

HI Experts,
I got one requirement File_To_File scenario source file is in UTF-8 format so we need to convet it into ASCII fromat , in this one mapping not required so please can you please help me out. we are using Pi 7.0 with SP 21.
Regards,
Prabhakar.A

in the communication channel define ASCII as enconding.
Processing Tab Page
Processing Parameters
   File Type
Specify the document data type.
○       Binary
○       Text
Under File Encoding, specify a code page.
The default setting is to use the system code page that is specific to the configuration of the installed operating system. The file content is converted to the UTF-8 code page before it is sent.
Permitted values for the code page are the existing Charsets of the Java runtime. According to the SUN specification for the Java runtime, at least the following standard character sets must be supported:
■       US-ASCII
Seven-bit ASCII, also known as ISO646-US, or Basic Latin block of the Unicode character set
■       ISO-8859-1
ISO character set for Western European languages (Latin Alphabet No. 1), also known as ISO-LATIN-1
■       UTF-8
8-bit Unicode character format
■       UTF-16BE
16-bit Unicode character format, big-endian byte order
■       UTF-16LE
16-bit Unicode character format, little-endian byte order
■       UTF-16
16-bit Unicode character format, byte order
Note
Check which other character sets are supported in the documentation for your Java runtime implementation.

Character set conversion UTF-8 -- ISO-8859-1 generates question mark (?)

I'm trying to convert an XML-file in UTF-8 format to another file with character set ISO-8859-1.
My problem is that the ISO-8859-1 file generates a question mark (?) and puts it as a prefix in the file.
?<?xml version="1.0" encoding="UTF-8"?>
<ns0:messagetype xmlns:ns0="urn:olof">
<underkat>testv��rde</underkat>
</ns0:messagetype>
Is there a way to do the conversion without getting the question mark?
My code looks as follows:
public class ConvertEncoding {
     public static void main(String[] args) {
          String from = "UTF-8", to = "ISO-8859-1";
          String infile = "C:\\temp\\infile.xml", outfile = "C:\\temp\\outfile.xml";
          try {
               convert(infile, outfile, from, to);
          } catch (Exception e) {
               System.out.println(e.getMessage());
               System.exit(1);
     private static void convert(String infile, String outfile,
                                        String from, String to)
                         throws IOException, UnsupportedEncodingException
          //Set up byte streams
          InputStream in = null;
          OutputStream out = null;
          if(infile != null) {
               in = new FileInputStream(infile);
          if(outfile != null) {
               out = new FileOutputStream(outfile);
          //Set up character streams
          Reader r = new BufferedReader(new InputStreamReader(in, from));
          Writer w = new BufferedWriter(new OutputStreamWriter(out, to));
          /*Copy characters from input to output.
           * The InputSreamreader converts
           * from Unicode to the output encoding.
           * Characters that cannot be represented in
           * the output encoding are output as '?'
          char[] buffer = new char[4096];
          int len;
          while((len = r.read(buffer))!= -1) { //Read a block of output
               w.write(buffer, 0, len);
          r.close();
          w.flush();
          w.close();
}

Yes the next character is the '<'
The file that I read from is generated by an integration platform. I send a plain file to it (supposedly in UTF-8 encoding) and it returns another file (in between I call my java class that converts the characterset from UTF-8 to ISO-8859-1). The file that I get back contains the '��' if the conversion doesn't work and '?' if the conversion worked.
My solution so far is to skip the first "junk-characters" when reading from the inputstream. Something like:
private static final char UTF_BOM = '\uFEFF'; //UTF-BOM = ?
String from = "UTF-8", to = "ISO-8859-1";
if (from != null && from.toLowerCase().startsWith("utf-")) { //Are we reading an UTF encoded file?
/*Read first character of the UTF-Encoded file
It will return '?' in the first position if we are dealing with UTF encoding If ? is returned we skip this character in the read
try {
r.mark(1); //Only allow to read one char for the reset function to work
char c;
int i = r.read();
c = (char) i;
if (String.valueOf(UTF_BOM).equalsIgnoreCase(String.valueOf(c))) {
r.reset(); //reset to start position
r.skip(1); //Skip first character when reading from the stream
else {
r.reset();
} catch (IOException e) {
e.getMessage();
//return null;
}

UTF-8 Unicode output

Hi all,
I'm at the end of my resources (and those are very limited when speaking of character sets).
Specifically, I'm having trouble with unicode character 2628 (http://www.fileformat.info/info/unicode/char/2628/index.htm)
The SOAPMessage response I'm receiving is fine.
SOAPMessage response = getResponse();
response.writeTo(System.out); // Works fine. Outputs it correctlyWhere I'm having trouble is reading the data from the attachment itself. I've tried a few hundred different things to no avail.
Iterator attItr = response.getAttachments();
AttachmentPart attPart = (AttachmentPart)attItr.next();
String content = new String(attPart.getRawContentBytes(), "UTF-8"); // Doesn't work. Giberish.
String contentTo = attPart.getContent(); // Doesn't work either. Giberish as wellI've tried a few other things ... but I'm really stuck.
Any help would be greatly appreciated.
Thanks.

You may be able to find a text editor that can do the conversion. Alternatively, I have converted from one encoding to the another programmatically using Java as well.
Tim Tow
Applied OLAP, Inc

Unable to send patch file in utf-8 or ASCII format

I am trying to solve tasks provided by eudyptula-challenge for which I have to send attachments in Unicode format. I am using THunderbird 31.0 version. I have tried everything but still i get a reply from them stating the attachments are in base64 .
I have done the changes menionted in the http://lxr.free-electrons.com/source/Documentation/email-clients.txt
I have tried the changes in this http://superuser.com/questions/288571/how-can-i-change-the-default-encoding-type-thunderbird-uses-when-composing-a-new
How to send a attachment in utf-8 format ?

Done some testing.
Tools > Options > Display > Formatting tab
click on 'Advanced ' button
My Outgoing and Incoming Mail Character encoding is set as:
Western (ISO-8859-1)
test plain text message with txt attachment.
Message source on received email:
Content-Type: text/plain; charset=windows-1252;
name="font.txt"
Content-Transfer-Encoding: '''base64'''
Content-Disposition: attachment;
filename="font.txt"
I believe this is what you are experiencing
So I did this:
Tools > Options > Advanced > general tab
click on Config Editor' button
it will tell you to be careful :)
In top search type: ascii
look for this line:
mail.label_ascii_only_mail_as_us_ascii; Value = false
double click on that line to toggle the 'false' to 'true'
or right click and select 'toggle'
close window - top right x
click on OK to save changes to options
Sent plain text message with txt attachment.
This was the result in received message source:
Content-Type: text/plain; charset=windows-1252;
name="font.txt"
Content-Transfer-Encoding: '''7bit'''
Content-Disposition: attachment;
filename="font.txt"
Does this help?

UTF-8 Unicode in JEditorPane doesn't work

I do hope this is the correct forum for this question so that some forum nazi doesn't give me grief...here goes.
I have a JEditorPane with the contentType set to "text/html; charset=UTF-8"
I do a .setText method with this text:
<HTML><body><font face='Arial Unicode MS' size='3'>
Followed by some text extracted from an RSS feed (just the contents of the <description> tag)
and then </body></html> to finish it off.
It displays fine apart from a unicode character, it looks like one of those 'fancy' apostrophes so that in the word "We've" the appostrophe shows as an accented a and two squares as shown in this screenshot : Screenshot
So does that mean that 'Arial Unicode MS' cannot display Unicode as the name would suggest, or am I doing something else wrong?

When you specify the charset in the contentType setting, you're telling the JEditorPane how to convert raw bytes that it reads from a URL into a Java string. That's assuming you use one of the setPage() methods to populate the component--but you're using setText(), which takes a String. That means the text was corrupted before you put it in the JEditorPane; you need to look at how it's getting brought in from the RSS feed. It's obviously encoded as UTF-8, but being decoded as if it were a single-byte encoding like ISO-8859-1 or windows-1252 (the default for English-locale Windows systems).

Convert UTF-8 (Unicode) Hex to Hex Byte Sequence while reading file

Hi all,
When java reads a utf-8 character, it does so in hex e.g \x12AB format. How can we read the utf-8 chaacter as a corresponding byte stream (e.g \x0905 is hex for some hindi character (an Indic language) and it's corresponding byte sequence is \xE0\x45\x96).
can the method to read UTF-8 character byte sequence be used to read any other (other than Utf 8, say some proprietary font) character set's byte sequence?

First, there's no such thing as a "UTF-8 character". UTF-8 is a character encoding that can be used to encode any character in the Unicode database.
If you want to read the raw bytes, use an InputStream. If you want to read text that's encoded as UTF-8, wrap the InputStream in an InputStreamReader and specify UTF-8 as the encoding. If the text is in some other encoding, specify that instead of UTF-8 when you construct the InputStreamReader. import java.io.*;
public class Test
// DEVANAGARI LETTER A (अ) in UTF-8 encoding (U+0905)
static final byte[] source = { (byte)0xE0, (byte)0xA4, (byte)0x85 };
public static void main(String[] args) throws Exception
    // print raw bytes
    InputStream is = new ByteArrayInputStream(source);
    int read = -1;
    while ((read = is.read()) != -1)
      System.out.printf("0x%02X ", read);
    System.out.println();
    is.reset();
    // print character as Unicode escape
    Reader r = new InputStreamReader(is, "UTF-8");
    while ((read = r.read()) != -1)
      System.out.printf("\\u%04X ", read);
    System.out.println();
    r.close();
} Does that answer your question?

Usage of Conversion Classes During Unicode Conversion

Hi,
I have code
"TRANSLATE BBKF-SGTXT FROM CODE PAGE '0100' TO '0120'" Now During Unicode i need to Use the Conversion classes
Can any one help to proceede further on this
Actually I am using CL_ABAP_CONV_OUT_CE and I am getting the Data in Binary Format But I want that to be in Character format can any one suggest me
Thanks & Regards,
Prasad

Do you have an EBCDIC system with code page 0100? (check it using SNLS transaction)
In that case, CL_ABAP_CONV_OUT_CE is okay.
After having converted to code page 0120 (into xstring), here is how to cast xstring variable to character.
DATA x TYPE x LENGTH 800. " so that to compile, x length
" must be a multiple of 4 (don't ask why)
FIELD-SYMBOLS <c> TYPE c.
x = xstring. "make sure x is long enough
ASSIGN x TO <c> CASTING.
BBKF-SGTXT = <c>.

XDK support for UTF-16 Unicode

Hi,
Does the Oracle Java XDK, specifically the XSQL servlet and api,
support UTF16-Unicode.
Presumably, .xsql files have to be stored, read and queries
executed in a Unicode compliant format for this to work. Is this
possible currently???
Thanks,
- Manish

If you are using XDK 9.0.1 or later, and you are using JDK 1.3,
this combination supports UTF-16. XSQL inherits the support
for "free" in this case.

NSString with Unicode + ASCII characters

Hi All,
I am developing an mac application in 10.5.
I need to deal with normal ASCII characters and Unicode characters.
I want to calculate number of bytes taken to store the string.
I am using [[unicodeStr dataUsingEncoding:NSUTF8StringEncoding]bytes].
This is working fine if "unicodeStr" string has only unicode characters.
If "unicodeStr" it is combination of (unicode characters + ASCII Characters) "NSUTF8StringEncoding" is not working.
Can anybody tell me how to proceed??
Thanks in advance.

There is no such thing as a combination of Unicode and ASCII. ASCII is part of Unicode. Any ASCII string is a valid UTF8 string.

Conversion of non - unicode ECC6 to unicode ECC6

Hi,
We are planning to upgrade/ convert the non unicode version of our current SAP system to an unicode version of ECC 6. Is there any standard guide line or road map which we can follow to make this conersion successful.
Regards,
Vadivelan.N.

Hi,
check this thread :
non-unicode to unicode conversion
please kindly search first before asking a question...
hope it help you.
rgds,
Alfonsus Guritno W.N.

Conversion(UTF - 8) functions......

hi,
i have prepared a registration, in these the user can register with korean text. but in the database they couldn't able to read the korean text.
is there any unicode converstion function in flash.
any Solution ...
Ayathas

I'm sorry, but you will have to explain the issue a bit more. It really isn't clear.
You say you've "prepared a registration," but what kind of registration? You mean some kind of form in Flash?
And then you say "in the database," but what database? Is flash sending that information to some kind of database? If so what method are you using and what kind of database is it?
And then you ask about a unicode conversion. Flash is most happy when using Unicode, so in general all the text should already be in Unicode. So there might be something wrong with how the database server is recieving the information.
And finally the only thing I know is that if you are using useCodePage, you shouldn't.

Regd:UTF 8 Unicode Version

Hi Gurus,
How to find UTF-8 Version In Database? I need whether it is using version 1.0 or 2.0 like that.
Regards,
Simma...

jeneesh wrote:
Efficientoracle wrote:
hi,
s i need unicode version. How to find it in our database. Am using 11.2.0.3.
Regards,
Simma..It cannot change across databases.. As the documentation says, for 11g, it uses unicode version 3.0Oracle Database 11g, Release 1: 5.0
Cheers,
Manik.

UTF-8, Unicode, XML and windows problems

Hi there,
I'm developing an application which uses a lot of russian text.
This russian text is stored in XML and can be sent to a server remotly
I use the standard javax.xml libaries to parse the xml files and DBunits XMLWriter to write generate XML strings.
The XML returned by the DBunit stuff is UTF-8, but its inside a UTF-16 string.
So when I generate xml and print it out I get something that looks like the following:
�� ?��Thats ok, beacause I can stick that streight into a file when writing files and it works.
But the problem comes when sending the XML over the server.
The sending implentation I use must be able to send java generated utf-16 and xml read utf-8.
So I convert the XML from utf-8 to utf-16, using the following:
byte[] stringBytesISO = isoString.getBytes("ISO-8859-1");
utf8String = new String(stringBytesISO, "UTF-8");And that works perfectly on my linux system.
However when I run it on windows, it only seems to convert some of the characters
Привычным �?ном за�?нут дороги до ве�?ны,Does anyone know whats going wrong here?

jammers1987 wrote:
I use the standard javax.xml libaries to parse the xml files and DBunits XMLWriter to write generate XML strings.DbUnit is a testing tool; are you saying you're using it in a production system? Ideally, you should use the same library to write the XML as you do to read it, but you definitely shouldn't be using DbUnit in this context.
The XML returned by the DBunit stuff is UTF-8, but its inside a UTF-16 string. That should never happen. XML is just text, and text can either be in the form of a Java string, or it can be stored externally using a text encoding like UTF-8. Never mind that Java strings use the UTF-16 encoding; you don't need to know or mention that. Encodings only come into play when you're communicating with something outside your program, like a file system or a database.
When you generate the XML, you specify that the encoding is UTF-8. When you read the XML, you specify that the encoding is UTF-8. That's all.

Conversion utf-16 unicode - ASCII

Similar Messages

Maybe you are looking for