UTF-8 file encoding issues within Java?

I'm working on an application that takes data from an IBM mainframe(z/OS), converts it from IBM-1047 encoding to UTF-8(via iconv utility) and binary FTP's it to a Unix box where we process the file with our Java app and return the processed file.
Within our Java app on the Unix platform we stream the file into a byte array and then create a new String from the byte array specifying "UTF-8" as the encoding parameter.
The problem is that Java appears to be taking certain 2 byte UTF-8 characters and converting them to a single char.
E.g. I have a \uC3A6 char in the input file, I can view the bytes in the byte array that's read in, and it's still a \uC3A6, but as soon as I create the new String with UTF-8 encoding and view the bytes, those 2 bytes are now shown as a single byte(0xE6). The code I have that's looking for the char \uC3A6 then fails.
Can anyone explain what's happening here?? Sorry for the long message.

The encodings which convert the character (char)0xC3A6 to the 2-entry byte array {0xC3, 0xA6} (unsigned) are "UTF-16BE", "UnicodeBigUnmarked", and "UnicodeBig." These are essentially identical except for the use of byte-order mark. As was said above UTF-8 converts (char)0xC3A6 to the 3-entry byte array {0xEC, 0x8E, 0xA6} (unsigned).
http://java.sun.com/j2se/1.4.1/docs/guide/intl/encoding.doc.html

Similar Messages

  • File encoding issue on import step

    Dear all,
    I am experiencing an import issue with csv files generated from MS Dynamics AX ERP, a Reporting Services query I guess.
    Having a look at the concerned files using notepad++ I can see that encoding is UTF-8.
    As a workaround, if I open files using Excel and re-save it as csv, then they are encoded as ANSI as UTF-8, in that case FDMEE import process is effective.
    Regarding our settings (FDMEE v.11.1.2.3.530):
    - System level: file charset is set to UTF-8
    - System level: no specified file charset
    - User level: no specified file charset
    My questions are :
    - Is this normal behavior? Why UTF-8 files are not supported if system level conf sayz UTF-8?
    - Is it relative to system encoding (OS, DB...)
    - Is it related to file so do I have to force encoding through befImport script?
    I had a deep look at this article http://akafdmee.blogspot.fr/2014/11/importing-files-with-different-file.html which is great but still no magic solution comes to my mind! I'll keep on testing.

    Hello Dora,
    I had the same problem, even with the most simple JavaBean Class. I found out that the export-jar function of the Developer Studio 2.0.13 creates sightly different jars than when I use the jar-tool from the command line. You can check the jar structure by using the command
    jar tf file.jar
    JAR File created by Developer Studio (did not work):
    D:temp>jar tf test.jar
    META-INF/MANIFEST.MF
    de/test/bean/SimpleBean.class
    JAR created by jar tool from the SDK from the top directory with
    jar cf test.jar de
    D:temp>jar tf test.jar de
    META-INF/
    META-INF/MANIFEST.MF
    de/
    de/test/
    de/test/bean/
    de/test/bean/SimpleBean.class
    The second jar file worked for me even <b>without any special manifest file</b> but the one that is automatically created.
    I hope that helps!
    Jari

  • File encoding issues with Mac OSX?

    I really love DreamWeaver and I think its the best editor our there, except for one thing - the file encoding!
    I tried setting the default encoding to UTF-8 under Preferences -> New Document, but that won't cut it.
    Dreamweaver always creates/saves new files as Western/ISO format, so I always have to open them up with another text editor and overwrite the file with UTF-8 format.
    Is there a fix to this or a extension to download?
    I'm using CS4 on Mac OSX Leopard.

    CS4 should automatically save files as UTF-8. It's the default setting. You shouldn't need to do anything special.
    One thing you could try is removing the Prefs file. On a Mac, it's located at <username>:Library:Preferences:Adobe Dreamweaver CS4 Prefs.

  • Character Encoding and File Encoding issue

    Hi,
    I have a file which has a data encoded using default locale.
    I start jvm in same default locale and try to red the file.
    I took 2 approaches :
    1. Read the file using InputStreamReader() without specifying the encoding, so that default one based on locale will be picked up.
    -- This apprach worked fine.
    -- I also printed system property "file.encoding" which matched with current locales encoding (on unix cooand to get this is "locale charmap").
    2. In this approach, I read the file using InputStream as an array of raw bytes, and passed it to String contructor to convert bytes to String.
    -- The String contained garbled data, meaning encoding failed.
    I tried printing encoding used by JVM using internal class, and "file.encoding" property as well.
    These 2 values do not match, there is weird difference.
    For e.g. for locale ja_JP.eucjp on linux box :
    byte-character uses EUC_JP_LINUX encoding
    file.encoding system property is EUC-JP-LINUX
    To get byte to character encoding, I used following methods (sun.io.*):
    ByteToCharConverter btc = ByteToCharConverter.getDefault();
    System.out.println("BTC uses " + btc.getCharacterEncoding());
    Do you have any idea why is it failing ?
    My understanding was, file encoding and character encoding should always be same by default.
    But, because of this behaviour, I am little perplexed.

    But there's no character encoding set for this operation:baos.write("���".getBytes());

  • About File Encoding for multiple Languages

    Hi My Scenario is Paroxy to File.
    I am getting multiple languages from the proxy and need to write all the languages in one scv file.
    I used UTF-8  File Encoding in the receiver file channel so that all languages comes except Hebrew.
    I want to write Hebrew language also in the same file.
    If i use ISO-8859-8 Hebrew language writes and other languages are showing as some special characters.
    How to sole this problem? I want to write the file with all the languages.
    Thanks

    > I used UTF-8  File Encoding in the receiver file channel so that all languages comes except Hebrew.
    > I want to write Hebrew language also in the same file.
    Are you sure that it is not like the hebrew characters are part of the file but you cannot view it?
    > If i use ISO-8859-8 Hebrew language writes and other languages are showing as some special characters.
    same here.

  • UTF encoding issues on file adapters and mappings

    Hi,
    We did some tests regarding to UTF-8 and UTF-16 encoding using file adapters. Our conclusion so far is (when using Windows OS):
    1. Inbound adapter can handle UTF-8 and UTF-16 correctly, but do not specify the encoding!
    2. XI mappings will set the XML encoding to UTF-8 correctly when sending an UTF-16 file to XI.
    3. Outbound adapter can only handle UTF-8 (and US-ACSII and ISO-8859-1) correctly.
    The exact test results are:
    >>Outbound file adapter bug.
    If no encoding is specified in the outbound file adapter, UTF-8 and UTF-16 are handled correctly. However if the encoding is set to UTF-16, XI mapping will fail with the error:
    During the application mapping com/sap/xi/tf/_CHRIS_OUTBOUND_TO_INBOUND_ a com.sap.aii.utilxi.misc.api.BaseRuntimeException was thrown: Fatal Error: com.sap.engine.lib.xml.parser.Parser~
    Part of the trace:
    com.sap.aii.ibrun.server.mapping.MappingRuntimeException: Runtime exception occurred during execution of application mapping program com/sap/xi/tf/_CHRIS_OUTBOUND_TO_INBOUND_: com.sap.aii.utilxi.misc.api.BaseRuntimeException; Fatal Error: com.sap.engine.lib.xml.parser.ParserException: XMLParser: No data allowed here: (hex) a0d, a0d, 6e3c(:main:, row:3, col:2) at com.sap.aii.ibrun.server.mapping.JavaMapping.executeStep(JavaMapping.java:72) at com.sap.aii.ibrun.server.mapping.Mapping.execute(Mapping.java:91) at com.sap.aii.ibrun.server.mapping.MappingHandler.run(MappingHandler.java:78) at com.sap.aii.ibrun.sbeans.mapping.MappingRequestHandler.handleMappingRequest
    >>Inbound file adapter bug.
    If the encoding of an inbound file adapter is set to UTF-16 everything works ok (except the XML encoding is not set correctly, but this may be a mapping issue and not an adapter issue). However the default UTF-16 encoding seems to be UTF-16BE, where I would expect UTF-16LE since this is the most commonly used encoding.
    If the encoding UTF-16LE or UTF-16BE the characterset used in the message is correct, except the BOM of the file. The BOM is empty which means UTF-8 encoded file. Since the file is UTF-16BE or UTF-16LE encoded, this is wrong and the correct BOM should be added by the adapter.
    Encodings like US-ASCII and ISO-8859-1 are handled correctly.
    >>Mapping bug
    When we send in a message encoded in UTF-8 and want to send it out as a UTF-16 encoded message, we need to set the XML encoding to UTF-16. Normally this is done by an XSLT mapping using the <xsl:output encoding=”UTF-16”/> command.
    The UTF-8 message will get processed by the XSLT and any special character will be converted to its UTF-16 value. However the output message is not UTF-16 encoded (1 byte in-stead off 2 bytes).
    When this 1 byte message is send to the inbound adapter (encoding is set to UTF-16) the message will be translated from 1 byte to 2 byte (UTF-8 to UTF-16). The characters that were converted from UTF-8 to UTF-16 will be read as single byte characters and will be converted again. This will result in an incorrect message with illegal characters.
    So basically characters will be converted to UTF-16 2 times, which is incorrect.
    Maybe someone can confirm this on another XI system (maybe different OS). If you need test files or mapping, please let me know.
    Kind regards,
    Christiaan Schaake.

    Update after carefully reading all the UTF related documents on the internet.
    For UTF-16 the BOM is required and the adapter is handling this correctly. (encoding=UTF-16 will create the BOM).
    For UTF-16LE and UTF-16BE the BOM must not be set. The application should be able to handle the conversion. The adapter is working correct again.
    If the adapter is set to binary mode in stead of the text mode, the file will always be read correctly.
    About the mapping issue, I'm still experimenting with this one.
    Kind regards,
    Christiaan Schaake.

  • Encoding Issue : JMS and Mapping : utf-8 iso8859-1

    Hi All,
    I am facing some problem with encoding issue.
    Scenario :  JMS -->  SAP PI --> JMS
    Requirment : Input plain text file contain some special characters,"©®" . Based on this condition,In Java Mapping
                       we check the Payload and changed the 'encoding' tag to UTF-8 or   ISO8859-1.                                                     
                   : <?xml version="1.0" encoding="UTF-8"?>     in the target XML output.
    While testing in Operation mapping our Java mapping works fine. as the encodeing tag changes from
                 UTF-8 to ISO8859-1 if the special character exists.But if I test the same in Integration Directory(Test Configuration)
                 or did a end to end  testing. The encoding tag did'nt changes.
    For testing we had to a set of Plain Text files with UTF-8 and ISO8859-1 .
    I tried the options of using beans in Adapter modules in Sender JMS channel.
    MessageTransformBean, TextCodepageConversionBean, XmlAnonymizerBean
    These doc & threads ,was also referred[How to Handle Encoding in PI|http://www.sdn.sap.com/irj/scn/index?rid=/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42]
    Regards,
    Ashutosh R

    Hi
    public static boolean fixSpecialCharforWeb(String text) {
            int i = 0;
            Character c = null;
            char[] ctext = null;
            StringBuffer newText = new StringBuffer("");
            //boolean encodingType = false;
            if ((text == null) || (text.trim().length() == 0)) {
                return encodingType;
            } else {
                try {
                                   for (i = 0; i < text.trim().length(); i++) {
                        ctext = text.trim().substring(i, i + 1).toCharArray();
                        c = new Character(ctext[0]);
                        //Single quote
                        if ((text.trim().substring(i, i + 1).equals("'")) || (c.hashCode() == 8217) || (text.trim().substring(i, i + 1).equals("?")) || (c.hashCode() == 146) || (c.hashCode() == 145)) {
                            //newText.append("'");
                            encodingType = true;
                            return encodingType;
                        //Double quotes
                        if ((c.hashCode() == 8220) || (c.hashCode() == 8221) || (c.hashCode() == 147) || (c.hashCode() == 148)) {
                            //newText.append(""");
                            encodingType = true;
                            return encodingType;
                        // bullet point
                        if ((c.hashCode() == 8226) || (c.hashCode() == 149)){
                            encodingType = true;
                            return encodingType;
                        // tilde
                        if ((c.hashCode() == 732) || (c.hashCode() == 152)){
                            encodingType = true;
                            return encodingType;
                        // Soft Hypen
                        if (c.hashCode() == 173){
                            encodingType = true;
                            return encodingType;
                        // En-Dash
                        if ((c.hashCode() == 8211) || (c.hashCode() == 150)) {
                            encodingType = true;
                            return encodingType;
                        // Em-Dash
                        if ((c.hashCode() == 8212) || (c.hashCode() == 151)) {
                            encodingType = true;
                            return encodingType;
                        // Euro Sign
                        if ((c.hashCode() == 8364) || (c.hashCode() == 128)) {
                            encodingType = true;
                            return encodingType;
                        // Yen Sign
                        if (c.hashCode() == 165) {
                            encodingType = true;
                            return encodingType;
                        // Pound Sign
                        if (c.hashCode() == 163) {
                            encodingType = true;
                            return encodingType;
                        // 1/2 sign
                        if (c.hashCode() == 189) {
                            encodingType = true;
                            return encodingType;
                        // 1/4 sign
                        if (c.hashCode() == 188) {
                            encodingType = true;
                            return encodingType;
                        // 3/4 sign
                        if (c.hashCode() == 190) {
                            encodingType = true;
                            return encodingType;
                        // Sword/dagger
                        if ((c.hashCode() == 8224) || (c.hashCode() == 134)) {
                            encodingType = true;
                            return encodingType;
                        // Trademark
                        if ((c.hashCode() == 8482) || (c.hashCode() == 153)) {
                            encodingType = true;
                            return encodingType;
                        // Ampersand &
                        if ((text.trim().substring(i, i+1).equals("&")) || (c.hashCode() == 38)) {
                            encodingType = true;
                            return encodingType;
                        //Registered mark
                        if ((text.trim().substring(i, i + 1).equals("?")) || (c.hashCode() == 174)) {
                            //newText.append("®");
                            encodingType = true;
                            return encodingType;
                        //Copyright mark
                        if ((text.trim().substring(i, i + 1).equals("?")) || (c.hashCode() == 169)) {
                            encodingType = true;
                            return encodingType;
                        // Question.
                        if (c.hashCode() == 63 && c.toString().equals("?")){
                            //newText.append("?");
                            encodingType = true;
                            return encodingType;
                        //handling symbol ?
                        if ((text.trim().substring(i, i+1).equals("?")) || (c.hashCode() == 233)) {
                            encodingType = true;
                            return encodingType;
                        if ((text.trim().substring(i, i+1).equals("?")) || (c.hashCode() == 232)) {
                            encodingType = true;
                            return encodingType;
                        if (c.hashCode() == 144) {
                            encodingType = true;
                            return encodingType;
                } catch (Exception e) {
                    e.printStackTrace();
            return encodingType;

  • How to set File Encoding to UTF-8 On Save action in JDeveloper 11G R2?

    Hello,
    I am facing issue when I am modifying a File using JDeveloper 11G R2. JDeveloper is changing the Encoding of the File to System default Encoding (ANSI) instead of UTF-8. I have updated the Encoding to UTF-8 in "Tools | Preferences | Environment | Encoding" option and restarted the JDeveloper. I have also updated "Project Properties | Compiler | Character Encoding" option to UTF-8. None of them are working.
    I am using below version of JDeveloper,
    Oracle JDeveloper 11g Release 2 11.1.2.3.0
    Studio Edition Version 11.1.2.3.0
    Product Version: 11.1.2.3.39.62.76.1
    I created a file in UTF-8 Encoding. I opened it, do some changes and Save it.
    When I open the "Properties" tab using "Help | About" Menu, I can see that the Properties of JDeveloper are showing encoding as Cp1252. Is it related?
    Properties
    sun.jnu.encoding
    Cp1252
    file.encoding
    Cp1252
    Any idea how to make sure JDeveloper saves the File in UTF-8 always?
    - Sujay

    I have already done that. That is the first thing I did as mentioned in my Thread. I have also added below 2 options in jdev.conf and restarted JDeveloper, but that also did not work.
    AddVMOption -Dfile.encoding=UTF-8
    AddVMOption -Dsun.jnu.encoding=UTF-8
    - Sujay

  • Receiver File Adapter - Encoding issue.

    Hi Everybody,
    The file format (encoding) is different to the format generally we used to get.
    Currently we are get the flat files in DOS format.The current file when we are downloading it we are getting it in the UNIX or other format.
    For eg: 20 has been changed to 0D in the file.
    Can somebody help me on the same.
    Thanks,
    Zabiulla

    Hi,
    Check on this for file adapters
    Text
    Under File Encoding, specify a code page.
    The default setting is to use the system code page that is specific to the configuration of the installed operating system. The file content is converted to the UTF-8 code page before it is sent.
    Permitted values for the code page are the existing Charsets of the Java runtime. According to the SUN specification for the Java runtime, at least the following standard character sets must be supported:
    &#9632;       US-ASCII
    Seven-bit ASCII, also known as ISO646-US, or Basic Latin block of the Unicode character set
    &#9632;       ISO-8859-1
    ISO character set for Western European languages (Latin Alphabet No. 1), also known as ISO-LATIN-1
    &#9632;       UTF-8
    8-bit Unicode character format
    &#9632;       UTF-16BE
    16-bit Unicode character format, big-endian byte order
    &#9632;       UTF-16LE
    16-bit Unicode character format, little-endian byte order
    &#9632;       UTF-16
    16-bit Unicode character format, byte order
    Regards
    Vijaya

  • CSV file encoded as UTF - 8 loses characters when displayed with excel 2010

    Hello everybody,
    I have adapted a customer report to be able to send certain data via mail a a CSV attachment.
    For that purpose I am using class cl_bcs.
    Everything goes fine, but since mail attachment contains certain german characters as Ü, when displaying it with excel those characters appear as corrupted.
    It seems the problem is with excel, because when opening the same file with notepad, the Ü is there. If I import the file to excel with the importer, it is correct too.
    Anyway, is there any solution to this problem?
    I have tried concatenating byte_order_mark_utf8 in the beginning of the file, but still excel does not recognize it.
    Thanks in advance,
    Pablo.
    Edited by: katathema on Jan 31, 2012 2:05 PM

    - Does ms excell actually support UTF-8
    Yes. I believed that we installed some international add-on which is not in default installnation. Anyway, other UTF-8 or UTF-16 file can be openned and viewed by Excel without any problem.
    - have you verifide that the file is viewable as a UTF-8 -encoded file
    I think so. If I open it into Notepad and choose "save as", the file type if UTF-8 file
    - Try opening the file in a program you are confident
    that it support UTF-8 - eg. Mozilla...
    I will try that.
    - Check that your UTF-8 -encoded file has a UTF-8 identifier (0xFEFF ?)
    as the first character
    The unicode-16(LE or BE) file I got from internet, I found there is always two bytes in the front. (0xFEFF or 0xFFFE). My UTF-8 file generated by java doesn't have that. But should UTF-8 file also has this kind of specifcal bytes in the front? If I manually add these bytes in the front of my file using Ultraeditor and open it in Excel2000, it didn't help.
    - Try using another spreadsheet program that supports UTF-8
    Do you know any other spreadsheet program supports csv file and UTF-8.

  • Java to Excel encoding issue.

    I'm trying to export my data in to Excel file. When I open the excel file the Japanese characters looks garbled.
    iResponse.setHeader("attachment ; filename = \"" + reportFileName() + "\" ", "content-disposition");
    iResponse.setHeader("application/vnd.ms-excel;","content-type");
    iResponse.setHeader("UTF-8","Content-Encoding");
    By Googling I learnt that Excel doesn't like UTF-8 format. Is there any other way I can export the data.
    Thanks.

    Specifying the content encoding for a binary format like Excel is pointless. You are producing native Excel format, aren't you? If you're producing a CSV file and claiming it's Excel, you should have mentioned that.
    Anyway: you're producing native Excel using something like Apache POI, I suppose? I would have thought it handled encoding issues, but check its documentation or its FAQ to see if that's really the case.
    Another option, which may or may not be feasible, is to use the new Office 2007 format. It's based on XML, so all of the encoding issues are automatically handled.

  • Encoding issue for file manager

    I am using the ditto command to duplicate a file. This file has unicode filename and as per http://developer.apple.com/qa/qa2001/qa1173.html I am first normalizing the name to kCFStringNormalizationFormD and then converting it to utf-8 before calling ditto on it. This all works smoothly but when I try to get the FSRef using the original unicode name I get fnfErr. Dosn't the API CFURLGetFSRef convert the string to kCFStringNormalizationFormD? Or is there any alternate for ditto on Tiger.

    no encoding issues if i use xml (xlf or xliff) bundle as xml supports utf-8 encoding.

  • Unable to set JVM file encoding to UTF-8 on Windows

    Hi,
    I am running Tomcat on 1.5.0_05 JRE. I tried several things to set the jvm file encoding to UTF-8 instead of the default Cp1252, but no luck yet.
    The most intuitive approach seems to be to use a JVM option like
    "-Dfile.encoding=UTF-8"
    but this does not seem to have any effect. I have a WinXP pro m/c. I saw some bug reports which seemed to indicate that changing the JVM file encoding is not an available feature....is that correct? I would really appreciate any help/pointers on this. I will post the solution if I find something in the meantime.
    Thanks,
    Sriram

    I am fail to set it too. I think it is better to separate the file.encoding into two, one for accept local OS, the other for compile .java and .jsp and so on files. So we can change it and the bugs will be decrease!

  • How to Set "file.encoding" System Property to default "UTF-8"

    When i execute my code some special character are not being display correct so by programming approach i am trying to set "file.encoding" system property to "UTF-8", using command System.setProperty( "file.encoding", "UTF-8" ); and it is not working.
    If i run my jar using command java -Dfile.encoding=UTF-8 -jar myprog.jar . It is working and my special characters are also looking in right way.
    Can i set this defalut encoding by programming approach.
    Thanks
    Ashish Pancholi

    Hello,
    I have the same problem. I have a java prog that is started with "-Dfile.encoding=ISO-8859-1". Now in this program I want to print some characters using the UTF-8 encoding because I know that the terminal I will be printing on has this encoding. I tried using InputStramReader without success:
        InputStreamReader isr = new InputStreamReader(new ByteArrayInputStream("Müller".getBytes()), "UTF-8");
        BufferedReader br = new BufferedReader(isr);
        String line = null;
        while ((line = br.readLine()) != null) {
            System.out.println(line);
        }EDIT:
    the above example is to read something into my java program. If I want to write something from my java class to an output it goes like this:
    Writer out = new BufferedWriter(new OutputStreamWriter(System.out, "UTF8"));
    out.write("Müller\n");
    out.flush();... in that case I get the correct encoding.
    Thanks,
    T

  • File sender adapter - Encoding issue

    Hi,
    On my customer site, we have an interface taking a file and sending an IDoc to the non Unicode ERP system. Unfortunately, when we have cyrillic characters in the file, the processing files with the error:
    com.sap.aii.utilxi.misc.api.BaseRuntimeException: Fatal Error: com.sap.engine.lib.xml.parser.ParserException: Invalid char #0x6(:main:, row:17776, col:893)
    This is of course the result of using an invalid encoding in the communication channel. Until now, it was left blank, so UTF8 was used. I want to improve this interface in order to never again have this error because it involves some manual work fixing it and it's getting annoying in production to see this once a month.
    What I want to do next is finding out the encoding from the guys delivering the file and then placing it in the communication channel. Pretty straightforward, right? On SAP, I think the cyrillic, non ASCII character will be replaced by #, but this is acceptable by the business. Not acceptable is this constant error.
    Because I want to be sure of my assessment before I ask for approval on doing this modification with the associated testing, communication and everything, my question to you is: have you experienced this before in PI? Are all my conclusions accurate? How would you solve the problem?
    Thanks in advance and best regards,
    George

    did you try giving the encoding as ISO 8859-5  in the file adapter?
    File Type
    Specify the document data type.
    u25CB       Binary
    u25CB       Text
    Under File Encoding, specify a code page.
    The default setting is to use the system code page that is specific to the configuration of the installed operating system. The file content is converted to the UTF-8 code page before it is sent.
    also ref: http://en.wikipedia.org/wiki/Cyrillic_alphabet#Computer_encoding
    http://help.sap.com/saphelp_nw04/helpdata/en/e3/94007075cae04f930cc4c034e411e1/content.htm

Maybe you are looking for