Encoding conversion from UTF-16 to UTF-8

Hi all
I have a simple java String which is in UTF-16 format. I have to convert it to UTF-8 format coz my application expects UTF-8 only.
Can any body tell me what should i do for this???
thanks

but isn't it true that java supports only UTF-16 strings.
When i create the new String by using a UTF-8 byte array , what i feel is, it'll again convert it to UTF-16 data.
Suppose i have a String s1="dd" in UTF-16 format.
i have another one s2="ddd" in UTF-8 format
if i perform s1+s2 operation , how will JVL come to nkow about the encoding value for both strings??how will it perform the operation??
plz respond back and clear my doubt........

Similar Messages

  • CONVERSION FROM ANSI ENCODED FILE TO UTF-8 ENCODED FILE

    Hi All,
    I have some issues in conversion of ANSI encoded file to utf encoded file. let me tell you in detail
    I have installed the Language Support for Thai Language on My Operating System.
    now, when I open my notepad and add thai character on the file and save it as ansi encoding. it saves it perfectly and also I able to see it on opening the file again.
    This file need to be read by my application , store in database and should display thai character on jsp after fetching the data from database. Currently it is showing junk character on jsp reason being that my database (UTF8 compliant database) has junk data . it has junk data because my application is not able to read it correctly from the file.
    If I save the file with encoding as UTF 8 it works fine. but my business requirement is such that the file is system generated and by default it is encoded in ANSI format. so I need to do the conversion of encoding from ANSI to UTF8 . so Any of you can guide me on the same how to do this conversion ?
    Regards
    Gaurav Nigam

    Guessing the encoding of a text file by examining its contents is tricky at best, and should only be done as a last resort. If the file is auto-generated, I would first try reading it using the system default encoding. That's what you're doing whenever you read a file with a FileReader. If that doesn't work, try using an InputStreamReader and specifying a Thai encoding like TIS-620 or cp838 (I don't really know anything about Thai encodings; I just picked those out of a quick Google search). Once you've read the file correctly, you can write the text to a new file using an OutputStreamWriter and specifying UTF-8 as the encoding. It shouldn't really be necessary to transcode files like this, but without knowing a lot more about your situation, that's all I can suggest.
    As for native2ascii, it isn't for encoding conversions. All it does is replace each non-ASCII character with its six-character Unicode escape, so "voilá" becomes "voil\u00e1". In other words, it avoids the problem of character encodings by converting the file's contents to a form that can be stored as ASCII. It's mainly used for converting property or resource files to a form that can be read by the Properties and ResourceBundle classes.

  • Encoding from UTF-16 to UTF-8

    Hi,
    I need to convert from UTF-16 to UTF-8 encoding.
    I receive an CSV file in encoding UTF-16 for our backend system. but our external partner needs the encoding to be UTF-8
    How can I change the encoding ?

    Hello Frank,
    We have used TextCodePageConversionBean to meet such a requirement in one of our scenarios using CSV files.
    http://help.sap.com/saphelp_nw04/helpdata/en/45/da2deb47812e98e10000000a155369/content.htm
    Can you please try this and let us know if this helps?
    Thanks.
    Best Regards,
    Shweta

  • Conversion  ISO-8859-7- UTF-8  and UTF-8 - ISO-8859-7

    Hi, I written this function to do a Charset conversion
    from ISO-8859-7 to UTF-8 and vice versa
    void ChangeChersetEncoding(String EncodingType)
    String GrammarText;
    try
    GrammarText = Editor.getText();
    b = GrammarText.getBytes(LastEncoding);
    String strTemp = new String(b,EncodingType);
    Editor.setText(strTemp);
    LastEncoding = EncodingType;
    catch (UnsupportedEncodingException e)
    JOptionPane.showMessageDialog(this, "Error: " + e.getMessage
    () , "Error", JOptionPane.ERROR_MESSAGE);
    The steps followed are:
    1)I initialize Editor (that is a JEditorPane) with a InputStreamReader, that use by default "CP1252"(window - latin1) charset encoding.
    2)When I call the function the first time with EncodingType = "ISO-8859-7" and LastEncoding = "CP1252"(window - latin1), Editor shows greek character as I aspected.
    3)When I call the function the second time with EncodingType = "UTF-8" and LastEncoding = "ISO-8859-7", Editor shows unknown character ('�') as I aspected.
    4)The problem is when I call the function the third time with EncodingType = "ISO-8859-7" and LastEncoding = "UTF-8" Editor don't show the original greek text, as I didn't aspect.
    Thank you for all.

    b = GrammarText.getBytes(LastEncoding);
    String strTemp = new String(b,EncodingType);Here you take a String (which is in Unicode) and convert it to bytes, using "LastEncoding". Next you take those bytes and convert them back to a String, assuming that they were encoded using "EncodingType". But they weren't, so at best this will do nothing and at worst it will produce garbage. It certainly won't do anything useful.
    As I said all Java strings are in Unicode. If you want to convert something from one encoding to another encoding, you can only convert an array of bytes to a String using the first encoding, then convert that back to bytes using the second encoding. Converting a String to a String just makes no sense.

  • Dreamweaver cc html entity conversion problem in mac -NO utf-8 related answer please

    I probably am fighting against a bug existing in DW for a while, and i'm really on the edge of bursting out! 
    Here are the specifications:
    Dreamweaver CC from creative cloud (also tested w/ CS5.5 too) installed on mac, OS and DW user interfaces are english, and on mac turkish keyboard layout is also installed.
    I have been using DW for maybe 15 years, since it was macromedia.. But was always on windows. This is the first time I use it on mac. Here is my problem step by step:
    1- Dreamweaver > Pereferences > New Document > Default Encoding: Western (ISO Latin 1) (NOT UTF-8 PLEASE, IT KEEPS THE CHARS UNCHANGED, ISO LATIN1 IS IMPORTANT)
    2- Go to Design View,
    3- There are 6 special characters in Turkish (times 2 for the caps versions of course), type:
    ĞÜŞİÖÇğüşıöç
    4- Go back to code view, what i should have seen was:
    ĞÜŞİÖÇğüşıöç
    But I see:
    ĞÜŞİÖÇğüşıöç
    There are 3 chars (and capital versions) NOT converted to html entity at all. Which were: ĞŞİğşı
    But I should have seen them as: ĞŞİğşı
    Any help would be appreciated, I do not want to leave my old friend DW just because of a weird conversion problem...

    Ok, when you look at the code view, what do you see exactly?
    do you see unconverted
    ĞÜŞİÖÇğüşıöç
    or converted
    ĞÜŞİÖÇğüşıöç
    Here is one of my reasons:
    I sometimes create newsletters in turkish for my customers, and the html files i prepare are sent to customers attached as inline through various versions of outlook or thunderbird, or through i completely different email sender company (none is sent by me, i only create the html file). And most of the time the headers and some coding are cut off from the code when used to send as newsletter, and i have no control at all on it. so i have to create absolute correct viewed/rendered html files since i have no control at all on which sending method will be used or which os or browser or mail system will be used to open it...

  • Can any version of Excel save to a CSV file that is either UTF-8 or UTF-16 encoded (unicode)?

    Are there any versions of Excel (chinese, japanese, russian... 2003, 2007, 2010...) that can save CSV files in Unicode (either UTF-8 or UTF-16)?
    If not, is the only solution to go with tab-delimited files (save as Unicode-text option)?

    Hi Mark,
    I have the same problem. Trying to save my CSV file in UTF8 encoding. After several hours in searching and trying this also in my VSTO Add-In I got nothing. Saving file as Unicode option in Excel creates file as TAB separated. Because I'd like to save the
    file in my Add-In application, the best to do is (for my problem) saving file as unicode tab delimited and then replacing all tabs with commas in the file automatically.
    I don't think there is a direct way to save CSV as unicode in Excel. And I don't understand why.

  • Convertion from UTF-16 to UTF-8 in XI

    Hi,
      From Source system (MDM), sometimes data are coming in UTF-16 format in to XI. My target system is R/3 which is UTF-8. Here's the scenario:-
    MDM->MQ Queue-> Local JMS Queue-> XI->R/3
    Here I am using sender JMS Queue adapter to receive the data from Local JMS Queue and using receiver IDOC adapter to send the IDOC into R/3. I am using ABAP mapping for this scenario.
      Since the target system in UTF-8 and the data are coming sometimes in UTF-16, how can I change the format UTF-16 to UTF-8 in sender JMS adapter.
    Please advice.
    Reply with details would be appreciated.
    BR
    Soumya

    Hi Soumya ,
    You can do this in Adapter module in JMS sender adapter .
    obj = inputModuleData.getPrincipalData();
    msg = (Message) obj;
    XMLPayload xmlpayload = msg.getDocument();               
    xmlpayload.getContent()
    convert from UTF 16 to UTF 8 then
    xmlpayload.setContent();
    Hope this works.
    Cheers,
    Reddy

  • How is the largest cde point differs from UTF-8 to UTF-16

    how is the largest cde point differs from UTF-8 to UTF-16
    the largest code point is 10FFFF for both of them then how is differ from the fromat
    thank you,
    Regards,
    Jagrut BharatKumar Shukla

    In this specific case there are no differences for code points storing character data because used character set is the same.
    But what is your Oracle 4 digits version ?
    Are you sure that database character set and national character set are the same ?
    In recent Oracle versions, database character set and national character set are different. For example:
    SQL> select * from nls_database_parameters where parameter like '%SET%';
    PARAMETER                      VALUE
    NLS_CHARACTERSET               AL32UTF8
    NLS_NCHAR_CHARACTERSET         AL16UTF16Edited by: P. Forstmann on 28 sept. 2011 18:51

  • Getting ÿþ as saved conversations from Lync in Outlook in Office 2013

    Hi,
    I've been trying to get to the bottom of this and have found similar posts, but no one seems to have an answer.
    When I IM someone using Lync 2013, they get a pop up notification but instead of the message they see ÿþ<.  Once they open the chat window, they can see my typed text.  Occasionally, certain people can't see the first line of my chat, but as
    long as they keep the chat window open, they can see everything new I type.
    All my conversations that are saved in outlook show ÿþ< for the text and are unreadable.  I've disabled the saving of conversations because they have become worthless.
    I believe it has to do with BOM but have not been able to find a way to fix this.
    If I copy a conversation from the chat window and paste it into Microsoft Word it shows ÿþ<, but if I paste it into notepad the conversation appears.
    (I had inserted a screenshot here, but am unable to because I am unable to figure out how to get my account "verified")
    I've tried changing the preferred encoding for outgoing messages: to Unicode (UTF-8) in Outlook, but this had no effect and I can't find a similar option in Lync 2013.
    (I had inserted a screenshot here, but am unable to because I am unable to figure out how to get my account "verified")
    I enabled logging for Lync and the event IDs that come up are 1, 11 and 12, to which I cannot find any information for at the moment.
    Any help and or suggestions would be appreciated.

    Hi,
    Did the issue happen only for you or for multiple users?
    Please try to delete Lync User Profile and information on Registry, then repair Office 2013.
    The path of Lync User Profile: %UserProfile%\AppData\Local\Microsoft\Office\15.0\Lync
    The path for information on Registry: HKCU\Software\Microsoft\Office\15.0\Lync\[email protected]
    Then test the issue again.
    Best Regards,
    Eason Huang
    Eason Huang
    TechNet Community Support

  • How to convert xml utf 16 to utf 8

    Is it possible to convert xml file with UTF16 to UTF8 using ABAP? I am using ECC 6.0
    Appreciated your inputs.

    Hhmm, interesting. I thought it should be straightforward, but it the two solutions I could think of seem a bit convoluted. First way is probably to use the iXML libray, where the starting point is the class CL_IXML. You can find the SAP documentation [here|http://help.sap.com/saphelp_nw04/helpdata/en/86/8280d212d511d5991b00508b6b8b11/frameset.htm].
    Then there's a "manual approach": Use OPEN DATASET to read the UTF-16 file, then modify the XML encoding attribute and save it as UTF-8 file. Not straightforward, because the tempting command option [OPEN DATASET .. LEGACY TEXT MODE CODE PAGE|http://help.sap.com/abapdocu_70/en/ABAPOPEN_DATASET_MODE.htm] doesn't work. Per ABAP help:
    For the specification of the code page cp, a character-type data object is expected that must contain - at the time of execution of the statement - the label of a non-Unicode page from the column CPCODEPAGE in the database table TCP00. A Unicode page must not be specified.
    Darn, looks like they expect most Unicode files to be UTF-8. But that might be the reason you want to convert it...
    So use the following steps:
    <ol><li>Open the file as a binary file (only option for UTF-16, seev[here|http://help.sap.com/abapdocu_70/en/ABAPOPEN_DATASET_ENCODING.htm]) via [OPEN DATASET file FOR INPUT IN BINARY MODE|http://help.sap.com/abapdocu_70/en/ABAPOPEN_DATASET.htm] and read content into XSTRING using [READ DATASET|http://help.sap.com/abapdocu_70/en/ABAPREAD_DATASET.htm].</li>
    <li>Convert it to a string using utility class [CL_ABAP_CONV_IN_CE|http://help.sap.com/saphelp_nw04/helpdata/en/79/c554afb3dc11d5993800508b6b8b11/frameset.htm], see example [here|http://wiki.sdn.sap.com/wiki/display/Snippets/ABAPCodePage+Conversions].</li>
    <li>Replace the encoding markup for UTF-16 in the XML with a reference to UTF-8.</li>
    <li>Write the XML content back to a file using [OPEN DATASET file|http://help.sap.com/abapdocu_70/en/ABAPOPEN_DATASET.htm] FOR [OUTPUT|http://help.sap.com/abapdocu_70/en/ABAPOPEN_DATASET_ACCESS.htm] IN [TEXT MODE|http://help.sap.com/abapdocu_70/en/ABAPOPEN_DATASET_MODE.htm] [ENCODING UTF-8|http://help.sap.com/abapdocu_70/en/ABAPOPEN_DATASET_ENCODING.htm] and [TRANSFER|http://help.sap.com/abapdocu_70/en/ABAPTRANSFER.htm]</li></ol>
    Maybe somebody has a shorter way...
    Cheers, harald

  • How to convert UTF-16 to UTF-8

    data source is 'ъѓъѓ№ѓфчр Фюыр№ 80Ъ                     ', it is Ukraine.
    I want to remove the blank, but no matter which key word in SAP I use, it doesn't work. i checked hexadecimal of the space from the text above , it is 00A0, but actually system only regard 0020 as space. i checked on internet,  the space of the text should be encoded with UTF-16 and system is UTF-8, 00A0 is extended ASCII, so 00A0 can't be seen in SAP system.
    my question is in this situation, how can a remove the space?

    Hi Eric,
    This Document might help u,
    Link: [how to convert UTF-16 to UTF-8|How to convert xml utf 16 to utf 8;
    -Dileep .C

  • Character encoding conversion for marshall/unmarshall?

    Hello, Java Web Services gurus,
    I am wondering if there is an easy/plugin-able way to do character encoding conversion transparently in the process of marshall/unmarshall.
    Basically, my input/output will always be these UTF-8 XMLs. As the backend database is ISO encoded, I hope the result of unmarshall will give me ISO strings. And when it comes to marshall, the ISO strings can be transparently turned to UTF-8 XML response. Right now I'm using JAXB's annotations to parse XML into objects.
    I understand there will be chars in the input file not able to get converted, if so, I'd be be expecting an error/exception that flags the failure
    Hope I sound clear. This has been a headache for a while. Really hope someone may help out a bit. Thanks a million in advance

    [Duplicate Post|http://forums.sun.com/thread.jspa?messageID=10971554&tstart=0#10971554]

  • Problem with URL encoding conversion

    Hi all,
    I am working on an I18N application and in my application one component sends the request to another component and then this component fetch that requet and extract the query-parameters from the request (HTTP request).
    Now the problem is that the input to first component can be given in one of the 5 character encodings:-
    UTF-8
    Shift_JIS
    EUC_JP
    Windows-31J
    ISO-2022-JP
    I have created a test program that convert the encoded URL from one character encoding to another character encoding. It is working successfully for the above 4 encodings but for the last encoding that is "ISO-2022-JP" this fails. The test programs is: -
    import java.io.*;
    import java.util.*;
    import java.net.URLDecoder;
    import java.net.URLEncoder;
    class JPtoUTF8{
         public static void main(String[] args){
              try{
                  String shift_jis = "%82%C8%82%A4%82%8B%82%E8";          // This is Shift_JIS encoded URL
                  String iso2022jp = "%1B%24B%24J%24%26%23k%24j%1B%28B";  // This is ISO-2022-JP encoded URL
                  String utf8 = "%E3%81%AA%E3%81%86%EF%BD%8B%E3%82%8A";   // This is the result that should be obtained
                  String decodedShift_jis = URLDecoder.decode(shift_jis,"Shift_JIS");
                  String decodedIso2022jp = URLDecoder.decode(iso2022jp,"ISO-2022-JP");
                  String encodedShift_JIS = URLEncoder.encode(decodedShift_jis,"UTF-8");
                  String encodedIso2022jp = URLEncoder.encode(decodedIso2022jp,"UTF-8");
                   System.out.println("shift_jis        = "+shift_jis);
                   System.out.println("encodedShift_JIS = "+encodedShift_JIS);
                   System.out.println("iso2022jp        = "+iso2022jp);
                   System.out.println("encodedIso2022jp = "+encodedIso2022jp);
              }catch(Exception e){
                   e.printStackTrace();
    }I am using jdk5 for this application.
    Please give your valuable suggestions.
    Thanks in advance.

    Could the cause be that ISO-2022-JP is not just ISO-2022-JP:
    http://www.w3.org/TR/japanese-xml/#AEN28427904
    Maybe what you are getting is one of the flavors, while the java urldecoder uses another flavor? Or maybe the string you are getting is incorrectly encoded to being with (might have been incorrectly converted from shift-jis)?
    With the shift-in shift-out design it is a difficult encoding to deal with under the best of circumstances, so you have my sympathy.

  • How we represent largest code point in UTF-8 and UTF-16 whats the differenc

    how we represent largest code point in UTF-8 and UTF-16 whats the differenc
    points will be awarded

    There are standards from for CHARACTER encoding.
    See below for a brief description:
    UTF-16 (16-bit Unicode Transformation Format) is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. The encoding form maps code points (characters) into a sequence of 16-bit words, called code units. For characters in the Basic Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other planes, the encoding will result in a pair of 16-bit words, together called a surrogate pair. All possible code points from U0000 through U10FFFF, except for the surrogate code points UD800–UDFFF, are uniquely mapped by UTF-16 regardless of the code point's current or future character assignment or use.
    UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any universal character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is consistent with ASCII (requiring little or no change for software that handles ASCII but preserves other values). For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages, and other places where characters are stored or streamed.
    Check this site for details.
    http://unicode.org/.

  • Convert UTF-16 to UTF-8

    Hi
    My source file is UTF-16 and Target file is UTF-8. I am using XSLT mapping . If i m testing in Altova XML  its working fine. But when i am testing the same thing using my scenario its not wroking.
    I have tested this using Test option in ID. If i change the UTF-16 to UTF-8 while testing in ID but if i m trying to change it directly in XML file its not accepting.
    How to change UTF-16 to UTF-8 while XSLT mapping. How to reslove this problem
    Regards
    Sowmya

    Which Adapter you are using?
    If you are using the file adapter then you can use the File adapter property as file.encoding=<codepage>
    you can refer to below link
    http://help.sap.com/saphelp_nw04/helpdata/en/0d/00453c91f37151e10000000a11402f/frameset.htm
    Gaurav Jain

Maybe you are looking for