Conversion ISO-8859-7- UTF-8 and UTF-8 - ISO-8859-7

Hi, I written this function to do a Charset conversion
from ISO-8859-7 to UTF-8 and vice versa
void ChangeChersetEncoding(String EncodingType)
String GrammarText;
try
GrammarText = Editor.getText();
b = GrammarText.getBytes(LastEncoding);
String strTemp = new String(b,EncodingType);
Editor.setText(strTemp);
LastEncoding = EncodingType;
catch (UnsupportedEncodingException e)
JOptionPane.showMessageDialog(this, "Error: " + e.getMessage
() , "Error", JOptionPane.ERROR_MESSAGE);
The steps followed are:
1)I initialize Editor (that is a JEditorPane) with a InputStreamReader, that use by default "CP1252"(window - latin1) charset encoding.
2)When I call the function the first time with EncodingType = "ISO-8859-7" and LastEncoding = "CP1252"(window - latin1), Editor shows greek character as I aspected.
3)When I call the function the second time with EncodingType = "UTF-8" and LastEncoding = "ISO-8859-7", Editor shows unknown character ('�') as I aspected.
4)The problem is when I call the function the third time with EncodingType = "ISO-8859-7" and LastEncoding = "UTF-8" Editor don't show the original greek text, as I didn't aspect.
Thank you for all.

b = GrammarText.getBytes(LastEncoding);
String strTemp = new String(b,EncodingType);Here you take a String (which is in Unicode) and convert it to bytes, using "LastEncoding". Next you take those bytes and convert them back to a String, assuming that they were encoded using "EncodingType". But they weren't, so at best this will do nothing and at worst it will produce garbage. It certainly won't do anything useful.
As I said all Java strings are in Unicode. If you want to convert something from one encoding to another encoding, you can only convert an array of bytes to a String using the first encoding, then convert that back to bytes using the second encoding. Converting a String to a String just makes no sense.

Similar Messages

Identify UTF-8 and UTF-16 formats

hi,
Clients submit there unicode messages (arabic,telugu etc langs) in hex format then our application accepts that message and process it.
But there are many tools in the market which will convert the unicode to UTF-8 and UTF-16 formats.
so i need to idetify whether the message is in
UTF-8 or
UTF-16 or
hex(no problem)
something like
isUTF8(String message)
isUTF16(String message)
so that i can convert them back to hex and dump it into database.
regards
Heral raj

You can identify whether it is UTF16 or UTF8 by looking at it's BOM (byte order mark). These are first 2 bytes of the stream.
Check this link http://www.websina.com/bugzero/kb/unicode-bom.html
I do not think implementation should be a problem
Thanks
Gaurav

How we represent largest code point in UTF-8 and UTF-16 whats the differenc

how we represent largest code point in UTF-8 and UTF-16 whats the differenc
points will be awarded

There are standards from for CHARACTER encoding.
See below for a brief description:
UTF-16 (16-bit Unicode Transformation Format) is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. The encoding form maps code points (characters) into a sequence of 16-bit words, called code units. For characters in the Basic Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other planes, the encoding will result in a pair of 16-bit words, together called a surrogate pair. All possible code points from U0000 through U10FFFF, except for the surrogate code points UD800UDFFF, are uniquely mapped by UTF-16 regardless of the code point's current or future character assignment or use.
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any universal character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is consistent with ASCII (requiring little or no change for software that handles ASCII but preserves other values). For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages, and other places where characters are stored or streamed.
Check this site for details.
http://unicode.org/.

British Pound Sterling with UTF-8 and ISO-8859-15

Please excuse my long-windedness ... I'm simply trying to answer all possible questions up front and give the most possible information. I've searched through tons of forums and all over various sites and references and am not able to come up with a concrete solution to this. I'd appreciate any help anyone has.
I'm having some trouble with character sets and international currencies.
Our server was recently upgraded from Red Hat 7.3 to Red Hat 8.0. I understand that the default system encoding thus changed from ISO-8859-15 to UTF-8. I have verified this by executing the following:
public class WhichEncoding {
public static void main(String args[])
    String p = System.getProperty("file.encoding");
    System.out.println(p);
}I have two machines, one which represents the old system (7.3) and one representing the new (8.0), which I will call machine73 and machine80 respectively.
[machine73:~]# java WhichEncoding
ISO-8859-15
[machine80:~]# java WhichEncoding
UTF-8I have also verified that the JVM is using the correct default character set by executing the following:
import java.io.ByteArrayOutputStream;
import java.io.OutputStreamWriter;
public class WhichCharset {
    public static void main (String[] args) {
        String foo = (String)(new OutputStreamWriter(new ByteArrayOutputStream())).getEncoding();
        System.out.println(foo);
}which yields:
[machine73:~]# java WhichCharset
ISO-8859-15
[machine80:~]# java WhichCharset
UTF8Here comes the problem. I have the following piece of code:
import java.text.NumberFormat;
import java.util.Locale;
public class TestPoundSterling
    public static void main (String[] args)
        NumberFormat nf = NumberFormat.getCurrencyInstance(new Locale("en", "GB"));
        System.out.println(nf.format(1.23));
}When I compile and execute this, I see mixed results. On machine73, I see what I would expect to see, the British Pound Sterling followed by 1.23. To be sure, I outputted the results to a file which I viewed in a hex editor, and observed [A3 31 2E 32 33 0A], which seems to be correct.
However, when I execute it on machine80, I see a capital A with a circumflex (carat) preceding the British Pound Sterling and the 1.23. The hex editor shows [C2 A3 31 2E 32 33 0A].
I looked up these hexadecimal values:
Extended ASCII
0xC2 = "T symbol"
0xA3 = lowercase "u" with grave
ISO-8859-1
0xC2 = Capital "A" with circumflex (carat)
0xA3 = British Pound Sterling
Unicode Latin-1
0x00C2 = Capital "A" with circumflex (carat)
0x00A3 = British Pound Sterling
(This explains why, when I remove /bin/unicode_start and reboot, I see a "T symbol" and "u" with a grave in place of what I saw before ... probably an irrelevant sidenote).
I found a possible answer on http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 under the Examples section. Apparently, a conversion between Unicode and UTF-8 acts differently based on the original Unicode value. Since the Pound Sterling falls between U-00000080 � U-000007FF (using the chart on the mentioned site), the conversion would be (as far as I can tell):
U-000000A3 = 11000010 10101001 = 0xC2 0xA3
This appears to be where the extra 0xC2 pops up.
Finally, to the whole point of this: How can I fix this so that things work as they should on machine80 like they did on machine73. All I want to see at the command line is the Pound Sterling. Getting the 0xC2 preceding the Pound Sterling causes some parts of my applications to fail.
Here's some additional information that might be of use:
[machine73:~]# cat /etc/sysconfig/i18n
LANG="en_US.iso885915"
SUPPORTED="en_US.iso885915:en_US:en"
SYSFONT="lat0-sun16"
SYSFONTACM="iso15"
[machine73:~]# echo $LANG
en_US.iso885915
[machine80:~]# cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
SUPPORTED="en_US.UTF-8:en_US:en"
SYSFONT="latarcyrheb-sun16"
[machine80:~]# echo $LANG
en_US.UTF-8Any help is very, very much appreciated. Thanks.

you didn't look hard enough, this is a faq...
there three options:
1) change the system encoding by setting LANG or LC_CTYPE environment variables... assuming you use bash:bash$ export LC_CTYPE=en_GB.iso88591 you can check the available locales with locale -a ... pipe it to grep en_GB to filter out the non-british english locales
-OR-
2) change the java default encoding from the command line with -Dfile.encoding... run with$ java -Dfile.encoding=ISO-8859-1 yourclass-OR-
3) set the encoding from within the program with OutputStreamWriter, or use a PrintStream that has the encoding set..PrintStream out = new PrintStream(new FileOutputStream(FileDescriptor.out), true, "ISO-8859-1");
System.setOut(out);see also the internationalization tutorial & the javadoc of the related classes....

[svn:fx-trunk] 7661: Change from charset=iso-8859-1" to charset=utf-8" and save file with utf-8 encoding.

Revision: 7661
Author:   [email protected]
Date:     2009-06-08 17:50:12 -0700 (Mon, 08 Jun 2009)
Log Message:
Change from charset=iso-8859-1" to charset=utf-8" and save file with utf-8 encoding.
QA Notes:
Doc Notes:
Bugs: SDK-21636
Reviewers: Corey
Ticket Links:
    http://bugs.adobe.com/jira/browse/iso-8859
    http://bugs.adobe.com/jira/browse/utf-8
    http://bugs.adobe.com/jira/browse/utf-8
    http://bugs.adobe.com/jira/browse/SDK-21636
Modified Paths:
    flex/sdk/trunk/templates/swfobject/index.template.html

same problem here with wl8.1
have you sold it and if yes, how?
thanks

How to store UTF-8 characters in an iso-8859-1 encoded oracle database?

How can we store UTF-8 characters in an iso-8859-1 encoded oracle database? We can NOT change the database encoding but need to store e.g. Polish or Russian characters besides other European languages.
Is there any stable sollution with good performance?
We use Oracle 8.1.6 with iso-8859-1 encoding, Bea WebLogic 7.0, JDK 1.3.1 and the following thin driver: "Oracle JDBC Driver version - 9.0.2.0.0".

There are a couple of unsupported options, but I wouldn't consider using them on a production database running other critical applications. I would also strongly discourage their use unless you understand in detail how Oracle National Language Support (NLS) works, otherwise you could end up with corrupt data or worse.
In a sense, you've been asked to do the impossible. The existing databas echaracter sets do not support encoding the data you've been asked to store.
Can you create a new database with an appropriate database character set and deploy your application there? That's probably the easiest solution.
If that isn't an option, and you really need to store data in this database, you could use one of the binary data types (RAW and BLOB), but that would mean that it would be exceptionally difficult for applications other than yours to extract the data. You would have to ensure that the data was always encoded in the same character set, otherwise you wouldn't be able to properly decode it later. This would also add a lot of complexity to your application, since you couldn't send or recieve string data from the database.
Unfortunately, I suspect you will have to choose from a list of bad options.
Justin
Distributed Database Consulting, Inc.
http://www.ddbcinc.com/askDDBC

Conversation String to UTF-8 and visa versa

I have a problem: From an IBM Host via CICS i get
the german letters, for example : "��" = x'81' x'94' x'84'. In a C program (with JNI) i use the method NewStringUTF to convert this characters to an JavaString. The result seems to be correct. I can see the exact german characters in Swing- und AWT components..
Then, versa to convert this String back to the original HostCodes with the method GetStringUTFChars in the same C programm, i get 2 unknown, confused Bytes for the 1 correct Byte i expected. This effect takes place only at the special german characters �� !!!!
Who can help?

Having been through these kinds of problems a few times, I MAY be able to point you in the right direction.
1. You need to be VERY sure what you are seeing at each stage of the conversion. DON'T TRUST ANY DISPLAYS EXCEPT HEX DISPLAYS.
2. If you are operating on a Windows machine, you might investigate OEMTOChar and CharToOEM. I mention this because I suspect that your original encoding is not UTF-8, and so NewStringUTF is doing something strange.

More questions re: UTF-8 and Latin encoding

Hi
I've really tried to do my homework before posting here, so I hope I'm not double-posting...
I've read all the discussions I can find, followed instructions on http://homepage.mac.com/thgewecke/iwebchars.html and spoken twice to my ISP who provides the hosting service.
My ISP is, of course, forcing the browser to interpret the HTML as Latin, and if I change it manually to UTF-8, my page looks fine.
My ISP is, of course, blaming Apple, saying that Macs use a slightly different UTF-8 Character set. I suspect that's bullocks.
My ISP is also blaming iWeb, saying that it is designed to only work with .mac. I've said that's not the case, that it can publish seperately to a folder for FTP upload.
My ISP insists that correctly coded UTF-8 pages will work fine, and that their server is not forcing the browser to interpret it any way or the other.
Is there any way I can prove that the HTML is correct, and that the server IS forcing browsers to interpret the character set as Latin?
If I can prove that, then I believe I have a valid case for them to change it, and accept the problem is theirs.
I would greatly appreciate suggestions from those who understand these matters better than me. Including, if you think it appropriate, what my next approach to the ISP should include...?
Many thanks to anyone who takes the time to help.
Kind regards
Steve
PS I should also mention that my ISP does not allow .htaccess files to be placed, administered, or used by users.
Can't get another host. Can't afford .mac.

All the statements made by your ISP are totally wrong, of course. If you want to prove that their server forces Latin-1, you can put your url into a site like this one:
http://web-sniffer.net/
It shows that the HTTP response header sent from their server tells all browsers that "Content-Type: = text/html; charset=iso-8859-1"
From your description it doesn't sound like they are capable of understanding this problem or fixing it for you. If that is true, and if you really have no other choice, I would suggest that before uploading you open your pages with TextEdit (set to Plain text, ignore rich text commands in html, and UTF-8 encoding) and just do Save As after setting the encoding to Western ISO Latin-1.

Accented characters and UTF-8

Hi all,
I have a problem with accented characters. I read that Plumtree 5.0 is completely Unicode enabled and all HTTP responses from remote web services are converted to Unicode (UTF-16). So the portal sends back to the client browser all pages in UTF-8.
We have a lots of portlet (ASP and JSP) that write data in external DB, for example SQLServer. When I fill an html form with accented characters that have to be saved in our DB, they are saved in UTF-8 because the gateway converts the HTTP response. We want that the data are saved as if we don't use the portal (without conversion). I tried to change the Charset with the ASP code (Response.Charset). This solves only the problem of displaying the right characters in the browser.
Could you explain me better how the portal make the conversion and how can I solve my problem?
Thank you very much,
Alberto Marchiaro

It might be helpful to clarify a few things first: 1. Both Java and VB Script will store strings in UTF-16/Unicode. If you have some code in your ASP file that looks like this: Dim strDatastrData = Request.Form("SomeName") then if you were to examine memory for the variable strData, you would see 16 bit characters. The same is true for Java. 2. String data is almost never sent over HTTP as UTF-16/Unicode. 3. Both Java and VBScript perform an implicit character set transcoding when reading string data out of a request or when writing string data out to a response. 4. ASP will perform the transcoding according to the value of the Session.CodePage value. If you have Session.CodePage to 65001, then ASP will expect the string data to be in UTF-8 and it will transcode UTF-8 in the request into UTF-16 in VB Script. Similarly, a Session.CodePage value of 65001 will cause "Response.Write" to convert UTF-16/Unicode into UTF-8. 5. All of the above is separate from how Java or VB Script interact with the database. Generally speaking an ASP module will use ODBC to communicate with the database. The ODBC layer knows that VB Script keeps strings in UTF-16/Unicode. The ODBC layer will perform the proper conversion into the database character set. Plumtree always recommends using UTF-16/Unicode in the database. You can do this relatively easily by declaring your database columns using the "N" datatypes such as NCHAR and NVARCHAR. However even if you using some other character set, the ODBC layer should always properly transcode from VBScript. The importantly thing to remember is that data that is sent over HTTP is never written directly to the database without going through some ASP or JSP code. Since the ASP and JSP code always uses UTF-16/Unicode, there should never be any issue with how the data is sent over HTTP. Here is an explanation for how Session.CodePage, Response.CharSet and Session.LCID work in ASP:****************************************************
1. Response.CharSet
2. Session.CodePage
3. Session.LCID
Here is an explanation of these properties and why they are important to non-English ASP gadget writers:
1. Response.CharSet
This property will cause the HTTP contentType header to be set with the specified character set. The HTTP header is the best way to tell the recipient what the character set is. The Plumtree HTTPGadgetProvider will read the ContentType header and then know how to properly trancode the portlet text into UTF-16/Unicode. Here
is an example of how to set this property:
Response.CharSet = "UTF-8"
2. Session.CodePage
This property tells the ASP engine which character set to send text in. Please remember that all text is encoded in Unicode on the Web Server. It only gets turned into the client character set when it is send down to the client. The Session.CodePage tells the engine which codepage to transcode into when sending down to the client. Please note that this property is an "integer" property not a string. So you have to know the number of the codepage that you would like to transcode into. Here is an example of how to use this property:
Session.CodePage = 65001
3. Session.LCID
This property tells the ASP engine which locale is being used. The locale is used by various VBScript functions such as FormatDateTime in order to format the date correctly for the locale. If the locale is a French locale, then the date will be formatted according to French rules. The locale does not really effect the character set, but if the portlet writer is going to the trouble of setting the other properties, then they should also set the LCID too. Here is an example of how to set this property:
Session.LCID = 1041
Please note that the examples that I am using are the appropriate examples for Japanese and UTF-8. The values for these properties are different for different character sets. For example, for ISO-2022-JP, the values would be:
Request.CharSet = "iso-2022-jp"
Session.CodePage = 50220
Session.LCID = 1041
A very helpful URL to figure out the values to use with Request.CharSet and Session.CodePage is the following:
http://msdn.microsoft.com/library/default.asp?url=/workshop/Author/dhtml/reference/charsets/charset4.asp

Trouble with UTF-8 and PHP-OCI

Hi there!
I'm having some serious trouble with UTF-8. I just tried to insert a lambda into an NCLOB column in one of my databases and it was converted to an inverted question mark. I have verified that the string reaches my PHP script correctly encoded. Also selecting RAWTOHEX(column) in SQL Developer shows, that the inverted question mark is already stored in the column. So the problem must be somewhere between PHP-OCI and the database. Inserting a lambda via SQL developer works. I can also correctly fetch it via PHP.
I'm using the latest PHP-OCI (2.0.8). v$version says "Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production".
The database is fairly old and uses WE8MSWIN1252 as the character set and AL16UTF16 as the NCHAR character set. Hence I'm using NCLOB instead of CLOB. However, I connect to the database with AL32UTF8, since my app is running in UTF-8. I was under the impression that this would result in an automatic conversion from UTF-8 to UTF-16 when inserting into NCLOB columns, but apperently this is not the case. It looks like there is some sort of double conversion, first UTF-8 to WE8MSWIN1252 and then to UTF-16, because some non-ASCII characters like ä (a umlaut) get correctly converted from UTF-8 to UTF-16.
Any ideas? I'm at a loss here. Thanks in advance.

PHP OCI8 doesn't support NCLOB or NVARCHAR2.
See NCLOB support in OCI8

What is the story with Java and UTF-8

Im getting bits of info about java and utf-8, things like
"unless you wrap the FileInputStream, you'll get exceptions with UTF-8 encoding"
So can someone tell where there is a good resource for detailing problems / advice when dealing with utf-8 encoded data???
or
Give me the low down and what the dodge actually is?
Sorry for my poor English - im a scally
Many thanks

1) you've chosen SAX probably because it sticks well to your business requirements (i hope ;-), so keep it!
2) i am using both James Clark's XP and Xerces as my SAX parsers (XP for big volumes because it offers better performances): i am processing only UTF-8 encoded files with no problems,
3) sure, if you hit some troubles post again and a lot of pair of eyes will look into it!

XML loader and UTF-16 - throws Content is not allowed in prolog

Hi,
Our ECC system is updated to unicode system recently. Now the iDoc downloaded from ECC is having a tag <?xml version="1.0" encoding="utf-16"?>.
The XML Loader in the transaction throws an exception "Cannot perform action on XML document Content is not allowed in prolog. Exception: [Content is not allowed in prolog.]". If I change the encoding manually as "utf-8" and executing the transaction, it is working fine.
Please let me know how to solve the issue.
Thanks,
Raman N

Where should I enhance the webservice to make it able to handle zipped XML documents? Shouldn't take the AXIS library take care of this automatically?
This is the web.xml document I use.
<?xml version="1.0" encoding="UTF-8"?>
<web-app id="WebApp_ID" version="2.4" xmlns="http://java.sun.com/xml/ns/j2ee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">
<display-name>
NDW2</display-name>
<servlet>
<display-name>
Apache-Axis Servlet</display-name>
<servlet-name>AxisServlet</servlet-name>
<servlet-class>
org.apache.axis.transport.http.AxisServlet</servlet-class>
</servlet>
<servlet>
<display-name>
Axis Admin Servlet</display-name>
<servlet-name>AdminServlet</servlet-name>
<servlet-class>
org.apache.axis.transport.http.AdminServlet</servlet-class>
<load-on-startup>100</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>AxisServlet</servlet-name>
<url-pattern>/servlet/AxisServlet</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>AxisServlet</servlet-name>
<url-pattern>*.jws</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>AxisServlet</servlet-name>
<url-pattern>/services/*</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>AdminServlet</servlet-name>
<url-pattern>/servlet/AdminServlet</url-pattern>
</servlet-mapping>
<welcome-file-list>
<welcome-file>index.html</welcome-file>
<welcome-file>index.htm</welcome-file>
<welcome-file>index.jsp</welcome-file>
<welcome-file>default.html</welcome-file>
<welcome-file>default.htm</welcome-file>
<welcome-file>default.jsp</welcome-file>
</welcome-file-list>
</web-app>

Unicode, UTF-8 and java servlet woes

Hi,
I'm writing a content management system for a website about russian music.
One problem I'm having is trying to get a java servlet to talk Unicode to the Content mangament client.
The client makes a request for a band, the server then sends the XML to the client.
The XML reading works fine and the client displays the unicode fine from an XML file read locally (so the XMLReader class works fine).
The servlet unmarshals the request perfectly (its just a filename).
I then find the correct class, and pass it through the XML writer. that returns the XML as string, that I simply put into the output stream.
out.write(XMLWrite(selectedBand));I have set correct header property
response.setContentType("text/xml; charset=UTF-8");And to read it I
         //Make our URL
         URL url = new URL(pageURL);
         HttpURLConnection conn = (HttpURLConnection)url.openConnection();
         conn.setRequestMethod("POST");
         conn.setDoOutput(true); // want to send
         conn.setRequestProperty( "Content-type", "application/x-www-form-urlencoded" );
         conn.setRequestProperty( "Content-length", Integer.toString(request.length()));
         conn.setRequestProperty("Content-Language", "en-US");
         //Add our paramaters
         OutputStream ost = conn.getOutputStream();
         PrintWriter pw = new PrintWriter(ost);
         pw.print("myRequest=" + URLEncoder.encode(request, "UTF-8")); // here we "send" our body!
         pw.flush();
         pw.close();
         //Get the input stream
         InputStream ois = conn.getInputStream();
            InputStreamReader read = new InputStreamReader(ois);
         //Read
         int i;
         String s="";
         Log.Debug("XMLServerConnection", "Responce follows:");
         while((i = read.read()) != -1 ){
          System.out.print((char)i);
          s += (char)i;
         return s;now when I print
read.getEncoding()It claims:
ISO8859_1Somethings wrong there, so if I force it to accept UTF-8:
InputStreamReader read = new InputStreamReader(ois,"UTF-8");It now claims its
UTF8However all of the data has lost its unicode, any unicode character is replaced with a question mark character! This happens even when I don't force the input stream to be UTF-8
More so if I view the page in my browser, it does the same thing.
I've had a look around and I can't see a solution to this. Have I set something up wrong?
I've set, "-encoding utf8" as a compiler flag, but I don't think this would affect it.

I don't know what your problem is but I do have a couple of comments -
1) In conn.setRequestProperty( "Content-length", Integer.toString(request.length())); the length of your content is not request.length(). It is the length of th URL encoded data.
2) Why do you need to send URL encoded data? Why not just send the bytes.
3) If you send bytes then you can write straight to the OutputStream and you won't need to convert to characters to write to PrintWriter.
4) Since you are reading from the connection you need to setDoInput() to true.
5) You need to get the character encoding from the response so that you can specify the encoding in           InputStreamReader read = new InputStreamReader(ois, characterEncoding);
6) Reading a single char at a time from an InputStream is very inefficient.

Utf-16 to utf-8 representation - "m��" out of "m�"

Hello,
I'd like to obtain a String containing "m��" (the UTF-8 "representation") out of a String containing "m�" (UTF-16 encoded). Sorry if this is obious but I'm new to those encoding issues.
Any hint on how to process?
Thanks a lot,
Sri

Thanks for your answer and for correcting me,
This looks very very wrong! Yes, you're right. I'm really lost with those encoding issues. Here is the code i use:
               try {
               FileInputStream fis = new FileInputStream(file);
               BufferedReader buffReader = new BufferedReader(new InputStreamReader(fis));
               String line = null;
               while ((line = buffReader.readLine()) != null) {
                    byte[] utf16Bytes = new String(line.getBytes(),"ISO-8859-1").getBytes("utf-8");
                    String line3 = new String(utf16Bytes);
               buffReader.close();
               fis.close();
          } catch (IOException ex) {
               ex.printStackTrace();
The bytes generated by
utf16String.getBytes() use the default encoding which
if it is ISO-8859-1 then
new String(utf16String.getBytes(),"ISO-8859-1") ;
actually does nothing!I'm reading a file from the file system. This file has been generated on the same machine using the local encoding. As I load it using a FileInputStream without any encoding provided, I was expecting it to be 'transcoded' from the default encoding to Java String (utf-16).
As i tried to understand by reading on http://czyborra.com/utf/, � in ISO-8859-1 corresponds to �� in UTF-8.
I'm trying to get the String "m��" from the String     "m� ".
The reason behind this is to try to cope with a database containing records with fields having been automatically transcoded from UTF-8 to ISO-8859-1 at insert time.
I get the expected results but I'd like to understand. Please, try a last time to correct me when i'm wrong.
Thanks,
Sri
Message was edited by:
sriRanayama

Convert UTF-16 to UTF-8

Hi
My source file is UTF-16 and Target file is UTF-8. I am using XSLT mapping . If i m testing in Altova XML its working fine. But when i am testing the same thing using my scenario its not wroking.
I have tested this using Test option in ID. If i change the UTF-16 to UTF-8 while testing in ID but if i m trying to change it directly in XML file its not accepting.
How to change UTF-16 to UTF-8 while XSLT mapping. How to reslove this problem
Regards
Sowmya

Which Adapter you are using?
If you are using the file adapter then you can use the File adapter property as file.encoding=<codepage>
you can refer to below link
http://help.sap.com/saphelp_nw04/helpdata/en/0d/00453c91f37151e10000000a11402f/frameset.htm
Gaurav Jain

Conversion ISO-8859-7- UTF-8 and UTF-8 - ISO-8859-7

Similar Messages

Maybe you are looking for