Chinese character encoding

I would like to migrate an access Db to Oracle 9i using Oracle Migration Workbench.
I fouind that the correct java.properties file (to be used in omwb/jre/lib directory) to be used for this should be downloaded from www.sun.com but I am unable to get this java.properties file from anywhere on the web.
Could some one point the correct source for me to get this information?
Thanks in advance,
Jayanthi.

Hi,
I am sorry for the confusion. What I was looking for is the font.properties file which is actually available in the jre folder in omwb.
I migrated data from Access to Oracle using OMWB. The characterset in Oracle has been set to UTF8. The omwb.bat file has been modified to include the encoding property as UTF8 and the font.properties.zh file has been renamed to font.properties in the jre\lib folder in OMWB.
But after doing all these configurations and migrating the data, if I select the data from the Oracle table, I am able to see only ? in place of the chinese characters.
I would like to see the chinese characters in the Oracle tables so that I can conclude that my migrataion process is successful.
How do I this. Please help me.
Thanks,
Jayanthi.

Similar Messages

NetBeans problem: Issue with servlets and Chinese character encoding

Java Version: JDK1.5.0_01, JRE1.5.0_01 (International version)
Netbeans Version: Netbeans IDE 4.0
OS: Windows XP Personal Edition
Dear Sirs,
First at all thanks for reading this post. I am having the following issue. I am creating an application using html pages and servlets. I am using Chinese and English languages on them (html encoding UTF-8).
I created a project in Netbeans and added an idex.html screen reporting to a servlet. Both index.html and in the servlet generated html page contains the line:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Additional, I setup the character code settings in Netbeans:
(tools-options-Java sources-Expert-default encoding=UTF-8
When I run the project, index.html displays itself perfectly, with the Chinese characters displayed properly. The problem comes when the html created servlet is displayed, which instead of the Chinese characters some strange characters are displayed (�� instead of Chinese).
I have tried different encodings from http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html without any luck. I also setup the encoding of the file itself (using right click-properties in the project menu of Netbeans).
Also, when I am editing the servlet, the characters are displayed properly. I type them directly without any issue, but then the display is wrong at runtime.
Also, just in case this have something to do with the problem, my PC was bought in US, therefore the default character set is not Chinese. I had to install the Chinese typing stuff later on. But like I said earlier, the html page is displayed properly, so I really think is some problem with Netbeans.
After a week trying to find a solution, I decided to post it here in the hopes that someone will show me the way of the light.
Thanks in advance for any ideas or help provided
Aral.

Ok, I found out some problems with Netbeans as well.
    public void doGet(HttpServletRequest request,
                      HttpServletResponse response)
        throws IOException, ServletException
        response.setCharacterEncoding("UTF-8");
        request.setCharacterEncoding("UTF-8");
        response.setContentType("text/html");
        PrintWriter out = response.getWriter();
        byte[] st = {-25,-75,-124,-27,-100,-106,-17,-68,-102,-27,-80,-113,-27,-72,-125,-26,-118,-75,-26,-105,-91,-27,-82,-93};
        out.println("this works: ");
        out.println(new String(st,"UTF-8"));
        out.println("<br>");
        out.println("this doesn't: ");
        out.println("some chinese copied from the Internet<br>");Right click the .java file and choose properties -> encoding UTF-8
Then I make a copy of the .java file, rename it to html and open it with IE sure enough
the Chinise is allready unreadable (not it's still readable in the IDE);
When I compile the file with F9 I get the following error:
whatever.java:101: warning: unmappable character for encoding Cp1252
Tried to set the encoding to UNICODE but then the file doesn't compile.
I gues you have to download the Japanese version for it to work correctly.

What every developer should know about character encoding

This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
If you write code that touches a text file, you probably need this.
Lets start off with two key items
1.Unicode does not solve this issue for us (yet).
2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts.
The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
Here's a key point about these text files – every program is still using an encoding. It may not be setting it in code, but by definition an encoding is being used.
Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
Wrapping it up
I think there are two key items to keep in mind here. First, make sure you are taking the encoding in to account on text files. Second, this is actually all very easy and straightforward. People rarely screw up how to use an encoding, it's when they ignore the issue that they get in to trouble.
Edited by: Darryl Burke -- link removed

DavidThi808 wrote:
This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
If you write code that touches a text file, you probably need this.
Lets start off with two key items
1.Unicode does not solve this issue for us (yet).
2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts. Pretty sure most Americans do not use character sets that only have a range of 0-127. I don't think I have every used a desktop OS that did. I might have used some big iron boxes before that but at that time I wasn't even aware that character sets existed.
They might only use that range but that is a different issue, especially since that range is exactly the same as the UTF8 character set anyways.
>
The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
The above is only true for small volume sets. If I am targeting a processing rate of 2000 txns/sec with a requirement to hold data active for seven years then a column with a size of 8 bytes is significantly different than one with 16 bytes.
Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
The above is out of place. It would be best to address this as part of Point 1.
Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
The first part of that paragraph is odd. The first 128 characters of unicode, all unicode, is based on ASCII. The representational format of UTF8 is required to implement unicode, thus it must represent those characters. It uses the idiom supported by variable width encodings to do that.
But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
Not sure what you are saying here. If a file is supposed to be in one encoding and you insert invalid characters into it then it invalid. End of story. It has nothing to do with html/xml.
Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
The browser still needs to support the encoding.
Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
I know java files have a default encoding - the specification defines it. And I am certain C# does as well.
Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
It is important to define it. Whether you set it is another matter.
Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
Unicode character escapes are replaced prior to actual code compilation. Thus it is possible to create strings in java with escaped unicode characters which will fail to compile.
Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
No. A developer should understand the problem domain represented by the requirements and the business and create solutions that appropriate to that. Thus there is absolutely no point for someone that is creating an inventory system for a stand alone store to craft a solution that supports multiple languages.
And another example is with high volume systems moving/storing bytes is relevant. As such one must carefully consider each text element as to whether it is customer consumable or internally consumable. Saving bytes in such cases will impact the total load of the system. In such systems incremental savings impact operating costs and marketing advantage with speed.

Need help in Displaying Chinese Character in JSF

Hi. Good Morning!
I'm having problem with displaying Chinese Characters in JSF. I've tried placing all the chinese words in the *.properties file however it doesn't work..
The result in the browser is something like this.
��

Hi,
When you say that you put Chinese characters in
*.properties files, can you be more specific? Did
you use native2ascii to convert the files to ASCII?
What is the character encoding setting for the JVM?
Have you sniffed the HTTP headers? If so, what's the
character encoding listed for the response?
Answers to these questions should help you nail down
the problem.
Regards,
PeterHi...
I did not use any converter like native to ascii... an example of what I placed in the properties file is:
title : 以下5种标签的表面要求列样
Request=条码离边框
Tablehead=透明标签
Note1= 透明标签
Note2= 透明标签
NoteA= 印商标图
then when i build the project in linux it turns out to be like this:
title= ???
Request=???
Tablehead=???
Note1= ???
Note2= ???
NoteA= ???
The character encoding for JVm.. i haven't check that yet nor know how...

GUI Download Chinese Character to Excel gibberish character

Hi Experts,
I'm facing a problem where I'm using FM gui_download to save Chinese Character into Excel file.
Upon double click to open the excel file, funny character shows up.
But if I were to open the same file using a blank Ms Excel application (Go to File->Open->choose file),
Excel will prompt me to select a proper encoding (GB2312 in this case), and the Chinese character can be seen thereafter.
But, my user doesn't want to go through this.
I've browsed through the forum and someone has posted about this before and it's unanswered.
How to download the chinese character using GUI_DOWNLOAD - unanswered
And, Re: GUI_DOWNLOAD give 2 bytes for each chinese character - I need fixed len which is not related to my problem.
Below is my code:
DATA: lv_codepage   TYPE cpcodepage,
        lv_char_cpage TYPE abap_encod,
        lv_encoding   TYPE abap_encod.
Get Code Page for Chinese Character Spras = '1' or 'ZH'
CALL FUNCTION 'NLS_GET_FRONTEND_CP'
    EXPORTING
      langu                 = '1'   " Chinese Simplified Table T002
      fetype                = 'MS' " Manufacturer is Microsoft Table TCP05
    IMPORTING
      frontend_codepage     = lv_codepage
    EXCEPTIONS
      illegal_syst_codepage = 1
      no_frontend_cp_found = 2
      internal_or_db_error = 3
      OTHERS                = 4.
Conversion c(4) = n(10)
lv_char_cpage = lv_codepage.
CALL FUNCTION 'GUI_DOWNLOAD'
    EXPORTING
      filename                = p_file
      filetype                = 'DAT'                       "tried ASC and not working as well
      codepage                = lv_char_cpage   "8404 in this case tried 8400 and same result
      replacement             = '#'
      write_field_separator   = 'X'
    TABLES
      data_tab                = i_data_cnvr        "table content
      fieldnames              = i_data_head      "table header
Please help. Does this has something to do with utf-8 encoding?
Thank you.
Thanks,
ZY See

Hi Nitesh,
Is there a way to check the Excel codepage? Do you mean by codepage = 936 for GB2312 encoding?
Anyway, this issue is fixed. Issue is related to unicode system.
Below codes for gui_download solved the problem.
CALL FUNCTION 'GUI_DOWNLOAD'
    EXPORTING
      filename                = p_file
      filetype                = 'DAT'
      codepage                = '4103'
      replacement             = '#'
      write_field_separator   = 'X'
      write_bom               = 'X'
    TABLES
      data_tab                = i_data_cnvr
      fieldnames              = i_data_head
Codepage = 4103 for utf-16 Unicode system.
Write-bom = 'X' to write Byte-Order-Mark.
Thanks,
ZY See

How to put Chinese character in jar file by java.util.jar.Manifest?

Now I want to develop a simple package tool which can modify some property in manifest.mf of jar files,but the Manifest class's putValue method only can correctly save English character.why?
And how can I put Chinese character?
the code is:
Attributes ab = mf.getMainAttributes();
ab.putValue("agent-Name", agent);

Attribute values can contain any character and it will be UTF-8 encoded when written to the manifest, according to Javadoc.
What makes you think that this mechanism fails? What do you see instead of the Chinese character? And what tool/editor/program you use to see it? I did not try myself, but according to the Javadoc there should be no problem.

Sender proxy to sap pi -chinese character

Hi experts,
I am having a scenario sender proxy to sap pi. Via the proxy , some chinese character are send into sap pi and during my sxmb_moni the receiver ( inbound message payload) is in BOXES. I tried to debug the proxy and it is sending chinese characters to sap pi but only problem when it arrives to sap PI it becomes into Boxes.
I have done researc and they have mention to use RFC Unicode. I have configured but it is the same problem into boxes.
Please help me. PLZ..

Hi Mark,
Thanks a lot for your reply. Basically it is Abap proxy. Do you think it is possible to use UTF-8 or UTF16 in your XML encoding for abap proxy. I am not so good in abap proxy and it is generated by abaper. Pleae advise me if i am wrong..

What's the difference of character encoding between 1.4.0and1.4.2 in Linux

As i find, the character encoding about chinese in jdk1.4.2 no langer the same of jdk1.4.0.
In jdk1.4.0, the character encoding used the "file.encoding" system property, we often set the
property with "gb2312".
But in jdk1.4.2, i find that the default character encoding no longer used the "file.encoding" system property.
Who knows the reason?
Test Program:
public class B{
public static void main(String args[]) throws Exception{
byte [] bytes = new byte[]{(byte)0xD6,(byte)0xD0,(byte)0xCE,(byte)0xC4};
String s1 = new String(bytes);
String s2 = new String(bytes,System.getProperty("file.encoding"));
System.out.println("s1="+s1+" , s2="+s2);
System.out.println("s1.length=" + s1.length() + " , s2.length="+s2.length());
run four times and the result list:
[root@app15 component]# /usr/local/j2sdk1.4.0/bin/java -Dfile.encoding=ISO-8859-1 -cp . B
s1=中文 , s2=中文
s1.length=4 , s2.length=4
[root@app15 component]# /usr/local/j2sdk1.4.0/bin/java -Dfile.encoding=gb2312 -cp . B
s1=中文 , s2=中文
s1.length=2 , s2.length=2
[root@app15 component]# /usr/local/j2sdk1.4.2/bin/java -Dfile.encoding=ISO-8859-1 -cp . B
s1=中文 , s2=中文
s1.length=4 , s2.length=4
[root@app15 component]# /usr/local/j2sdk1.4.2/bin/java -Dfile.encoding=gb2312 -cp . B
s1=中文 , s2=??
s1.length=4 , s2.length=2
[root@app15 component]#

I don't know for sure, but:
-- The API documentation for String says that "new String(byte[])" uses "the platform's default charset".
-- The API documentation for Charset says "The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system."
You'll notice that it doesn't say anything about using the file.encoding system value, so presumably (based on your experiments) it doesn't. I did a search for "java default charset" and didn't find anything specific, but this site says "As of Java 1.4.1, the default Charset varies from platform to platform" and suggests you explicitly hard-code your charset. I would agree with that.

Problem in parsing date having Chinese character when dateformat is 'MMM'

I m calling jsp page using following code:
var ratewin = window.showModalDialog("Details.jsp?startDate="+startDate,window, dlgSettings );
In my javascript when checked by adding alerts I m getting correct values before passing to jsp,
alert("startDate:"+startDate);
In jsp page my code is like below:
String startDate = request.getParameter("startDate");
but here I m getting garbage values in month when the dateformat is 'MMM', because of which date parsing is failing.
This happens only Chinese character.
following 2 encoding are already in my jsp page,can anyone help to find solution?
     <%@ page pageEncoding="UTF-8" contentType="text/html;charset=UTF-8"%>
     <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8"/>
I have even tried to read it as UTF-8 but still that's failing.

This is my actual code
import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;
public class TestingDate {
      * @param args
     public static void main(String[] args) {
          // TODO Auto-generated method stub
          String dateFormat="EEEE, MMM d h:mm a";
          Date test=new Date(2007,0,19, 19, 31);
          System.out.println(" original date is "+test);
          String stringResult=DateToString(test,dateFormat);
          System.out.println("Date to string is "+stringResult);
          Date dateResult=stringToDate(stringResult,dateFormat);
          System.out.println(" String to date is "+dateResult);
          String stringResult2=DateToString(dateResult,dateFormat);
          System.out.println(" Date to string is "+stringResult2);
public static String DateToString(Date test, String dateFormat) {
         String result = null;
         try {
              DateFormat myDateFormat = new SimpleDateFormat(dateFormat);
                 result = myDateFormat.format(test);
                 //System.out.println(" reslut date is "+result);
          } catch (Exception e) {
               System.out.println(" Exception is "+e);
          return result;
public static Date stringToDate(String strDate,String dateFormat1){
     Date result1=null;
     try {
          DateFormat myDateFormat = new SimpleDateFormat(dateFormat1);
          result1=myDateFormat.parse(strDate);
     catch(Exception e){
          System.out.println(" exception is "+e);
     return result1;
}I am facing problem in getting the actual date. Please suggest the solution.

Character Encoding for IDOC to JMS scenario with foreign characters

Dear Experts,
The scenario is desribed as follows:
Issue Description:
There is an IDOC which is created after extracting data from different countries (but only one country at a time). So, for instance first time the data is picked in Greek and Latin and corresponding IDOC is created and sent to PI, the next time plain English and sent to PI and next Chinese and so on. As of now every time this IDOC reaches PI ,it comes with UTF-8 character encoding as seen in the IDOC XML.
I am converting this IDOC XML into single string flat file (currently taking the default encoding UTF-8) and sending it to receiver JMS Queue (MQ Series). Now when this data is picked up from the end recepient from the corresponding queue in MQ Series, they see ? wherever there is a Greek/latin characters (may be because that should be having a different encoding like ISO-8859_7). This is causing issues at their end.
My Understanding
SAP system should trigger the IDOC with the right code page i.e if the IDOC is sent with Greek/Latin code page should be ISO-8859_7, if this same IDOC is sent with Chinese characters the corresponding code page else UTF-8 or default code page.
Once this is sent correctly from SAP, Java Mapping should have to use the correct code page when righting the bytes to outputstream and then we would also need to set the right code page as JMS Header before putting the message in the JMS queue so that receiver can interpret it.
Queries:
1. Is my approach for the scenario correct, if not please guide me to the right approach.
2. Does SAP support different code page being picked for the same IDOC based on different data set. If so how is it achieved.
3. What is the JMS Header property to set the right code page. I think there should be some JMS Header defined by MQ Series for Character Encoding which I should be setting correctly) I find that there is a property to set the CCSID in JMS Receiver Adapter but that only refers to Non-ASCII names and doesn't refer to the payload content.
I would appreciate if anybody can give me pointers on how to resolve this issue.
Thanks,
Pratik

Hi Pratik,
http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42?quicklink=index&overridelayout=true
This link might help.
regards
Anupam

Output chinese character to CSV file in UNIX

Hi
I encountered ABAP dump whenever output chinese character to CSV file in UNIX in ECC6. Error show as
"At the conversion of a text from codepage '4102' to codepage '1100':
- a character was found that cannot be displayed in one of the two
codepages;
- or it was detected that this conversion is not supported"
The program with coding of statement OPEN DATASET xxxxx FOR OUTPUT IN TEXT MODE ENCODING NON-UNICODE. Reason to output to OPEN statement to non-unicode as users would like to open the csv file thru Excel directly. They do not wish to open the text file in Excel. Can Experts please share with me how to overcome the problem?
Thanks
Kang Ring

May be you could give a try with the following code and check
OPEN DATASET xxxxx FOR OUTPUT IN TEXT MODE ENCODING NON-UNICODE CODEPAGE '4103'.
Vikranth

Chinese Character cannot be decoded

hi,
I would like to implement two JSP pages. The first JSP is just a html form, which is used to submit unicoded chinese data to a target JSP file.
The target JSP file received those data and display.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> is added in the first JSP file. As a result, data will be submitted in UTF-8 format.
In target JSP, I used the following code to recieve and decode data:
<%@ page contentType="text/html; charset=UTF-8" %>
<%
String para = request.getParameter("para"); // where para is name of received parameter
byte[] bytes = para.getBytes();
para = new String(bytes, "UTF-8");
out.println("Recieved character: " + para);
%>
My Problem:
After I submitted chinese characters from the first JSP file, only some of them can be displayed on target JSP. Some of those characters are missing.
For example, when I input "�@", target JSP can display the character. On the other hand, when I input "�p", nothing is displayed. But I know that variable "bytes" stored 3 bytes for each chinese character. I would like ask why
para = new String(bytes, "UTF-8");
cannot encode properly. Is anything wrong about my coding?
Thx

More information can be provided.
OS: Windows 2000 server
web server: iPlanet
P.S. : I have set the Character set to UTF-8 in iPlanet.
thx.
hi,
I would like to implement two JSP pages. The
The first JSP is just a html form, which is used to
submit unicoded chinese data to a target JSP file.
The target JSP file received those data and
and display.
<meta http-equiv="Content-Type"
ype" content="text/html; charset=UTF-8"> is added in
the first JSP file. As a result, data will be
submitted in UTF-8 format.
In target JSP, I used the following code to
to recieve and decode data:
<%@ page contentType="text/html; charset=UTF-8"
F-8" %>
<%
String para = request.getParameter("para"); //
// where para is name of received parameter
byte[] bytes = para.getBytes();
para = new String(bytes, "UTF-8");
out.println("Recieved character: " + para);
%>
My Problem:
After I submitted chinese characters from the
m the first JSP file, only some of them can be
displayed on target JSP. Some of those characters are
missing.
For example, when I input "�@", target JSP can
P can display the character. On the other hand, when I
input "�p", nothing is displayed. But I know that
variable "bytes" stored 3 bytes for each chinese
character. I would like ask why
para = new String(bytes, "UTF-8");
cannot encode properly. Is anything wrong about my
coding?
Thx

Display chinese character

I have a java application with read string from IE browser through JNI interface.
The issue is the String i got, which is "\u963f\u5f1f\u4ed4" can't be display in Chinese directly.
I do following convetion, it doesn't work.
byte[] sb=link.getBytes();
String Zhongwen=new String(sb, "GB2312");
But if I define a this String in my application, it works.
String link = "\u963f\u5f1f\u4ed4";
I bet Java treats first case as a buch of seperate characters since if I build
a String as "\\u" + "963f", I can't get chinese character.
So it is looks I can only use the parseInt(..) method on the Integer class and casting the result to a char.
try
char p = (char)Integer.parseInt(x, 16);
System.out.println(p);
catch(Exception e)
System.out.println(e);
Does antbody have another solution?

The most probable source of error is character encoding conversion...most likely from the IE browser to the Java environment. Do you have a way to find out what encoding is used in the browser? IE will transfer form data back to the server in either UTF-8 or the encoding of the original form.
Transfer the data from IE correctly, and you will be able to display the characters without further problem.
John O'Conner

Display Chinese Character jsp page by using UTF-8

hi all,
I have one jsp page have Chinese character, need to allow user input Chinese character into db, right now my situation is if I set <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> , the Chinese character in my jsp page I can't display correcly, but the inser is okI have to set encoding to big5, then I can't insert chinese character into my db correctly.
can I use UTF-8 to display my jsp page, which have Chinese Character in there?Thank you!

Hi Niklas
thank you for the suggestion. somebody told me UTF-8 suppose work, only if you typing your Chiness character in UTF-8 formate, i don't know that is true or not, but that realy a challenge for me, I using jdbc , some time ago, and then stop, and on that period,if I got free time, I did try to learn some tutorials on struct, hibernate, like http://javaboutique.internet.com/tutorials/Struts/; http://courses.coreservlets.com/Course-Materials/struts.html
recently I need to deal with some online form , first I want to use struct, but I keep got forwad problem, I forget the detail error message, and found not place to ask questions, so , I pick jdbc again, so ,if you have some useful link in struct or a nice community for people to learn struct, I am look forward to
know that,
by the way I set *<meta http-equiv="Content-Type" content="text/html; charset=big5" /> in my jsp*
String dbURL="jdbc:mysql://localhost/survey?useBig5=true&characterEncoding=Big5"; in jdbc
keep UTF-8 setting in mysql, it looks solute the problem that I had, hope that can enhabce our knowledge in that
thank you for the help and wish good luck for future.

Chinese character in RecordStore

Hi all,
I am new to j2me. is it possible to key in Chinese character and store it in recordstore, and i can display the character back when i view the record?
i have go through many examples of j2me encoding. but i still don't have any idea about how to do this.
Thanks a lot...

HongHong -
There may be some tips you can use on www.77new.cn/program/i/1173293079453/001/029/14438.html -- I don't know chinese and the translation wasn't good enough for me to be sure about it.
Have you tried? If your emulator or handset allows you to key in Chinese characters in a textField, they should be available via getString()and the resulting string should get stored OK, of course being a DBCS there wll be 2 bytes per letter.
Recover the string from the RecordStore using
new String(recordStore.getRecord(n));I have a MotoROKR E6 with alternative Chinese keyboard, shall experiment and get back to you in a day or two.
Meanwhile, try for yourself and post the results.
Regards, Darryl

Chinese character encoding

Similar Messages

Maybe you are looking for