Unicode Character Sets in Java

I am trying to port code from a PowerBuilder 10.5.1 Build 6021 environment to Java and I am encountering getting the same value for the Euro character in Java and PowerBuilder.
I get a value of 20AC (8364 in decimal) in Java which is consistent with ISO-8859-15, but I get a value of 128 in PowerBuilder.
This is only a small example and perhaps not strictly a Java question, but if anyone has any suggestions, I would appreciate it.

In the ISO-8859-x encodings, each character is represented by one byte, but the numbers 128-159 are not used. The Windows extensions of those encodings, like windows-1252, map those unused numbers to useful characters like curly quotes and the Euro symbol (and create compatibility problems like this one in the process). PowerBuilder is obviously using one of these Windows encodings. However, there are several points in the development process where character encodings come into play, so we'll need more info. Does the problem occur when you compile the code, or when you run it? If it's at run time, does it happen when you're reading text from a file, a database, or some other source? And how exactly does the problem manifest?

Similar Messages

Need suggestion on Multi currency and Unicode character set use in ABAP

Hi All,
Need suggestion. In one of the requirement I saw 'multi-currency and Unicode character set experience in FICO'.
Can you please elaborate me how ABAPers are invlolved in multi currency as I think this is FICO fuctional area.
And also what is Unicode character set exp.? Please give me some document of you have any.
Thanks
Sreedevi
Moderator message - This isn't the place to prepare for interviews - thread locked
Edited by: Rob Burbank on Sep 17, 2009 4:45 PM

Use the default parser.
By default, WebLogic Server is configured to use the default parser and transformer to parse and transform XML documents. The default parser and transformer are those included in the JDK 5.0.
The built-in WebLogic Server DOM factory implementation class is com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl.
The DocumentBuilderFactory.newInstance method returns the built-in parser.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

Unicode Character sets (e.g UTF-8)

Hi,
We are using some third party software which will connect to the oracle database.
One of the requiremebnts it states is that both the databse client and server must use the Unicode character set e.g UTF-8.
How do we ensure this when installing the oracle client software.
Also, why when install orcale client software and select language as English does it put NLS_LANG as American by default.
Is there an English U.K language option - couldn't see it.
Many Thanks

user5716448 wrote:
Hi,
We are using some third party software which will connect to the oracle database.
One of the requiremebnts it states is that both the databse client and server must use the Unicode character set e.g UTF-8.
Pl post details of OS and database and client versions being installed
How do we ensure this when installing the oracle client software.
For the client, set NLS_LANG appropriately when using the client software - there is no setup required during the install - http://www.oracle.com/technetwork/database/globalization/nls-lang-099431.html
Also, why when install orcale client software and select language as English does it put NLS_LANG as American by default.
Is there an English U.K language option - couldn't see it.Try "ENGLISH"
http://docs.oracle.com/cd/E11882_01/server.112/e10729/ch3globenv.htm
>
Many ThanksHTH
Srini

How to input unicode character set from oralce form 9i

Hi,
Can anyone show me how to input unicode character set from form 9i. I have designed a form and run it but when I input unicode charater in TEXT ITEM on form (FONT_NAME of this TEXT ITEM is New Roman, AriaTime l ...), but it display incorrectly nor stored it in Database.
Thank you !

Thank Duncan R Mills !
My setting NLS_CHARACTER in Database as follow :
SQL> SELECT * FROM NLS_DATABASE_PARAMETERS;
PARAMETER VALUE
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET UTF8
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
PARAMETER VALUE
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZH:TZM
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZH:TZM
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_NCHAR_CHARACTERSET UTF8
NLS_RDBMS_VERSION 8.1.7.0.0
18 rows selected.
Even if I can'nt input unicode character on Oracle Forms, It display incorrectly though I set exactly font_name.

Unrecognised Char in GB2312 character set using java InputStreamReader??

Reading the following file chinese GB2312 html file from
http://news.xinhuanet.com/local/2007-02/13/content_5732705.htm
using the InputStreamReader with GB2312 encoding as shown below
public class readGB2312html file
//........TmpText declarations.....
public static void main( String[] args )
try
FileInputStream is = new FileInputStream( args[0] );
BufferedReader br = new BufferedReader(
new InputStreamReader( is, "GB2312" ) );
String strLine;
while ( (strLine = br.readLine()) != null )
TmpText.append(strLine);
TmpText.append("\r\n");
br.close();
bw.close();
catch ( Exception e )
e.printStackTrace();
The TmpText variable does not display the last character in the article properly （记者夏珺） it gives instead （记者夏?B）
Inside the html file the unrecognised charcter is represented by �B in the html file Why is this so
��B��
In the internet browser it is displayed and recognised as a chinese GB2312 character why not recognised by Java InputStreamReader???
Any help or explanation would be much appreciated

Yes, it is not a GB2312 character
The �B character is AC40 in hex format which is outside of the GB2312 character range, it is in GBK
Copied from wikipedia,
GBK is an extension of the GB2312 character set for simplified Chinese characters, used in the People's Republic of China.
GB stands for National Standard, while K stands for Extension. GBK not only extended the old standard GB2312 with Traditional Chinese characters, but also with Chinese characters that were simplified after the establishment of GB2312 in 1981. With the arrival of GBK, certain names with characters formerly unrepresentable, like the "rong" (�g) character in former Chinese Premier Zhu Rongji's name, are now representable.
Thanks a lot will use the GBK charset to read the file for all GB2312 file since it is a subset of it.

Unicode Character - getString - ResultSet - Java.

Hello experts,
I am trying to query some records, containing unicode character of fields. When i execute the query in Oracle 9i, i am able to see the unicode character in the query result.
When sending the same sql from java, though i am getting same count of records, but the field name containing special characters, are not being visible in the rs.getString(1) method.
Could you please help me, on getting the special character in the output.
Your direction in this regard is highly appreciated...
Regards,
girig.

Basically, i am using eclipse Ide, and seeing the
output in the console window.Try printing the unicode hex value of the characters rather than the charcters. This way you will not be worried about whether or not the font used by the display actually has glyphs for the characters.

AL32UTF8 unicode character set can i use to store japanees characters????

Hi All,
My database is running with AL32UTF8 character set,
but we are unable to insert japanees char
Please let me know iin this regard
Thanks
Mallikarjun

You need to set correct NLS_LANG
this article can help you
http://www.oracle.com/technology/tech/globalization/htdocs/nls_lang%20faq.htm

Converting Unicode to UTF-8 character set through Oracle forms(10g)

Hi,
I am working on oracle forms (10g) where i need to load files containing unicode character set (multilingual characters) to database.
but while loading the file , junk characters are getting inserted into the database tables.
while reading the file through forms , i am using utl_file.fopen_nchar,utl_file.get_line_nchar functions to read the unicode characters ...
the application server , and database server characterset are set to american utf8 characteset.
In fact , when i change the text file characterset to utf8 through an editor(notepad ++,etc) , in that case , data is getting inserted into database properly,(at least working for english characters) , but not with unicode ...
Any guidance in this regard are highly appreciated
Thank you in advance
Sanu

hi
please check out the following link.
http://www.oracle.com/technology/tech/globalization/htdocs/nls_lang%20faq.htm
sarah

Language Conversion from Unicode 8 to Character Set

Hi,
I am creating a file programmatically containing Vendor Master data (FTP interface).
The vendor name and vendor address is maintained in the local language (Taiwanese) in SAP System, these characters are in Unicode 8 character set.
The Unicode character set should be converted to BIG5 for Taiwanese, and then send this information in the file.
How can I perform this conversion and change the character set of the values I'm retrieving from table (LFA1) to character set BIG5.
Is is possible to does this conversion in SAP, does sap allows this?
/Mike

Hi Manik,
I am also having a similar requirement, as I need to convert the unicode chinese character to GB2312 encoded chinese character,. I already posted in forums but didnt get the required the solution.
Can you please provide the solution which you implemented and also confirm whether it can be used to solve the above problem.
Hoping for your good reply.
Regards,
Prakash

Character sets - UTF8 or Chinese

Hi,
I am looking into enhancing the application I have built in Oracle to save/display data in Chinese & English. I have looking into how to change the character set of a database to accept different languages i.e. different characters.
From what I understand I can create a database to use a Chinese character set (apparently English ascii characters are also a part of any Chinese character set) or I can set the database to use a unicode multi-byte character set (UTF8) - which seems to be okay for all languages.
Has anyone had any experience of a) changing an existing standard 7 byte ascii database into database which can handle Chinese and/or b) the difference/ implications between using a Chinese and unicode character sets.
I am using Oracle RDBMS 8.1.7 on SuSE Linux 7.2
Thanks in advance.
Dan

If the data is segmented so that character set 1 data is in a table and character set 2 data is in another table then you may have a chance to salvage the data with help from support. The idea would be to first export and import only your CL8MSWIN1251 data to UTF8. Be careful that your NLS_LANG is set to CL8MSWIN1251 for export so that no conversion takes place. Confirm the import is successful and remove CL8MSWIN1251 data from database. Oracle support can now help you override the character set via ALTER database to say MSWIN1252. Now selectively export/import this data, again make sure NLS_LANG is set to MSWIN1252 for export so that no conversion takes place. Confirm the import is successful and remove MSWIN1252 data from database. And then do the same steps for 1250 data.

MP3 ID3 tags in non standard character set

Hello,
I'm curious as to how I can rip a cd as mp3 and read the tags that have been encoded in a non western or unicode character set. Right now I'm using the LANG environment variable and setting it to the locale that matches the character set, ripping the songs, and then writing the tags using easytag as unicode before resetting the LANG variable. This is fine, but it gets annoying as once I change the language a lot of my programs also start using that language and I would like them to remain english.
Is there a simpler and more straight-forward way of doing this?
--Nan

nan wrote:
Hello,
I'm curious as to how I can rip a cd as mp3 and read the tags that have been encoded in a non western or unicode character set. Right now I'm using the LANG environment variable and setting it to the locale that matches the character set, ripping the songs, and then writing the tags using easytag as unicode before resetting the LANG variable. This is fine, but it gets annoying as once I change the language a lot of my programs also start using that language and I would like them to remain english.
Is there a simpler and more straight-forward way of doing this?
--Nan
Just guessing, but what if you only set LC_CTYPE to whatever language you need?

How to set Multi Byte Character Set ( MBCS ) to Particular String In MFC VC++

I Use Unicode Character Set in my MFC Application ( VC++) .
now i get the output ठ桔湡⁫潹⁵潦⁲獵 (like this )character and i want to convert this character in english language (means MBCS),
But i need Unicode to My Applicatiion. when i change the Multi-Byte Character set It give Correct output in English but other Objects ( TreeCtrl Selection ) will perform wrongly . so i need to convert the particular String to MBCS
how can i do that ? In MFC

I assume your string read from your hardware device is an plains "C" string (ANSI string). This type of string has one byte per character. Unicode has two bytes per character.
From the situation you explained I'd convert the string returned by the hardware to an Unicode string using i.e. MultibyteTowideChar with CP_ACP. You may also use mbstowcs or some similar functions to convert your string to an Unicode string.
Best regards
Bordon
Note: Posted code pieces may not have a good programming style and may not perfect. It is also possible that they do not work in all situations. Code pieces are only indended to explain something particualar.

Foreign character set issue

Hi
This might sound a bit silly, but stay with me.
There's a database that decidedly supports UTF 8. I checked using this query
select * from nls_database_parameters where parameter like '%CHARACTERSET'
And got this result
NLS_CHARACTERSET UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16
It's Oracle 10g.
In a particular table, some text is stored in multiple languages. There are seven languages (English, Mandarin, Japanese, German...). Every language has 2-3 rows to itself. There's a column where I have to get rid of the trailing few characters, the number of which depends on the content of the string.
But I cannot see any of the Eastern languages in TOAD. The column is in VARCHAR2.
My problem is twofold.
1. What functions do I use to ensure that only the last few bytes are truncated, and there's no data loss (which many websites gravely warn of when dealing with foreign language data) ?
2. How can I see this foreign language text in TOAD/SQLPlus?
(Yes, I'm kind of new to the whole multiple-language-game. Please let me know if I've left out any important detail!)

Do you have metalink access?
If so, please see the notes below, there's a lot of good information in them:
158577.1 - NLS_LANG explained
260893.1 - Unicode Character Sets in the Database
788156.1 - UTF8 implications
With any character set situation there are at least two and a bit sides to the equation.
First is whether you are storing the correct data.
You're best using the DUMP function to inspect the stored data, e.g.
SELECT DUMP(<column_name>) FROM <table_name> WHERE ....This function may help you with your truncation of the last few bytes - not sure why you need to do this?
The "second and a bit" bit is having the correct client settings - NLS_LANG - and using a client which supports the characters required.
SQL*Plus has it's limitations here. Toad I don't know well enough but it should support full UTF8 characters.
SQL*Developer and iSQL*Plus both should support the full UTF8 - I tend to use the former, particularly for UTF8.

Function to change character set for specific text

Hi all,
there is any function that i can use to change specific character from AL16UTF16 character set to UTF8 character set,
there is any such a function in oracle doing this ..
Thanks in advance
Ahmed,

HI Elic,
The things is my CharacterSet is AR8MSWIN1256
and i want to convert it to unicode character set that supported arabic
can you please tell me ,
ahmed

Change default character set of JVM

Is there a way to change the default character set of JVM to say, UTF-8?
System.out.println("Default Character Set: " + new java.io.OutputStreamWriter(new java.io.ByteArrayOutputStream()).getEncoding());
System.out.println("File Encoding: " + System.getProperty("file.encoding")); On Windows
==========
Default Character Set: Cp1252
File Encoding: Cp1252
On Linux
========
Default Character Set: ASCII
File Encoding: ANSI_X3.4-1968
I would like to save on the effort of changing the many lines of code that looks like
new BufferedWriter(new OutputStreamWriter(out)); to
new BufferedWriter(new OutputStreamWriter(out, "UTF-8")); Thanks

Try this:
-Dfile.encoding=utf-8
as vm argument.
/Kaj

Unicode Character Sets in Java

Similar Messages

Maybe you are looking for