Character sets - UTF8 or Chinese
Hi,
I am looking into enhancing the application I have built in Oracle to save/display data in Chinese & English. I have looking into how to change the character set of a database to accept different languages i.e. different characters.
From what I understand I can create a database to use a Chinese character set (apparently English ascii characters are also a part of any Chinese character set) or I can set the database to use a unicode multi-byte character set (UTF8) - which seems to be okay for all languages.
Has anyone had any experience of a) changing an existing standard 7 byte ascii database into database which can handle Chinese and/or b) the difference/ implications between using a Chinese and unicode character sets.
I am using Oracle RDBMS 8.1.7 on SuSE Linux 7.2
Thanks in advance.
Dan
If the data is segmented so that character set 1 data is in a table and character set 2 data is in another table then you may have a chance to salvage the data with help from support. The idea would be to first export and import only your CL8MSWIN1251 data to UTF8. Be careful that your NLS_LANG is set to CL8MSWIN1251 for export so that no conversion takes place. Confirm the import is successful and remove CL8MSWIN1251 data from database. Oracle support can now help you override the character set via ALTER database to say MSWIN1252. Now selectively export/import this data, again make sure NLS_LANG is set to MSWIN1252 for export so that no conversion takes place. Confirm the import is successful and remove MSWIN1252 data from database. And then do the same steps for 1250 data.
Similar Messages
-
Hi,
on 10g, on Win 2003, how to verify if database is NATIONAL CHARACTER SET UTF8 ?
Thank you.SELECT *
FROM v$nls_parameterswill show you all the NLS parameters. You're presumably looking for the row where parameter = 'NLS_NCHAR_CHARACTERSET' though there may be a few more parameters that you're interested in.
Justin -
I am looking to execute something like the stmt "SET CHARACTER SET UTF8"
However if i put the above stmt in a SQLDBC_Statement_execute call, i get fail message.
I should also mention here that, SQLDBC_Connection_connect call provides a way to provide the character set as one of the parameter. Is that the only way we can set character set from an application while using SQLDBC ?
Regards
RajHi Lars,Elke and Thomas,
Thanks to all of you for your valuable input. Honestly speaking i'm little lost on how to go about this requirement of unicode support for my application. Please allow me some more time to investigate this and then get back to you.
In my application for all other databases a simple execution of "SET CHARACTER SET UTF8" is all that's been done to set the support for UTF-8. So, I really need to figure out what all changes needs to be done in the app if this is not going to work.
In the meantime something more caught my attention while i was using this command:
sqlcli MAXDB1=> \dc domain.columns
Table "DOMAIN.COLUMNS"
Column Name
Type
Length
Nullable
KEYPOS
SCHEMANAME
CHAR UNICODE
32
YES
OWNER
CHAR UNICODE
32
YES
What does the type 'CHAR UNICODE' for Type means here?
Regards
Raj -
UTF8 character set conversion for chinese Language
Hi friends,
Would like to some basic explanation on UTF8 feature,what does it help while converting the data from chinese language.
Would like to know what all characters this UTF8 will not support while converting from chinese language.
Thanks & Regards
Ramya NomulaNot exactly sure what you are looking for, but on MetaLink, there are numerous detailed papers on NLS character sets, conversions, etc.
Bottom line is that for traditional Chinese characters (since they are more complicated), they require 4 bytes to store the characters (such as UTF-8, and AL32UTF8). Some mid-eastern characters sets also fall in this category.
Do a google search on "utf8 al32utf8 difference", and you will get some good explanations.
e.g., http://decipherinfosys.wordpress.com/2007/01/28/difference-between-utf8-and-al32utf8-character-sets-in-oracle/
Recently, one of our clients had a question on the differences between these two character sets since they were in the process of making their application global. In an upcoming whitepaper, we will discuss in detail what it takes (from a RDBMS perspective) to address localization and globalization issues. As far as these two character sets go in Oracle, the only difference between AL32UTF8 and UTF8 character sets is that AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate characters encoded using UTF-8 (or six bytes per character). Besides this storage difference, another difference is better support for supplementary characters in AL32UTF8 character set.
You may also consider posting your question on the Globalization Suport forum which pertains more to these types of questions.
Globalization Support -
Can a db with character set UTF8 be restored to AL32UTF8?
Hello Everyone,
Good Day.
Our present production and non-production databases are configured with NLS_CHARACTERSET as UTF8. However, as we are in the process of migrating to a new server, we intend to configure the new databases with NLS_CHARACTERSET as AL32UTF8 (which is the recommended option as per our research. Moreover, came to know that for Weblogic schemas and repositories to work, NLS_CHARACTERSET must be AL32UTF8).
As we would be restoring from a backup to the new instance created on the new server, kindly help us understand if any issues might arise while restoring due to both being different charactersets?
Warm Regards,
Vikram.Hi Robin,
Thank you for the update. Our DB is too huge and contains many schemas to try for a data pump. Hence we had planned for a restoration which might be simpler task with lesser downtime.
Perhaps, one option would be to create the instances with UTF character set itself and then change it once the migration activity has been completed.
Also, could you please throw some light on the two character sets as to which one is better and why?
Warm Regards,
Vikram. -
We have database with texts stored in mixed character sets.
DB charset is CL8MSWIN1251, but real data in CL8MSWIN1251, WE8MSWIN1252, EE8MSWIN1250.
We want convert this DB to UTF8 charser. Simple import/export will not help in this situation. Only CL8MSWIN1251 will be converted propertly.
Anyone know solutions for this situation?
Thank you in advance!If the data is segmented so that character set 1 data is in a table and character set 2 data is in another table then you may have a chance to salvage the data with help from support. The idea would be to first export and import only your CL8MSWIN1251 data to UTF8. Be careful that your NLS_LANG is set to CL8MSWIN1251 for export so that no conversion takes place. Confirm the import is successful and remove CL8MSWIN1251 data from database. Oracle support can now help you override the character set via ALTER database to say MSWIN1252. Now selectively export/import this data, again make sure NLS_LANG is set to MSWIN1252 for export so that no conversion takes place. Confirm the import is successful and remove MSWIN1252 data from database. And then do the same steps for 1250 data.
-
China character set (UTF8)
Hi All,
We have a problem showing report with china character set.
We use Crystal Report 9. when I am getting the data from the DB with other application(Oracle SQL developer) I can see the china character well.
When I am using Crystal I am getting garbage.
In the regional setting I checked all the relevant checkbox for Asian language and code page conversion table.
The NLS_LANG is define as AMERICAN_AMERICA.AL32UTF8.
1) Does Crystal Report 9 support UTF8?
2) There is any 'check list' That I can check if all my setting (on my machine) are correct?
3) Any other idea?
Thanks,
AmosHi Amos,
It should work but you will need to use a Unicode font. Try setting your field fonts in CR to MS Ariel Unicode and test again.
Thank you
Don -
Character set migration error to UTF8 urgent
Hi
when we migrated from ar8iso889p6 to utf8 characterset we are facing one error when i try to compile one package through forms i am getting error program unit pu not found.
When i running the source code of that procedure direct from database using sqlplus its running wihtout any problem.How can i migrate this forms from ar8iso889p6 to utf8 characterset. We migrated from databas with ar8iso889p6 oracle 81.7 database to oracle 9.2. database with character set UTF8 (windows 2000) export and import done without any error
I am using oracle 11i inside the calling forms6i and reports 6i
with regards
ramya
1) this is server side program yaa when connecting with forms i am getting error .When i am running this program using direct sql its working when i running compiling i am getting this error.
3) yes i am using 11 i (11.5.10) inside its calling forms 6i and reports .Why this is giving problem using forms.Is there any setting changing in forms nls_lang
with regardsHi Ramya
what i understand from your question is that you are trying to compile a procedure from a forms interface at client side?
if yes you should check the code in the forms that is calling the compilation package.
does it contains strings that might be affected from the character set change???
Tony G. -
Database Character Set Conversion from WE8ISO8859P1 to UTF8
Hi All
I want to migrate data from one database to another database But my original database character set is WE8ISO8859P1 but i want to migrate it to
database which has character set UTF8
because of character set it don't shows me Marathi data which is in original database .
it shows me some symbol for Marathi words ..
please help me out.
Thanking You
Gaurav SontakkeDear GauravSontakke,
Since your database version is unknown, i will show you an online documentation of character set migration for 10gR2.
http://www.oracle.com/pls/db102/search?remark=quick_search&word=character+set+migration&tab_id=&format=ranked
http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch11charsetmig.htm#sthref1442
*http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch11charsetmig.htm#NLSPG011*Please read those carefully.
Hope it Helps,
Ogan -
How to find Client Character set?
Hi,
I need to connect to remote database which is having different character set than the client. Ia there any method to display client character and Database character set from SQL Plus? Could someone please help me.
Thanks in Advance
Sree.I guess you're using PL/SQL Developer?
(because I get that warning message too ;) )
The warning also continues with:
You can set the client character set thought the NLS_LANG environment variable or the NLS_LANG registry key.
If I execute some scripts from client (client character set WE8MSWIN1252 and database character set UTF8) will
it cause any problem?It depends on what kind of data you're loading/importing. (chinese characters for example)
I never had any problems at all, since I'm not using 'exotic' characters.
You can find related threads on http://asktom.oracle.com/pls/asktom/asktom.search?p_string=%22UTF8%22
and more explanations in the Oracle Globalization Guide: http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/toc.htm -
HOW can I enter text using Japanese character sets?
The "Text, Plates, Insets" section of the LOOKOUT(6.01) Help files states:
"Click the » button to the right of the Text field to expand the field for multiple line entries. You can enter text using international character sets such as Chinese, Korean, and Japanese."
Can someone please explain HOW to do this? Note, I have NO problem inputting Hirigana, Katakana, and Kanji into MS WORD; the keyboard emulates the Japanese layout and characters (Romaji is default) and the IME works fine converting Romaji, and I can also select charcters directly from the IME Pad. I have tried several different fonts with success and am currently using MS UI Gothic.ttf as default. Again, everything is normal and working in a predictable manner within Word.
I cannot get these texts into Lookout. I can't cut/paste from HTML pages or from text editors, even though both display properly. Within Lookout with JP selected as language/keyboard, when trying to type directly into the text field, the IME CORRECTLY displays Hirigana until <enter> is pressed, at which point all text reverts to question marks (?? ???? ? ?????). If I use the IME Pad, it does pretty much the same. I managed to get the "Yen" symbol to display, though, if that's relevant. As I said, font selected (in text/plate font options) is MS UI Gothic with Japanese as the selected script. Oddly enough, at this point the "sample" window is showing me the exact Hirigana character I want displayed in Lookout, but it won't. I've also tried staying in English and copying unicode characters from the Windows Character Map. Same results (Yen sign works, Hirigana WON'T).
Help me!
JW_TechJW_Tech,
Have you changed the regional setting to Japanese?
Doug M
Applications Engineer
National Instruments
For those unfamiliar with NBC's The Office, my icon is NOT a picture of me
Attachments:
language.JPG 50 KB -
XML data from BLOB to CLOB - character set conversion
Hi All,
I'm trying to solve a problem with a character set conversion in PL/SQL in the following scenario:
1. source is an XML as a BLOB variable.
2. target is an XML as a CLOB variable.
3. the problem I have is the following:
- database character set is set to UTF-8
- XML character set could be anything (UTF-8, ISO 8859-1, ISO 8859-2, ASCII, ...)
- I need to write a procedure which converts the source BLOB content into the target CLOB taking into account the XML encoding and converts it into the DB default character set (UTF8).
I've been able to implement a simple conversion function. However, this function expects static XML encoding ISO-8859-1. The main part of the function looks as follows:
buffer := UTL_RAW.cast_to_varchar2(
UTL_RAW.convert(
DBMS_LOB.SUBSTR(source_blob_variable, 16000, pos)
, 'American_America.UTF8'
, 'American_America.we8iso8859p1')
Does anyone have an idea how to rewrite the code to handle "any" XML encoding in the source BLOB file? In other words, is there a function in Oracle which converts XML character set names into Oracle character set values (ISO-8859-1 to we8iso8859p1, UTF-8 to UTF8, ...)?
Thanks a lot for any help.
JuliusI want to pass a BLOB to some "createXML" procedure and get a proper XMLType in UTF8 character set, properly converted from whatever character set is the input in.As per documentation the generated XML has always the encoding set at the client side depending on NLS_LANG (default UTF-8), regardless of the input encoding, so I don't see a need to parse the PI of the XML:
C:\>echo %NLS_LANG%
%NLS_LANG%
C:\>sqlplus
SQL*Plus: Release 11.1.0.6.0 - Production on Wed Apr 30 08:54:12 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> var cur refcursor
SQL>
SQL> declare
2 b blob := utl_raw.cast_to_raw ('<a>myxml</a>');
3 begin
4 open :cur for select xmlroot (xmltype (utl_raw.cast_to_varchar2 (b))) xml from dual;
5 end;
6 /
PL/SQL procedure successfully completed.
SQL>
SQL> print cur
XML
<?xml version="1.0" encoding="UTF-8"?><a>myxml</a>
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
C:\>set NLS_LANG=GERMAN_GERMANY.WE8ISO8859P1
C:\>sqlplus
SQL*Plus: Release 11.1.0.6.0 - Production on Mi Apr 30 08:55:02 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
SQL> var cur refcursor
SQL>
SQL> declare
2 b blob := utl_raw.cast_to_raw ('<a>myxml</a>');
3 begin
4 open :cur for select xmlroot (xmltype (utl_raw.cast_to_varchar2 (b))) xml from dual;
5 end;
6 /
PL/SQL-Prozedur erfolgreich abgeschlossen.
SQL>
SQL> print cur
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<a>myxml</a> -
Importing from a different character set
Oracle 8.1.7 / Windows NT
I'm trying to import a dump file which was created with character set WE8ISO8859P9. My database uses character set UTF8. Some of the records can't be inserted because of error "ORA-1401: Value too large for column". Is this because of the different character sets? If I switch my session to WE8ISO8859P9, imp says "character set conversion from x to y not supported."
How can I get these last records inserted? Here's an excerpt from the log:
Verbunden mit: Oracle8i Enterprise Edition Release 8.1.7.0.0 - Production
With the Partitioning option
JServer Release 8.1.7.0.0 - Production
<
Export-Datei wurde von EXPORT:V08.00.05 |ber konventionellen Pfad erstellt
Warnung: Die Objekte wurden von NOC_ADMIN exportiert, nicht von Ihnen.
Importvorgang mit Zeichensatz WE8ISO8859P9 und Zeichensatz UTF8 NCHAR durchgef|hrt
Import-Server verwendet Zeichensatz UTF8 (mvgliche Zeichensatzkonvertierung)
Export-Server verwendet Zeichensatz WE8ISO8859P9 NCHAR (mvgliche Zeichensatzkonvertierung)
. Import NOC_ADMIN's Objekte in NOC_ADMIN
. . Import der Tabelle "ACCESSROUTERIFS_" 782 Zeilen importiert
. . Import der Tabelle "ITEM_"
IMP-00019: Zeile zur|ckgewiesen aufgrund von Oracle-Fehler 1401
IMP-00003: Oracle-Fehler 1401 gefunden
ORA-01401: Eingef|gter Wert zu gro_ f|r Spalte
Spalte 1 33886
Spalte 2
Spalte 3
Spalte 4 1323
Spalte 5
Spalte 6 11
Spalte 7 18600
Spalte 8 18600
Spalte 9 20-NOV-2000:00:00:00
Spalte 10 processing
Spalte 11 inactive
Spalte 12
Spalte 13
Spalte 14 35682.0
Spalte 15
Spalte 16
Spalte 17
Spalte 18 05.12.00: KD weiss noch nix neues, er wird uns inf...
Spalte 19
Spalte 20 kschmid
Spalte 21 09-FEB-2001:15:50:21
Spalte 22
Spalte 23 12
Spalte 24
Spalte 25 06-NOV-2000:00:00:00
nullPlease try ORacle RDBMS support. this issues is to do with Oracle Import.
-
How to change the character set of the D/b
Hello All,
When i issue the command
ALTER DATABASE CHARACTER SET UTF8
It gives me the error that I can only change the character set to a superset of the existing character set.
Is there any way i can change the character set without recreating the database.
TIA
NaveenThe existing character set is the basis for the new character set. This is fair enough, because the character set determines how the actual data is stored in the database. Allowing new characters is a minor change: completely re-encoding your entire database is not.
I'm afraid export, recreate and import is your only option.
Cheers, APC -
Unrecognised Char in GB2312 character set using java InputStreamReader??
Reading the following file chinese GB2312 html file from
http://news.xinhuanet.com/local/2007-02/13/content_5732705.htm
using the InputStreamReader with GB2312 encoding as shown below
public class readGB2312html file
//........TmpText declarations.....
public static void main( String[] args )
try
FileInputStream is = new FileInputStream( args[0] );
BufferedReader br = new BufferedReader(
new InputStreamReader( is, "GB2312" ) );
String strLine;
while ( (strLine = br.readLine()) != null )
TmpText.append(strLine);
TmpText.append("\r\n");
br.close();
bw.close();
catch ( Exception e )
e.printStackTrace();
The TmpText variable does not display the last character in the article properly (记者夏珺) it gives instead (记者夏?B)
Inside the html file the unrecognised charcter is represented by �B in the html file Why is this so
���������B��
In the internet browser it is displayed and recognised as a chinese GB2312 character why not recognised by Java InputStreamReader???
Any help or explanation would be much appreciatedYes, it is not a GB2312 character
The �B character is AC40 in hex format which is outside of the GB2312 character range, it is in GBK
Copied from wikipedia,
GBK is an extension of the GB2312 character set for simplified Chinese characters, used in the People's Republic of China.
GB stands for National Standard, while K stands for Extension. GBK not only extended the old standard GB2312 with Traditional Chinese characters, but also with Chinese characters that were simplified after the establishment of GB2312 in 1981. With the arrival of GBK, certain names with characters formerly unrepresentable, like the "rong" (�g) character in former Chinese Premier Zhu Rongji's name, are now representable.
Thanks a lot will use the GBK charset to read the file for all GB2312 file since it is a subset of it.
Maybe you are looking for
-
I'm developing a web page at http://uoregon.edu/~jbernal/index_20080723.html. Safari (v 3.1.2) won't load the external css file for this page. The page looks fine in Firefox 3.0. Any suggestions to help me render the css?
-
Adapter Engine Load balancing for Messages HTTP SOA/SOE
Greetings all, We are currently implementing our SOE/SOA solution utilizing SAP services between a .Net Application. Basically .Net -> PI -> ECC and back. We are load testing the system and are now experiencing the CI getting overloaded. We have upp
-
Key figures are not available when i performing LBWE action
Hi gurus, I am trying to load 2lis_02_hdr, 2lis_02_itm and 2lis_02_scl to infocube 0pur_c04. But whatever the key figures available in the cube its not appear in the R/3 side data Source.. but those keyfigures are available in LIS structure.. for t
-
How do I get an app out of my contacts app on my iPhone
I accidentally moved my Mailbox app into my contacts on my iPhone 4S. I cannot get it out.
-
Is it possible to use 2 iphotos on same computer?
Hello....I have 8500 pictures on my laptop. The majority of them are family, etc. but, alot of them are pictures made for one of my hobbies. Is there any way I can have 2 iphotos...for example, iphoto 5 and iphoto 6 and use one for personal photos an