Character sets - UTF8 or Chinese

Hi,
I am looking into enhancing the application I have built in Oracle to save/display data in Chinese & English. I have looking into how to change the character set of a database to accept different languages i.e. different characters.
From what I understand I can create a database to use a Chinese character set (apparently English ascii characters are also a part of any Chinese character set) or I can set the database to use a unicode multi-byte character set (UTF8) - which seems to be okay for all languages.
Has anyone had any experience of a) changing an existing standard 7 byte ascii database into database which can handle Chinese and/or b) the difference/ implications between using a Chinese and unicode character sets.
I am using Oracle RDBMS 8.1.7 on SuSE Linux 7.2
Thanks in advance.
Dan

If the data is segmented so that character set 1 data is in a table and character set 2 data is in another table then you may have a chance to salvage the data with help from support. The idea would be to first export and import only your CL8MSWIN1251 data to UTF8. Be careful that your NLS_LANG is set to CL8MSWIN1251 for export so that no conversion takes place. Confirm the import is successful and remove CL8MSWIN1251 data from database. Oracle support can now help you override the character set via ALTER database to say MSWIN1252. Now selectively export/import this data, again make sure NLS_LANG is set to MSWIN1252 for export so that no conversion takes place. Confirm the import is successful and remove MSWIN1252 data from database. And then do the same steps for 1250 data.

Similar Messages

  • NATIONAL CHARACTER SET UTF8

    Hi,
    on 10g, on Win 2003, how to verify if database is NATIONAL CHARACTER SET UTF8 ?
    Thank you.

    SELECT *
      FROM v$nls_parameterswill show you all the NLS parameters. You're presumably looking for the row where parameter = 'NLS_NCHAR_CHARACTERSET' though there may be a few more parameters that you're interested in.
    Justin

  • How to SET CHARACTER SET UTF8

    I am looking to execute something like the stmt "SET CHARACTER SET UTF8"
    However if i put the above stmt in a SQLDBC_Statement_execute call, i get fail message.
    I should also mention here that, SQLDBC_Connection_connect call provides a way to provide the character set as one of the parameter. Is that the only way we can set character set from an application while using SQLDBC ?
    Regards
    Raj

    Hi Lars,Elke and Thomas,
    Thanks to all of you for your valuable input. Honestly speaking i'm little lost on how to go about this requirement of unicode support for my application. Please allow me some more time to investigate this and then get back to you.
    In my application for all other databases a simple execution of "SET CHARACTER SET UTF8" is all that's been done to set the support for UTF-8. So, I really need to figure out what all changes needs to be done in the app if this is not going to work.
    In the meantime something more caught my attention while i was using this command:
    sqlcli MAXDB1=> \dc domain.columns
    Table "DOMAIN.COLUMNS"
    Column Name
    Type
    Length
    Nullable
    KEYPOS
    SCHEMANAME
    CHAR UNICODE
    32
    YES
    OWNER
    CHAR UNICODE
    32
    YES
    What does the type  'CHAR UNICODE' for Type means here?
    Regards
    Raj

  • UTF8 character set conversion for chinese Language

    Hi friends,
    Would like to some basic explanation on UTF8 feature,what does it help while converting the data from chinese language.
    Would like to know what all characters this UTF8 will not support while converting from chinese language.
    Thanks & Regards
    Ramya Nomula

    Not exactly sure what you are looking for, but on MetaLink, there are numerous detailed papers on NLS character sets, conversions, etc.
    Bottom line is that for traditional Chinese characters (since they are more complicated), they require 4 bytes to store the characters (such as UTF-8, and AL32UTF8). Some mid-eastern characters sets also fall in this category.
    Do a google search on "utf8 al32utf8 difference", and you will get some good explanations.
    e.g., http://decipherinfosys.wordpress.com/2007/01/28/difference-between-utf8-and-al32utf8-character-sets-in-oracle/
    Recently, one of our clients had a question on the differences between these two character sets since they were in the process of making their application global. In an upcoming whitepaper, we will discuss in detail what it takes (from a RDBMS perspective) to address localization and globalization issues. As far as these two character sets go in Oracle, the only difference between AL32UTF8 and UTF8 character sets is that AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate characters encoded using UTF-8 (or six bytes per character). Besides this storage difference, another difference is better support for supplementary characters in AL32UTF8 character set.
    You may also consider posting your question on the Globalization Suport forum which pertains more to these types of questions.
    Globalization Support

  • Can a db with character set UTF8 be restored to AL32UTF8?

    Hello Everyone,
    Good Day.
    Our present production and non-production databases are configured with NLS_CHARACTERSET as UTF8. However, as we are in the process of migrating to a new server, we intend to configure the new databases with NLS_CHARACTERSET as AL32UTF8 (which is the recommended option as per our research. Moreover, came to know that for Weblogic schemas and repositories to work, NLS_CHARACTERSET must be AL32UTF8).
    As we would be restoring from a backup to the new instance created on the new server, kindly help us understand if any issues might arise while restoring due to both being different charactersets?
    Warm Regards,
    Vikram.

    Hi Robin,
    Thank you for the update. Our DB is too huge and contains many schemas to try for a data pump. Hence we had planned for a restoration which might be simpler task with lesser downtime.
    Perhaps, one option would be to create the instances with UTF character set itself and then change it once the migration activity has been completed.
    Also, could you please throw some light on the two character sets as to which one is better and why?
    Warm Regards,
    Vikram.

  • Mixed Character Sets - UTF8

    We have database with texts stored in mixed character sets.
    DB charset is CL8MSWIN1251, but real data in CL8MSWIN1251, WE8MSWIN1252, EE8MSWIN1250.
    We want convert this DB to UTF8 charser. Simple import/export will not help in this situation. Only CL8MSWIN1251 will be converted propertly.
    Anyone know solutions for this situation?
    Thank you in advance!

    If the data is segmented so that character set 1 data is in a table and character set 2 data is in another table then you may have a chance to salvage the data with help from support. The idea would be to first export and import only your CL8MSWIN1251 data to UTF8. Be careful that your NLS_LANG is set to CL8MSWIN1251 for export so that no conversion takes place. Confirm the import is successful and remove CL8MSWIN1251 data from database. Oracle support can now help you override the character set via ALTER database to say MSWIN1252. Now selectively export/import this data, again make sure NLS_LANG is set to MSWIN1252 for export so that no conversion takes place. Confirm the import is successful and remove MSWIN1252 data from database. And then do the same steps for 1250 data.

  • China character set (UTF8)

    Hi All,
    We have a problem showing report with china character set.
    We use Crystal Report 9. when I am getting the data from the DB with other application(Oracle SQL developer) I can see the china character well.
    When I am using Crystal I am getting garbage.
    In the regional setting I checked all the relevant checkbox for Asian language and code page conversion table.
    The NLS_LANG is define as AMERICAN_AMERICA.AL32UTF8.
    1) Does Crystal Report 9 support UTF8?
    2) There is any 'check list' That I can check if all my setting (on my machine) are correct?
    3) Any other idea?
    Thanks,
    Amos

    Hi Amos,
    It should work but you will need to use a Unicode font. Try setting your field fonts in CR to MS Ariel Unicode and test again.
    Thank you
    Don

  • Character set migration error to UTF8 urgent

    Hi
    when we migrated from ar8iso889p6 to utf8 characterset we are facing one error when i try to compile one package through forms i am getting error program unit pu not found.
    When i running the source code of that procedure direct from database using sqlplus its running wihtout any problem.How can i migrate this forms from ar8iso889p6 to utf8 characterset. We migrated from databas with ar8iso889p6 oracle 81.7 database to oracle 9.2. database with character set UTF8 (windows 2000) export and import done without any error
    I am using oracle 11i inside the calling forms6i and reports 6i
    with regards
    ramya
    1) this is server side program yaa when connecting with forms i am getting error .When i am running this program using direct sql its working when i running compiling i am getting this error.
    3) yes i am using 11 i (11.5.10) inside its calling forms 6i and reports .Why this is giving problem using forms.Is there any setting changing in forms nls_lang
    with regards

    Hi Ramya
    what i understand from your question is that you are trying to compile a procedure from a forms interface at client side?
    if yes you should check the code in the forms that is calling the compilation package.
    does it contains strings that might be affected from the character set change???
    Tony G.

  • Database Character Set Conversion from WE8ISO8859P1 to UTF8

    Hi All
    I want to migrate data from one database to another database But my original database character set is WE8ISO8859P1 but i want to migrate it to
    database which has character set UTF8
    because of character set it don't shows me Marathi data which is in original database .
    it shows me some symbol for Marathi words ..
    please help me out.
    Thanking You
    Gaurav Sontakke

    Dear GauravSontakke,
    Since your database version is unknown, i will show you an online documentation of character set migration for 10gR2.
    http://www.oracle.com/pls/db102/search?remark=quick_search&word=character+set+migration&tab_id=&format=ranked
    http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch11charsetmig.htm#sthref1442
    *http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch11charsetmig.htm#NLSPG011*Please read those carefully.
    Hope it Helps,
    Ogan

  • How to find Client Character set?

    Hi,
    I need to connect to remote database which is having different character set than the client. Ia there any method to display client character and Database character set from SQL Plus? Could someone please help me.
    Thanks in Advance
    Sree.

    I guess you're using PL/SQL Developer?
    (because I get that warning message too ;) )
    The warning also continues with:
    You can set the client character set thought the NLS_LANG environment variable or the NLS_LANG registry key.
    If I execute some scripts from client (client character set WE8MSWIN1252 and database character set UTF8) will
    it cause any problem?It depends on what kind of data you're loading/importing. (chinese characters for example)
    I never had any problems at all, since I'm not using 'exotic' characters.
    You can find related threads on http://asktom.oracle.com/pls/asktom/asktom.search?p_string=%22UTF8%22
    and more explanations in the Oracle Globalization Guide: http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/toc.htm

  • HOW can I enter text using Japanese character sets?

    The "Text, Plates, Insets" section of the LOOKOUT(6.01) Help files states:
    "Click the » button to the right of the Text field to expand the field for multiple line entries. You can enter text using international character sets such as Chinese, Korean, and Japanese."
    Can someone please explain HOW to do this? Note, I have NO problem inputting Hirigana, Katakana, and Kanji into MS WORD; the keyboard emulates the Japanese layout and characters (Romaji is default) and the IME works fine converting Romaji, and I can also select charcters directly from the IME Pad. I have tried several different fonts with success and am currently using MS UI Gothic.ttf as default. Again, everything is normal and working in a predictable manner within Word.
    I cannot get these texts into Lookout. I can't cut/paste from HTML pages or from text editors, even though both display properly. Within Lookout with JP selected as language/keyboard, when trying to type directly into the text field, the IME CORRECTLY displays Hirigana until <enter> is pressed, at which point all text reverts to question marks (?? ???? ? ?????). If I use the IME Pad, it does pretty much the same. I managed to get the "Yen" symbol to display, though, if that's relevant. As I said, font selected (in text/plate font options) is MS UI Gothic with Japanese as the selected script. Oddly enough, at this point the "sample" window is showing me the exact Hirigana character I want displayed in Lookout, but it won't. I've also tried staying in English and copying unicode characters from the Windows Character Map. Same results (Yen sign works, Hirigana WON'T).
    Help me!
    JW_Tech

    JW_Tech,
    Have you changed the regional setting to Japanese?
    Doug M
    Applications Engineer
    National Instruments
    For those unfamiliar with NBC's The Office, my icon is NOT a picture of me
    Attachments:
    language.JPG ‏50 KB

  • XML data from BLOB to CLOB - character set conversion

    Hi All,
    I'm trying to solve a problem with a character set conversion in PL/SQL in the following scenario:
    1. source is an XML as a BLOB variable.
    2. target is an XML as a CLOB variable.
    3. the problem I have is the following:
    - database character set is set to UTF-8
    - XML character set could be anything (UTF-8, ISO 8859-1, ISO 8859-2, ASCII, ...)
    - I need to write a procedure which converts the source BLOB content into the target CLOB taking into account the XML encoding and converts it into the DB default character set (UTF8).
    I've been able to implement a simple conversion function. However, this function expects static XML encoding ISO-8859-1. The main part of the function looks as follows:
    buffer := UTL_RAW.cast_to_varchar2(
    UTL_RAW.convert(
    DBMS_LOB.SUBSTR(source_blob_variable, 16000, pos)
    , 'American_America.UTF8'
    , 'American_America.we8iso8859p1')
    Does anyone have an idea how to rewrite the code to handle "any" XML encoding in the source BLOB file? In other words, is there a function in Oracle which converts XML character set names into Oracle character set values (ISO-8859-1 to we8iso8859p1, UTF-8 to UTF8, ...)?
    Thanks a lot for any help.
    Julius

    I want to pass a BLOB to some "createXML" procedure and get a proper XMLType in UTF8 character set, properly converted from whatever character set is the input in.As per documentation the generated XML has always the encoding set at the client side depending on NLS_LANG (default UTF-8), regardless of the input encoding, so I don't see a need to parse the PI of the XML:
    C:\>echo %NLS_LANG%
    %NLS_LANG%
    C:\>sqlplus
    SQL*Plus: Release 11.1.0.6.0 - Production on Wed Apr 30 08:54:12 2008
    Copyright (c) 1982, 2007, Oracle.  All rights reserved.
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    SQL> var cur refcursor
    SQL>
    SQL> declare
      2     b   blob := utl_raw.cast_to_raw ('<a>myxml</a>');
      3  begin
      4     open :cur for select xmlroot (xmltype (utl_raw.cast_to_varchar2 (b))) xml from dual;
      5  end;
      6  /
    PL/SQL procedure successfully completed.
    SQL>
    SQL> print cur
    XML
    <?xml version="1.0" encoding="UTF-8"?><a>myxml</a>
    SQL> exit
    Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    C:\>set NLS_LANG=GERMAN_GERMANY.WE8ISO8859P1
    C:\>sqlplus
    SQL*Plus: Release 11.1.0.6.0 - Production on Mi Apr 30 08:55:02 2008
    Copyright (c) 1982, 2007, Oracle.  All rights reserved.
    SQL> var cur refcursor
    SQL>
    SQL> declare
      2     b   blob := utl_raw.cast_to_raw ('<a>myxml</a>');
      3  begin
      4     open :cur for select xmlroot (xmltype (utl_raw.cast_to_varchar2 (b))) xml from dual;
      5  end;
      6  /
    PL/SQL-Prozedur erfolgreich abgeschlossen.
    SQL>
    SQL> print cur
    XML
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <a>myxml</a>

  • Importing from a different character set

    Oracle 8.1.7 / Windows NT
    I'm trying to import a dump file which was created with character set WE8ISO8859P9. My database uses character set UTF8. Some of the records can't be inserted because of error "ORA-1401: Value too large for column". Is this because of the different character sets? If I switch my session to WE8ISO8859P9, imp says "character set conversion from x to y not supported."
    How can I get these last records inserted? Here's an excerpt from the log:
    Verbunden mit: Oracle8i Enterprise Edition Release 8.1.7.0.0 - Production
    With the Partitioning option
    JServer Release 8.1.7.0.0 - Production
    <
    Export-Datei wurde von EXPORT:V08.00.05 |ber konventionellen Pfad erstellt
    Warnung: Die Objekte wurden von NOC_ADMIN exportiert, nicht von Ihnen.
    Importvorgang mit Zeichensatz WE8ISO8859P9 und Zeichensatz UTF8 NCHAR durchgef|hrt
    Import-Server verwendet Zeichensatz UTF8 (mvgliche Zeichensatzkonvertierung)
    Export-Server verwendet Zeichensatz WE8ISO8859P9 NCHAR (mvgliche Zeichensatzkonvertierung)
    . Import NOC_ADMIN's Objekte in NOC_ADMIN
    . . Import der Tabelle "ACCESSROUTERIFS_" 782 Zeilen importiert
    . . Import der Tabelle "ITEM_"
    IMP-00019: Zeile zur|ckgewiesen aufgrund von Oracle-Fehler 1401
    IMP-00003: Oracle-Fehler 1401 gefunden
    ORA-01401: Eingef|gter Wert zu gro_ f|r Spalte
    Spalte 1 33886
    Spalte 2
    Spalte 3
    Spalte 4 1323
    Spalte 5
    Spalte 6 11
    Spalte 7 18600
    Spalte 8 18600
    Spalte 9 20-NOV-2000:00:00:00
    Spalte 10 processing
    Spalte 11 inactive
    Spalte 12
    Spalte 13
    Spalte 14 35682.0
    Spalte 15
    Spalte 16
    Spalte 17
    Spalte 18 05.12.00: KD weiss noch nix neues, er wird uns inf...
    Spalte 19
    Spalte 20 kschmid
    Spalte 21 09-FEB-2001:15:50:21
    Spalte 22
    Spalte 23 12
    Spalte 24
    Spalte 25 06-NOV-2000:00:00:00
    null

    Please try ORacle RDBMS support. this issues is to do with Oracle Import.

  • How to change the character set of the D/b

    Hello All,
    When i issue the command
    ALTER DATABASE CHARACTER SET UTF8
    It gives me the error that I can only change the character set to a superset of the existing character set.
    Is there any way i can change the character set without recreating the database.
    TIA
    Naveen

    The existing character set is the basis for the new character set. This is fair enough, because the character set determines how the actual data is stored in the database. Allowing new characters is a minor change: completely re-encoding your entire database is not.
    I'm afraid export, recreate and import is your only option.
    Cheers, APC

  • Unrecognised Char in GB2312 character set using java InputStreamReader??

    Reading the following file chinese GB2312 html file from
    http://news.xinhuanet.com/local/2007-02/13/content_5732705.htm
    using the InputStreamReader with GB2312 encoding as shown below
    public class readGB2312html file
    //........TmpText declarations.....
    public static void main( String[] args )
    try
    FileInputStream is = new FileInputStream( args[0] );
    BufferedReader br = new BufferedReader(
    new InputStreamReader( is, "GB2312" ) );
    String strLine;
    while ( (strLine = br.readLine()) != null )
    TmpText.append(strLine);
    TmpText.append("\r\n");
    br.close();
    bw.close();
    catch ( Exception e )
    e.printStackTrace();
    The TmpText variable does not display the last character in the article properly &#65288;&#35760;&#32773;&#22799;&#29690;&#65289; it gives instead &#65288;&#35760;&#32773;&#22799;?B&#65289;
    Inside the html file the unrecognised charcter is represented by �B in the html file Why is this so
    ���������B��
    In the internet browser it is displayed and recognised as a chinese GB2312 character why not recognised by Java InputStreamReader???
    Any help or explanation would be much appreciated

    Yes, it is not a GB2312 character
    The �B character is AC40 in hex format which is outside of the GB2312 character range, it is in GBK
    Copied from wikipedia,
    GBK is an extension of the GB2312 character set for simplified Chinese characters, used in the People's Republic of China.
    GB stands for National Standard, while K stands for Extension. GBK not only extended the old standard GB2312 with Traditional Chinese characters, but also with Chinese characters that were simplified after the establishment of GB2312 in 1981. With the arrival of GBK, certain names with characters formerly unrepresentable, like the "rong" (�g) character in former Chinese Premier Zhu Rongji's name, are now representable.
    Thanks a lot will use the GBK charset to read the file for all GB2312 file since it is a subset of it.

Maybe you are looking for

  • Safari won't load css file

    I'm developing a web page at http://uoregon.edu/~jbernal/index_20080723.html. Safari (v 3.1.2) won't load the external css file for this page. The page looks fine in Firefox 3.0. Any suggestions to help me render the css?

  • Adapter Engine Load balancing for Messages HTTP SOA/SOE

    Greetings all, We are currently implementing our SOE/SOA solution utilizing SAP services between a .Net Application.  Basically .Net -> PI -> ECC and back. We are load testing the system and are now experiencing the CI getting overloaded. We have upp

  • Key figures are not available when i performing LBWE action

    Hi gurus,   I am trying to load 2lis_02_hdr, 2lis_02_itm and 2lis_02_scl to infocube 0pur_c04. But whatever the key figures available in the cube its not appear in the R/3 side data Source.. but those keyfigures are available in LIS structure.. for t

  • How do I get an app out of my contacts app on my iPhone

    I accidentally moved my Mailbox app into my contacts on my iPhone 4S. I cannot get it out.

  • Is it possible to use 2 iphotos on same computer?

    Hello....I have 8500 pictures on my laptop. The majority of them are family, etc. but, alot of them are pictures made for one of my hobbies. Is there any way I can have 2 iphotos...for example, iphoto 5 and iphoto 6 and use one for personal photos an