Store Multi Byte Characters in WE8ISO8859P1 Database without Migration

Hi - I am looking for a solution where I can store the Multi Byte Character's under the WE8ISO8859P1 Database.
Below are the DB NLS_PARAMETERS
NLS_CHARACTERSET = WE8ISO8859P1
NLS_NCHAR_CHARACTERSET = AL32UTF8
NLS_LENGTH_SEMANTICS = BYTE
Size of DB = 2 TB.
DB Version = 11.2.0.4
Currently there is a need to store the Chinese Characters under NAME and ADDRESS Columns only. Below are the description of the columns.
Column Name DataType
GIVEN_NAME_ONE
VARCHAR2(120 BYTE)
GIVEN_NAME_TWO
VARCHAR2(120 BYTE)
LAST_NAME
VARCHAR2(120 BYTE)
ADDR_LINE_ONE
VARCHAR2(100 BYTE)
ADDR_LINE_TWO
VARCHAR2(100 BYTE)
ADDR_LINE_THREE
VARCHAR2(100 BYTE)
What are my option's over here without considering the Migration WE8ISO8859P1 DB to AL32UTF8 ?
1. Can I increase the size of the Column i.e make it n x 4. e.g NAME will be 480 Byte and ADDRESS will be 400 Byte.? What are pros and cons ?
2. Convert the existing Column from VARCHAR2 to NVARCHAR2 with the Same Size ? i.e NVARCHAR2(120 BYTE) ?
3. Add the extension to an table with new columns - NVARCHAR2. e.g NAME - NVARCHAR2(120 CHAR) and ADDRESS (100 - CHAR) ?
4. Database got Clobs,Blobs, Long etc. got Varied Data, Is it a good idea to Migrate to AL32UTF8 with Minimal Downtime ?
Please suggest the best alternatives. Thanks.
Thanks
Jitesh

Hi Jitesh,
NLS_NCHAR_CHARACTERSET can either be AL16UTF16 or UTF8. So mostly your DB would have UTF8.
You can definitely insert Unicode characters into N-type columns. Size of the N-type column will depend on the characters you plan to store in them.
If you use N-types, do make sure you use the (N'...') syntax when coding it so that Literals are denoted as being in the national character set by prepending letter 'N'.
Although you can use them, N-types are not very well supported in 3the party client/programming environments, you may need to adapt a lot of code to use N-types properly and there are some limitations.
While at first using N-types for a (few) columns seems like a good idea to avoid the conversion of a whole database , in many cases the end conclusion is that changing the NLS_CHARACTERSET is simply the easiest and fastest way to support more languages in an Oracle database.
So, It depends on how much of your data will be unicode which you would store in N-type characters.
If you do have access to My Oracle Support you can check Note 276914.1 :The National Character Set ( NLS_NCHAR_CHARACTERSET ) in Oracle 9i, 10g , 11g and 12c, For more details.
With respect to your Downtime, The actual conversion (CSALTER or in case using DMU) shouldn't take too much time, if you have run CSSCAN on your DB and made sure you have taken care of all your truncation, convertible and lossy data (if any).
It would be best for you to run CSSCAN initially to gauge how much convertible/lossy/truncation data you need to take care.
$ CSSCAN FROMCHAR=WE8ISO8859P1 TOCHAR=AL32UTF8 LOG=P1TOAl32UTF8 ARRAY=1000000 PROCESS=2 CAPTURE=Y FULL=Y
Regards,
Suntrupth

Similar Messages

Multi-byte characters are garbled in SQL Server Business Intelligent Development Studio (Visual Studio) 2008

Hi,
I'm revising an existing report which was developed by my predecessor. Though it works fine in the production environment, when I open the .rdl file with SQL Server Business Intelligent Studio (Visual Studio) 2008 on my client
PC, I find all the multi-byte characters are garbled. When I open it with the BIDS (the same version) on the server, it shows everything correctly.
The fonts for the controls (labels) are Tahoma and it's originally only for alphabets, but multi-byte characters are supposed to be displayed in MSGOTHIC by Font Link as they are displayed correctly on the server.
Could anyone advise me how to solve this issue? I know I can fix it by changing the fonts from Tahoma to MSGOTHIC for all the contrls, but I don't want to do it.
Environment:
My PC：Windows7 64bit /Visual Studio 9.0.30729.1 / .NET Framework 3.5 SP1
Server：Windows Server 2003 R2 /Visual Studio 9.0.30729.1 / .NET Framework 3.5 SP1
Garbled characters sample:
FontLink - SystemLink
Please let me know if you need any more information. I would appreciate your advice!

Hi nino_miya,
According to your description, when you display the report in client side, characters are garbled.
In your scenario, please check if the Language is the same as the report on production server. Also please check if the data of Tahoma in registry on client PC is the same as server. If those two settings are the same, please specify font of the each
control as MSGOTHIC manaually on client PC.
If you have any question, please feel free to ask.
Best regards,
Qiuyun Yu
Qiuyun Yu
TechNet Community Support

JDBC2.0 API and Multi-Bytes Characters

I use the JDBC2.0 API with the thin Driver816 for jdk1.2.X,
it works well with English characters ,
but i get wrong with Multi-Bytes Characters.
Does anyone else know the reason?
Thanks in advance.

I have the same problem!!!!!!!!!!!
<BLOCKQUOTE><font size="1" face="Verdana, Arial">quote:</font><HR>Originally posted by huang Jian-chang:
I use the JDBC2.0 API with the thin Driver816 for jdk1.2.X,
it works well with English characters ,
but i get wrong with Multi-Bytes Characters.
Does anyone else know the reason?
Thanks in advance.<HR></BLOCKQUOTE>
null

Migrating Multi-Byte Characters

When Migrating from access 2000, all multibyte charcters
are coverted into single byte. The Database is runng UTF8?
Anyone done this before?
Thanks

#1 should return you the encoded string.
#2 should decode the string and return the correct characters.
If it doesn't, it's probably because the string was improperly
encoded.
#3 should cause #1 to do the same as #2, but you have to set
the property before JavaMail classes are loaded.

How to convert muti-byte characters from US7ASCII database to UTF-8

Hi Guys,
We have a source database with CHARCTERSET as US7ASCII and our traget database has characterset of UTF-8. We have "©" symbol in the source databse and when we are inserting this value into our target database it is being converted as "¿".
How can get make sure that "©" symbol is inserted on the target database. Both the databases are on version 10.2 but have a different CHARACTERSET. In the oracle documentation it mentioned that if the target database characterset is not a superset of source database characterset then this will happen but in our case UTF-8 is a superset of US7ASCII.
Thanks,
Ramu Kalvakuntla
Edited by: user11905624 on Sep 15, 2009 2:58 PM

user11905624 wrote:
When I tried the DUMP('COLUMN','1016), this is what I got:
Typ=96 Len=1 CharacterSet=US7ASCII: a9Considering 7-bit ASCII standard character set, the code 0xA9 is invalid.
This has likely happened due to a pass-through scenario. See [NLS_LANG FAQ|http://www.oracle.com/technology/tech/globalization/htdocs/nls_lang%20faq.htm] (example of wrong setup). E.g. Windows 125x code pages all defines a character 'copyright sign' with encoding A9.
If proper char set conversion takes place, I would expect (illegal) codes 0x80-FF to be "catched" and converted to replacement character (like U+fffd).
Going back to the issue, how exactly are you transfering data or retrieving and inserting from source to target database?
Edited by: orafad on Sep 17, 2009 10:56 PM

Urgent: comparing multi-byte characters to a single byte character!

Let's say I have two strings, they have the same contents but use different encoding, how do I compare them?
String a = "GOLD";
String b = "G O L D ";
The method a.equals(b) doesn't seem to work.

try this:
String a = "GOLD";
String b = "G O L D ";
boolean bEqual = true;
int iLength = a.length();
int j = 0;
for (int i = 0; i < iLength; i++) {
   while(b.substring(j,1).equals(" "))
      j++;
   if(!a.substring(i, 1).equals(b.substring(j,1)) {
      bEqual = false;
      break;
}

Multi-byte character

If DATABASE CHARACTER SET is UTF-8 than
Than can i use VARCHAR2 to store multi-byte character or i still have to use
nvarchar2
also vachar2(1),nvarchar2(1) can store how much (max) bytes in case of UTF-8 CHARACTER SET

If you create VARCHAR2(1) then you possibly can not store anything as your first character might be multibyte.
My recommendation would be to consider defining by character rather than by byte.
CREATE TABLE tbyte (
testcol VARCHAR2(20));
CREATE TABLE tchar (
testcol VARCHAR2(20 CHAR));The second will always hold 20 characters without regard to the byte count.
Demos here:
http://www.morganslibrary.org/library.html

DEFECT: (Serious!) Truncates display of data in multi-byte environment

I have an oracle 10g database set up with the following nls parameters:
NLS_CALENDAR      GREGORIAN
NLS_CHARACTERSET      AL32UTF8
NLS_COMP      LINGUISTIC
NLS_CURRENCY      $
NLS_DATE_FORMAT      DD-MON-YYYY
NLS_DATE_LANGUAGE      AMERICAN
NLS_DUAL_CURRENCY      $
NLS_ISO_CURRENCY      AMERICA
NLS_LANGUAGE      AMERICAN
NLS_LENGTH_SEMANTICS      CHAR
NLS_NCHAR_CHARACTERSET      UTF8
NLS_NCHAR_CONV_EXCP      TRUE
NLS_NUMERIC_CHARACTERS      .,
NLS_RDBMS_VERSION      10.2.0.3.0
NLS_SORT BINARY
NLS_TERRITORY      AMERICA
NLS_TIMESTAMP_FORMAT      DD-MON-RR HH.MI.SSXFF AM
NLS_TIMESTAMP_TZ_FORMAT      DD-MON-RR HH.MI.SSXFF AM TZR
NLS_TIME_FORMAT      HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT      HH.MI.SSXFF AM TZR
I am querying a view in sqlserver 2000 via an odbc database link.
When I query a 26 character wide column in the view in sql developer, it will only return up to 13 characters of the data.
When I query the exact same view in the exact same sql server database from the extact same oracle database using the exact same odbc database link using sql navigator, I get the full 26 characters worth of data.
It also works just fine from the sql command line tool from 10g express.
Apparently, sql developer is confused about how to handle multi-byte data. If you ask it the length of the data in the column, it will tell you 26, but it will only show you 13.
I have found a VERY PAINFUL work around, to do a cast(column_name as varchar2(26) when I query it. But I've got hundreds of views and queries...

In all other respects, the settings I have appear to be working correctly.
I can enter multi-byte characters into the sql worksheet to create a package, save it, and re-open the package with the multi-byte characters still visible.
I'm using a fallback directory for my jdk with the correct font installed, so I can see and edit multi-byte data in the data grids.
In this case, I noticed the problem on a column that only contains the standard ascii letters and digits.
Environment->Encoding = UTF-16
All the fonts are set to a font that properly displays western and ge'ez characters. The font has been in use for years, and is working correctly in all other circumstances.
The Database->NLS Parameters tab under sql developer preferences shows:
language: American
territory : American
sort: binary
comp: binary
length: char (I've also tried byte)
If there are other settings that you think might be relevant, please let me know.
I've done some more testing. I created an oracle table with a single column and did an insert into ... select from statement across the database link. The correct, full-length data appeared in the oracle table.
So, it's not a matter of whether the data is being returned or not, it is. It is simply not being displayed correctly. It appears that sql developer is making some unwarranted decisions about the datatable across the database link when it decides to display the data, because sql plus and sql navigator have no such issues.
This is really a very serious problem, because if I cannot trust the data the tool shows me, I cannot trust the tool.
It is also an invitation to make an error based upon the erroneous data display.

Problem to display japanese/multi-byte character on weblogic server 9.1

Hi experts
We are running weblogic 9.1 on linux box [REHL v4] and trying to display Japanese characters embedded in some of html files, but Japanese characters are converted into a question mark [?]. The html files that contain Japanese characters are stored properly in the file system and retained the Japanese characters as they should be.
I changed character setting in the html header to shift_jis, but no luck. Then I added the encoding scheme for shift_jis in jsp_description and charset-parameter section in weblogic.xml but also no luck.
I am wondering how I can properly display multi-byte characters/Japanese on weblogic server without setting up internationalization tools.
I will appreciate for your advice.
Thanks,
yasushi

This was fixed by removing everything except teh following files from the original ( 8.1 ) domain directory
1. config.xml
2. SerializedSystemIni.dat
3. *.ldift
4. applications directory
Is this a bug in the upgrade tool ? Or did I miss a part of the documentation ?
Thanks
--sony

CUSTOM Service - multi Byte character issue

Hi Experts,
I wrote a custom Service. What this service is doing, its id reading some data from Database and then generates CSV report. Code is working fine. But if we have multi - byte characters in data, then these characters are not properly shown in report. Given below is my service code :
byte bytes[] = CustomServiceHelper.getReport(this.m_binder,providerName);
                    DataStreamWrapper wrapper = new DataStreamWrapper();
                    wrapper.m_dataEncoding="UTF-8";
                    wrapper.m_dataType="application/vnd.ms-excel;charset=UTF-8";
                    wrapper.m_clientFileName="Report.csv";
                    wrapper.initWithInputStream(new ByteArrayInputStream(bytes), bytes.length);
                    this.m_service.getHttpImplementor().sendStreamResponse(m_binder, wrapper);
NOTE - This code is working fine on my local ucm (windows) for multi-byte characters. But When I install this service on our DEV and Staging servers (SOLARIS), then multi-byte characters issue occurs.
Thanks in Advance..!!
Edited by: user4884609 on May 17, 2011 4:12 PM

Please Help

How best to send double byte characters as http params

Hi all
I have a web app that accepts text that can be in many languages.
I build up a http string and send the text as parameters to another webserver. Hence, whatever text I receive i need to be able to represent on a http query string.
The parameters are sent as urlencoded UTF8. They are decoded by the second webserver back into unicode and saved to the db.
Occassionally i find a character that i am unable to convert to a utf8 string and send as a parameter (usually a SJIS character). When this occurs, the character is encoded as '3F' - a question mark.
What is the best way to send double byte characters as http parameters so they always are sent faithfully and not as question marks? Is my only option to use UTF16?
example code
<code>
public class UTF8Test {
public static void main(String args[]) {
encodeString("\u7740", "%E7%9D%80"); // encoded UTF8 string contains question mark (3F)
encodeString("\u65E5", "%E6%97%A5"); // this other japanese character converts fine
private static void encodeString(String unicode, String expectedResult) {
try {
String utf8 = new String(unicode.getBytes("UTF8"));
String utf16 = new String(unicode.getBytes("UTF16"));
String encoded = java.net.URLEncoder.encode(utf8);
String encoded2 = java.net.URLEncoder.encode(utf16);
System.out.println();
System.out.println("encoded string is:" + encoded);
System.out.println("expected encoding result was:" + expectedResult);
System.out.println();
System.out.println("encoded string16 is:" + encoded2);
System.out.println();
} catch (Exception e) {
e.printStackTrace();
</code>
Any help would be greatly appreciated. I have been struggling with this for quite some time and I can hear the deadline approaching all too quickly
Thanks
Matt

Hi Matt,
one last visit to the round trip issue:
in the Sun example, note that UTF8 encoding is used in the method that produces the byte array as well as in the method that creates the second string. This is equivalent to calling:
String roundTrip = new String(original.getBytes("UTF8"), "UTF8");//sun exampleWhereas, in your code you were calling:
String utf8 = new String(unicode.getBytes("UTF8"))//Matt's code
[/code attracted
The difference is crucial. When you call the string constructor without a second (encoding) argument, the default encoding (usually Cp1252) is used. Therefore your code is equivalent toString utf8 = new String(unicode.getBytes("UTF8"), "Cp1252")//Matt's code
i.e.you are encoding with one transformation format and decoding back with a different transformation format, so in general you won't get your original string back.
Regarding safely sending multi-byte characters across the Internet, I'm not completely sure what the situation is because I don't do it myself. (When our program is run as an applet, the only interaction it has with the web server is to download various files). I've seen lots of people on this forum describing problems sending multi-byte characters and I can't tell whether the problem is with the software or with the programming. Two possible methods come to mind (of course you need to find out what your third party software is doing):
1) use the DataOutput/InputStreams writeUTF/readUTF methods
2) use the InputStreamReader/OutputStreamWriter pair with UTF8 encoding
See this thread:
http://forum.java.sun.com/thread.jsp?forum=16&thread=168630
You should stick to UTF8. It is designed so that the bytes generated by encoding non-ASCII characters can be safely transmitted across the Internet. Bytes generated by UTF16 can be just about anything.
Here's what I suggest:
I am running a version of the Sun tutorial that has a program running on a server to which I can send a string and the program sends back the string reversed.
http://java.sun.com/docs/books/tutorial/networking/urls/readingWriting.html
I haven't tried sending multi-byte characters but I will do so and test whether there are any transmission problems. (Assuming that the Sun cgi program itself correctly handles characters).
More later,
regards,
Joe
P.S.
I thought one the reasons for the existence of UTF8 was to
represent things like multi-byte characters in an ascii format?Not exactly. UTF8 encodes ascii characters into single bytes with the same byte values as ASCII encoding. This means that a document consisting entirely of ASCII characters is the same whether it was encoded as UTF8 or ASCII and can consequently be read in any ASCII document reader (e.g.notepad).

Double Byte Characters

Which is better: store double byte characters (from other rdbms) as NUMBER or VARCHAR2?
thanks in advance to those who will reply

I think it's pretty standard to use varchar2 for this. I've never heard of anyone wanting to store double-byte character data as a NUMBER. What advantages could you imagine in such a scheme?
Justin

Reparse=yes for multi-byte charset

I try to use the "include-xsql" having "reparse=yes" but my multi-byte characters becomes "???". Can anyone give me some hints?
Thanks in advance!

Would you show you XSQL? Did you attache any stylesheet?

Handling Multi-byte/Unicode (Japanese) characters in Oracle Database

Hello,
How do I handle the Japanase characters with Oracle database?
I have a Java application which retrieves some values from the database; makes some changes to these [ex: change value of status column, add comments to Varchar2 column, etc] and then performs an UPDATE back to the database.
Everything works fine for the English. But NOT for Japanese language, which uses Multi-byte/Unicode characters. The Japanese characters are garbled after the performing the database UPDATE.
I verified that Java by default uses UTF16 encoding. So there shouldn't be any problem with Java/JDBC.
What do I need to change at #1- Oracle (Database) side or #2- at the OS (Linux) side?
/* I tried changing the NLS_LANG value from OS and NLS_SESSION_PARAMETERS settings in Database and tried 'test' insert from SQL*plus. But SQL*Plus converts all Japanese characters to a question mark (?). So could not test it via SQL*plus on my XP (English) edition.
Any help will be really appreciated.
Thanks

Hello Sergiusz,
Here are the values before & after Update:
--BEFORE update:
select tar_sid, DUMP(col_name, 1016) from table_name where tar_sid in ('6997593.880');
/* Output copied from SQL-Developer: */
6997593.88 Typ=1 Len=144 CharacterSet=UTF8: 54,45,53,54,5f,41,42,53,54,52,41,43,54,e3,81,ab,e3,81,a6,4f,52,41,2d,30,31,34,32,32,e7,99,ba,e7,94,9f,29,a,4d,65,74,61,6c,69,6e,6b,20,e3,81,a7,e7,a2,ba,e8,aa,8d,e3,81,84,e3,81,9f,e3,81,97,e3,81,be,e3,81,97,e3,81,9f,e3,81,8c,e3,80,81,52,31,30,2e,32,2e,30,2e,34,20,a,e3,81,a7,e3,81,af,e4,bf,ae,e6,ad,a3,e6,b8,88,e3,81,bf,e3,81,ae,e4,ba,8b,e4,be,8b,e3,81,97,e3,81,8b,e7,a2,ba,e8,aa,8d,e3,81,a7,e3,81,8d,e3,81,be,e3,81,9b,e3,82,93,2a
--AFTER Update:
select tar_sid, DUMP(col_name, 1016) from table_name where tar_sid in ('6997593.880');
/* Output copied from SQL-Developer: */
6997593.88 Typ=1 Len=144 CharacterSet=UTF8: 54,45,53,54,5f,41,42,53,54,52,41,43,54,e3,81,ab,e3,81,a6,4f,52,41,2d,30,31,34,32,32,e7,99,ba,e7,94,9f,29,a,4d,45,54,41,4c,49,4e,4b,20,e3,81,a7,e7,a2,ba,e8,aa,8d,e3,81,84,e3,81,9f,e3,81,97,e3,81,be,e3,81,97,e3,81,9f,e3,81,8c,e3,80,81,52,31,30,2e,32,2e,30,2e,34,20,a,e3,81,a7,e3,81,af,e4,bf,ae,e6,ad,a3,e6,b8,88,e3,81,bf,e3,81,ae,e4,ba,8b,e4,be,8b,e3,81,97,e3,81,8b,e7,a2,ba,e8,aa,8d,e3,81,a7,e3,81,8d,e3,81,be,e3,81,9b,e3,82,93,2a
So the values BEFORE & AFTER Update are the same!
The problem is that sometimes, the Japanese data in VARCHAR2 (abstract) column gets corrupted. What could be the problem here? Any clues?

How to store UTF-8 characters in an iso-8859-1 encoded oracle database?

How can we store UTF-8 characters in an iso-8859-1 encoded oracle database? We can NOT change the database encoding but need to store e.g. Polish or Russian characters besides other European languages.
Is there any stable sollution with good performance?
We use Oracle 8.1.6 with iso-8859-1 encoding, Bea WebLogic 7.0, JDK 1.3.1 and the following thin driver: "Oracle JDBC Driver version - 9.0.2.0.0".

There are a couple of unsupported options, but I wouldn't consider using them on a production database running other critical applications. I would also strongly discourage their use unless you understand in detail how Oracle National Language Support (NLS) works, otherwise you could end up with corrupt data or worse.
In a sense, you've been asked to do the impossible. The existing databas echaracter sets do not support encoding the data you've been asked to store.
Can you create a new database with an appropriate database character set and deploy your application there? That's probably the easiest solution.
If that isn't an option, and you really need to store data in this database, you could use one of the binary data types (RAW and BLOB), but that would mean that it would be exceptionally difficult for applications other than yours to extract the data. You would have to ensure that the data was always encoded in the same character set, otherwise you wouldn't be able to properly decode it later. This would also add a lot of complexity to your application, since you couldn't send or recieve string data from the database.
Unfortunately, I suspect you will have to choose from a list of bad options.
Justin
Distributed Database Consulting, Inc.
http://www.ddbcinc.com/askDDBC

Store Multi Byte Characters in WE8ISO8859P1 Database without Migration

Similar Messages

Maybe you are looking for