Where is the Multi-Byte Character.

Hello All
While reading data from DB, our middileware interface gave following error.
java.sql.SQLException: Fail to convert between UTF8 and UCS2: failUTF8Conv
I understand that this failure is because of a multi-byte character, where 10g driver will fix this bug.
I suggested the integration admin team to replace current 9i driver with 10g one and they are on it.
In addition to this, I wanted to suggest to the data input team on where exactly is the failure occured.
I have asked them and got the download of the dat file and my intention was to findout where exactly is
that multi-byte character located which caused this failure.
I wrote the following code to check this.
import java.io.*;
public class X
public static void main(String ar[])
int linenumber=1,columnnumber=1;
long totalcharacters=0;
try
File file = new File("inputfile.dat");
FileInputStream fin = new FileInputStream(file);
byte fileContent[] = new byte[(int)file.length()];
fin.read(fileContent);
for(int i=0;i<fileContent.length;i++)
   columnnumber++;totalcharacters++;
   if(fileContent<0 && fileContent[i]!=10 && fileContent[i]!=13 && fileContent[i]>300) // if invalid
{System.out.println("failure at position: "+i);break;}
if(fileContent[i]==10 || fileContent[i]==13) // if new line
{linenumber++;columnnumber=1;}
fin.close();
System.out.println("Finished successfully, total lines : "+linenumber+" total file size : "+totalcharacters);
catch (Exception e)
e.printStackTrace();
System.out.println("Exception at Line: "+linenumber+" columnnumber: " +columnnumber);
}But this shows that the file is good and no issue with this.
Where as the middleware interface fails with above exception while reading exactly the same input file.
Anywhere I am doing wrong to locate that multi-byte character ?
Greatly appreciate any help everyone !
Thanks.

My challenge is to spot the multi-byte character hidden in this big dat file.
This is because the data entry team asked me to spot out the record and column that has issue out of
lakhs of records they sent inside this file.
Lets have the validation code like this...
if( (fileContent<0 && fileContent[i]!=10 && fileContent[i]!=13) || fileContent[i]>300) // if invalid
{System.out.println("failure at position: "+i);break;}lessthan 0 - I saw some -ve values when I was testing with other files.
greaterthan 300 - was a try to find out if any characters exceeds actual chars. range.
if 10 and 13 are for line-feed.
with this, I randomly placed chinese, korean characters and program found them.
any alternative (better code ofcourse) way to catch this black sheep ?
Edited by: Sanath_K on Oct 23, 2009 8:06 PM

Similar Messages

Multi-byte character encoding issue in HTTP adapter

Hi Guys,
I am facing problem in the multi-byte character conversion.
Problem:
I am posting data from SAP CRM to third party system using XI as middle ware. I am using HTTP adapter to communicate XI to third party system.
I have given XML code as UT-8 in the XI payload manipulation block.
I am trying to post Chines characters from SAP CRM to third party system. junk characters are going to third party system. my assumption is it is double encoding.
Can you please guide me how to proceed further.
Please let me know if you need more info.
Regards,
Srini

Srinivas,
Can you go through the url:
UTF-8 encoding problem in HTTP adapter
---Satish

How to set Multi Byte Character Set ( MBCS ) to Particular String In MFC VC++

I Use Unicode Character Set in my MFC Application ( VC++) .
now i get the output ठ桔湡⁫潹⁵潦⁲獵 (like this )character and i want to convert this character in english language (means MBCS),
But i need Unicode to My Applicatiion. when i change the Multi-Byte Character set It give Correct output in English but other Objects ( TreeCtrl Selection ) will perform wrongly . so i need to convert the particular String to MBCS
how can i do that ? In MFC

I assume your string read from your hardware device is an plains "C" string (ANSI string). This type of string has one byte per character. Unicode has two bytes per character.
From the situation you explained I'd convert the string returned by the hardware to an Unicode string using i.e. MultibyteTowideChar with CP_ACP. You may also use mbstowcs or some similar functions to convert your string to an Unicode string.
Best regards
Bordon
Note: Posted code pieces may not have a good programming style and may not perfect. It is also possible that they do not work in all situations. Code pieces are only indended to explain something particualar.

Multi-byte character

If DATABASE CHARACTER SET is UTF-8 than
Than can i use VARCHAR2 to store multi-byte character or i still have to use
nvarchar2
also vachar2(1),nvarchar2(1) can store how much (max) bytes in case of UTF-8 CHARACTER SET

If you create VARCHAR2(1) then you possibly can not store anything as your first character might be multibyte.
My recommendation would be to consider defining by character rather than by byte.
CREATE TABLE tbyte (
testcol VARCHAR2(20));
CREATE TABLE tchar (
testcol VARCHAR2(20 CHAR));The second will always hold 20 characters without regard to the byte count.
Demos here:
http://www.morganslibrary.org/library.html

Problem to display japanese/multi-byte character on weblogic server 9.1

Hi experts
We are running weblogic 9.1 on linux box [REHL v4] and trying to display Japanese characters embedded in some of html files, but Japanese characters are converted into a question mark [?]. The html files that contain Japanese characters are stored properly in the file system and retained the Japanese characters as they should be.
I changed character setting in the html header to shift_jis, but no luck. Then I added the encoding scheme for shift_jis in jsp_description and charset-parameter section in weblogic.xml but also no luck.
I am wondering how I can properly display multi-byte characters/Japanese on weblogic server without setting up internationalization tools.
I will appreciate for your advice.
Thanks,
yasushi

This was fixed by removing everything except teh following files from the original ( 8.1 ) domain directory
1. config.xml
2. SerializedSystemIni.dat
3. *.ldift
4. applications directory
Is this a bug in the upgrade tool ? Or did I miss a part of the documentation ?
Thanks
--sony

How to deal with the two bytes character in 'IF_IXML_NODE'

we can create dom with Xstring
ostream = streamfactory->create_ostream_xstring( string = lv_xstring )
but if one value of the node is two bytes character.
if_ixml_node->get_value( ), only retrun string value, so the result is
wrong, it will display "##", how to fix it?
Thanks a lot for your help.

we can create dom with Xstring
ostream = streamfactory->create_ostream_xstring( string = lv_xstring )
but if one value of the node is two bytes character.
if_ixml_node->get_value( ), only retrun string value, so the result is
wrong, it will display "##", how to fix it?
Thanks a lot for your help.

Multi byte character set

Hi,
I am going to create a oracle 8i database in linux OS.In that i want to set english, italian and chinese language.
My question is
1. what are the parameter to be set in the OS level for this multi byte characterset
2.how to set these characterset in database level at the time of creation.
3.and also i am going to migrate one database to the new database. but the old one contains only english and italy. so while migration what are the parameter we have to set in the new database.
Kindly provide some solutions..
rgds..

1) I'm not sure what you're asking here.
2) While creating the database, you would want to set the NLS_CHARACTERSET to UTF8 (I don't believe AL32UTF8 was available in 8i).
3) How are you migrating the database? Via export & import? If so, you'd need to ensure that the NLS_LANG on the client machine(s) that do the actual export and import are set appropriately.
Justin

CUSTOM Service - multi Byte character issue

Hi Experts,
I wrote a custom Service. What this service is doing, its id reading some data from Database and then generates CSV report. Code is working fine. But if we have multi - byte characters in data, then these characters are not properly shown in report. Given below is my service code :
byte bytes[] = CustomServiceHelper.getReport(this.m_binder,providerName);
                    DataStreamWrapper wrapper = new DataStreamWrapper();
                    wrapper.m_dataEncoding="UTF-8";
                    wrapper.m_dataType="application/vnd.ms-excel;charset=UTF-8";
                    wrapper.m_clientFileName="Report.csv";
                    wrapper.initWithInputStream(new ByteArrayInputStream(bytes), bytes.length);
                    this.m_service.getHttpImplementor().sendStreamResponse(m_binder, wrapper);
NOTE - This code is working fine on my local ucm (windows) for multi-byte characters. But When I install this service on our DEV and Staging servers (SOLARIS), then multi-byte characters issue occurs.
Thanks in Advance..!!
Edited by: user4884609 on May 17, 2011 4:12 PM

Please Help

COREid - where is the Multi-Language on/off setting stored

Hi folks,
I need to find out where the on/off setting for Multi-Language support is stored, whether it is in the LDAP directory or in a file.
I know I can change it in the GUI, but I am attempting to fix an automation script so that is not a good option at this stage.
thanks,
kam

In LDAP, please get to obid=<language_pack>,o=Oblix,<DC> node and set "obenabled" attribute value to "true".

Converting from Single Byte to Multi Byte character set

Hello,
I'm trying to migrate one schema, including data, from a 10g (10.1.0.2.0) DB with IW8ISO8859P8 character set, to a 10g (10.2.0.1.0) DB with AL32UTF8 character set.
The original tables are using VARCHAR2 columns, including some VARCHAR2(1) columns.
I'm trying to use exp and imp for the task, but during import I'm receiving errors like:
IMP-00019: row rejected due to ORACLE error 12899
IMP-00003: ORACLE error 12899 encountered
ORA-12899: value too large for column "SHAMAUT"."TIKIM"."GAR_SET" (actual: 2, maximum: 1)These errors are not limited to the one-character columns only.
Is there a way to export/import the data with AL32UTF8 in mind, so the system will automatically convert the data properly?
Thanks for the help,
Arie.

It's not a true conversion problem that you have but more a space problem. Tables columns are created by default with the init. parameter NLS_LENGTH_SEMANTICS character semantics:
If NLS_LENGTH_SEMANTICS = BYTE
then 1 character = 1 byte whatever the db character set
If NLS_LENGTH_SEMANTICS = CHAR
then 1 character = 1 character size for the db character set.
If this parameter is changed it is only taken into account for newly created tables or columns: existing columns are not changed.
See http://download-uk.oracle.com/docs/cd/B10501_01/server.920/a96529/ch2.htm#104327
The only solution I see is to enlarge your VARCHAR2 columns before running the import...
Message was edited by:
Pierre Forstmann

Store Multi Byte Characters in WE8ISO8859P1 Database without Migration

Hi - I am looking for a solution where I can store the Multi Byte Character's under the WE8ISO8859P1 Database.
Below are the DB NLS_PARAMETERS
NLS_CHARACTERSET = WE8ISO8859P1
NLS_NCHAR_CHARACTERSET = AL32UTF8
NLS_LENGTH_SEMANTICS = BYTE
Size of DB = 2 TB.
DB Version = 11.2.0.4
Currently there is a need to store the Chinese Characters under NAME and ADDRESS Columns only. Below are the description of the columns.
Column Name DataType
GIVEN_NAME_ONE
VARCHAR2(120 BYTE)
GIVEN_NAME_TWO
VARCHAR2(120 BYTE)
LAST_NAME
VARCHAR2(120 BYTE)
ADDR_LINE_ONE
VARCHAR2(100 BYTE)
ADDR_LINE_TWO
VARCHAR2(100 BYTE)
ADDR_LINE_THREE
VARCHAR2(100 BYTE)
What are my option's over here without considering the Migration WE8ISO8859P1 DB to AL32UTF8 ?
1. Can I increase the size of the Column i.e make it n x 4. e.g NAME will be 480 Byte and ADDRESS will be 400 Byte.? What are pros and cons ?
2. Convert the existing Column from VARCHAR2 to NVARCHAR2 with the Same Size ? i.e NVARCHAR2(120 BYTE) ?
3. Add the extension to an table with new columns - NVARCHAR2. e.g NAME - NVARCHAR2(120 CHAR) and ADDRESS (100 - CHAR) ?
4. Database got Clobs,Blobs, Long etc. got Varied Data, Is it a good idea to Migrate to AL32UTF8 with Minimal Downtime ?
Please suggest the best alternatives. Thanks.
Thanks
Jitesh

Hi Jitesh,
NLS_NCHAR_CHARACTERSET can either be AL16UTF16 or UTF8. So mostly your DB would have UTF8.
You can definitely insert Unicode characters into N-type columns. Size of the N-type column will depend on the characters you plan to store in them.
If you use N-types, do make sure you use the (N'...') syntax when coding it so that Literals are denoted as being in the national character set by prepending letter 'N'.
Although you can use them, N-types are not very well supported in 3the party client/programming environments, you may need to adapt a lot of code to use N-types properly and there are some limitations.
While at first using N-types for a (few) columns seems like a good idea to avoid the conversion of a whole database , in many cases the end conclusion is that changing the NLS_CHARACTERSET is simply the easiest and fastest way to support more languages in an Oracle database.
So, It depends on how much of your data will be unicode which you would store in N-type characters.
If you do have access to My Oracle Support you can check Note 276914.1 :The National Character Set ( NLS_NCHAR_CHARACTERSET ) in Oracle 9i, 10g , 11g and 12c, For more details.
With respect to your Downtime, The actual conversion (CSALTER or in case using DMU) shouldn't take too much time, if you have run CSSCAN on your DB and made sure you have taken care of all your truncation, convertible and lossy data (if any).
It would be best for you to run CSSCAN initially to gauge how much convertible/lossy/truncation data you need to take care.
$ CSSCAN FROMCHAR=WE8ISO8859P1 TOCHAR=AL32UTF8 LOG=P1TOAl32UTF8 ARRAY=1000000 PROCESS=2 CAPTURE=Y FULL=Y
Regards,
Suntrupth

DEFECT: (Serious!) Truncates display of data in multi-byte environment

I have an oracle 10g database set up with the following nls parameters:
NLS_CALENDAR      GREGORIAN
NLS_CHARACTERSET      AL32UTF8
NLS_COMP      LINGUISTIC
NLS_CURRENCY      $
NLS_DATE_FORMAT      DD-MON-YYYY
NLS_DATE_LANGUAGE      AMERICAN
NLS_DUAL_CURRENCY      $
NLS_ISO_CURRENCY      AMERICA
NLS_LANGUAGE      AMERICAN
NLS_LENGTH_SEMANTICS      CHAR
NLS_NCHAR_CHARACTERSET      UTF8
NLS_NCHAR_CONV_EXCP      TRUE
NLS_NUMERIC_CHARACTERS      .,
NLS_RDBMS_VERSION      10.2.0.3.0
NLS_SORT BINARY
NLS_TERRITORY      AMERICA
NLS_TIMESTAMP_FORMAT      DD-MON-RR HH.MI.SSXFF AM
NLS_TIMESTAMP_TZ_FORMAT      DD-MON-RR HH.MI.SSXFF AM TZR
NLS_TIME_FORMAT      HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT      HH.MI.SSXFF AM TZR
I am querying a view in sqlserver 2000 via an odbc database link.
When I query a 26 character wide column in the view in sql developer, it will only return up to 13 characters of the data.
When I query the exact same view in the exact same sql server database from the extact same oracle database using the exact same odbc database link using sql navigator, I get the full 26 characters worth of data.
It also works just fine from the sql command line tool from 10g express.
Apparently, sql developer is confused about how to handle multi-byte data. If you ask it the length of the data in the column, it will tell you 26, but it will only show you 13.
I have found a VERY PAINFUL work around, to do a cast(column_name as varchar2(26) when I query it. But I've got hundreds of views and queries...

In all other respects, the settings I have appear to be working correctly.
I can enter multi-byte characters into the sql worksheet to create a package, save it, and re-open the package with the multi-byte characters still visible.
I'm using a fallback directory for my jdk with the correct font installed, so I can see and edit multi-byte data in the data grids.
In this case, I noticed the problem on a column that only contains the standard ascii letters and digits.
Environment->Encoding = UTF-16
All the fonts are set to a font that properly displays western and ge'ez characters. The font has been in use for years, and is working correctly in all other circumstances.
The Database->NLS Parameters tab under sql developer preferences shows:
language: American
territory : American
sort: binary
comp: binary
length: char (I've also tried byte)
If there are other settings that you think might be relevant, please let me know.
I've done some more testing. I created an oracle table with a single column and did an insert into ... select from statement across the database link. The correct, full-length data appeared in the oracle table.
So, it's not a matter of whether the data is being returned or not, it is. It is simply not being displayed correctly. It appears that sql developer is making some unwarranted decisions about the datatable across the database link when it decides to display the data, because sql plus and sql navigator have no such issues.
This is really a very serious problem, because if I cannot trust the data the tool shows me, I cannot trust the tool.
It is also an invitation to make an error based upon the erroneous data display.

Unable to generate single byte character when used TO_SINGLE_BYTE

Hi All,
Can anyone help me in getting the output for the below single byte query. When tried it says INVALID NUMBER.
Step 1 :-
select RAWTOHEX('2Z') from DUAL; -- 325A
Step 2:-
SELECT TO_SINGLE_BYTE(CHR('325A')) FROM DUAL;
The above query when executed it says "ORA-01722: invalid number".
I tried using VARCHAR2 instead of CHR it throuws the below exception,
"ORA-00936: missing expression".
But the same query if no characters are passed is working fine.
SELECT TO_SINGLE_BYTE(CHR('3251')) FROM DUAL;
Thanks,
Ravi

TO_SINGLE_BYTE is used to convert multi-byte characters to single-byte characters. '325A' is not a multi-byte character so can't be converted.
Use HEXTORAW to convert the hex value back to a raw value.

Single and multi byte settings

Hello,
We are trying to implement multibyte char loading and I have a few questions:
1) Our current char coding is in UTF-8. What char coding should we use for multi byte loading?
2) In DDL, the column can be declared as a BYTE or CHAR, such as varchar2(20 CHAR). For multi byte, we can either change the size of the column or change from BYTE to CHAR for column definition. Which is a better way of implementation?
3) Any other setting changes we need to be aware of from single to multi bye implementation?
Regards

First off, I'm a bit confused. If your database's character set us UTF-8, you already have a multi-byte character set. I'm not sure what it is that you're converting in this case.
As to changing the table definition-- that depends primarily on your application(s). Generally, I find it easier to declare a field with character length semantics, which gives users in every language certainty about the number of characters a field can support. There are probably people that think the other way because they're allocating memory in a client application based on bytes and want to ensure that the definitions on the client and the server match.
Since I don't quite understand what it is that you're converting, I'm hard pressed to come up with what "other setting changes" might be appropriate.
Justin

Oracle Multi-Bytes vs Single-Byte

Hi,
We have to add japanese to our application, i had succesfully add japanese data in our single-byte database,
so why should we use a Multi-byte DB?
what is the gain to use a Multi byte DB vs a Single Byte?
does intermedia work with japanese in Single Bytes?
Is utf8 the best way to have an international DB?
We will have to add a lot of other char-set in the future.
Thanks

so why should we use a Multi-byte DB?
what is the gain to use a Multi byte DB vs a Single Byte? What you are doing is storing invalid multibyte characters into a single byte database. So each double byte Japanese characters are being treated as 2 separate single byte characters. You are using an unsupported but common garbage in garbage out approach, so in that sense you are using Oracle as a garbage container. :)
Let's look at some of the issues that you are going to have :-
All SQL Functions are based on the property of the single byte database character set WE8ISO8859P1. So LENGTH(), SUBSTR (), INSTR (), UPPER(), NLS_UPPER etc .. will yield incorrect results . For example a column with one Japanese character and one ASCII character will return a length of 3 characters rather than 2 characters. And if you want to locate a specific character in a mix ASCII and Japanese string using the SUBSTR() it will be very difficult, because to Oracle the string consists of all single byte characters, it will not skip 2 bytes for a Japanese character. Even if you don't have mix strings, you will need to write one routine for handling ASCII only and another for Japanese strings.
Invalid Data conversion , if your need to talk to another db using dblink say ,all the character conversion will be based on the single byte character set to the target database character set mapping, so the receiver will lose all the source Japanses characters and will get 2 single byte characters for each Japanese char instead .
Export and Import will have identical problems, character set conversion are performed during these operations, so all Japanese characters will be lost. This also means that you can not load correctly encoded Japanese data into your current single byte DB using IMPORT or SQLLOADER without data corruption ...
does intermedia work with japanese in Single Bytes?No
Is utf8 the best way to have an international DB?Yes
null

Where is the Multi-Byte Character.

Similar Messages

Maybe you are looking for