Converting DB characterset to UTF8

I'm looking at: metalink article changing WE8... to UTF8


Is this correct at a high-level: 
1. csscan from=WE8... to=WE8… (check current) 
2. cssscan from=WE8... to=UTF8 (check changes) 
Based on results, options are: 
A. update rows to have less data (not an option) 
B. update table,larger columns or different CHAR/BYTE semantics 
C. pre-create table, larger columns or different CHAR/BYTE semantics 
3. run csalter.plb (converts data in current) 
4. change db to UTF8 
5. export current 
6. import into UTF8 db 
7. release to prod

What's your Oracle version?
You only need to do export/import if you have Convertible Data or data subject to truncation.
Don't forget to backup your database before proceed.

Similar Messages

Convert from utf16 to utf8 ?? er?

Dear list,
I have recently seen a sample to convert a utf16 string to utf8. I am a little bit confused. I thought utf16 was a superset of utf8. Could please someone explain why this is necessary sometimes ?
regards
Ben

how can utf16 be a superset of utf8. I thought this
relationship was similiar to ASCII and utf8/utf16,
where for example the space bar has a value of 32 in
ASCII and Unicode (utf8 and utf16).... This been tjhe
case there is not much need for a utf8 to utf16
conversion program.I didn't say it was a superset. It is a different way of representing the same thing.
>
You say that utf16 is ALWAYS 2 bytes, and utf8 is
usually 8 bits but is variable when necessary. Is
utf16 not a variable byte character set ? No.
The name
according to this, utf8 and utf16 is somewhat
misleading as they are NOT always 8 or 16 bytes.
And "java" is neither an island nor a beverage. The name does not convey the entirety of the subject.
characters the first byte (or 2) is an 'escape' bytewhich means that more bytes are needed.
What do you mean by first or (2). escape byte?
When something sees a given specific byte then then it knows that there are a certain number of bytes after that are needed to fully represent the character.
I am still not convincedConvinced?
If you do not find my explaination satisfactory then you might try writing some code that converts to UTF16 and UTF8 using String.getBytes(String).
You might also try to find the character set definitions.

Problem in Database convertion from US7ASCII to UTF8

Hi,
We are facing the following problem while converting the database from US7ASCII to UTF8:
We have recently changed the database character set from US7ASCII to UTF8 for the internationalization
purpose. We ran the Character set scanner utility and it did report that some data may pose problems.
We followed the the below mentioned process to convert into UTF8 -
1) alter database character set utf8
2) alter database national character set utf8.
Now we find some problem while working with the old data in our application which is java based.
We are getting the following error "java.sql.SQLException: Fail to convert between UTF8 and UCS2: failUTF8Conv".
We further analyzed our data and found some interesting things :
e.g.
DB - UTF8.
NL_LANG also set to UTF8.
Select name from t1 where name like 'Gen%';
NAME
Genhve
But when we find out the length of the same data it show like this
NAME LENGTH(NAME) VSIZE(NAME)
Genhve 4 6
The question is why is it showing length as 4 only and when we try to use a substr function
its extracting like the following :-
select name,substr(name,4,1) from t1 where name like 'Gen%';
NAME SUB
Genhve hve
We have execute the above queries on US7ASCII DB and it is working fine, length it shows 6
and using SUBSTR it extracts just 'h' as well.
We also used dump function on the UTF8 Db for the above query,,this is the result :-
select name,length(name),vsize(name),dump(name) from t1 where name like 'Gen%';
NAME LENGTH(NAME) VSIZE(NAME) DUMP(NAME)
Genhve 4 6 Typ=1 Len=6: 71,101,110,232,118,101
We checked a lot with the data and it seems 'h' (accented e) is posing the problem.
We want to know where is the problem and how to overcome this.
Further, we tried all of the following :
1)
Export Server: US7ASCII
Export Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well
Import Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well
Import Server: UTF8
RESULT: Acute e became h
2)
Export Server: US7ASCII
Export Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well
Import Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8
Import Server: UTF8
RESULT: IMP 00016 error
3)
Export Server: US7ASCII
Export Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8
Import Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well
Import Server: UTF8
RESULT: Acute E became h
4)
Export Server: US7ASCII
Export Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8
Import Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8
Import Server: UTF8
RESULT: Acute e became h
5)
Tried using Update sys.props$
set value$='UTF8'
where name='NLS_CHARACTERSET'
RESULT: Acute e shows properly but it gives problem in the application
"java.sql.SQLException: Fail to convert between UTF8 and UCS2: failUTF8Conv"
Looking further it was observed the following:
when you try this command on a column 'city' in a table which contains 'Genhva' (note the acute e after n), it shows
command: select length(city), vsize(city),substr(city,4,1),city from cities
Result: 4 6 hva Genhva
if you see the value of substr(city,4,1) , you will see the problem. Also note that the length shows 4 and size shows 6. Moreover, when these records are selected through JDBC Driver, it starts giving problems.
6)
Actually the above (point no. 5) is similar to changing the character set of the database with 'ALTER DATABASE CHARACTER SET UTF8'. Same problem is observed then too.
7)
We have also tried to with another method, that is by changing the third byte of the export file which specifies the character set, to the UTF8 code by editing the export file with a Hexdecimal editor. After import the same problem has been observed as defined in (5) and (6) above.
We have no more ideas how to migrate without corrupting the data. Of course we have known the records where these characters occur through the Oracle's cssacn utility but we do not want to manually rectify each and every record by replacing it with an ASCII character. Any other idea as to how this can be accomplised?
Thanx
Ashok

The problem you have is that although your original database is defined as US7ASCII, the data it contains is more than is covered by this code page (as the reply on Sept 4 to the previous posting already said).
This has probably happened because the client was also defined as US7ASCII, and when the DB and client are defined as having the same character set no conversion (or checdking) takes place when data is passed between them. However if you are using a Windows client then it will in fact be using Windows code page 1252 (Latin-1) or similar, and this allows many more characters, including h (accented e). So a user can enter all these characters and store them in the database, and similarly read them from the database, because data transfer is transparent.
When you did ALTER DATABASE CHARACTER SET UTF8 this will only change the label on the database, but not affect the contents. However only part of the contents are valid UTF8, any character above 7F (like h) is invalid. If your original client now uses the database, code page transformation will take place because the client and DB have different character sets defined. The invalid codes can then cause problems.
Without being able to explain what has happened in detail, it may help to see what your h (dec 232, x'E8') looks like. The actual data has not changed (you can see this as it is reported as 232). However the binary code there (11101000) is invalid UTF8. UTF8 encodes a character in 1 to 4 bytes, and the first bits in a UTF8 character tell how many bytes it uses. 0xxx tell it is one byte (same as the corresponding USASCII character), 110x that it uses 2 bytes, 1110 that it uses 3 bytes etc. So if you interpret what is there as UTF8 it looks like the first byte of a 3-byte character, which explains why the substringing is giving you the other 2 bytes as well.
Can you fix this without losing data? I believe yes. First you should check what other characters are being flagged by the scan. See if these are contained in another standard character set. If they are only Western European accentet characters then WE8ISO8859P1 is probably ok, but watch out for the euro sign which Windows has at x'80', an undefined character in ISO8859-1.
You can see the contents of the Microsoft Windows Codepage 1252 at: http://www.microsoft.com/globaldev/reference/sbcs/1252.htm
For a listing of the US-ASCII defined characters see http://czyborra.com/charsets/iso646.html and for ISO 8859-1 see http://czyborra.com/charsets/iso8859.html#ISO-8859-1
If all is well, you can first ALTER DATABASE CHARACTER SET to WE8ISO8859P1. This will not change any data, but ensure that all the data gets exported. Then export the DB and import it to a UTF8 DB. This will convert the non-US-ASCII characters to Unicode. You will also have to change the clients character set to something other than USASCII or they will just see ? for the other characters.
Good Luck!

Convertion from Unicode to UTF8

I want to convert some string having Unicode chars into a string with UTF8 char. I used following code snippet:
try {
String str = new String(givenString);
String utfStr = new String(str.getBytes("UTF-8"), "UTF-8");
System.out.println("Converted:" + str + " to:" + utfStr);
} catch (Exception e) {
e.printStackTrace(System.out);
I also tried :
Charset utf8Charset = Charset.forName("UTF-8");
CharsetEncoder encoder = utf8Charset.newEncoder();
CharsetDecoder decoder = utf8Charset.newDecoder();
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(givenString));
CharBuffer cbuf = decoder.decode(bbuf);
String dest = cbuf.toString();
When Java tries to encode Unicode to UTF-8 and it runs into an unknown character (typically a character that is in the High Ascii range) it substitutes it with '?' or some other wierd character.
How do I prevent this.

Where is this string coming from? Are you initializing it in your source code as a String literal? String str = "A�roport Princesse B�atrix"; If so, you need to make sure the .java file is saved in an encoding that can handle all of the characters. ISO-8859-1, windows-1252, and of course, UTF-8 will all suffice. You also need to make sure the compiler reads the source file with the correct encoding. For example, if you saved your source files as UTF-8, you would do this: javac -encoding UTF-8 *.java Finally, before you print the text to the console, you need to make sure the console is using an encoding that can handle it. On my WinXP box, the default encoding (or codepage, as they call it) for console windows is cp437, which doesn't support accented characters. You can change it with the "chcp" command, like so: chcp 1252 Unfortunately, chcp won't accept UTF-8 or any other Unicode encoding, but cp1252 can handle the accented characters in your string. Note that you don't need to specify that encoding in your code; the Java runtime detects it automatically.
>
If you see question marks or some other placeholder character when viewing output, that's probably because the terminal or whatever doesn't have the fonts available to render those characters.>
No, question marks always indicate an encoding problem. If the character is valid but the font lacks a glyph for it, it shows up as a little rectangle.

Convertion from ASCII to UTF8 on Oracle 8.1.7 via PLSQL

I need to extract a string from a ascii db, put it into a variabile in a plsql procedure, then with a 'magic box' convert it into utf8 and put it into a new utf8 database.
I need the magic box, does exist a tool, or package or procedure or..... that works like that?
Thanks in advance!

I suggest to post this message on the genaral RDBMS or PL/SQL forums
Kuassi

Convertion from ASCII to UTF8

How do we convert the Extended ASCII character to UTF8 without using the ALTER DATABASE CHARACTER SET command

Is [url http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14200/functions027.htm#i77037]convert function ?
SQL> select convert('a','utf8','us7ascii') from dual;
C
a

Encoding problem with convert and CLOB involving UTF8 and EBCDIC

Hi,
I have a task that requires me to call a procedure with a CLOB argument containing a string encoded in EBCDIC. This did not go well so I started narrowing down the problem. Here is some SQL to illustrate it:
SQL> select * from v$version;
BANNER
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for Solaris: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> select value from v$nls_parameters where parameter = 'NLS_CHARACTERSET';
VALUE
AL32UTF8
SQL> select convert(convert('abc', 'WE8EBCDIC500'), 'AL32UTF8', 'WE8EBCDIC500')
output from dual;
OUT
abc
SQL> select convert(to_char(to_clob(convert('abc', 'WE8EBCDIC500'))), 'AL32UTF8', 'WE8EBCDIC500') output from dual;
OUTPUT
╒╫¿╒╫¿╒╫¿
So converting to and from EBCDIC works fine when using varchar2, but (if I am reading this right) fails when involving CLOB conversion.
My question then is: Can anyone demonstrate how to put correct EBCDIC into a CLOB and maybe even explain why the examples do what they do.

in order to successfully work with xmldb it is recommended that you use 9.2.0.4
and above. Its seems to have lower version.
Okay now related to the problem , if your data that you want to send to the attributes are not greater than 32767, then you can use the pl/sql varchar2 datatype to hold the data rather then CLOB and overcome this problem.
here is the sample. use function with below pl/sql to return the desired output.
SQL> declare
2 l_clob CLOB := 'Hello';
3 l_output CLOB;
4 begin
5 select xmlelement("test", xmlattributes(l_clob AS "a")).getclobval()
6 into l_output from dual;
7 end;
8 /
select xmlelement("test", xmlattributes(l_clob AS "a")).getclobval()
ERROR at line 5:
ORA-06550: line 5, column 44:
PL/SQL: ORA-00932: inconsistent datatypes: expected - got CLOB
ORA-06550: line 5, column 3:
PL/SQL: SQL Statement ignored
SQL> declare
2 l_vchar varchar2(32767) := 'Hello';
3 l_output CLOB;
4 begin
5 select xmlelement("test", xmlattributes(l_vchar AS "a")).getclobval()
6 into l_output from dual;
7 dbms_output.put_line(l_output);
8 end;
9 /
<test a="Hello"></test>
PL/SQL procedure successfully completed.

Convert characterset WE8MSWIN1252 to UTF8

Hi all
I am using Oracle 10g Database. Now the Characterset as WE8MSWIN1252. I want to change my CharacterSet to UTF8. It is possible.
Can anyone please post me the steps involved.
Very Urgent !!!!!!!
Regds
Nirmal

Subject: Changing WE8ISO8859P1/ WE8ISO8859P15 or WE8MSWIN1252 to (AL32)UTF8
Doc ID: Note:260192.1 Type: BULLETIN
Last Revision Date: 24-JUL-2007 Status: PUBLISHED
Changing the database character set to (AL32)UTF8
=================================================
When changing a Oracle Applications Database:
Please see the following note for Oracle Applications database
Note 124721.1 Migrating an Applications Installation to a New Character Set
If you have any doubt log an Oracle Applications TAR for assistance.
It might be usefull to read this note, even when using Oracle Applications
seen it explains what to do with "lossy" and "truncation" in the csscan output.
Scope:
You can't simply use "ALTER DATABASE CHARACTER SET" to go from WE8ISO8859P1 or
WE8ISO8859P15 or WE8MSWIN1252 to (AL32)UTF8 because (AL32)UTF8 is not a
binary superset of any of these character sets.
You will run into ORA-12712 or ORA-12710 because the code points for the
"extended ASCII" characters are different between these 3 character sets
and (AL32)UTF8.
This note will describe a method of still using a
"ALTER DATABASE CHARACTER SET" in a limited way.
Note that we strongly recommend to use the SAME flow when doing a full
export / import.
The choise between using FULL exp/imp and a PARTIAL exp/imp is made in point
7)
DO NOT USE THIS NOTE WITH ANY OTHER CHARACTERSETS
WITHOUT CHECKING THIS WITH ORACLE SUPPORT
THIS NOTE IS SPECIFIC TO CHANGING:
FROM: WE8ISO8859P1, WE8ISO8859P15 or WE8MSWIN1252
TO: AL32UTF8 or UTF8
AL32UTF8 and UTF8 are both Unicode character sets in the oracle database.
UTF8 encodes Unicode version 3.0 and will remain like that.
AL32UTF8 is kept up to date with the Unicode standard and encodes the Unicode
standards 3.0 (in database 9.0), 3.1 (database 9.2) or 3.2 (database 10g).
For the purposes of this note we shall only use AL32UTF8 from here on forward,
you can substitute that for UTF8 without any modifications.
If you use 8i or lower clients please have a look at
Note 237593.1 Problems connecting to AL32UTF8 databases from older versions (8i and lower)
WE8ISO8859P1, WE8ISO8859P15 or WE8MSWIN1252 are the 3 main character sets that
are used to store Western European or English/American data in.
All standard ASCII characters that are used for English/American do not have to
be converted into AL32UTF8 - they are the same in AL32UTF8. However, all other
characters, like accented characters, the Euro sign, MS "smart quotes", etc.
etc., have a different code point in AL32UTF8.
That means that if you make extensive use of these types of characters the
preferred way of changing to AL32UTF8 would be to export the entire database and
import the data into a new AL32UTF8 database.
However, if you mainly use standard ASCII characters and not a lot else (for
example if you only store English text, maybe with some Euro signs or smart
quotes here and there), then it could be a lot quicker to proceed with this
method.
Please DO read in any case before going to UTF8 this note:
Note 119119.1 AL32UTF8 / UTF8 (unicode) Database Character Set Implications
and consider to use CHAR semantics if on 9i or higher:
Note 144808.1 Examples and limits of BYTE and CHAR semantics usage
It's best to change the tables and so to CHAR semantics before the change
to UTF8.
This procedure is valid for Oracle 8i, 9i and 10g.
Note:
* If you are on 9i please make sure you are at least on Patch 9204, see
Note 250802.1 Changing character set takes a very long time and uses lots of rollback space
* if you have any function-based indexes on columns using CHAR length semantics
then these have to be removed and re-created after the character set has
been changed. Failure to do so will result in ORA-604 / ORA-2262 /ORA-904
when the "alter database character set" statement is used in step 4.
Actions to take:
1) install the csscan tool.
1A)For 10g use the csscan 2.x found in /bin, no need to install a newer version
Goto 1C)
1B)For 9.2 and lower:
Please DO install the version 1.2 or higher from TechNet for you version.
http://technet.oracle.com/software/tech/globalization/content.html
and install this.
copy all scripts and executables found in the zip file you downloaded
to your oracle_home overwriting the old versions.
goto 1C).
Note: do NOT use the CSSCAN of a 10g installation for 9i/8i!
1C)Run csminst.sql using these commands and SQL statements:
cd $ORACLE_HOME/rdbms/admin
set oracle_sid=<your SID>
sqlplus "sys as sysdba"
SQL>set TERMOUT ON
SQL>set ECHO ON
SQL>spool csminst.log
SQL> START csminst.sql
Check the csminst.log for errors.
If you get when running CSSCAN the error
"Character set migrate utility schema not compatible."
then
1ca) or you are starting the old executable, please do overwrite all old files with the files
from the newer version from technet (1.2 has more files than some older versions, that's normal).
1cb) or check your PATH , you are not starting csscan from this ORACLE_HOME
1cc) or you have not runned the csminst.sql from the newer version from technet
More info is in Note 123670.1 Use Scanner Utility before Altering the Database Character Set
Please, make sure you use/install csscan version 1.2 .
2) Check if you have no invalid code points in the current character set:
Run csscan with the following syntax:
csscan FULL=Y FROMCHAR=<existing database character set> TOCHAR=<existing database character set> LOG=WE8check CAPTURE=Y ARRAY=1000000 PROCESS=2
Always run CSSCAN with 'sys as sysdba'
This will create 3 files :
WE8check.out a log of the output of csscan
WE8check.txt a Database Scan Summary Report
WE8check.err contains the rowid's of the rows reported in WE8check.txt
At this moment we are just checking that all data is stored correctly in the
current character set. Because you've entered the TO and FROM character sets as
the same you will not have any "Convertible" or "Truncation" data.
If all the data in the database is stored correctly at the moment then there
should only be "Changeless" data.
If there is any "Lossy" data then those rows contain code points that are not
currently stored correctly and they should be cleared up before you can continue
with the steps in this note. Please see the following note for clearing up any
"Lossy" data:
Note 225938.1 Database Character Set Healthcheck
Only if ALL data in WE8check.txt is reported as "Changeless" it is safe to
proceed to point 3)
NOTE:
if you have a WE8ISO8859P1 database and lossy then changing your WE8ISO8859P1 to
WE8MSWIN1252 will most likly solve you lossy.
Why ? this is explained in
Note 252352.1 Euro Symbol Turns up as Upside-Down Questionmark
Do first a
csscan FULL=Y FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252 LOG=1252check CAPTURE=Y ARRAY=1000000 PROCESS=2
Always run CSSCAN with 'sys as sysdba'
For 9i, 8i:
Only if ALL data in 1252check.txt is reported as "Changeless" it is safe to
proceed to the next point. If not, log a tar and provide the 3 generated files.
Shutdown the listener and any application that connects locally to the database.
There should be only ONE connection the database during the WHOLE time and that's
the sqlplus session where you do the change.
2.1. Make sure the parallel_server parameter in INIT.ORA is set to false or it is not set at all.
If you are using RAC see
Note 221646.1 Changing the Character Set for a RAC Database Fails with an ORA-12720 Error
2.2. Execute the following commands in sqlplus connected as "/ AS SYSDBA":
SPOOL Nswitch.log
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER SYSTEM ENABLE RESTRICTED SESSION;
ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
ALTER SYSTEM SET AQ_TM_PROCESSES=0;
ALTER DATABASE OPEN;
ALTER DATABASE CHARACTER SET WE8MSWIN1252;
SHUTDOWN IMMEDIATE;
STARTUP RESTRICT;
SHUTDOWN;
The extra restart/shutdown is necessary in Oracle8(i) because of a SGA
initialization bug which is fixed in Oracle9i.
-- a alter database takes typically only a few minutes or less,
-- it depends on the number of columns in the database, not the amount of data
2.3. Restore the parallel_server parameter in INIT.ORA, if necessary.
2.4. STARTUP;
now go to point 3) of this note of course your database is then WE8MSWIN1252, so
you need to replace <existing database character set> with WE8MSWIN1252 from now on.
For 10g and up:
When using CSSCAN 2.x (10g database) you should see in 1252check.txt this:
All character type data in the data dictionary remain the same in the new character set
All character type application data remain the same in the new character set
and
The data dictionary can be safely migrated using the CSALTER script
IF you see this then you need first to go to WE8MSWIN1252
If not, log a tar and provide all 3 generated files.
Shutdown the listener and any application that connects locally to the database.
There should be only ONE connection the database during the WHOLE time and that's
the sqlplus session where you do the change.
Then you do in sqlplus connected as "/ AS SYSDBA":
-- check if you are using spfile
sho parameter pfile
-- if this "spfile" then you are using spfile
-- in that case note the
sho parameter job_queue_processes
sho parameter aq_tm_processes
-- (this is Bug 6005344 fixed in 11g )
-- then do
shutdown immediate
startup restrict
SPOOL Nswitch.log
@@?\rdbms\admin\csalter.plb
-- Csalter will aks confirmation - do not copy paste the whole actions on one time
-- sample Csalter output:
-- 3 rows created.
-- This script will update the content of the Oracle Data Dictionary.
-- Please ensure you have a full backup before initiating this procedure.
-- Would you like to proceed (Y/N)?y
-- old 6: if (UPPER('&conf') <> 'Y') then
-- New 6: if (UPPER('y') <> 'Y') then
-- Checking data validility...
-- begin converting system objects
-- PL/SQL procedure successfully completed.
-- Alter the database character set...
-- CSALTER operation completed, please restart database
-- PL/SQL procedure successfully completed.
-- Procedure dropped.
-- if you are using spfile then you need to also
-- ALTER SYSTEM SET job_queue_processes=<original value> SCOPE=BOTH;
-- ALTER SYSTEM SET aq_tm_processes=<original value> SCOPE=BOTH;
shutdown
startup
and the 10g database will be WE8MSWIN1252
now go to point 3) of this note of course your database is then WE8MSWIN1252, so
you need to replace <existing database character set> with WE8MSWIN1252 from now on.
3) Check which rows contain data for which the code point will change
Run csscan with the following syntax:
csscan FULL=Y FROMCHAR=<your database character set> TOCHAR=AL32UTF8 LOG=WE8TOUTF8 CAPTURE=Y ARRAY=1000000 PROCESS=2
Always run CSSCAN with 'sys as sysdba'
This will create 3 files :
WE8TOUTF8.out a log of the output of csscan
WE8TOUTF8.txt a Database Scan Summary Report
WE8TOUTF8.err a contains the rowid's of the rows reported in WE8check.txt
+ You should have NO entries under Lossy, because they should have been filtered
out in step 2), if you have data under Lossy then please redo step 2).
+ If you have any entries under Truncation then go to step 4)
+ If you only have entries for Convertible (and Changeless) then solve those in
step 5).
+ If you have NO entry's under the Convertible, Truncation or Lossy,
and all data is reported as "Changeless" then proceed to step 6).
4) If you have Truncation entries.
Whichever way you migrate from WE8(...) to AL32UTF8, you will always have to
solve the entries under Truncation.
Standard ASCII characters require 1 byte of storage space under in WE8(...) and
in AL32UTF8, however, other characters (like accented characters and the Euro
sign) require only 1 byte of storage space in WE8(...), but they require 2 or
more bytes of space in AL32UTF8.
That means that the total amount of space needed to store a string can exceed
the defined column size.
For more information about this see:
Note 119119.1 AL32UTF8 / UTF8 (unicode) Database Character Set Implications
and
"Truncation" data is always also "Convertible" data, which means that whatever
else you do, these rows have to be exported before the character set is changed
and re-imported after the character set has changed. If you proceed with that
without dealing with the truncation issue then the import will fail on these
columns because the size of the data exceeds the maximum size of the column.
So these truncation issues will always require some work, there are a number of
ways to deal with them:
A) Update these rows in the source database so that they contain less data
B) Update the table definition in the source database so that it can contain
longer data. You can do this by either making the column larger, or by using
CHAR length semantics instead of BYTE length semantics (only possible in
Oracle9i).
C) Pre-create the table before the import so that it can contain 'longer' data.
Again you have a choice between simply making it larger, or switching from BYTE
to CHAR length semantics.
If you've chosen option A or B then please rerun csscan to make sure there is no
Truncation data left. If that also means there is no Convertible data left then
proceed to step 6), otherwise proceed to step 5).
To know how much the data expands simply check the csscan output.
you can find that in the .err file as "Max Post Conversion Data Size"
For example, check in the .txt file wich table has "Truncation",
let's assume you have there a row that say's
-- snip from WE8TOUTF8.txt
[Distribution of Convertible, Truncated and Lossy Data by Table]
USER.TABLE Convertible Truncation Lossy
SCOTT.TESTUTF8 69 6 0
-- snip from WE8TOUTF8.txt
then look in the .err file for "TESTUTF8" until the
"Max Post Conversion Data Size" is bigger then the column size for that table.
User : SCOTT
Table : TESTUTF8
Column: ITEM_NAME
Type : VARCHAR2(80)
Number of Exceptions : 6
Max Post Conversion Data Size: 81
-> the max size after going to UT8 will be 81 bytes for this column.
5) If you have Convertible entries.
This is where you have to make a choice whether or not you want to continue
on this path or if it's simpler to do a complete export/import in the
traditional way of changing character sets.
All the data that is marked as Convertible needs to be exported and then
re-imported after the character set has changed.
6) check if you have functional indexes on CHAR based columns and purge the RECYCLEBIN.
select OWNER, INDEX_NAME , INDEX_TYPE, TABLE_OWNER, TABLE_NAME, STATUS,
FUNCIDX_STATUS from ALL_INDEXES where INDEX_TYPE not in
('NORMAL', 'BITMAP','IOT - TOP') and TABLE_NAME in (select unique
(table_name) from dba_tab_columns where char_used ='C');
if this gives rows back then the change will fail with
ORA-30556: functional index is defined on the column to be modified
if you have functional indexes on CHAR based columns you need to drop the
index and recreate after the change , note that a disable will not be enough.
On 10g check ,while connected as sysdba, if there are objects in the recyclebin
SQL> show recyclebin
If so do also a PURGE DBA_RECYCLEBIN; other wise you will recieve a ORA-38301 during CSALTER.
7) Choose on how to do the actual change
you have 2 choices now:
Option 1 - exp/imp the entire database and stop using the rest of this note.
a. Export the current entire database (with NLS_LANG set to <your old
database character set>)
b. Create a new database in the AL32UTF8 character set
c. Import all data into the new database (with NLS_LANG set to <your old database character set>)
d. The conversion is complete, do not continue with this note.
note that you do need to deal with truncation issues described in step 4), even
if you use the export/import method.
Option 2 - export only the convertible data and continue using this note.
For 9i and lower:
a. If you have "convertible" data for the sys objects SYS.METASTYLESHEET,
SYS.RULE$ or SYS.JOB$ then follow the following note for those objects:
Note 258904.1 Convertible data in data dictionary: Workarounds when changing character set
make sure to combine the next steps in the example script given in that note.
b. Export all the tables that csscan shows have convertible data
(make sure that the character set part of the NLS_LANG is set to the current
database character set during the export session)
c. Truncate those tables
d. Run csscan again to verify you only have "changeless" application data left
e. If this now reports only Changeless data then proceed to step 8), otherwise
do the same again for the rows you've missed out.
For 10g and up:
a. Export all the USER tables that csscan shows have convertible data
(make sure that the character set part of the NLS_LANG is set to the current
database character set during the export session)
b. Fix any "convertible" in the SYS schema, note that the 10g way to change
the characterset (= the CSALTER script) will deal with any CLOB data in the
sys schema. All "no 9i only" fixes in
Note 258904.1 Convertible data in data dictionary: Workarounds when changing character set
should NOT be done in 10g
c. Truncate the exported user tables.
d. Run csscan again to verify you only have "changeless" application data left
e. If this now reports only Changeless data then proceed to step 8), otherwise
do the same again for the rows you've missed out.
When using CSSCAN 2.x (10g database) you should see in WE8TOUTF8.txt this:
The data dictionary can be safely migrated using the CSALTER script
If you do NOT have this when working on a 10g system CSALTER will NOT work and this
means you have missed something or not followed all steps in this note.
8) Perform the character set change:
Perform a backup of the database.
Check the backup.
Double-check the backup.
For 9i and below:
Then use the "alter database" command, this changes the current database
character set definition WITHOUT changing the actual stored data.
Shutdown the listener and any application that connects locally to the database.
There should be only ONE connection the database during the WHOLE time and that's
the sqlplus session where you do the change.
1. Make sure the parallel_server parameter in INIT.ORA is set to false or it is not set at all.
If you are using RAC see
Note 221646.1 Changing the Character Set for a RAC Database Fails with an ORA-12720 Error
2. Execute the following commands in sqlplus connected as "/ AS SYSDBA":
SPOOL Nswitch.log
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER SYSTEM ENABLE RESTRICTED SESSION;
ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
ALTER SYSTEM SET AQ_TM_PROCESSES=0;
ALTER DATABASE OPEN;
ALTER DATABASE CHARACTER SET INTERNAL_USE AL32UTF8;
SHUTDOWN IMMEDIATE;
-- a alter database takes typically only a few minutes or less,
-- it depends on the number of columns in the database, not the amount of data
3. Restore the parallel_server parameter in INIT.ORA, if necessary.
4. STARTUP;
Without the INTERNAL_USE you get a ORA-12712: new character set must be a superset of old character set
WARNING WARNING WARNING
Do NEVER use "INTERNAL_USE" unless you did follow the guidelines STEP BY STEP
here in this note and you have a good idea what you are doing.
Do NEVER use "INTERNAL_USE" to "fix" display problems, but follow Note 225938.1
If you use the INTERNAL_USE clause on a database where there is data listed
as convertible without exporting that data then the data will be corrupted by
changing the database character set !
For 10g and up:
Shutdown the listener and any application that connects locally to the database.
There should be only ONE connection the database during the WHOLE time and that's
the sqlplus session where you do the change.
Then you do in sqlplus connected as "/ AS SYSDBA":
-- check if you are using spfile
sho parameter pfile
-- if this "spfile" then you are using spfile
-- in that case note the
sho parameter job_queue_processes
sho parameter aq_tm_processes
-- (this is Bug 6005344 fixed in 11g )
-- then do
shutdown
startup restrict
SPOOL Nswitch.log
@@?\rdbms\admin\csalter.plb
-- Csalter will aks confirmation - do not copy paste the whole actions on one time
-- sample Csalter output:
-- 3 rows created.
-- This script will update the content of the Oracle Data Dictionary.
-- Please ensure you have a full backup before initiating this procedure.
-- Would you like to proceed (Y/N)?y
-- old 6: if (UPPER('&conf') <> 'Y') then
-- New 6: if (UPPER('y') <> 'Y') then
-- Checking data validility...
-- begin converting system objects
-- PL/SQL procedure successfully completed.
-- Alter the database character set...
-- CSALTER operation completed, please restart database
-- PL/SQL procedure successfully completed.
-- Procedure dropped.
-- if you are using spfile then you need to also
-- ALTER SYSTEM SET job_queue_processes=<original value> SCOPE=BOTH;
-- ALTER SYSTEM SET aq_tm_processes=<original value> SCOPE=BOTH;
shutdown
startup
and the 10g database will be AL32UTF8
9) Reload the data pump packages after a change to AL32UTF8 / UTF8 in Oracle10
If you use Oracle10 then the datapump packages need to be reloaded after
a conversion to UTF8/AL32UTF8. In order to do this run the following 3
scripts from $ORACLE_HOME/rdbms/admin in sqlplus connected as "/ AS SYSDBA":
For 10.2.X:
catnodp.sql
catdph.sql
catdpb.sql
For 10.1.X:
catnodp.sql
catdp.sql
10) Reimporting the exported data:
If you exported any data in step 5) then you now need to reimport that data.
Make sure that the character set part of the NLS_LANG is still set to the
original database character set during the import session (just as it was during
the export session).
11) Verify the clients NLS_LANG:
Make sure your clients are using the correct NLS_LANG setting:
Regards,
Chotu,
Bangalore

PDF printing with UTF8 characterset

Hi,
Stats:
E-BIZ version :- 11.5.10.2
DB:- 10.2.0.3
Report BUilder :- 6i
Character Set :- UTF8.
I have some doubt while viewing/printing pdf reports from Oralce Application. Can you please suggest if it is possible to print pdf's from oracle apps with UTF8 characterset and without having 3rd party software to convert pdf report to postscript and a printer that can understand postscript. We do have PASTA and IX configured in our enviornment.Reports created are in BI/XML publisher.
Actually we have one 3rd party tool for printing and wanted to get rid of it due to financial constraints. Can you please suggest if it is possible to print pdf's from oracle apps without having 3rd party tool. I have read lot of document and completely lost. Oracle some where says it is possible and at other place says 'If you are on UTF8 chacterset then you need to have XML/BI publisher with PASTA and 3rd party software' but in my enviornment we have some pdf reports that are bypassing 3rd party software for printing. I am just lost.
Any help would be appreciated
Thanks,
JD

Hi,
This line confuses me, do i require 3rd party software to print from oracle apps if my characterset is UTF8(althoug i am using XML publisher), I believe yes -- See (Note: 422508.1 - About Oracle XML Publisher Release 5.6.3), Step 8 Enable PDF Printing in Oracle Applications. (System Administrator)
Workaround section says
6. From Oracle Applications via the Adobe Acrobat Reader, a PDF output file can be viewed and
printed.>>>> NO i not want to use it.This is a workaround to print PDF directly from the application. If you do not want to use this approach, use the other one mentioned above.
7.Although, it may be possible to create a custom print driver or print program using the Adobe
Acrobat Distiller, viable instructions on how to perform such a custom setup is very scarce, see
Note 262657.1--intended for Latin 1 character set environments.>>>> I am on UTF8
I would not recommend this approach as this may (or may not) work.
8. Use Pasta, an Oracle post printing program, to change a copy of the report output file to the
desired or printable output format.>>>>>> I have PASTA but does that mean i can print all my pdf reports using it. Don't i need 3rd party tool now.See (Note: 239196.1 - PASTA 3.0 Release Information).
9. Lastly, use another format like Postscript, that is fully supported for viewing and printing>>>>> How to use it? Can anyone please explain.Change the output of the concurrent program to Postscript (Concurrent > Program > Define), and you can view the output using Ghost Viewer.
Note: 117112.1 - How to read Postscript File Formats on a MS Windows Operating System and Convert To Another File Format
Thanks,
Hussein

Convertion of Danish characters in UTF8

Hello everybody,
I am facing a strange situation on my project.
We are not able to convert a special/national Danish character into UTF (e.g. convert “JordbÃ¿Å r” to “Jordbær JordbÿŠr”).
I am using the built-in function CONVERT and an Oracle 9.2 database.
Do you have any idea how can I solve this?
Thanks,
Ionut

Hi Ionut,
How do you tell it doesn't work?
In my system I have plenty of Jordbær:
SQL> select * from nls_database_parameters where parameter='NLS_CHARACTERSET';
PARAMETER VALUE
NLS_CHARACTERSET WE8MSWIN1252
SQL> select 'Jordbær', convert('Jordbær', 'WE8MSWIN1252', 'UTF8') from dual;
'JORDBæ CONVERT
Jordbær Jordb+r
SQL>@Ravi Kumar, you reversed the arguments, I think.
Regards
Peter

How to convert CLOB to UTF8

Hi all,
We have batch job which actually runs on daily basis and produces XML with the Java code.And the XML generated is used for various purposes.
Recently the job was not executed successfully because of some special characters in XML which falls out of ANSI encoding stantands.
So we are in a situation to convert the CLOB datatype (input to Java code) to UTF8 encoded XML.
We are not to able to achieve this .
Right now the cloB data is converted to ASCII stream,which doesn't create a well formed XML based on UTF8 encoding standards.See below the code
clob xmlCLOB = (Clob)clobInfo.get("clobfield");
InputStream is = xmlCLOB.getAsciiStream();
Any thoughts on how to convert this CLOB to UTF8?
Regards,
NaG

Joan,
I don't know if this will help with conversion of you BFILE, but at
http://www.xml.com/lpt/a/2000/04/26/encodings/xmlparser.html
and at
http://xmlsoft.org/encoding.html
there is some information on conversion to UTF8.
Hope it helps. Let us know.
Dave

How to convert from UNICODE (UTF16) to UTF8 and vice-versa in JAVA.

Hi
I want to insert a string in format UTF16 to the database. How do I convert from UTF16 to UTF8 and vice- versa in JAVA?. What type must the database field be? Do I need a special set up for the database (oracle 8.i)?
thanks
null

I'm not sure if this is the correct topic, but we are having problems accessing our Japanese data stored in UTF-8 in our Oracle database using the JDBC thin driver. The data is submitted and extracted correctly using ODBC drivers, but inspection of the same data retrieved from the GetString() call using the JDBC thin driver shows frequent occurrences of bytes like "FF", which are not expected in either UTF8 or UCS2. My understanding is that accessing UTF8 in Java should involve NO NLS translation, since we are simply going from one Unicode encoding to another.
We are using Oracle version 8.0.4.
Can you tell me what we are doing wrong?
null

Latin-1 Characterset Translation Issues

I have an Oracle 9.2.0.5 database on OpenVMS 7.3-2. Currently, there are 101 incorrect Latin-1 to Latin-1 character set translations that are being loaded into my Oracle database (Incorrect OS conversion tables when data is transfered from source system).
NLS DB parameters (nls parameters not listed are default values):
nls_language string AMERICAN
nls_length_semantics string BYTE
nls_nchar_conv_excp string FALSE
nls_territory string AMERICA
example:
Source Data : Résine de PolyPropylène
Loaded in my database after OS translation: R©sine de PolyPropyl¬ne
The invalid translations are happening external to the oracle database at the OS level. My problem is I need to correct all the invalid character sets that are in my database. The database is current 3.5TB in size, so I have to do this in an efficient matter. I know what the before (invalid translations values in HEX) and after (correct translations in HEX) values are.
Is there a PL/SQL program or Oracle tool that can help me to correct these values against millions of rows of data in Oracle (Basically a characterset translation program)?
I have a C program that works to convert the charactersets if they are in a CSV file. The problem is it takes to long to extract the data from oracle into CSV files for tables that are multi-millions of rows.
Any help is appreciated.

It looks like during the insertion from ASP the Latin 1 string has not been converted to UTF8. Hence you are storing Latin-1 encoding inside a UTF-8 database.
I thought it would automatically be handled by OO4O.True. Did you specify the character set of the NLS_LANG env variable for the OO4O client to WE8ISO8859P1 ? If it was set to UTF8 then Oracle will assume that the encoding coming thru' the ASP page are in UTF-8 , hence no conversion takes place ..
Also may be you should check the CODEPAGE directive and Charset property in your ASP ?
null

SQLLOAD with various Charactersets

Hello,
I am working on an Oracle EBS project (version 11i) with Oracle DB 11G hosted on Oracle Linux.
Character set is UTF8 on the database since we have some Polish users.
Our ERP is logically interfaced with many systems from which we receive ASCII datafiles that we need to upload in our DB using SQLLOAD utility.
The problem we have is that depending on the sending system, the characterset of a datafile can vary among following values :
- EE8MSWIN1250 => files sent by our Polish Subsidiary
- WE8MSWIN1252 => files sent by WINDOWS systems
- WE8ISO8859P1 => files sent by some Unix Systems
We have developped a specific Linux shell that submits SQLLOAD for datafiles with the appropriate control file "CHARACTERSET" option.
The problem is that until now I was not able to detect precisely the character set of a given datafile.
- the Linux command "file -i" returns "text/plain; charset=iso-8859-1" even for a windows file encoded with WINDOWS-1252 or WINDOWS-1250
- I also tried the linux command iconv to convert the file to UTF8 but this command is successfull whatever the "from" characterset we specify (ISO-8859-1 / WINDOWS-1252 / WINDOWS-1250)
My Question :
How can I determine precisely the characterset of a given ASCII datafile in order to set correctly the CHARACTERSET option of SQLLOAD control file ?
(in batch mode on Linux)
Browsers as IE, Chrome or Firefox are able to do that (detect the character set of a web page to display it correctly) so I suppose that a tool or command should exist for that purpose.
Thanks in advance for helping and sharing experience.
Karim Helali
Toshiba France

Thank you Sergiusz : the lcsscan tools gives quite good results and it may be a solution for us.
The only issue is that lcsscan is only available on last oracle DB releases (10 and 11 ).
Although our database server is on release 11G, the EBS applications server is on oracle 8i due to Oracle Forms restrictions.
As the SQLLOAD is run from the applications server, I have to run the lcsscan tool by SSH on the DB server
So I let the question open a few days again in case someone knows a Linux command or tool that does the same control as lcsscan
Note: We are also considering the other solution you mention eg to assign to each sending system an agreed characterset .
Thank you again and best regards
Karim Helali

ADADMIN Convert Character Set CUSTOM_TOP

Hi All,
I am in the process of converting my 11i instance to UTF8. The database has been converted using DMU. The application folder structure is conveted using the ADADMIN utility.
The conversion of APPL_TOP is fine and no issues.
The conversion of CUSTOM_TOP created issues, the permission of files have chagned and none of the files in the custom top has execute permission.
Has anyone done characteset conversion in the past?
How to handle the CUSTOM_TOP character set conversion ? Can we just ignore the CUSTOM_TOP from converting to UTF8 ?
Regards
Sridhar M

The files that were present in the CUSTOM_TOP during the conversion had their permissions changed after the conversion. ie., none of the files had the execute privilege and hence the users were getting permission denied error on the CUSTOM objects.You can simply set the execute permission on those files.
If you have followed any metalink document please share it with me. Please see the docs referenced in these threads.
Character Set Migration
Oracle E-Biz Applications Database characterset conversion
MLS Data Character Set Conversion
http://forums.oracle.com/forums/search.jspa?threadID=&q=Convert+AND+Characterset&objID=c3&dateRange=all&userID=&numResults=15&rankBy=10001
http://forums.oracle.com/forums/search.jspa?threadID=&q=Migrate+AND+Characterset&objID=c3&dateRange=all&userID=&numResults=15&rankBy=10001
Thanks,
Hussein

Converting DB characterset to UTF8

Similar Messages

Maybe you are looking for