Non latin character sets and accented latin character with refind

I need to use refind to deal with strings containing accented
characters like žittâ lísu, but it doesn't seem to
find them. Also when using it with cyrillic characters , it won't
find individual characters, but if I test for [\w] it'll work.
I found a livedocs that says cf uses the Java unicode
standard for characters. Is it possible to use refind with non
latin characters or accented characters or do I have to write my
own Java?

ogre11 wrote:
> I need to use refind to deal with strings containing
accented characters like
> ?itt? l?su, but it doesn't seem to find them. Also when
using it with cyrillic
> characters , it won't find individual characters, but if
I test for [\w] it'll
> work.
works fine for me using unicode data:
<cfprocessingdirective pageencoding="utf-8">
<cfscript>
t="TÃ¡ mÃ© in ann gloine a ithe;
NÃ chuireann sÃ© isteach nÃ³ amach
orm";
s="Ã¡";
writeoutput("search:=#t#<br>for:=#s#<br>found
at:=#reFind(s,t,1,false)#");
</cfscript>
what's the encoding for your data?

Similar Messages

Character sets and conversions

Hi all,
were facing a quite complex problem, for which I'am not even able to specify were it is going wrong or what needs configuring, partly for lack of experience and partly for combining different tecnical areas from which I'm only responible for some of them.
So I'll sketch breefly the situation, and hopefully you might give me some guidelines or hints as to where to look at.
The setup : web application (so clients access by use of browser) on Weblogic- Linux platform, Tuxedo on Iseries , and as far as I understand some DB internally to Iseries where data is stored.
Data is entered in the DB by use of some data-entry application that comes with the iSeries.
The problem: consulting data by use of the web-aplication , some characters dont show up correctly , e.g. @ in email addresses, e's with accents, ...
For the chain being "browser <-> WL <-> Tuxedo <-> DB" , the problem might be different points. But from trace beeing activated , we could see that the response going out of tuxedo to WL is not correct...
Any hint as to what to look for, what can configuration is important, would be welcome ...
Some sub-questions:
- I understand Tuxedo is always "installed" in English , with no other option. This means that f.e. logs are in English.
But can/need to define some character set?
- Between Tuxedo <-> DB you can use som conversion tables ?
Any help would be apreciated , were quite lost ..

Hi,
Given that you are running Tuxedo on iSeries, I'm guessing you are running Tuxedo 6.5 as the port for the current Tuxedo release on iSeries hasn't been released yet. Tuxedo 6.5 does not directly support multi-byte character strings. The two common buffer formats for string data in Tuxedo are STRING which doesn't support multi-byte characters, or CARRAY which does support multi-byte characters as a CARRAY is essentially a blob. Do you know what buffer type the Tuxedo application is using to send data to WebLogic Server?
In Tuxedo 9.0 and later, direct support for multi-byte strings was added in the form of the MBSTRING buffer type. This buffer type supports multi-byte strings with a variety of character sets and encodings.
Regards,
Todd Little
Oracle Tuxedo Chief Archiitect

UTF/Japanese character set and my application

Blankfellaws...
a simple query about the internationalization of an enterprise application..
I have a considerably large application running as 4 layers.. namely..
1) presentation layer - I have a servlet here
2) business layer - I have an EJB container here with EJBs
3) messaging layer - I have either Weblogic JMS here in which case it is an
application server or I will have MQSeries in which case it will be a
different machine all together
4) adapter layer - something like a connector layer with some specific or
rather customized modules which can talk to enterprise repositories
The Database has few messages in UTF format.. and they are Japanese
characters
My requirement : I need thos messages to be picked up from the database by
the business layer and passed on to the client screen which is a web browser
through the presentation layer.
What are the various points to be noted to get this done?
Where and all I need to set the character set and what should be the ideal
character set to be used to support maximum characters?
Are there anything specifically to be done in my application code regarding
this?
Are these just the matter of setting the character sets in the application
servers / web servers / web browsers?
Please enlighten me on these areas as am into something similar to this and
trying to figure out what's wrong in my current application. When the data
comes to the screen through my application, it looks corrupted. But the asme
message when read through a simple servlet, displays them without a problem.
Am confused!!
Thanks in advance
Manesh

Hello Manesh,
For the database I would recommend using UTF-8.
As for the character problems, could you elaborate which version of WebLogic
are you using and what is the nature of the problem.
If your problem is that of displaying the characters from the db and are
using JSP, you could try putting
<%@ page language="java" contentType="text/html; charset=UTF-8"%> on the
first line,
or if a servlet .... response.setContentType("text/html; charset=UTF-8");
Also to automatically select the correct charset by the browser, you will
have to include
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> in the
jsp.
You could replace the "UTF-8" with other charsets you are using.
I hope this helps...
David.
"m a n E s h" <[email protected]> wrote in message
news:[email protected]...
Blankfellaws...
a simple query about the internationalization of an enterpriseapplication..
>
I have a considerably large application running as 4 layers.. namely..
1) presentation layer - I have a servlet here
2) business layer - I have an EJB container here with EJBs
3) messaging layer - I have either Weblogic JMS here in which case it isan
application server or I will have MQSeries in which case it will be a
different machine all together
4) adapter layer - something like a connector layer with some specific or
rather customized modules which can talk to enterprise repositories
The Database has few messages in UTF format.. and they are Japanese
characters
My requirement : I need thos messages to be picked up from the database by
the business layer and passed on to the client screen which is a webbrowser
through the presentation layer.
What are the various points to be noted to get this done?
Where and all I need to set the character set and what should be the ideal
character set to be used to support maximum characters?
Are there anything specifically to be done in my application coderegarding
this?
Are these just the matter of setting the character sets in the application
servers / web servers / web browsers?
Please enlighten me on these areas as am into something similar to thisand
trying to figure out what's wrong in my current application. When the data
comes to the screen through my application, it looks corrupted. But theasme
message when read through a simple servlet, displays them without aproblem.
Am confused!!
Thanks in advance
Manesh

[urgent] oracle character set and national character set !!(dictionary)

Hi. everyone.
What is the oracle dictionary that contains information of
oracle character set and national character set?
I checked v$database, but there was not the information.
It seems that there are some differences between "nls_* " init parameters
and the database character set.
"Alter database backup controlfile to trace" gave me the character set of db,
but I would like to know whether there are oracle dictionary regarding them.
Thanks in advance. Have a nice day.
Best Regards.

I found the dictionary which contains the information of character set and
natiional character set of database.
select * from nls_database_parameters
where parameter like '%CHARACTERSET';
Thanks for reading.
Have a good day.
Best Regards.

Hi.have a g5 mac,dual core 2.3 unit.i bought it with no hard drive.have got hard drive,formatted for mac.i am trying to load osx.i get a grey screen with a small box in the centre with 2 character faces,and then grey apple with loading icon spinning.help?

hi.have a g5 mac,dual core 2.3 unit.i bought it with no hard drive.have got hard drive,formatted for mac.i am trying to load osx.i get a grey screen with a small box in the centre with 2 character faces,and then grey apple with loading icon spinning.nothing is loading tho.

I see 10.6.3 in your profile---is tthat what you are trying to load? If so, it won't work. No PowerPC Mac like your G5 can run a Mac OS version higher than 10.5.8

Oracle 8.1.5 install on Linux Redhat 6.0: character set (and other) problem(s)

I am trying to install Oracle 8i on Linux and it does not work : once the install is finished, I have a message saying that "Character Set not found".
I am runing a french version of Linux (fr-latin 1) and I try to install Oracle with French and English as languages
An other problem about this install : Oracle does not seem to recognize that I have 6,9 Giga for it to install, and says that I have not enough space for the install...
And at the end of the install, it takes for ages (about 15mns) during which nothing seems to happen. On one machine I got out of this phase, but on the other I never saw it finish, it looks as if the computer crashed. Is that normal?
I went through all the initialization phases, set the correct environment variables...
thanks
Solange
null

I've been dealing with the same problems in the english version but could bypass thiss by doing the folowing.
-Just ignore the disk space stuff
-Ignore the charset message, also
-When creating a database, choose custom and then select the WE8ISO8859P1 char set. It worked for portuguese, must work for french also.
-Everyone here recommended, and I do the same, leave the database creation for later, not during instalation.
Good Luck!

Character sets and ado

I have a table with a clob field on an Oracle 8.1.7.4 database. When querying the clob field via odbc and ado the value is truncated. The Oracle server and client are using a WE8ISO8859P1 character set. Has anyone come across this before.
Thanks.

I believe the data should be able to be represented by IS0-8859. The data is a long random string of characters that represents a fingerprint image.
We seem to only get 996 characters back from the database. If I do a getchunk on the data then I get 996 characters of data, then 996 NULLS, then 996 characters of data and so on. The 996 NULLS should be data.
The data is in the database because I can do a dbms_lob.substr and get the correct info back.

Oracle Database Character set and DRM

Hi,
I see the below context in the Hyperion EPM Installation document.
We need to install only Hyperion DRM and not the entire Hyperion product suite, Do we really have to create the database in one of the uft-8 character set?
Why it is saying that we must create the database this way?
Any help is appreciated.
Oracle Database Creation Considerations:
The database must be created using Unicode Transformation Format UTF-8 encoding
(character set). Oracle supports the following character sets with UTF-8 encoding:
l AL32UTF8 (UTF-8 encoding for ASCII platforms)
l UTF8 (backward-compatible encoding for Oracle)
l UTFE (UTF-8 encoding for EBCDIC platforms)
Note: The UTF-8 character set must be applied to the client and to the Oracle database.
Edited by: 851266 on Apr 11, 2011 12:01 AM

Srini,
Thanks for your reply.
I would assume that the ConvertToClob function would understand the byte order mark for UTF-8 in the blob and not include any parts of it in the clob. The byte order mark for UTF-8 consists of the byte sequence EF BB BF. The last byte BF corresponds to the upside down question mark '¿' in ISO-8859-1. Too me, it seems as if ConvertToClob is not converting correctly.
Am I missing something?
BTW, the database version is 10.2.0.3 on Solaris 10 x86_64
Kind Regards,
Eyðun
Edited by: Eyðun E. Jacobsen on Apr 24, 2009 8:26 PM

MySQL Character Set and Collation

Hey There,
Can somebody please tell me why MySQL's PKGBUILD contains:
--with-charset=latin1 --with-collation=latin1_general_ci
line ? I mean why not utf8 and utf8_general_ci but latin1 ?

Hey There,
Can somebody please tell me why MySQL's PKGBUILD contains:
--with-charset=latin1 --with-collation=latin1_general_ci
line ? I mean why not utf8 and utf8_general_ci but latin1 ?

Conversions between character sets when using exp and imp utilities

I use EE8ISO8859P2 character set on my server,
when exporting database with NLS_LANG not set
then conversion should be done between
EE8ISO8859P2 and US7ASCII charsets, so some
characters not present in US7ASCII should not be
successfully converted.
But when I import such a dump, all characters not
present in US7ASCII charset are imported to the database.
I thought that some characters should be lost when
doing such a conversions, can someone tell me why is it not so?

Not exactly. If the import is done with the same DB character set, then no matter how it has been exported. Conversion (corruption) may happen if the destination DB has a different character set. See this example :
[ora102 work db102]$ echo $NLS_LANG
AMERICAN_AMERICA.WE8ISO8859P15
[ora102 work db102]$ sqlplus test/test
SQL*Plus: Release 10.2.0.1.0 - Production on Tue Jul 25 14:47:01 2006
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
TEST@db102 SQL> create table test(col1 varchar2(1));
Table created.
TEST@db102 SQL> insert into test values(chr(166));
1 row created.
TEST@db102 SQL> select * from test;
C
¦
TEST@db102 SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
[ora102 work db102]$ export NLS_LANG=AMERICAN_AMERICA.EE8ISO8859P2
[ora102 work db102]$ sqlplus test/test
SQL*Plus: Release 10.2.0.1.0 - Production on Tue Jul 25 14:47:55 2006
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
TEST@db102 SQL> select col1, dump(col1) from test;
C
DUMP(COL1)
©
Typ=1 Len=1: 166
TEST@db102 SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
[ora102 work db102]$ echo $NLS_LANG
AMERICAN_AMERICA.EE8ISO8859P2
[ora102 work db102]$ exp test/test file=test.dmp tables=test
Export: Release 10.2.0.1.0 - Production on Tue Jul 25 14:48:47 2006
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Export done in EE8ISO8859P2 character set and AL16UTF16 NCHAR character set
server uses WE8ISO8859P15 character set (possible charset conversion)
About to export specified tables via Conventional Path ...
. . exporting table TEST 1 rows exported
Export terminated successfully without warnings.
[ora102 work db102]$ sqlplus test/test
SQL*Plus: Release 10.2.0.1.0 - Production on Tue Jul 25 14:48:56 2006
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
TEST@db102 SQL> drop table test purge;
Table dropped.
TEST@db102 SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
[ora102 work db102]$ imp test/test file=test.dmp
Import: Release 10.2.0.1.0 - Production on Tue Jul 25 14:49:15 2006
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Export file created by EXPORT:V10.02.01 via conventional path
import done in EE8ISO8859P2 character set and AL16UTF16 NCHAR character set
import server uses WE8ISO8859P15 character set (possible charset conversion)
. importing TEST's objects into TEST
. importing TEST's objects into TEST
. . importing table "TEST" 1 rows imported
Import terminated successfully without warnings.
[ora102 work db102]$ export NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P15
[ora102 work db102]$ sqlplus test/test
SQL*Plus: Release 10.2.0.1.0 - Production on Tue Jul 25 14:49:34 2006
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
TEST@db102 SQL> select col1, dump(col1) from test;
C
DUMP(COL1)
¦
Typ=1 Len=1: 166
TEST@db102 SQL>

CHARACTER SET CONVERSION PROBLEM BETWEEN WIN XP (SOURCE EXPORT) AND WIN 7

Hi colleagues, please assist:
I have a laptop running win 7 professional. Its also running oracle database 10g release 10.2.0.3.0. I need to import a dump into this database. The dump originates from a client pc running win XP and oracle 10g release 10.2.0.1.0 When i use the import utility in my database(on the laptop), the following happens:
Import: Release 10.2.0.3.0 - Production on Tue Nov 9 17:03:16 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Username: system/password@orcl
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
Import file: EXPDAT.DMP > F:\uyscl.dmp
Enter insert buffer size (minimum is 8192) 30720>
Export file created by EXPORT:V08.01.07 via conventional path
Warning: the objects were exported by UYSCL, not by you
import done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character set
export client uses WE8ISO8859P1 character set (possible charset conversion)
export server uses WE8ISO8859P1 NCHAR character set (possible ncharset conversion)
List contents of import file only (yes/no): no >
when i press enter, the import windows terminates prematurely without completing the process. What should i do to fix this problem?

Import: Release 10.2.0.3.0 - Production on Fri Nov 12 14:57:27 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Username: system/password@orcl
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
Import file: EXPDAT.DMP > F:\Personal\DPISIMBA.dmp
Enter insert buffer size (minimum is 8192) 30720>
Export file created by EXPORT:V10.02.01 via conventional path
import done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character set
List contents of import file only (yes/no): no >
Ignore create error due to object existence (yes/no): no >
Import grants (yes/no): yes >
Import table data (yes/no): yes >
Import entire export file (yes/no): no >
Username:

Character set issue after import?

Hi,
Source DB version:10.2.0.1
OS:Red hat Linux
Target DB version:10.2.0.1
OS:Windows server
source database character set:AL32UTF8
Performed the export as below
$export NLS_LANG=AMERICAN.AL32UTF8
Performed the full database export and it finished successfully with out any warnings
Export done in AL32UTF8 character set and AL16UTF16 NCHAR character set
Now imported into the target database as below.
target database character set:AL32UTF8
c:\>set NLS_LANG=AMERICAN.AL32UTF8
now run import command which imported successfully with out any warnings.
However I’m having problems with Greek characters. Most of them are shown as ?, while some of them are converted to Latin chars
For example:
This was supposed to be Αγγελική ???e????
And this Κουκουτσάκη ??????ts???
While this one should be Δήμητρα ??µ?t?a
From the import log file I can see that ‘import done in AL32UTF8 character set and AL16UTF16 NCHAR character set’ which I believe is correct.
Can any one tell me how i can over come this problem of greek charecters.
Thank you all.

PARAMETER
VALUE
NLS_LANGUAGE
AMERICAN
NLS_TERRITORY
AMERICA
NLS_CURRENCY
$
PARAMETER
VALUE
NLS_ISO_CURRENCY
AMERICA
NLS_NUMERIC_CHARACTERS
NLS_CHARACTERSET
AL32UTF8
PARAMETER
VALUE
NLS_CALENDAR
GREGORIAN
NLS_DATE_FORMAT
DD-MON-RR
NLS_DATE_LANGUAGE
AMERICAN
PARAMETER
VALUE
NLS_SORT
BINARY
NLS_TIME_FORMAT
HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT
DD-MON-RR HH.MI.SSXFF AM
PARAMETER
VALUE
NLS_TIME_TZ_FORMAT
HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT
DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY
$
PARAMETER
VALUE
NLS_COMP
BINARY
NLS_LENGTH_SEMANTICS
BYTE
NLS_NCHAR_CONV_EXCP
FALSE
PARAMETER
VALUE
NLS_NCHAR_CHARACTERSET
AL16UTF16
NLS_RDBMS_VERSION
10.2.0.1.0
20 rows selected.

Character set during export

Hi,
In the database both the NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET values are UTF8.
When I noticed the export log file it was quite amazing that
<b>Export done in WE8MSWIN1252 character set and UTF8 NCHAR character set server uses UTF8 character set (possible charset conversion) </b>
How come WE8MSWIN1252 comes into picture ???
Any answer is welcome....

It depends on NLS_LANG environment variable setting (or non-setting). Try the following :
SQL> select * from nls_database_parameters
2 where parameter in('NLS_LANGUAGE', 'NLS_TERRITORY', 'NLS_CHARACTERSET');
PARAMETER            VALUE
NLS_LANGUAGE         AMERICAN
NLS_TERRITORY        AMERICA
NLS_CHARACTERSET     WE8ISO8859P15
SYS@db102 SQL>                                                             Before exporting set NLS_LANG
C:\> set NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P15
C:\> exp ........
of course use your values.

Checking server character set

I think this is a simple question, but for some reason I am not able to find the result.
How can I check the server's default character set?
We have HP-UX (unix) servers, and most databases are set to AL32UTF8/AL16UTF16.
I thought it would be something simple like typing 'locale', but on the HP-UX box(es), it doesn't show any setting for LANG.
I also executed 'env' and looked at all settings, but nothing seems relevant.
Unless I explicitly set the NLS_LANG=AMERICAN_AMERICA.AL32UTF8, when I do an export, it shows:
Export done in US7ASCII character set and AL16UTF16 NCHAR character set
server uses AL32UTF8 character set (possible charset conversion)
Where is it getting the US7ASCII?
If I select * from nls_database_parameters; there is no setting like US7ASCII.
When I do a 'man -k character', I get numerous possible commands, but none of them appear to be something that displays the default character set.
But, as I noted above, when I run an export of our database (without setting the NLS_LANG), it shows me that the default character set is US7ASCII.
So, how do I show this, and if you might also have any ideas how to change this to UTF8.
Thanks.
removed references in this posting to my linux boxes...

The default O/S character set for Unix is 7-bit ASCII. This is element of so-called "C locale". This locale is used by all applications that do not declare themselves sensitive to user locale (i.e. they do not call setlocale() at application startup) and by all applications that have no locale parameters (LANG, or LC_xxxx) set in their environment.
If you call 'locale' in a Unix session that has no locale environment set, the "C locale" should be reported.
"C locale" uses US locale formatting conventions and binary collation.
## "Export done in US7ASCII character set and AL16UTF16 NCHAR character set
## server uses AL32UTF8 character set (possible charset conversion)"
Here, US7ASCII is the default character set of an Oracle Client. It is not directly related to the default O/S character set. US7ASCII is always the default Oracle client character set on all non-EBCDIC platforms, used if NLS_LANG is not explicitly set.
-- Sergiusz

Change character set

Hi
is anyone can tell me how to change characterset.
i try with alter session but it doesnt work.
thanks

Article from Metalink
Doc ID:      Note:66320.1
Subject:      Changing the Database Character Set or the Database National Character Set
Type:      BULLETIN
Status:      PUBLISHED
     Content Type:      TEXT/PLAIN
Creation Date:      23-OCT-1998
Last Revision Date:      12-DEC-2003
PURPOSE ======= To explain how to change the database character set or national character set of an existing Oracle8(i) or Oracle9i database without having to recreate the database. 1. SCOPE & APPLICATION ====================== The method described here is documented in the Oracle 8.1.x and Oracle9i documentation. It is not documented but it can be used in version 8.0.x. It does not work in Oracle7. The database character set is the character set of CHAR, VARCHAR2, LONG, and CLOB data stored in the database columns, and of SQL and PL/SQL text stored in the Data Dictionary. The national character set is the character set of NCHAR, NVARCHAR2, and NCLOB data. In certain database configurations the CLOB and NCLOB data are stored in the fixed-width Unicode encoding UCS-2. If you are using CLOB or NCLOB please make sure you read section "4. HANDLING CLOB AND NCLOB COLUMNS" below in this document. Before changing the character set of a database make sure you understand how Oracle deals with character sets. Before proceeding please refer to [NOTE:158577.1] "NLS_LANG Explained (How Does Client-Server Character Conversion Work?)". See also [NOTE:225912.1] "Changing the Database Character Set - an Overview" for general discussion about various methods of migration to a different database character set. If you are migrating an Oracle Applications instance, read [NOTE:124721.1] "Migrating an Applications Installation to a New Character Set" for specific steps that have to be performed. If you are migrating from 8.x to 9.x please have a look at [NOTE:140014.1] "ALERT: Oracle8/8i to Oracle9i Using New "AL16UTF16"" and other referenced notes below. Before using the method described in this note it is essential to do a full backup of the database and to use the Character Set Scanner utility to check your data. See the section "2. USING THE CHARACTER SET SCANNER" below. Note that changing the database or the national character set as described in this document does not change the actual character codes, it only changes the character set declaration. If you want to convert the contents of the database (character codes) from one character set to another you must use the Oracle Export and Import utilities. This is needed, for example, if the source character set is not a binary subset of the target character set, i.e. if a character exists in the source and in the target character set but not with the same binary code. All binary subset-superset relationships between characters sets recognized by the Oracle Server are listed in [NOTE:119164.1] "Changing Database Character Set - Valid Superset Definitions". Note: The varying width character sets (like UTF8) are not supported as national character sets in Oracle8(i) (see [NOTE:62107.1]). Thus, changing the national character set from a fixed width character set to a varying width character set is not supported in Oracle8(i). NCHAR types in Oracle8 and Oracle8i were designed to support special Oracle specific fixed-width Asian character sets, that were introduced to provide higher performance processing of Asian character data. Examples of these character sets are : JA16EUCFIXED ,JA16SJISFIXED , ZHT32EUCFIXED. For a definition of varying width character sets see also section "4. HANDLING CLOB AND NCLOB COLUMNS" below. WARNING: Do not use any undocumented Oracle7 method to change the database character set of an Oracle8(i) or Oracle9i database. This will corrupt the database. 2. USING THE CHARACTER SET SCANNER ================================== Character data in the Oracle 8.1.6 and later database versions can be efficiently checked for possible character set migration problems with help of the Character Set Scanner utility. This utility is included in the Oracle Server 8.1.7 software distribution and the newest Character Set Scanner version can be downloaded from the Oracle Technology Network site, http://otn.oracle.com The Character Set Scanner on OTN is available for limited number of platforms only but it can be used with databases on other platforms in the client/server configuration -- as long as the database version matches the Character Set Scanner version and platforms are either both ASCII-based or both EBCDIC-based. It is recommended to use the newest Character Set Scanner version available from the OTN site. The Character Set Scanner is documented in the following manuals: - "Oracle8i Documentation Addendum, Release 3 (8.1.7)", Chapter 3 - "Oracle9i Globalization Support Guide, Release 1 (9.0.1)", Chapter 10 - "Oracle9i Database Globalization Support Guide, Release 2 (9.2)", Chapter 11 Note: The Character Set Scanner coming with Oracle 8.1.7 and Oracle 9.0.1 does not have a separate version number. It reports the database release number in its banner. This version of the Scanner does not check for illegal character codes in a database if the FROMCHAR and TOCHAR (or FROMNCHAR and TONCHAR) parameters have the same value (i.e. you simulate migration from a character set to itself). The Character Set Scanner 1.0, available on OTN, reports its version number as x.x.x.1.0, where x.x.x is the database version number. This version adds a few bug fixes and it supports FROMCHAR=TOCHAR provided it is not UTF8. The Character Set Scanner 1.1, available on OTN and with Release 2 (9.2) of the Oracle Server, reports its version number as v1.1 followed by the database version number. This version adds another bug fixes and the full support for FROMCHAR=TOCHAR. None of the above versions of the Scanner can correctly analyze CLOB or NCLOB values if the database or the national character set, respectively, is multibyte. The Scanner reports such values randomly as Convertible or Lossy. The version 1.2 of the Scanner will mark all such values as Changeless (as they are always stored in the Unicode UCS-2 encoding and thus they do not change when the database or national character set is changed from one multibyte to another). Character Set Scanner 2.0 will correctly check CLOBs and NCLOBs for possible data loss when migrating from a multibyte character set to its subset. To verify that your database contains only valid codes, specify the new database character set in the TOCHAR parameter and/or the new national character set in the TONCHAR parameter. Specify FULL=Y to scan the whole database. Set ARRAY and PROCESS parameters depending on your system's resources to speed up the scanning. FROMCHAR and FROMNCHAR will default to the original database and national character sets. The Character Set Scanner should report only Changless data in both the Data Dictionary and in application data. If any Convertible or Exceptional data are reported, the ALTER DATABASE [NATIONAL] CHARACTER SET statement must not be used without further investigation of the source and type of these data. In situations in which the ALTER DATABASE [NATIONAL] CHARACTER SET statement is used to repair an incorrect database character set declaration rather than to simply migrate to a new wider character set, you may be advised by Oracle Support Services analysts to execute the statement even if Exceptional data are reported. For more information see also [NOTE:225912.1] "Changing the Database Character Set - a short Overview". 3. CHANGING THE DATABASE OR THE NATIONAL CHARACTER SET ====================================================== Oracle8(i) introduces a new documented method of changing the database and national character sets. The method uses two SQL statements, which are described in the Oracle8i National Language Support Guide: ALTER DATABASE [<db_name>] CHARACTER SET <new_character_set> ALTER DATABASE [<db_name>] NATIONAL CHARACTER SET <new_NCHAR_character_set> The database name is optional. The character set name should be specified without quotes, for example: ALTER DATABASE CHARACTER SET WE8ISO8859P1 To change the database character set perform the following steps. Note that some of them have been erroneously omitted from the Oracle8i documentation: 1. Use the Character Set Scanner utility to verify that your database contains only valid character codes -- see "2. USING THE CHARACTER SET SCANNER" above. 2. If necessary, prepare CLOB columns for the character set change -- see "4. HANDLING CLOB AND NCLOB COLUMNS" below. Omitting this step can lead to corrupted CLOB/NCLOB values in the database. If SYS.METASTYLESHEET (STYLESHEET) is populated (9i and up only) then see [NOTE:213015.1] "SYS.METASTYLESHEET marked as having convertible data (ORA-12716 when trying to convert character set)" for the actions that need to be taken. 3. Make sure the parallel_server parameter in INIT.ORA is set to false or it is not set at all. 4. Execute the following commands in Server Manager (Oracle8) or sqlplus (Oracle9), connected as INTERNAL or "/ AS SYSDBA": SHUTDOWN IMMEDIATE; -- or NORMAL <do a full database backup> STARTUP MOUNT; ALTER SYSTEM ENABLE RESTRICTED SESSION; ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0; ALTER SYSTEM SET AQ_TM_PROCESSES=0; ALTER DATABASE OPEN; ALTER DATABASE CHARACTER SET <new_character_set>; SHUTDOWN IMMEDIATE; -- OR NORMAL STARTUP RESTRICT; 5. Restore the parallel_server parameter in INIT.ORA, if necessary. 6. Execute the following commands: SHUTDOWN IMMEDIATE; -- OR NORMAL STARTUP; The double restart is necessary in Oracle8(i) because of a SGA initialization bug, fixed in Oracle9i. 7. If necessary, restore CLOB columns -- see "4. HANDLING CLOB AND NCLOB COLUMNS" below. To change the national character set replace the ALTER DATABASE CHARACTER SET statement with ALTER DATABASE NATIONAL CHARACTER SET. You can issue both statements together if you wish. Error Conditions ---------------- A number of error conditions may be reported when trying to change the database or national character set. In Oracle8(i) the ALTER DATABASE [NATIONAL] CHARACTER SET statement will return: ORA-01679: database must be mounted EXCLUSIVE and not open to activate - if you do not enable restricted session - if you startup the instance in PARALLEL/SHARED mode - if you do not set the number of queue processes to 0 - if you do not set the number of AQ time manager processes to 0 - if anybody is logged in apart from you. This error message is misleading. The command requires the database to be open but only one session, the one executing the command, is allowed. For the above error conditions Oracle9i will report one of the errors: ORA-12719: operation requires database is in RESTRICTED mode ORA-12720: operation requires database is in EXCLUSIVE mode ORA-12721: operation cannot execute when other sessions are active Oracle9i can also report: ORA-12718: operation requires connection as SYS if you are not connect as SYS (INTERNAL, "/ AS SYSDBA"). If the specified new character set name is not recognized, Oracle will report one of the errors: ORA-24329: invalid character set identifier ORA-12714: invalid national character set specified ORA-12715: invalid character set specified The ALTER DATABASE [NATIONAL] CHARACTER SET command will only work if the old character set is considered a binary subset of the new character set. Oracle Server 8.0.3 to 8.1.5 recognizes US7ASCII as the binary subset of all ASCII-based character sets. It also treats each character set as a binary subset of itself. No other combinations are recognized. Newer Oracle Server versions recognize additional subset/superset combinations, which are listed in [NOTE:119164.1]. If the old character set is not recognized as a binary subset of the new character set, the ALTER DATABASE [NATIONAL] CHARACTER SET statement will return: - in Oracle 8.1.5 and above: ORA-12712: new character set must be a superset of old character set - in Oracle 8.0.5 and 8.0.6: ORA-12710: new character set must be a superset of old character set - in Oracle 8.0.3 and 8.0.4: ORA-24329: invalid character set identifier You will also get these errors if you try to change the characterset of a US7ASCII database that was started without a (correct) ORA_NLSxx parameter. See [NOTE:77442.1] It may be necessary to switch off the superset check to allow changes between formally incompatible character sets to solve certain character set problems or to speed up migration of huge databases. Oracle Support Services may pass the necessary information to customers after verifying the safety of the change for the customers' environments. If in Oracle9i an ALTER DATABASE NATIONAL CHARACTER SET is issued and there are N-type colums who contain data then this error is returned: ORA-12717:Cannot ALTER DATABASE NATIONAL CHARACTER SET when NCLOB data exists The error only speaks about Nclob but Nchar and Nvarchar2 are also checked see [NOTE:2310895.9] for bug [BUG:2310895] 4. HANDLING CLOB AND NCLOB COLUMNS ================================== Background ---------- In a fixed width character set codes of all characters have the same number of bytes. Fixed width character sets are: all single-byte character sets and those multibyte character sets which have names ending with 'FIXED'. In Oracle9i the character set AL16UTF16 is also fixed width. In a varying width character set codes of different characters may have different number of bytes. All multibyte character sets except those with names ending with FIXED (and except Oracle9i AL16UTF16 character set) are varying width. Single-byte character sets are character sets with names of the form xxx7yyyyyy and xxx8yyyyyy. Each character code of a single-byte character set occupies exactly one byte. Multibyte character sets are all other character sets (including UTF8). Some -- usually most -- character codes of a multibyte character set occupy more than one byte. CLOB values in a database whose database character set is fixed width are stored in this character set. CLOB values in an Oracle 8.0.x database whose database character set is varying width are not allowed. They have to be NULL. CLOB values in an Oracle >= 8.1.5 database whose database character set is varying width are stored in the fixed width Unicode UCS-2 encoding. The same holds for NCLOB values and the national character set. The UCS-2 storage format of character LOB values, as implemented in Oracle8i, ensures that calculation of character positions in LOB values is fast. Finding the byte offset of a character stored in a varying width character set would require reading the whole LOB value up to this character (possibly 4GB). In the fixed width character sets the byte offsets are simply character offsets multiplied by the number of bytes in a character code. In UCS-2 byte offsets are simply twice the character offsets. As the Unicode character set contains all characters defined in any other Oracle character set, there is no data loss when a CLOB/NCLOB value is converted to UCS-2 from the character set in which it was provided by a client program (usually the NLS_LANG character set). CLOB Values and the Database Character Set Change ------------------------------------------------- In Oracle 8.0.x CLOB values are invalid in varying width character sets. Thus you must delete all CLOB column values before changing the database character set to a varying width character set. In Oracle 8.1.5 and later CLOB values are valid in varying width character sets but they are converted to Unicode UCS-2 before being stored. But UCS-2 encoding is not a binary superset of any other Oracle character set. Even codes of the basic ASCII characters are different, e.g. single-byte code for "A"=0x41 becomes two-byte code 0x0041. This implies that even if the new varying width character set is a binary superset of the old fixed width character set and thus VARCHAR2/LONG character codes remain valid, the fixed width character codes in CLOB values will not longer be valid in UCS-2. As mentioned above, the ALTER DATABASE [NATIONAL] CHARACTER SET statement does not change character codes. Thus, before changing a fixed width database character set to a varying width character set (like UTF8) in Oracle 8.1.5 or later, you first have to export all tables containing non-NULL CLOB columns, then truncate these tables, then change the database character set and, finally, import the tables back to the database. The import step will perform the required conversion. If you omit the steps above, the character set change will succeed in Oracle8(i) (Oracle9i disallows the change in such situation) and the CLOBs may appear to be correctly legible but as their encoding is incorrect, they will cause problems in further operations. For example, CREATE TABLE AS SELECT will not correctly copy such CLOB columns. Also, after installation of the 8.1.7.3 server patchset the CLOB columns will not longer be legible. LONG columns are always stored in the database character set and thus they behave like CHAR/VARCHAR2 in respect to the character set change. BLOBs and BFILEs are binary raw datatypes and their processing does not depend on any Oracle character set setting. NCLOB Values and the National Character Set Change -------------------------------------------------- The above discussion about changing the database character set and exporting and importing CLOB values is theoretically applicable to the change of the national character set and to NCLOB values. But as varying width character sets are not supported as national character sets in Oracle8(i), changing the national character set from a fixed width character set to a varying width character set is not supported at all. Preparing CLOB Columns for the Character Set Change --------------------------------------------------- Take a backup of the database. If using Advanced Replication or deferred transactions functionality, make sure that there are no outstanding deferred transactions with CLOB parameters, i.e. DEFLOB view must have no rows with non-NULL CLOB_COL column; to make sure that replication environment remains consistent use only recommended methods of purging deferred transaction queue, preferably quiescing the replication environment. Then: - If changing the database character set from a fixed width character set to a varying with character set in Oracle 8.0.x, set all CLOB column values to NULL -- you are not allowed to use CLOB columns after the character set change. - If changing the database character set from a fixed width character set to a varying width character set in Oracle 8.1.5 or later, perform table-level export of all tables containing CLOB columns, including SYSTEM's tables. Set NLS_LANG to the old database character set for the Export utility. Then truncate these tables. Restoring CLOB Columns after the Character Set Change ----------------------------------------------------- In Oracle 8.1.5 or later, after changing the character set as described above (steps 3. to 6.), restore CLOB columns exported in step 2. by importing them back into the database. Set NLS_LANG to the old database character set for the Import utility to avoid IMP-16 errors and data loss. RELATED DOCUMENTS: ================== [NOTE:13856.1] V7: Changing the Database Character Set -- This note has limited distribution, please contact Oracle Support [NOTE:62107.1] The National Character Set in Oracle8 [NOTE:119164.1] Changing Database Character set - Valid Superset definitions [NOTE:118242.1] ALERT: Changing the Database or National Character Set Can Corrupt LOB Values <Note.158577.1> NLS_LANG Explained (How Does Client-Server Character Conversion Work?) [NOTE:140014.1] ALERT: Oracle8/8i to Oracle9i using New "AL16UTF16" [NOTE:159657.1] Complete Upgrade Checklist for Manual Upgrades from 8.X / 9.0.1 to Oracle9i (incl. 9.2) [NOTE:124721.1] Migrating an Applications Installation to a New Character Set Oracle8i National Language Support Guide Oracle8i Release 3 (8.1.7) Readme - Section 18.12 "Restricted ALTER DATABASE CHARACTER SET Command Support (CLOB and NCLOB)" Oracle8i Documentation Addendum, Release 3 (8.1.7) - Chapter 3 "New Character Set Scanner Utility" Oracle8i Application Developer's Guide - Large Objects (LOBs), Release 2 - Chapter 2 "Basic Components" Oracle8 Application Developer's Guide, Release 8.0 - Chapter 6 "Large Objects (LOBs)", Section "Introduction to LOBs" Oracle9i Globalization Guide, Release 1 (9.0.1) Oracle9i Database Globalization Guide, Release 2 (9.2) For further NLS / Globalization information you may start here: [NOTE:150091.1] Globalization Technology (NLS) Library index .
     Copyright (c) 1995,2000 Oracle Corporation. All Rights Reserved. Legal Notices and Terms of Use.
Joel P�rez

Non latin character sets and accented latin character with refind

Similar Messages

Maybe you are looking for