Character set mapping (western european to US ascii)

Hi everyone,
I am trying to convert a string with western european character set to US Ascii character set. Idea is to remove all "accent" on top of the "accented characters"
Here is an example .À should be converted to A.
I have tried using the following, but some characters are not converted.
SELECT decompose(to_single_byte('ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßïîíìëêéèçæåäãâáà'))
FROM dual;
The result is :
AAA?A??CEEEEIIII??OOO?O??UUUUY??iiiieeeec??a?aaa
As you can see, 12 characters converted to questions marks, meaning they didnt converted properly like others.
I will appreciate any suggesstions or code example of how people have tried to convert western european character set to US Ascii.
Thanks

According to doc
Where a character does not exist in the destination character set, a replacement character appears. .
I can understand that × Ø Æ Þ ß are not ASCII characters but it's strange that some characters with ~ and ° "accents" are not considered as ASCII. Maybe you can try to code your own function in PL/SQL to handle these special cases and then call this function in SQL.

Similar Messages

Approach to converting database character set from Western European to Unicode

Hi All,
EBS:12.2.4 upgraded
O/S: Red Hat Linux
I am looking for the below information. If anyone could help provide would be great!
INFORMATION NEEDED: Approach to converting database character set from Western European to Unicode for source systems with large data exceptions
DETAIL: We are looking to convert Oracle EBS database character set from Western European to Unicode to support Kanji characters. Our scan results show
both “lossy (110K approx.)” and “truncation (26K approx.)” exceptions in the database which needs to be fixed before the database is converted to Unicode.
Oracle Support has suggested to fix all open and closed transactions in the source Production instance using forms and scripts.
We’re looking for information/creative approaches who have performed similar exercises without having to manipulate data in the source instance.
Any help in this regard would be greatly appreciated!
Thanks for yourn time!
Regards,

There are two aspects here:
1. Why do you have such large number of lossy characters? Is this data coming from some very old eBS release, i.e. from before the times of the Java applet interface to Oracle Forms? Have you analyzed the nature of this lossy data?
2. There is no easy way around truncation issues as you cannot modify eBS metadata (make columns wider). You must shorten or remove the data manually through the documented eBS interfaces. eBS does not support direct manipulation of data in the database due to complex consistency rules enforced by the application itself (e.g. forms).
Thanks,
Sergiusz

Website not displaying correctly. Firefox is changing the character set to Western (ISO-8859-1) automatically.

Normally I have set Firefox (or it's set by default) to Character Set Unicode (UTF-8) and everything displays perfectly. I've never had a problem before.
Now however, whenever I upload my own website, for some bizarre reason on that particular tab (and only that tab) the Character Set is changed over to Western (ISO-8859-1) and then there's a few characters within my site that do not display correctly, namely apostrophes and hypens.
It definitely isn't my software (Serif WebPlus X4) because the page displays correctly in every other browser. Plus it displays correctly in Firefox if I change the Character set back to Unicode.
PS The site is a work in progress

That happens because the server sends a content-type (<b>text/html; charset=ISO-8859-1</b>) via the HTTP response headers and in that case that content type prevails. The page code is saved with an UTF-8 byte order mark (ï»¿) that you see in this case.
*http://web-sniffer.net/?url=http%3A%2F%2Fwww.valuevisionglasses.co.uk&http=1.1&gzip=yes&type=HEAD&uak=0
*http://httpd.apache.org/docs/current/mod/mod_mime.html#AddType

Character set error oracle 10g

I have a 10g TARGET database with a single byte character set of western european and 9i SOURCE databse with multibyte character of UTF8 since the character sets are different to load data from 9i to 10g I am using national character set NCHAR columns on target database to store the multi byte data :
this is the table i am working on loading
CREATE TABLE RAN_TEST1_MDL
( MODEL_ID NUMBER(15) NOT NULL,
PRODUCT_ID NUMBER(15) NULL,
MODEL_CODE NVARCHAR2(540) NULL,
ODM_CODE NVARCHAR2(900) NULL,
MODEL_DESC NVARCHAR2(1200) NULL )
tablespace csn_d_01 LOGGING NOCOMPRESS NOCACHE NOPARALLEL MONITORING
The table is test table on oracle 10g database .
This is the query I am running
INSERT /*+append*/ INTO WORK_HIER_MDL(
MODEL_ID,
PRODUCT_ID,
MODEL_CODE,
ODM_CODE,
MODEL_DESC
SELECT
MODEL_ID,
PRODUCT_ID,
MODEL_CODE,
ODM_CODE,
MODEL_DESC
FROM SHLD_HIER_MDL
shld_hier_mdl is source table from oracle 9i multi byte UTF8 database.
WORK_HIER_MDL is target table on oracle 10g single byte western european databse
Error : ORA-29275: partial multibyte character
When I describe the source table SHLD_HIER_MDL ( on 9i oracle accesed thru a db link ) I get the following error
ORA-01460: unimplemented or unreasonable conversion requested
I think ORA-29275 and ORA-01460 are correleted . Can anyone suggest what could be the cause for this ? Thanks

Error:     ORA-29275 (ORA-29275)
Text:     partial multibyte character
Cause:     The requested read operation could not complete because a partial
     multibyte character was found at the end of the input.
Action:     Ensure that the complete multibyte character is sent from the
     remote server and retry the operation. Or read the partial
     multibyte character as RAW.
you can export the table and import on 10g.Rename the table,create your test table and use IAS.

Customising character sets

We are trying to migrate to an oracle 8.1.7 database with UTF8 as database character set. A lot of the clients (windows) will still be using a company-specific character set, so there was an need to define/compile/install this character set in the NLS directory of oracle using .nlt files and the lxinst utility.
While was trying to do this, a some questions emerged that where not addressed in the oracle documentation for NLS and Globalisation.
The custom character set is based on the US-ASCII character set, so I thought to use
base_char_set = US7ASCII
and define the other characters starting from 0x80 and ending with 0xff
-     but when I tried to compile this charset on AIX (server side) it gave a lot of warnings:
LXI-WARN-00510: In lx22712.nlt at line 88, unicode 0x300 out of private use range
LXI-WARN-00512: In lx22712.nlt at line 88, character 0x80 is remapped
LXI-WARN-00510: In lx22712.nlt at line 89, unicode 0x302 out of private use range
LXI-WARN-00512: In lx22712.nlt at line 89, character 0x81 is remapped
-     when I tried to compile this on windows 2000 (client side) lxinst generates an application error
Because this base_char_set definition did not work (or I made a mistake), I left it out of the definition file, and I mapped the characters using:
character_data = {
0x00 - 0x7f : 0x0000 - 0x007f,
this compiles fine on both platforms. But this means that I have to fill in the classification list, the upper-to-lower and lower-to-upper relationships for all these characters, so it would be a lot easier if this base_char_set worked.
other questions:
-     About the character classification list:
can such a list contain more than 2 classifications? For example
0xC0 = {LETTER, UPPER, PRINTABLE}
what are exactly the differences between those classifications? (for example can a character be a LETTER but not PRINTABLE)
-     What about combining characters: in most character sets (also in ours) a combining character precedes the character it will be combined with, in a string. In Unicode the combining character comes after the character it will be combined with. When oracle converts strings between two such charsets, will the combining characters be handled according to the character set it is converting to, or will the order of characters in the string stay the same?
--Janick

Thanks for the help,
the reason for needing a customised character set is not because of a special input device. Our current client uses this character set (not a standard) to support scripts for different languages. Since we want future clients to use the unicode standard, and want to support more scripts in the future, we are migrating our DB to UTF8, but our old clients should still be able to connect and query the DB (for now).
I cannot find this developers version of the OLB. In
Oracle Technology Network > Software >
there is no entry in the drop down boxes for this (or am I looking in the wrong place?)
thank you a lot for the help,
regards,
--Janick

Urgent : Character set problem

We have one test server in India and another one in Europe. The character setconfigured in Indian server is US7ASCII and in Europe it is configured as WE8ISO8859P1.
We have a routine that encrypts the password using 'crypt' command of HP-UX 10.20. In both the servers the crypt funtionality is working fine. In Indian server the crypted value is stored in the database. Whereas in Europe server when the crypted password is stored in the database, it is replaced with different value than the cyrpted value. This happens only during the updation of the table.
To test this we changed the character set to the european character set in our database(India) and tested it. It failed storing a different value in the database
as it happened for the European database.
We are using Oracle 8.0.6 on HP-UX 10.20.
Anyone please provide me with a solution as soon as possible. Is there any
common character set that we can use for both the regions?
Thanks for your help.

If you didn't already read the document referrred to by Pierre (yes I do acknowledge you rather want others to do your work)
please do so now especially
http://www.oracle.com/technology/tech/globalization/htdocs/nls_lang%20faq.htm#_Toc110410552
Also: could you please drop the MSN language?
This forum isn't especially geared at 12 year olds.
Sybrand Bakker
Senior Oracle DBA

Checking server character set

I think this is a simple question, but for some reason I am not able to find the result.
How can I check the server's default character set?
We have HP-UX (unix) servers, and most databases are set to AL32UTF8/AL16UTF16.
I thought it would be something simple like typing 'locale', but on the HP-UX box(es), it doesn't show any setting for LANG.
I also executed 'env' and looked at all settings, but nothing seems relevant.
Unless I explicitly set the NLS_LANG=AMERICAN_AMERICA.AL32UTF8, when I do an export, it shows:
Export done in US7ASCII character set and AL16UTF16 NCHAR character set
server uses AL32UTF8 character set (possible charset conversion)
Where is it getting the US7ASCII?
If I select * from nls_database_parameters; there is no setting like US7ASCII.
When I do a 'man -k character', I get numerous possible commands, but none of them appear to be something that displays the default character set.
But, as I noted above, when I run an export of our database (without setting the NLS_LANG), it shows me that the default character set is US7ASCII.
So, how do I show this, and if you might also have any ideas how to change this to UTF8.
Thanks.
removed references in this posting to my linux boxes...

The default O/S character set for Unix is 7-bit ASCII. This is element of so-called "C locale". This locale is used by all applications that do not declare themselves sensitive to user locale (i.e. they do not call setlocale() at application startup) and by all applications that have no locale parameters (LANG, or LC_xxxx) set in their environment.
If you call 'locale' in a Unix session that has no locale environment set, the "C locale" should be reported.
"C locale" uses US locale formatting conventions and binary collation.
## "Export done in US7ASCII character set and AL16UTF16 NCHAR character set
## server uses AL32UTF8 character set (possible charset conversion)"
Here, US7ASCII is the default character set of an Oracle Client. It is not directly related to the default O/S character set. US7ASCII is always the default Oracle client character set on all non-EBCDIC platforms, used if NLS_LANG is not explicitly set.
-- Sergiusz

Western european an Polish character set

Dear gurus,
I 'm work on oracle database 10R2 server with Western European character set. We now want to include polish character to suit a new customer. Inorder to keep our wester eurpean languages (English, danish, swedish etc) and the new polish, I converted the database server and the window webserver client to Unicode UTF8 but I am unable to retrieve some polish characters such as ( ą ć ę ł ń ś ż ź) as they are display differently from the original characters.
If the know the best character set/how to configure the database to support both western european (not only english) and polish character sets please share your view.
regards

Before you can determine whether the characters are displayed correctly, you need to determine that they are stored correctly.
The dump function is useful for this
For example if I dump this entire string
select dump('ą ć ę ł ń ś ż ź') from dual;
I get the following
Typ=96 Len=23: 196,133,32,196,135,32,196,153,32,197,130,32,197,132,32,197,155,32,197,188,32,197,186
and dumping individual characters, eg
select dump('ą') from <yourtable>l;
Typ=96 Len=2: 196,133
I do this test with OracleSqlDeveloper running on the Database Server which has a UTF8 characterset
Check out the globalisation guide. The basic principle here is that the NLS_LANG character setting on your client should be the same as on your server.
eg if you are using a windows client and your server NLS_LANG is set to AMERICAN_AMERICA.UTF8 and teh NLS_LANG setting in the windows registry is AMERICAN_AMERICA.ASCII7, then you would need to change the NLS_LANG setting in the windows registry to AMERICAN_AMERICA.UTF8.
This would then resolve the issues with client display.
So,the steps here are (and in order)
1) Ensure your Data is stored correctly on the server (Insert on server )
2) Ensure your Data is retrieved correctly on the server(Retrieve on server)
3) Ensure your data is displayed correctly on the Client(Retrieve on Client)
That should hopefully resolve your problem. The Globalisation guide as mentione dearlier is your place of reference

Oracle 10G support for both Cyrillic and Western European Character Sets

Dear all,
Our DB currently supports western EU characters sets but we need to also support Russian Characters.
Is there a common character set for both? or some trick that does the job?
Thanks.
DB: Oracle 10G R2
OS: Linux
Current Char Set:
NLS_CHARACTERSET     WE8ISO8859P1
NLS_CALENDAR     GREGORIAN
NLS_NCHAR_CHARACTERSET     AL16UTF16

AL32UTF8 will always do the job.
CL8ISO8859P5
CL8MSWIN1251
could to the job according to http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/applocaledata.htm#sthref1960.
Edited by: P. Forstmann on 9 août 2011 17:41

Lanovo ix4 300d and european character set for samba shares

Hi,
I am installing a couple of IX4-300d at a client site, struggling to set and fix the samba character set for the Windows shares. Here in Italy everyone is used to put western european characters into filenames (for example è ò ° €) but the NAS acts wierdly when it finds such caracters in folder names and file names.
As you know samba gives the possibility to declare dos charset= ISO8859-1 and unix charset= ISO8859-1 in the smb.conf file but the option is not present in the IX4 setup pages.
Is there an easy way to circumvent the problem and fix this option? As it is now the machine is fearly useless, at least if you need to migrate files and folder from a server or storage with ISO8859 charset already applied.
thanks a lot
np
Solved!
Go to Solution.

This issue has been raised on all iomega drives since day 1. although 'documented' it is still shame, that units sold to 'r-o-w' means rest-of-world, outside US, are stuckto Aa-Zz09 basically wheras client systems ( windows, linux, ios ) don't care and take what they are supposed to do.
Take it or leave it. ( I mean, take another vendor), I do not believe someone ever has taken that seriously or will do it......
Various PCs / Laptops ( sorry I still really love Dell and Fujitsu ;-))
Supporting Customers ix2s and ix4s -- Love Networking ( not only technically ).
I am not a Lenovo Employee.
If you find a post helpful and it answers your question, please mark it as an "Accepted Solution"!

How to set my character coding to 'Western ISO-8859-1' permanently?

I am having a REALLY annoying problem here. I want to set my character encoding to 'Western ISO-8859-1' permanently. I am able to successfully change it for a while but a couple of minutes barely pass and it reverts back to 'Unicode', even if I am on the same page and idle. I need immediate help here, as I am not able to do an urgent task that requires my encoding to be set at 'Western'.

Go to Tools -> Options, click on the tab Content. In the group "Fonts & Colours" click the button Advanced. In the windows that pops up, set the Default Character Encoding to Western (ISO-8859-1).

Western iso 8859-1 character set is gone, "other (including Western" does not work. Why did you take it out?

Some websites need Western ISO 8859-1 character set to run properly and you have taken it away in favour of "other(including Western)", now the site does not work properly. Why did you take it out and can you please put it back.

ISO-8859-1 and Windows-1252 should be equivalent. Can you provide an example of a page that doesn't display properly?
If you have to manually select a character encoding to view the page correctly, then the site is broken and you should notify its owner that it needs be fixed. Websites specify the character encoding in one of two ways:
* [http://www.w3.org/International/tutorials/tutorial-char-enc/ Handling character encodings in HTML and CSS | W3C]
# The ''Content-Type'' response header.
# The ''meta'' tag in the page source.
''Henri Sivonen (:hsivonen) wrote:''
<blockquote>We are in the process of implementing http://encoding.spec.whatwg.org/ . The process involves removing support for legacy character decoders that aren’t really necessary for supporting existing Web content.</blockquote>

Mapping trademark character u00AE from character set 1406 to character set 1505

Hello!
I have following problem.
In character set 1505 I do not see trademark character - ®,
Character set 1505 is for Russia. This character can be invisible on the screen. It shood be visible on the paper.
This character is under code 0+174 in character set 1406.
How to copy this character from character set 1406 to character set 1505?
Is it possible?
Please help me.
Regards
Bogdan

Hello
I have solved this problem myself.
Solution is very simply. In SapScript need to be putted following string: <347>
This is the registered sign for 1505 character set.
Regards
Bogdan

Problem with character set (ubuntu linux)

hello everyone.
I 'have already installed oracle-xe-universal_10.2.0.1-1.1_i386 in ubuntu.
The problem is that greek characters from the db appear like ??????.
How can i set the right nls_lang character set to solve the problem in ubuntu linux?
Thank you in advance!

Character code point translation is in the realm of the client, try setting NLS_LANG environment variable.
If your client programs handle UTF8, as do many linux utilities, that is the best choice, set your <language>_<locale>.<characterset>, i.e.
export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
Or french_france, or german_germany ... depends on the locale you want to use. Try a `putty` session to the host, the terminal can be set for many different character sets, i.e. ISO8859-<n> a.k.a. the WE8<n> or Western European, UTF, etc. under Window/Translation.
Try a few select ... from dual; statements in sqlplus with a literal, and different unicode values- the unistr() and dump() functions can come in quite handy.
select unistr('\ac20 euro' ) from dual;
... € euro ...
select dump( unistr('\ac20 euro' ) ) from dual;
... <ascii codes for the character values> ...
select dump( unistr('\ac20 euro' ),16 ) from dual; -- this one for hex dump

Oracle XE and character set

Hello all,
I installed Oracle XE on RHEL 4 Linux and I found out that database character set is AL32UTF8. Does anyone know why oracle choose this character set? Maybe because of NLS_LANG env variable? Is it possible to change it to EE8ISO8859P2? Since database is still empty I can drop it and crate new database.
Do you think it is possible to set some env variables and do new oracle xe instalation including database with iso charset?
I want to have EE8ISO8859P2 charset because of doing exp/imp from another oracle iso db to oracle xe and it is much easier to do this without charset conversion.
Any help will be appreciated.
regards,
Miha

When you download XE, you have a choice - take the 'western european' character set download, or the 'unicode' download.
No other choices.
Join us over in the XE forum where people have discussed this and found workarounds. Info about finding that forum at Re: Oracle XE Installation failed

Character set mapping (western european to US ascii)

Similar Messages

Maybe you are looking for