Convertion from ASCII to UTF8

How do we convert the Extended ASCII character to UTF8 without using the ALTER DATABASE CHARACTER SET command

Is [url http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14200/functions027.htm#i77037]convert function ?
SQL> select convert('a','utf8','us7ascii') from dual;
C
a

Similar Messages

Convertion from ASCII to UTF8 on Oracle 8.1.7 via PLSQL

I need to extract a string from a ascii db, put it into a variabile in a plsql procedure, then with a 'magic box' convert it into utf8 and put it into a new utf8 database.
I need the magic box, does exist a tool, or package or procedure or..... that works like that?
Thanks in advance!

I suggest to post this message on the genaral RDBMS or PL/SQL forums
Kuassi

Problem in Database convertion from US7ASCII to UTF8

Hi,
We are facing the following problem while converting the database from US7ASCII to UTF8:
We have recently changed the database character set from US7ASCII to UTF8 for the internationalization
purpose. We ran the Character set scanner utility and it did report that some data may pose problems.
We followed the the below mentioned process to convert into UTF8 -
1) alter database character set utf8
2) alter database national character set utf8.
Now we find some problem while working with the old data in our application which is java based.
We are getting the following error "java.sql.SQLException: Fail to convert between UTF8 and UCS2: failUTF8Conv".
We further analyzed our data and found some interesting things :
e.g.
DB - UTF8.
NL_LANG also set to UTF8.
Select name from t1 where name like 'Gen%';
NAME
Genhve
But when we find out the length of the same data it show like this
NAME LENGTH(NAME) VSIZE(NAME)
Genhve 4 6
The question is why is it showing length as 4 only and when we try to use a substr function
its extracting like the following :-
select name,substr(name,4,1) from t1 where name like 'Gen%';
NAME SUB
Genhve hve
We have execute the above queries on US7ASCII DB and it is working fine, length it shows 6
and using SUBSTR it extracts just 'h' as well.
We also used dump function on the UTF8 Db for the above query,,this is the result :-
select name,length(name),vsize(name),dump(name) from t1 where name like 'Gen%';
NAME LENGTH(NAME) VSIZE(NAME) DUMP(NAME)
Genhve 4 6 Typ=1 Len=6: 71,101,110,232,118,101
We checked a lot with the data and it seems 'h' (accented e) is posing the problem.
We want to know where is the problem and how to overcome this.
Further, we tried all of the following :
1)
Export Server: US7ASCII
Export Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well
Import Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well
Import Server: UTF8
RESULT: Acute e became h
2)
Export Server: US7ASCII
Export Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well
Import Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8
Import Server: UTF8
RESULT: IMP 00016 error
3)
Export Server: US7ASCII
Export Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8
Import Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well
Import Server: UTF8
RESULT: Acute E became h
4)
Export Server: US7ASCII
Export Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8
Import Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8
Import Server: UTF8
RESULT: Acute e became h
5)
Tried using Update sys.props$
set value$='UTF8'
where name='NLS_CHARACTERSET'
RESULT: Acute e shows properly but it gives problem in the application
"java.sql.SQLException: Fail to convert between UTF8 and UCS2: failUTF8Conv"
Looking further it was observed the following:
when you try this command on a column 'city' in a table which contains 'Genhva' (note the acute e after n), it shows
command: select length(city), vsize(city),substr(city,4,1),city from cities
Result: 4 6 hva Genhva
if you see the value of substr(city,4,1) , you will see the problem. Also note that the length shows 4 and size shows 6. Moreover, when these records are selected through JDBC Driver, it starts giving problems.
6)
Actually the above (point no. 5) is similar to changing the character set of the database with 'ALTER DATABASE CHARACTER SET UTF8'. Same problem is observed then too.
7)
We have also tried to with another method, that is by changing the third byte of the export file which specifies the character set, to the UTF8 code by editing the export file with a Hexdecimal editor. After import the same problem has been observed as defined in (5) and (6) above.
We have no more ideas how to migrate without corrupting the data. Of course we have known the records where these characters occur through the Oracle's cssacn utility but we do not want to manually rectify each and every record by replacing it with an ASCII character. Any other idea as to how this can be accomplised?
Thanx
Ashok

The problem you have is that although your original database is defined as US7ASCII, the data it contains is more than is covered by this code page (as the reply on Sept 4 to the previous posting already said).
This has probably happened because the client was also defined as US7ASCII, and when the DB and client are defined as having the same character set no conversion (or checdking) takes place when data is passed between them. However if you are using a Windows client then it will in fact be using Windows code page 1252 (Latin-1) or similar, and this allows many more characters, including h (accented e). So a user can enter all these characters and store them in the database, and similarly read them from the database, because data transfer is transparent.
When you did ALTER DATABASE CHARACTER SET UTF8 this will only change the label on the database, but not affect the contents. However only part of the contents are valid UTF8, any character above 7F (like h) is invalid. If your original client now uses the database, code page transformation will take place because the client and DB have different character sets defined. The invalid codes can then cause problems.
Without being able to explain what has happened in detail, it may help to see what your h (dec 232, x'E8') looks like. The actual data has not changed (you can see this as it is reported as 232). However the binary code there (11101000) is invalid UTF8. UTF8 encodes a character in 1 to 4 bytes, and the first bits in a UTF8 character tell how many bytes it uses. 0xxx tell it is one byte (same as the corresponding USASCII character), 110x that it uses 2 bytes, 1110 that it uses 3 bytes etc. So if you interpret what is there as UTF8 it looks like the first byte of a 3-byte character, which explains why the substringing is giving you the other 2 bytes as well.
Can you fix this without losing data? I believe yes. First you should check what other characters are being flagged by the scan. See if these are contained in another standard character set. If they are only Western European accentet characters then WE8ISO8859P1 is probably ok, but watch out for the euro sign which Windows has at x'80', an undefined character in ISO8859-1.
You can see the contents of the Microsoft Windows Codepage 1252 at: http://www.microsoft.com/globaldev/reference/sbcs/1252.htm
For a listing of the US-ASCII defined characters see http://czyborra.com/charsets/iso646.html and for ISO 8859-1 see http://czyborra.com/charsets/iso8859.html#ISO-8859-1
If all is well, you can first ALTER DATABASE CHARACTER SET to WE8ISO8859P1. This will not change any data, but ensure that all the data gets exported. Then export the DB and import it to a UTF8 DB. This will convert the non-US-ASCII characters to Unicode. You will also have to change the clients character set to something other than USASCII or they will just see ? for the other characters.
Good Luck!

Convert from utf16 to utf8 ?? er?

Dear list,
I have recently seen a sample to convert a utf16 string to utf8. I am a little bit confused. I thought utf16 was a superset of utf8. Could please someone explain why this is necessary sometimes ?
regards
Ben

how can utf16 be a superset of utf8. I thought this
relationship was similiar to ASCII and utf8/utf16,
where for example the space bar has a value of 32 in
ASCII and Unicode (utf8 and utf16).... This been tjhe
case there is not much need for a utf8 to utf16
conversion program.I didn't say it was a superset. It is a different way of representing the same thing.
>
You say that utf16 is ALWAYS 2 bytes, and utf8 is
usually 8 bits but is variable when necessary. Is
utf16 not a variable byte character set ? No.
The name
according to this, utf8 and utf16 is somewhat
misleading as they are NOT always 8 or 16 bytes.
And "java" is neither an island nor a beverage. The name does not convey the entirety of the subject.
characters the first byte (or 2) is an 'escape' bytewhich means that more bytes are needed.
What do you mean by first or (2). escape byte?
When something sees a given specific byte then then it knows that there are a certain number of bytes after that are needed to fully represent the character.
I am still not convincedConvinced?
If you do not find my explaination satisfactory then you might try writing some code that converts to UTF16 and UTF8 using String.getBytes(String).
You might also try to find the character set definitions.

Convertion from Unicode to UTF8

I want to convert some string having Unicode chars into a string with UTF8 char. I used following code snippet:
try {
String str = new String(givenString);
String utfStr = new String(str.getBytes("UTF-8"), "UTF-8");
System.out.println("Converted:" + str + " to:" + utfStr);
} catch (Exception e) {
e.printStackTrace(System.out);
I also tried :
Charset utf8Charset = Charset.forName("UTF-8");
CharsetEncoder encoder = utf8Charset.newEncoder();
CharsetDecoder decoder = utf8Charset.newDecoder();
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(givenString));
CharBuffer cbuf = decoder.decode(bbuf);
String dest = cbuf.toString();
When Java tries to encode Unicode to UTF-8 and it runs into an unknown character (typically a character that is in the High Ascii range) it substitutes it with '?' or some other wierd character.
How do I prevent this.

Where is this string coming from? Are you initializing it in your source code as a String literal? String str = "A�roport Princesse B�atrix"; If so, you need to make sure the .java file is saved in an encoding that can handle all of the characters. ISO-8859-1, windows-1252, and of course, UTF-8 will all suffice. You also need to make sure the compiler reads the source file with the correct encoding. For example, if you saved your source files as UTF-8, you would do this: javac -encoding UTF-8 *.java Finally, before you print the text to the console, you need to make sure the console is using an encoding that can handle it. On my WinXP box, the default encoding (or codepage, as they call it) for console windows is cp437, which doesn't support accented characters. You can change it with the "chcp" command, like so: chcp 1252 Unfortunately, chcp won't accept UTF-8 or any other Unicode encoding, but cp1252 can handle the accented characters in your string. Note that you don't need to specify that encoding in your code; the Java runtime detects it automatically.
>
If you see question marks or some other placeholder character when viewing output, that's probably because the terminal or whatever doesn't have the fonts available to render those characters.>
No, question marks always indicate an encoding problem. If the character is valid but the font lacks a glyph for it, it shows up as a little rectangle.

How to convert from UNICODE (UTF16) to UTF8 and vice-versa in JAVA.

Hi
I want to insert a string in format UTF16 to the database. How do I convert from UTF16 to UTF8 and vice- versa in JAVA?. What type must the database field be? Do I need a special set up for the database (oracle 8.i)?
thanks
null

I'm not sure if this is the correct topic, but we are having problems accessing our Japanese data stored in UTF-8 in our Oracle database using the JDBC thin driver. The data is submitted and extracted correctly using ODBC drivers, but inspection of the same data retrieved from the GetString() call using the JDBC thin driver shows frequent occurrences of bytes like "FF", which are not expected in either UTF8 or UCS2. My understanding is that accessing UTF8 in Java should involve NO NLS translation, since we are simply going from one Unicode encoding to another.
We are using Oracle version 8.0.4.
Can you tell me what we are doing wrong?
null

9.2 convert ASCII to UTF8 welsh language

hello
I have a 9.2 ascii database that i cant convert to UTF8 yet
1 for an output (util file) i need to convert an ascii text string to utf-8 on export
2 i have two characters that are not supported by ascii, ŵŷ the users will represent these by typing w^y^
I tryed using UNISTR but non of the characters below are corectly converted
SELECT UNISTR(ASCIISTR( 'ÂâîÊêô)) FROM DUAL ;
how would you recomend converting a ascii latin 1 extended string to UTF-8 for export?
is it sencible to use the character replacement plan above for ŵŷ?
thanks
james

Probably the unconverted characters are not contained in the first charset.
If this is right.
http://en.wikipedia.org/wiki/Windows-1252
...there is no conversion for values outside the first charset.
But I may made a mistake.
Are you sure Â, â, î, Ê, ê and ô are in the 1252 charset?
I am not able to see if there is a difference between the similar chars in the table on wikipedia and the ones you posted, that is why I asked.
Anyway this output seems to verify my indication.
Processing ...
SELECT convert ('Ââî€Êêô','WE8MSWIN1252','UTF8') FROM DUAL
Query finished, retrieving results...
CONVERT('¨âî€¨êô','WE8MSWIN1252','UTF8')
¨¨¨¨
1 row(s) retrieved
Processing ...
SELECT convert ('Ââî€Êêô','UTF8','UTF8') FROM DUAL
Query finished, retrieving results...
CONVERT('¨âî€¨êô','UTF8','UTF8')
¨âî€¨êô
1 row(s) retrieved
Processing ...
SELECT convert ('Ââî€Êêô','UTF8','WE8MSWIN1252') FROM DUAL
Query finished, retrieving results...
CONVERT('¨âî€¨êô','UTF8','WE8MSWIN1252')
¶¨Ç?¶îÇ?¶¨Ç?¶ô
1 row(s) retrieved
Processing ...
SELECT convert ('Ââî€Êêô','WE8PC858','UTF8') FROM DUAL
Query finished, retrieving results...
CONVERT('¨âî€¨êô','WE8PC858','UTF8')
1 row(s) retrieved
Processing ...
SELECT convert ('Ââî€Êêô','UTF8','WE8PC858') FROM DUAL
Query finished, retrieving results...
CONVERT('¨âî€¨êô','UTF8','WE8PC858')
ƒ??Ç?¶¯Ç?ƒ??ƒ??Ç?
1 row(s) retrievedSome characters are not supported on my DB so try these queries on yours to prove it.
SELECT convert ('Ââî€Êêô','WE8MSWIN1252','UTF8') FROM DUAL;
SELECT convert ('Ââî€Êêô','UTF8','UTF8') FROM DUAL;
SELECT convert ('Ââî€Êêô','UTF8','WE8MSWIN1252') FROM DUAL;
SELECT convert ('Ââî€Êêô','WE8PC858','UTF8') FROM DUAL;
SELECT convert ('Ââî€Êêô','UTF8','WE8PC858') FROM DUAL;Bye Alessandro

Two Spaces being converted to from ascii 32,32 to ascii 160,32

Two Spaces being converted to from ascii 32,32 to ascii 160,32 when the page is
loaded by the browser.
This problem is exhibited when viewing an address page in personal data that contains
two spaces in the address field. i.e.. APT102- 102 Elm St.
We put debug traces in to display the ascii character values of the spaces before
and after the record is brought up. This debug showed that the two spaces change
from
Before: space,space or ASCII(32),ASCII(32)
After: ASCII(160) ASCII(32)
This is causing the application to display a record change warning. This occurs
on IE 5.5, 6.0, Netscape 6, but DOES NOT occur using the Opera 6 browser. For
what it's worth, the database is Non-unicode.

Forgot to add:
This is WEBLOGIC SERVER 5.1 sp 9 running on Win2k

Convert 10.2.0.4 RDBMS from WE8ISO8859P1 to UTF8 without install new langua

We are in 11.5.10.2 Ebusiness suite with 10.2.0.4 RDBMS. thinking to take advantage a downtime to convert the darabase character set from WE8ISO8859P1 to UTF8 right now even we ONLY want to install and configure new language in year or two in the future. I have a lot of questions and hope someone can answer them
1) Is that ok to convert the database character set witout doing anything in the apps side? Not even change any setting in apps?
2) I know Oracle is recommned AL32UTF8 for E-business suite.. but for Rel 12 only. Am I have the right information?
3) I found someone post in one of the Forum in here .. that he use CSSCAN to scan the database but realized it only change the metadata not the user data..... I though CSSCAN is only for scanning to report potential data.. should not change anything in database? I wonder he mean CSaLTER instead.... but only metadata huh??
4) I also read some people use expdp/impdp to perform the character set conversion... but I also read that oracle recommend to use exp/import utility only. Is that right? What is the better method for character set conversion if both of them are valid path?
thanks Fushan

Hi
Apart from Srini inputs, here is my inputs.
*1) Is that ok to convert the database character set without doing anything in the apps side? Not even change any setting in apps?*
No issues to convert without adding new language in EBS.
*2) I know Oracle is recommend AL32UTF8 for E-business suite.. but for Rel 12 only. Am I have the right information?*
Please note that AL32UTF8 is not certified for Oracle E-Business Suite 11i.
Note.124721.1 Migrating an Applications Installation to a New Character Set:
This is documented in Note:222663.1
Note 179133.1 The correct NLS_LANG in a Windows Environment
Note 264157.1 The correct NLS_LANG on Unix Environments
*3) I found someone post in one of the Forum in here .. that he use CSSCAN to scan the database but realized it only change the metadata not the user data..... I though CSSCAN is only for scanning to report potential data.. should not change anything in database? I wonder he mean CSaLTER instead.... but only metadata huh??*
Please follow below blog, you will get clear picture
Changing the Database Character Set ( NLS_CHARACTERSET ) [ID 225912.1]
Note 276914.1 The National Character Set in Oracle 9i and 10g
Note 458122.1 Installing and Configuring Csscan in 8i and 9i (Database Character Set Scanner)
Note 745809.1 Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner)
Note 444701.1 Csscan output explained
http://www.oracle-base.com/articles/10g/CharacterSetMigration.php
http://repettas.wordpress.com/2008/05/16/national-character-set-in-oracle-9i-and-10g/
http://avdeo.com/2010/11/01/converting-migerating-database-character-set/
*4) I also read some people use expdp/impdp to perform the character set conversion... but I also read that oracle recommend to use exp/import utility only. Is that right? What is the better method for character set conversion if both of them are valid path?*
yes.
Note 227332.1 NLS considerations in Import/Export - Frequently Asked Questions

Converting server from Big5 to UTF8.

I try to convert oracle 8i server from big5 to UTF8 by using "alter database character set utf8". But it tells me "ORA-12712: new character set must be a superset of old character set. Is there any way to enforce this conversion? I understand there will be data conversion problem that I don't care.
Rick Lin
null

Why would you want to do this ? In UTF8 all your 2 byte chinese BIG5 characters will be increased to 3 bytes in length. Basically each character needs to be converted when you migrate your database to UTF8, since their current binary representations are not longer valid.
The ALTER DATABASE CHARACTER SET doesn't perform any character conversion ,if your operation proceed without the error, then all you chinese data will be lost !

Convert String from ASCII to ANSI

Hi,
a command line instruction via LabVIEW function "System Exec.vi" retrieves me a string in ASCII format. Is there a function to convert string from ASCII to ANSII format? I use LabVIEW 8.5 German Installation.
Kind Regards
Christian
Test Engineering
digades GmbH
www.digades.com

Hallo Christian,
AFAIK there is no such in function in LabView...
But you can:
- use "Search and replace string" to search for ASCII chars and replace them by their corresponding ANSI char, do this in a loop for all chars to be replaced (acceptable speed for small strings...)
- convert the string to an U8 array and use a lookup table to convert all bytes from ASCII to ANSI, convert back to string (may be faster for long strings...)
Best regards,
GerdW
CLAD, using 2009SP1 + LV2011SP1 + LV2014SP1 on WinXP+Win7+cRIO
Kudos are welcome

JDeveloper3.1.1.2 problem:convert from UTF8 to UCS2 failed

oracleTeam:
i test JDeveloper3.1.1.2,it has problem in runtime:convertion from UTF8 to UCS2 failed ,AttributeLoadException.( our language is chinese)
I found that oracle\jbo\server\QueryCollection.class in dacf.zip maybe has problem,i use this class of JDeveloper3.1 to repalce same_name class in JDeveloper3.1.1.2,above problem disappeared,
but because this class is not suit of JDeveloper3.1.1.2,other problems appeared.
so you should work out this problem ,i hope
it runs correctly.

I searched this forum and the SQLJ/JDBC forum, and found a few occurrences of this problem. Among the things people suggested:
* Changing JDBC drivers (experience varied as to which one fixed the problem)
* Adding nls_charset1x.zip to your CLASSPATH
* Ensuring you're using the same character set on the client and server.
I suggest you take a look at the following discussion threads: http://technet.oracle.com:89/ubb/Forum8/HTML/001810.html http://technet.oracle.com:89/ubb/Forum8/HTML/000065.html http://technet.oracle.com:89/ubb/Forum2/HTML/000820.html
Blaise
null

How to convert an ascii file into dBase .dbf file type

Does any one out there know how to convert an ascii file(which is generated from PL/SQL script) into a .dbf (dBaseIII) file type? Thanks in advance.

I haven't worked with dBase for about 20 years, but I seem to recall it having an IMPORT command for that purpose. But maybe I'm wrong...

Convert from latin iso-1 charset to varchar2

Hi,
how can i convert from latin iso-1 charset to varchar2?
for example the & sign that is & amp;
Cheers.
Message was edited by:
edfimasa
Message was edited by:
edfimasa

You cannot do conversion from ISO 8859-1 to UTF-8 in-place because the UTF-8 version will generally be longer (unless you convert a pure ASCII file, which does not need conversion in the first place). Therefore, you would have to overwrite what you have not read yet. Instead, convert to a new file with a temporary name, drop the original and rename the temporary back to original. This is not that complicated.
If the problem is that you want to overwrite a file already open by the database, then rename the incoming file first and then convert copying to the target.
-- Sergiusz

Convert from NVARCHAR2 to Unicode in SQL Plus

I need to convert from NVARCHAR2 column data to Unicode format in a query. How can I do this?

I need to convert from NVARCHAR2 column data to Unicode format in a queryMaybe with convert:
SQL> select convert(n'ABC', 'utf8') abc from dual;
AB
┐
1 row selected.?

Convertion from ASCII to UTF8

Similar Messages

Maybe you are looking for