To Determine Unicode Datatype encoding

Hi,
Going through the Oracle documentation found that Oracle Unicode datatype (NCHAR or NVARCHAR2) supports AL16UTF16 and UTF8 Unicode encodings.
Is there a way to determine which encoding is being used by Oracle Unicode datatypes thorugh OCI interface?
Thanks,
Sachin

That's a rather hard problem. You would, realistically, either have to make a bunch of simplifying assumptions based on the data or you would want to buy a commercial tool that does character set detection.
There are a number of different ways to encode Unicode (UTF-8, UTF-16, UTF-32, USC-2, etc.) and a number of different versions of the Unicode standard. UTF-8 is one of the more common ways to encode Unicode. But it is popular precisely because the first 127 characters (which is the majority of what you'd find in English text) are encoded identically to 7-bit ASCII. Depending on the size and contents of the document, it may not be possible to determine whether the data is encoded in 7-bit ASCII, UTF-8, or one of the various single-byte character sets that are built off of 7-bit ASCII (ISO 8859-15, Windows-1252, ISO 8859-1, etc).
Depending on how many different character sets you are trying to distinguish between, you'd have to look for binary values that are valid in one character set and not in another.
Justin

Similar Messages

What determines the file encoding for ${C:file.txt} = 'abc' ?

What determines the file encoding for
${C:file.txt} = 'abc'
I'm always getting ASCII as the encoding for file.txt after executing that assignment.

Thanks so much. I'll keep looking for the MSFT doc on this. I scanned Bruce Payette's book and did not find anything there.
It turns out to be one of those "by rote" things you have to learn about PowerShell.
My concern about the lack of documentation is that MSFT might change the underlying code in the future to use Unicode and that might break some existing code. If there was some MSFT provided documentation declaring ASCII as the intended encoding they
might provide plenty of warning if they do a switch in encoding.
I note also that if you try to write characters outside the ASCII set (see example below) that character substitution happens to find an ASCII character to use in place of the one outside the ASCII set. In the example below a 'v' is substituted for
the '√' character:
${C:xo.txt} = '√'

CMP Bean's Field Mapping with oracle unicode Datatypes

Hi,
I have CMP which mapps to RDBMS table and table has some Unicode datatype such as NVARCHAR NCAR
now i was woundering how OC4J/ oracle EJB container handles queries with Unicode datatypes.
What i have to do in order to properly develope and deploy CMP bean which has fields mapped onto the data based UNICODE field.?
Regards
atif

Based on the sun-cmp-mapping file descriptor
<schema>Rol</schema>
It is expected a file called Rol.schema is packaged with the ejb.jar. Did you perform capture-schema after you created your table?

Gujarati unicode font encoding problem in E-63

I can´t read e-mails containing Gujarati Unicode Font in my E-63 even if I have selected Unicode for encoding. The same mail is easily rea dable in 6303i classic.Wat changes shud I do in my E-63 to enable for gujarati unicode font?

Dear gujarati, Written below is the sample of "Shruti" Gujarati Unicode fonts: આ ફોન્ટ ’શ્રુતિ‘ યુનિકોડ ફોન્ટ છે.જે હું મારા નોકિયા E-63માં ઉકેલી શક્તો નથી. અક્ષરને બદલે માત્ર લંબચોરસ ચોકઠા જ દેખાય છે.આપનો ખૂબ આભાર. Thanks.

Unicode datatypes v.s. unicode databases

We have a legacy system that is implemented in Powerbuilder and C++. We are pretty sure about which columns we need to convert to support Unicode. Besides, some of our clients have cooperate standard (AMERICAN_AMERICA.WE8MSWIN1252) for the NLS_LANG on the Oracle clients set up, .
Therefore, we decided to use the unicode datatypes approach to only update the columns identified to NVARCHAR2 and NCLOB with AL16UTF16 as the national character set. Our understanding is that this is the safe and easy way for our situation since both C++ and Powerbuilder support UTF-16 standard as default. This will not require any change on the NLS_LANG set up.
However, one of our clients seems to have strong opinions against the unicode datatypes option and would rather migrating the entire database to be Unicode. The client mentioned that "AL16UTF16 has to be used in a Unicode database with UTF8 or AL32UTF8 as the database character set in order to display characters correctly". To our knowledge and understanding we have not heard about this requirement. I didn't see anything like this in Oracle official document.
Could anyone advise if Unicode database is really better than Unicode datatype option?
Thanks!

Besides, some of our clients have cooperate standard
(AMERICAN_AMERICA.WE8MSWIN1252) for the NLS_LANG on
the Oracle clients set up, . This might even be necessary requirement since they are using Windows-1252 code page.
that "AL16UTF16 has to
be used in a Unicode database with UTF8 or AL32UTF8
as the database character set in order to display
characters correctly".Hard to say without knowing what they refer to specifically.
They might have been thinking about the requirement to use AL32UTF8, depending on how binds are done. If you insert string literals, which is interpreted in the database character set, into NCHAR columns, you obvisouly need a character set that supports all characters you are going to insert (i.e. AL32UTF8 in unicode case).
This is described very clearly by Sergiusz Wolicki, in Re: store/retrieve data in lang other than eng when CHARACTERSET is not UTF8.

Unicode datatype

Hi,
Â· Unicode database (changing the database character set to AL32UTF8) is working fine, we tested with the with asp .NET application, we are able to see English, Japanese, Arabic and Urdu.
Â· Unicode datatype (database character set is default âWE8MSWIN1252â), with column datatype as NVARCHAR2, we are able to enter any language, but while querying from database the values are displayed as inverted â???????â. We tried the above as per the oracle documentation (Globalization Support Guide â
Chapter 5 - Supporting Multilingual Databases with Unicode - a96529.pdf) but still it displays junk characters only.
Is any client setting am missing here?.
Thanks in Advance.

There is no character set that supports both Arabic and Japanese data other then a Unicode character set. The restriction you are encountering should only be for string literals you are trying to load into Unicode datatypes. For literals in this scenario where the database character set does not support the characters in the literal string the only work around is to use UNISTR. This problem with Unicode datatypes and literals was addressed in 10gR2.

How to Determine Text File Encoding is UNICODE

Hi Gurus,
How to determine whether the file is a UNICODE format or not?
I have the file stored as a BLOB column in a table
Thanks,
Sombit

That's a rather hard problem. You would, realistically, either have to make a bunch of simplifying assumptions based on the data or you would want to buy a commercial tool that does character set detection.
There are a number of different ways to encode Unicode (UTF-8, UTF-16, UTF-32, USC-2, etc.) and a number of different versions of the Unicode standard. UTF-8 is one of the more common ways to encode Unicode. But it is popular precisely because the first 127 characters (which is the majority of what you'd find in English text) are encoded identically to 7-bit ASCII. Depending on the size and contents of the document, it may not be possible to determine whether the data is encoded in 7-bit ASCII, UTF-8, or one of the various single-byte character sets that are built off of 7-bit ASCII (ISO 8859-15, Windows-1252, ISO 8859-1, etc).
Depending on how many different character sets you are trying to distinguish between, you'd have to look for binary values that are valid in one character set and not in another.
Justin

How do I determine what text encoding a database is using?

Hello,
How can I determine the multibyte text encoding (UTF8, UTF16, etc.) that a database is using? I presume that I can query a system table, but I haven't been able to determine which one.
Thanks.
Bob

SQL> select * from nls_database_parameters where parameter like '%SET%';
PARAMETER                      VALUE
NLS_CHARACTERSET               WE8MSWIN1252
NLS_NCHAR_CHARACTERSET         AL16UTF16

UTF-8 to non-unicode RFC - encoding

Hi,
I get data via SOAP UTF-8 and send them with some simple mappings to a RFC receiver non-unicode. How can I post special characters?
In sender payload i see and i expected to get A & B in SAP but i get A & B in SAP. In the RFC receiver adapter i don't see any setting for codepage.
Anyone an idea?
Regards
Jörg

Hi,
The first thing is that XI is unicode so if you are sending special char which are UTF-8 encoded then you should be seeing them in XI in the payload monitor. Can you check that first?
Now if the target is Not Unicode then what is the encoding on the target side? You can encode these special chars in XI and then pass it on in the encoding format the target system expects. One more thing in the SM59 transaction in XI for the Receiving R/3 system you can specify the Char Set encoding. Make sure you check and based on that your encoding should work. try this out.
thanks
Ashish

How to determine the character encoding of a string

I'm under the impression (however misguided this may be) that one of our Databases (it is set to 8859) is outputting it's values as 8859 charset to Java, who in turn is preserving this encoding.
Printout on the contents of data.get(info) yields garbage.
However, Doing
value = new String((data.get(info).getBytes("ISO-8859-1")), "UTF-8");and printing 'value' yields the proper Asian characters.
Is there a way to determine the Charset of a string somehow?

Whereas some database drivers for other languages (C,
C++) make it possible to set the encoding on the
client side irrespectively to the database encoding
setting. So you can skew your database with those
programs.Some JDBC drivers allow one to specify the encoding as part of the connection URL. This may be a general property of JDBC drivers but, since I have now retired, my JDBC knowledge is getting as dated as I am so I don't really know.

How do I determine correct media encoder settings for Blu-Ray

I am successfully producing Hi Def Blu-Ray disks by using one of the presets available in Premiere media encoder. Actually using the HDTV 1080i 25 High Quality setting. They playback in beautiful high definition but I think I was just lucky. I would really like to know what determines the setting I should use.
1. I live in Australia so I assume PAL.
2. My camera is a Sony HDV. HCR HC9 1440 x 1080i. I assume this is irrelevant to export encoding settings for Blu-Ray.
3. The export encoder preset has field order set to Upper. I would have thought I should either match the project settings which is interlaced to match my camera and project. OR more likely I should match the audience TV which would be a widesceen plama or LCD TV. I dont know if these TVs are interlaced or progressive. OR is it more a question of what field order the Blu-Ray player expects me to send on the disk?
4. I will play on a TV 1920 x 1080 pixel capable and on slightly lower resoultion widescreens. The encoder preset is for 1920 x 1080 so I assume that's ok for all widescreen TVs even lower definition ones.
As you can tell I am a learner. Any suggested links to basic reading on the subject would help me.
Robert
PS Put this item in wrong forum previously.

Since no response I've been searching and found the following.
When encoding to make a Blu-Ray disk from my HDV 1080i footage:
1. Tell encoder the source is 1080i That is interlaced and it's TOP field first for HDV footage.
2. Tell it to encode to 1440 x 1080 resolution. You could tell it 1920 x 1080 but it will take much longer to encode. Will be good but why bother. All BD players will automatically up the 1440 to 1902 full screen.
3. Beware, not all players support BD-R and BD-RE recordable disks.
Seems Panasonic is good. Samsung probably good. Sony and Sharp probably not able to read (yet).
4. All players read MPEG-2 compression and H264 compression. Use a bit rate of 25 Mbps for MPEG-2 encode or 18 Mbps for equivalent result in H264 encode. Can use higher rates. These are minimums for a very good result
5. One pass encode is pretty damn good but use two pass if you have the time to wait for the encode.

Determining Unicode Values

Is there a reliable way to determine the Unicode value of any character? For example, the following codeString string = "Việt";
for (int i = 0; i < string.length(); ++i)
char c = string.charAt(i);
System.out.println(c + " = " + Character.getNumericValue(c));
}Gives this output:
V = 31
i = 18
? = -1
t = 29
Not too helpful, since the one character I really care about is shown as -1.

Well, I'm still somewhat confused (and, in my
corporate environment here, am limited to J2SE 1.4).
Why would a character not have a numeric
value? In other words, under what circumstances is
the -1 returned?I think you're misconstruing the meaning of "numeric value" as it's intended here. Numeric value just means that "3" has a numeric value of 3, "7" -> 7, and so on. It doesn't mean that characters like '%', '(', etc. have a numeric value. From the API:
Returns the int value that the specified Unicode character represents. For example, the character '\u216C' (the roman numeral fifty) will return an int with a value of 50.
The letters A-Z in their uppercase ('\u0041' through '\u005A'), lowercase ('\u0061' through '\u007A'), and full width variant ('\uFF21' through '\uFF3A' and '\uFF41' through '\uFF5A') forms have numeric values from 10 through 35. This is independent of the Unicode specification, which does not assign numeric values to these char values.
If the character does not have a numeric value, then -1 is returned. If the character has a numeric value that cannot be represented as a nonnegative integer (for example, a fractional value), then -2 is returned.
To get the Unicode value of a char, just cast to an int as previously shown by DrClap (and myself, though less clearly so).
~

How do we determine the JVM encoding of an R/3 system sending idoc to EAI?

Hi experts,
can you please help me find out the JVM encoding or our R/3 system. We are currently sending an IDOC via RFC connection to webmethods. the recieving side is not able to process special characters like the TradeMark (TM) superscript, and they are suspecting that the JVM encoding is not in sych. I have search already to find this out in the R/3 system but have not found any clue. I am a bit skeptical though, this is the first time i heard that JVM is being used on RFC.
Thanks in Advance!
Regards,
Alden

Ken, take a look at the value being mapped to the E1EDK01-BSART field. Is it 'CRME' or 'INVO' or other? This value controls the setting of the variable 'bsart_cremem' and subsequent debit/credit handling from there in the IDOC process code.

Unicode - DataType Currency error

Hi experts.
Please can you help me?
I used below method instead of move clause.
I can transfer (wa_table> to buffer.
But i found ##―ఀ###ఀ ###ఀ contents in Buffer.
This filed of buffer is Curr(15.2) datatype.
Please Can notice me how can slove this problem ?
Thanks.
DATA: buffer(30000) OCCURS 10 WITH HEADER LINE.
DATA : st_table TYPE REF TO data,
tb_table TYPE REF TO data,
FIELD-SYMBOLS : <wa_table> TYPE ANY,
<it_table> TYPE STANDARD TABLE,
<wa_table2> TYPE ANY.
CREATE DATA : tb_table TYPE TABLE OF (query_table), "Object Create.
st_table TYPE (query_table).
ASSIGN : tb_table->* TO <it_table>, "INTERNAL TABLE.
st_table->* TO <wa_table>. "WORK AREA.
SELECT * FROM (query_table)
INTO CORRESPONDING FIELDS OF TABLE <it_table> WHERE (options).
LOOP AT <it_table> INTO <wa_table>.
CLEAR buffer.
CALL METHOD cl_abap_container_utilities=>fill_container_c
EXPORTING
im_value = <wa_table>
IMPORTING
ex_container = buffer
EXCEPTIONS
illegal_parameter_type = 1
OTHERS = 2.
APPEND buffer.
endloop.

Hello Kalidas
Here is a simple "smoke test". Try to see if the system accept the following statement:
" NOTE: Try to write the packed field only
WRITE: / i_z008-packed_field.
If you receive an error you cannot WRITE packed values directly.
Alternative solution: write your structure to a string.
DATA:
ls_z008 LIKE LINE OF i_z008,
ld_string TYPE string.
LOOP AT i_z008 INTO ls_z008.
CALL METHOD cl_abap_container_utilities=>fill_container_c
    EXPORTING
      im_value = ls_z008
    IMPORTING
      ex_container = ld_string.
WRITE: / ld_string.
ENDLOOP.
Regards
Uwe

Moving to unicode datatype for an entire database - SQL Server 2012

Hi,
I've a SQL Server 2012 database with more tables having more char and varchar columns.
I'd like to change quickly char columns in nchar columns and varchar columns in nvarchar columns.
Is it possible to solve this issue, please? Many thanks

Hello,
Creating a script could do it quickly as shown in the following article:
http://blog.sqlauthority.com/2010/10/18/sql-server-change-column-datatypes/
But creating the scripts may take you some time.
You will find more options here:
https://social.technet.microsoft.com/Forums/sqlserver/en-US/e7b70add-f390-45ee-8e3e-8ed6c6fa0f77/changing-data-type-to-the-fields-of-my-tables?forum=transactsql
Hope this helps.
Regards,
Alberto Morillo
SQLCoffee.com

To Determine Unicode Datatype encoding

Similar Messages

Maybe you are looking for