Xmlagg is executing slow on unicode database

The database is Oracle 11.2.0.3.0 running a exadata.
I am trying to create XML using the following:
select
xmlserialize(DOCUMENT
xmlroot(
xmlelement("InterchangeEducationOrgCalendar",
xmlattributes(xml_text3 as "xmlns:xsi", xml_text1 as "xsi:schemaLocation",xml_text2 as "xmlns" ),
(select xmlagg(
xmlelement("CalendarDate",
xmlelement("Date", to_char(caldate,t_date_format)),
xmlelement("CalendarEvent",'xxxxxxxxxxxxxx'),
xmlelement("EducationOrgReference",
xmlelement("EducationalOrgIdentity",
xmlelement("StateOrganizationId", a.educationorganizationid)))
from calendardate a
) -- CalendarDate
, version '1.0'
AS CLOB) XMLRESULT
from (select t_xsi_location xml_text1, default_namespace xml_text2, xsi_namespace xml_text3 from xml_ctl);
if I run the above SQL statement in my non-unicode database it runs in about 45 seconds but in the unicode database it takes about 2 minutes and 20 seconds. The databases(init.ora parameters) are the same except one in unicode and the other is not. The above table has the same stats, same indexes, same number of rows in both databases as well.
The above query is selecting 2,000,000 rows from calendardate.
Can anyone tell me why it is slower in the unicode database?
Thanks in advance.
Edited by: 972551 on Apr 6, 2013 5:36 PM

Kit.net wrote:
I would expect to see slower performance since unicode is twice the character size.Not necessarily twice, depends on the encoding scheme : UTF-8, UCS-2 or UTF-16.
In the latest versions of the database, Unicode is supported via the AL32UTF8 db character set. It's a variable-width multibyte character set that encodes characters using 1 to 4 bytes.
So for occidental languages mostly based on the ASCII range, there's little overhead in storage size.
I agree though that UTF-8 string processing ought to be slower than a fixed-width encoding.
Back to the specific issue with SQL/XML functions, one thing that could be tried too is the NOENTITYESCAPING option.
It disables the runtime scan of element and attribute values in order to escape invalid XML characters such as &, < or >.
If we're sure the value doesn't contain invalid chars then it could save time :
SQL> select xmlserialize(content
2           xmlelement(noentityescaping "emp",
3             xmlattributes(noentityescaping empno as "id")
4           )
5         )
6 from scott.emp
7 where empno = 7369 ;
XMLSERIALIZE(CONTENTXMLELEMENT
<emp id="7369"></emp>

Similar Messages

Oracle Text Indexing performance in Unicode database

Forum folks,
I'm looking for overall performance thoughts in Text Indexing within a Unicode database. Part of our internal testing suites includes searching on values using contains filters over indexed binary and text documents. We've architected these tests such that they could be run in a suite or on their own, thus, the data is loaded at the beginning of each test and then the text indexes are created and populated prior to running any of the actual testing.
We have the same tests running on non-unicode instances of Oracle 11gR2 just fine, but when we run them against a Unicode instance, we are almost always seeing timing issues where the indexes haven't finished populating, thus our tests are saying we've only found n number of hits when we are expecting n+ 50 or in some cases n + 150 records to be returned.
We are just looking for some general information in regards to text indexing performance in a unicode database. Will we need to add sleep time to the testing to allow for the indexes to populate? How much time? We would rather not get into having to create different tests for unicode vs non-unicode, but perhaps that is necessary.
Any insight you could provide would be most appreciated.
Thanks in advance,
Dan

Roger,
Thanks much for your quick reply...
When you talk about Unicode, do you mean AL32UTF8?
--> Yes, this is the Unicode charset we are using.
Is the data the same in both cases, or are you indexing simple 7-bit ascii data in the one database, and foreign text (maybe Chinese?) in the UTF8 database?
With the same data, there should be virtually no difference in performance due to the AL32UTF8 database character set.
--> We have a data generation tool we utilize. For non-unicode data, we generate using all 256 characters in the ISO-8859-1 set. With our Unicode data for clobs, we generate using only the first 1,000 characters of UTF8 by setting up an array of code points...0 - 1000. For Blobs, we have sets of sample word documents and pdfs that are inserted, then indexed.
I'm not sure I understand your testing methodology. Do you run ( load-data, index-data, run-queries ) sequentially?
--> That is correct. We utilize the ctx_ddl package to populate the pending table and then to sync the index....The following is an example of the ddl we generate to create and populate the index:
create index "DBMEARSPARK_ORA80"."RESRESUMEDOC" on "DBMEARSPARK_ORA80"."RESUME" ("RESUMEDOC") indextype is CTXSYS.CONTEXT parameters(' nopopulate sync (every "SYSTIMESTAMP + INTERVAL ''30'' MINUTE" PARALLEL 2) filter ctxsys.auto_filter ') PARALLEL 2;
execute ctx_ddl.populate_pending('"DBMEARSPARK_ORA80"."RESRESUMEDOC"',null);
execute ctx_ddl.sync_index('"DBMEARSPARK_ORA80"."RESRESUMEDOC"',null,null,2);
If so, there should be no way that the indexes can be half-created. If not, don't you have some check to see if the index creation has finished before running the query test?
--> Excellent question....is there such a check? I have not found a way to do that yet...
Were you just lucky with the "non-unicode" tests that the indexing just happened to have always finished by the time you ran the queries?
--> This is quite possible as well. If there is a check to see if the index is ready, then we could add that into our infrastructure.
--> Thanks, again, for responding so quickly.
Edited by: djulson on Feb 12, 2013 7:13 AM

Use of nvarchar field in a Unicode database

Hi,
I'm using an application (SPI) where, in the manual of this application, is clearly stated under the chapter to create the Oracle database:
i. Under Database Character Set, select Use Unicode (AL32UTF8).
ii. Under National Character Set, select AL16UTF16 (the default).
Then, in the latest version of SPI all varchar fields where changed to nvarchar fields.
Now I'm wondering what the advantage could be of this setup.
In a Unicode database, I can already store any international character set in a varchar field, so why would they use nvarchar?
I do see the disadvantage, because in various SQL's against this database I get ORA-12704 (Character set mismatch errors) when using
select
tag from ..
union
select '-' from ... instead I have to add a cast
select
tag from ..
union
select cast('-' as nvarchar2(1)) from ...

Now I'm wondering what the advantage could be of this setup.Hard to say but plenty of disadvantages.
See [url https://forums.oracle.com/forums/thread.jspa?threadID=2302983#9955254]Dear Gurus: Can u pls explain the difference between VARCHAR2 & NVARCHAR2??

A record selection problem with a string field when UNICODE database

We used report files made by Crystal Reports 9 which access string fields
(char / varchar2 type) of NON-UNICODE database tables.
Now, our new product needs to deal with UNICODE database, therefore,
we created another database schema changing table definition as below.
(The table name and column name are not changed.)
    char type -> nchar type
    varchar2 type -> nvarchar2 type
When we tried to access the above table, and output a report,
the SQL statement created from the report seemed to be wrong.
We confirmed the SQL statement using Oracle trace function.
    SELECT (abbr.) WHERE "XXXVIEW"."YYY"='123'.
We think the above '123' should be N'123' because UNICODE string
is stored in nchar / nvarchar2 type field.
Question:
How can we obtain the correct SQL statement in this case?
Is there any option setting?
FYI:
The environment are as follows.
    Oracle version: 11.2.0
    ODBC version: 11.2.0.1
    National character set: AL16UTF16

With further investigating, we found patterns that worked well.
Worked well patters:
    Oracle version: 11.2.0
    ODBC version: 11.2.0.1
    National character set: AL16UTF16
    Report file made by Crystal Reports 2011
    Crystal Reports XI
Not worked patters:
    Oracle version: 11.2.0 (same above)
    ODBC version: 11.2.0.1 (same above)
    National character set: AL16UTF16 (same above)
    Report file made by Crystal Reports 2011 (same above)
    Crystal Reports 2008 / 2011
We think this phenomenon is degraded behavior of Crystal Reports 2008 / 2011.
But we have to use the not worked patters.
Anything wrong with us? Pls help.
-Nobuhiko

Peoplesoft convert Oracle non-unicode database to unicode database

I am following doc 1437384.1 to convert a Peoplesoft database from non-unicode database to unicode database
I use the following export statement (as user PS)
SET NO TRACE;
SET OUTPUT output_file.dat;
SET NO DATA;
EXPORT *;
And the following import statement (as user sysadm)
SET NO TRACE;
SET NO DATA;
SET INPUT output_file;
SET LOG log_file;
SET UNICODE ON;
SET STATISTICS OFF;
SET ENABLED_DATATYPE 9.0;
IMPORT *;
Before I do the datapump import, I am comparing the objects
SQL> select object_type, count(*) from dba_objects where owner = 'SYSADM' group by object_type order by 1 asc;
OBJECT_TYPE COUNT(*)
INDEX 33797
LOB 2775
TABLE 28829
TRIGGER 9
VIEW 21208
on oracsc63 (targetdb):
SQL> select object_type, count(*) from dba_objects where owner = 'SYSADM' group by object_type order by 1 asc;
OBJECT_TYPE COUNT(*)
INDEX 23748
LOB 2170
TABLE 19727
I don't have the same number of object. When I do the import this means that around 10000 tables will not have the UTF-8 format.
Any ideas how I can solve this? How has experience with this peoplesoft conversions?

Hello Jacques,
please check sapnote #808505 (Secondary connection to Oracle DB w/ different character set).
Regards
Stefan

Data errors/changes in unicode database Once all code is unicode compliant

Hi All,
This is regarding unicode project.
We have currently made all the programs unicode complaint
and the database we are using is not unicode database.
We are moving now the entire code to the Unicode database system.
1>Could anyone tell what kind of data errors that might be encountered due to this new database system.
2>What kind of data changes regarding the format/data we might observe in the output files generated.
Any expertise and experiences in the similar upgradation will be very helpful..
Thank you all in advance

Hi Kumar,
each code page encodes characters into a binary representation. ASCII is may be the best known. It encode 128 characters with seven bits. The first 32 characters are control characters for printers and terminals like carriage return and bell. Then there are some special characters like Space and Comma followed by digits and the characters of the roman alphabet in upper case and lower case representation. Unicode is another code page which is defined in unicode standard documentation. Because unicode characters are wider than one byte (the current standard contains almost 100.000 characters) there are different encoding used in applications. The most used encoding is probably UTF-8 which is used by DB2 and Oracle. MaxDB uses UTF-16 which uses much more space for most used characters. Languages use characters from a code page to build words. You may have multiple code pages in one system (MDMP) or a unicode system which supports all languages on a single code page.
I hope this help you to understand the difference between a code page and a language. May be you check out the links [http://www.asciitable.com|http://www.asciitable.com] and [http://unicode.org|http://unicode.org].
Best regards
Ralph

Errors while performing non-Unicode database export

Hi,
I am exporting of a non-unicode database (to perform a unicode conversion). The export has completed without any problems, but I frequently got the following messages in logfile.
Error 1- UMGCOMCHAR read check skip, no data found; probably old SPUMG
Error 2- enviornment variable I18N_POOL_WIDTH is not set. Checks are active
Error 3- I18N_NAMETAB_TIMESTAMPS not in env: checks are ON (Note 738858)
My query is that
1- Are these above 3 errors an issue.
2- When I import already exported files into a unicode database, will it cause any problem or loss of data.
3- What is the fix to this issue.
Points to be awarded for any kind of small help.
Thanks

Depending on the data that you have in your non-unicode database, potentially data expansions or data loss may occur.
Data expansions
For example, a 1 byte character in a VARCHAR2(1) column may expand to 2 bytes or 3 bytes in a Unicode (UTF8) database; hence you may need to re-define you scheme prior to importing the data into your new Unicode database.
Data Loss
This happens only if you have invalid characters inside your non-unicode database. For example, you may have some 8-bit non-ASCII characters inside your US7ASCII database, during export these characters will be converted to some replacement characters (?).
However you can use the character set scanner (csscan) to scan your source database to detect both of the above scenarios. Please visit the Globalization Support section of OTN for more info - http://technet.oracle.com/tech/globalization/content.html
Regards
Nat

Unicode datatypes v.s. unicode databases

We have a legacy system that is implemented in Powerbuilder and C++. We are pretty sure about which columns we need to convert to support Unicode. Besides, some of our clients have cooperate standard (AMERICAN_AMERICA.WE8MSWIN1252) for the NLS_LANG on the Oracle clients set up, .
Therefore, we decided to use the unicode datatypes approach to only update the columns identified to NVARCHAR2 and NCLOB with AL16UTF16 as the national character set. Our understanding is that this is the safe and easy way for our situation since both C++ and Powerbuilder support UTF-16 standard as default. This will not require any change on the NLS_LANG set up.
However, one of our clients seems to have strong opinions against the unicode datatypes option and would rather migrating the entire database to be Unicode. The client mentioned that "AL16UTF16 has to be used in a Unicode database with UTF8 or AL32UTF8 as the database character set in order to display characters correctly". To our knowledge and understanding we have not heard about this requirement. I didn't see anything like this in Oracle official document.
Could anyone advise if Unicode database is really better than Unicode datatype option?
Thanks!

Besides, some of our clients have cooperate standard
(AMERICAN_AMERICA.WE8MSWIN1252) for the NLS_LANG on
the Oracle clients set up, . This might even be necessary requirement since they are using Windows-1252 code page.
that "AL16UTF16 has to
be used in a Unicode database with UTF8 or AL32UTF8
as the database character set in order to display
characters correctly".Hard to say without knowing what they refer to specifically.
They might have been thinking about the requirement to use AL32UTF8, depending on how binds are done. If you insert string literals, which is interpreted in the database character set, into NCHAR columns, you obvisouly need a character set that supports all characters you are going to insert (i.e. AL32UTF8 in unicode case).
This is described very clearly by Sergiusz Wolicki, in Re: store/retrieve data in lang other than eng when CHARACTERSET is not UTF8.

Unicode/non-unicode database

Hi,
I have two Oracle 8.1.7 databases : an unicode database, and a non-unicode database.
Can I export schema from non-unicode database to unicode database without problem ? What is the real impact ?
Thank you in advance for your help,
Nicolas.

Depending on the data that you have in your non-unicode database, potentially data expansions or data loss may occur.
Data expansions
For example, a 1 byte character in a VARCHAR2(1) column may expand to 2 bytes or 3 bytes in a Unicode (UTF8) database; hence you may need to re-define you scheme prior to importing the data into your new Unicode database.
Data Loss
This happens only if you have invalid characters inside your non-unicode database. For example, you may have some 8-bit non-ASCII characters inside your US7ASCII database, during export these characters will be converted to some replacement characters (?).
However you can use the character set scanner (csscan) to scan your source database to detect both of the above scenarios. Please visit the Globalization Support section of OTN for more info - http://technet.oracle.com/tech/globalization/content.html
Regards
Nat

Can we read an oracle non-unicode database to an sap unicode dataset????

Hi
Can we read an oracle non-unicode database to an sap unicode environtment using open dataset.,transfer....using a connection??
Regards

Hello Jacques,
please check sapnote #808505 (Secondary connection to Oracle DB w/ different character set).
Regards
Stefan

Execute query with non database block

How to execute query with non database block when new form instance trigger.

Hi Kame,
Execute_Query not work with non database block. To do this Make a cursor and then assign values to non database block's items programmatically, see following example,
DECLARE
BEGIN
     FOR i IN (SELECT col1, col2 FROM Table) LOOP
            :block.item1 := i.col1;
            :block.item2 := i.col2;
            NEXT_RECORD;
     END LOOP;
END;
Please mark if it help you or correct
Regards,
Danish

Can't run non-UNICODE-DbSl against UNICODE database

Hi everyone. I get this error during the update of Solution Manager 7.1 SPS4 (dual stack) to SPS10. It is a green install. This happens during the
phase PREP_EXTRACT/PREIMP! Of the ABAP stack. Step 5.2. I do not see a lot of people with this problem, so I assume it is something I am doing wrong, as it is my first time using the tool SUM.
Details of the system:
Solution Manager 7.1 SPS4
Windows 2008 R2 64 bit
Microsoft SQL 2008 R2 64 bit SP1
Central installation type with all components on
the same host
Kernel level 600
Here is what I have done so far:
Downloaded the media as described in the installation
guide.
Worked my way through the install using the
Software Provision Manager 1.0
Updated the Kernel from level 401 to 600, by
copying the files from the download folder to the F:\usr\sap\SM1\SYS\exe\uc\NTAMD64
folder. This is the folder that is assigned to the DIR_CT_RUN parameter.
I start the STARTUP.BAT file in the SUM folder.
I then start the SUM GUI and run through the
steps and in step 3 I map the Stack configuration file (XML) to one listed in
the folder “51047130\DATA” called “SPS10_stack.xml”
Then I
get to 5.2 where the system stops with this error:
Severe error(s) occurred in phase PREP_EXTRACT/PREIMP!
Last error code set: Single errors (code <= 8) found in logfile 'PREIMP.ELG'
ERROR: Detected the following errors:
# F:\usr\SUM\abap\log\R710VPE.<DB>:
      4 ETW000 TRACE-INFO: 19: [    dev trc,00000] Driver: sqlncli10.dll Driver release: 10.50.2769                                3938 0.039733
      4 ETW000 TRACE-INFO: 20: [    dev trc,00000] GetDbRelease: 10.50.2769.00                                                      1603 0.041336
      4 ETW000 TRACE-INFO: 21: [    dev trc,00000] GetDbRelease: Got DB release numbers (10,50,2769,0)                                21 0.041357
      4 ETW000 TRACE-INFO: 22: [    dev trc,00000] Can't run non-UNICODE-DbSl against UNICODE database                              2647 0.044004
      4 ETW000 TRACE-INFO: 23: [    dev trc,00000] CheckCodepageType failed. Connect terminated.                                      13 0.044017
      2EETW169 no connect possible: "DBMS = MSSQL                            --- SERVER = '<SERVER>' DBNAME = '<DB>'"
Can
anyone please help?
PS:
The admin user that I use do have access to the database. It is listed with
sysadmin role and the default DB is the Solution Manager db.

ERROR: Detected the following errors:
# F:\usr\SUM\abap\log\R710VPE.<DB>:
      4 ETW000 TRACE-INFO: 19: [    dev trc,00000] Driver: sqlncli10.dll Driver release: 10.50.2769                                3938 0.039733
      4 ETW000 TRACE-INFO: 20: [    dev trc,00000] GetDbRelease: 10.50.2769.00                                                      1603 0.041336
      4 ETW000 TRACE-INFO: 21: [    dev trc,00000] GetDbRelease: Got DB release numbers (10,50,2769,0)                                21 0.041357
      4 ETW000 TRACE-INFO: 22: [    dev trc,00000] Can't run non-UNICODE-DbSl against UNICODE database                              2647 0.044004
      4 ETW000 TRACE-INFO: 23: [    dev trc,00000] CheckCodepageType failed.
As per error, you have used wrong kernel. Please make sure that you use unicode kernel as your system is also unicode.
Thanks,
Sunny

SQL Azure - query with row_number() executes slow if columns with nvarchar of big size are included

I am linking my question from Stack Overflow here. The link: http://stackoverflow.com/questions/27943913/sql-azure-query-with-row-number-executes-slow-if-columns-with-nvarchar-of-bi
Appreciate your help!
Gorgi

Hi,
Thanks for posting here.
I suggest you to check this link and optimize your query on sql azure.
http://www.sqlusa.com/articles/query-optimization/
http://sqlblog.com/blogs/paul_white/archive/2011/02/23/Advanced-TSQL-Tuning-Why-Internals-Knowledge-Matters.aspx
Also check this blog which had similar issue.
https://social.msdn.microsoft.com/Forums/en-US/c1da08b4-265d-4ec8-a252-8d7090234e3e/simple-select-query-takes-long-time-to-execute-with-nvarchar-columns?forum=transactsql
Girish Prajwal

Transaction executing slow and database undo log increasing soon.

Dears,
I developed a transaction which query database many times in a repeater loop and finally generate a SAP MII XML Output Document which I want to display in a html hyper link in MII Navigation. (Using XacuteQuery and iGrid)
I found that
1. if I execute in SAP MII Workbench, the transaction executing very slow, and also the database undo log in D:\oracle\TMI\sapdata2\undo_1\UNDO.DATA1 increasing soon.
2. If I use MII Schedule Edit to run the transaction, it executing fast.
Does any one know why?
Is there any setting can make it executing fast in a MII Workbench?
Many thanks!
Ivan

Hi,
Can you explain why it's different by using MII Workbench and Scheduler depending on sql joining and logic?
My transaction logical is basically as below,
1. query qualified sfc in SAPME tables
SELECT *
FROM (SELECT   s.site, s.sfc, ss.operation_bo, ss.qty_in_queue,
                 ss.qty_in_work, s.priority, s.item_bo, s.shop_order_bo,
                 s.status_bo, ss.sfc_router_bo, ss.step_id, ss.step_sequence,
                 st.status_description, cf.ATTRIBUTE, cf.VALUE
            FROM sfc_step ss,
                 sfc s,
                 sfc_router sr,
                 sfc_routing srt,
                 status st,
                 custom_fields cf
           WHERE sr.handle = ss.sfc_router_bo
             AND srt.handle = sr.sfc_routing_bo
             AND s.handle = srt.sfc_bo
             AND st.handle = s.status_bo
             AND SUBSTR (s.status_bo, -3) IN ('402', '403', '404')
             AND sr.handle = ss.sfc_router_bo
             AND sr.in_use = 'true'
             AND ((ss.qty_in_queue > 0) OR (ss.qty_in_work > 0))
             AND s.site = '[Param.1]'
             AND cf.handle(+) = s.handle
             AND cf.ATTRIBUTE(+) = 'QTIMECONTROL'
             [Param.2]
        ORDER BY s.priority DESC, s.sfc)
WHERE (ATTRIBUTE = 'QTIMECONTROL' AND VALUE != 'N') OR VALUE IS NULL
2. use Repeater to query sfc's activity_log table
SELECT   al.site, al.sfc, al.operation, al.operation_revision, op.description,
         al.step_id,
            TO_CHAR (NEW_TIME (date_time, 'PST', 'GMT'),
                     'YYYY-MM-DD'
         || 'T'
         || TO_CHAR (NEW_TIME (date_time, 'PST', 'GMT'), 'HH24:MI:SS')
                                                                 AS date_time,
         TO_CHAR (NEW_TIME (date_time, 'PST', 'GMT'),
                  'YYYY/MM/DD HH24:MI:SS'
                 ) AS complete_date,
         cf.VALUE AS qtime, action_code, ss.operation_bo AS current_op,
         ss.step_id AS current_step_id,
         ss.step_sequence AS current_setp_sequence,
         al.item || ',' || al.item_revision AS item,
         TO_CHAR (SYSDATE, 'YYYY/MM/DD HH24:MI:SS') AS check_time,
         (sysdate-NEW_TIME (date_time, 'PST', 'GMT'))2460 as difference
    FROM activity_log al,
         custom_fields cf,
         sfc s,
         sfc_routing srg,
         sfc_router sr,
         sfc_step ss,
         operation op
   WHERE al.site = '[Param.1]'
     AND (action_code IN( 'COMPLETE' , 'START' , 'SIGNOFF'))
     AND al.sfc = '[Param.2]'
     AND cf.handle(+) =
               'OperationBO:'
            || al.site
            || ','
            || al.operation
            || ','
            || al.operation_revision
     AND cf.ATTRIBUTE(+) = 'QTIME'
     AND s.handle = srg.sfc_bo
     AND srg.handle = sr.sfc_routing_bo
     AND 'true' = sr.in_use
     AND sr.handle = ss.sfc_router_bo
     AND 0 < ss.qty_in_queue + ss.qty_in_work
     AND s.sfc = al.sfc
     AND al.operation = op.operation
     AND al.operation_revision = op.revision
     AND op.site= '[Param.1]'
     AND al.operation NOT LIKE '%-W'
ORDER BY date_time DESC
3. call another transaction to parse input data and get output data
4. parse get back data to form a MII xml output Document.
Thanks!

Strugging with exporting data out from an Unicode database

Background information
Server: Sun Solaris 5.10; 10g
Client: Windows 2000; 10g, TOAD, Oracle ODBC 10.2.0.1
select * from v$NLS_PARAMETERS
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET AL32UTF8
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
We import SAS data (Windows Latin character set) into Oracle, use OWB for ETL, export the results to SAS. Per regulatory requirements, character columns cannot exceed 200 in length.
Problem scenario
Data that cause the trouble (200 characters, with a degree sign at the 77th position):
XXX PT PRIOR TO MSFC NOT TO USE WALKER, PT STATED SHE NEEDED IT LAST VISIT 2° BAD HEADACHE DECREASED BALANCE, WHICH WAS LATER FOUND TO BE SINUS INFECTION. ASKED PT NOT TO USE WALKER THIS TIME, PT SAID
Degree sign is U+00B0 in UTF-8, or 0xB0 (176) in ASCII. Though, I found out select ascii('°') from dual would return 49480 (or, 0xC2 0xB0).
In order to accommodate the import, Source.COMMENTX is VARCHAR2(201). Using OWB, we are mapping this to Target.COVAL which is VARCHAR2(200).
To get around ORA-12899: value too large for column, we use the expression convert(Source.COMMENTX, 'WE8ISO8859P1', 'AL32UTF8')).
Although viewing Target.COVAL shows a ¿ (true in TOAD, SQL*Plus), dump(COVAL) confirms the 77th character is 176:
DUMP(COVAL)
Typ=1 Len=200: [...],32,50,176,32,[...]
Desirable outcome
Store and display the text in a VARCHAR2(200) column without compromising the high-bit ASCII characters, e.g., degree sign, micro sign (i.e., Greek character mu), copyright sign, etc.
Questions
1. Is it a wrong assumption that AL32UTF8 supports the high-bit ASCII characters (i.e., characters between 128 and 255)? If not, why do the clients display the inverted question mark instead of degree sign when executing select chr(176) from dual?
2. The aforementioned DUMP statement seems to confirm ASCII 0xB0 (i.e., not 0xC2 0xB0, or 0xBF) is being stored in the database at the 77th position. Why do my applications via ODBC interpreted and replaced it as 0xBF, which is the inverted question mark?
Avenues attempted without the desirable outcome
1. Changing Target.COVAL from VARCHAR2(200) to NVARCHAR2(200) or VARCHAR2(200 CHAR) would make SAS (data access through ODBC) think the length is 400 or 800, respectively [Note: The vendor claims it is ODBC 3.0 compliant]
2. Through Microsoft's ODBC Test software, this is the output for describe column all against select COLVAL from Target:
icol, szColName, pcbColName, pfSqlType, pcbColDef, pibScale, *pfNullable
1, COMMENTX, 8, SQL_WVARCHAR=-9, 200, 0, SQL_NULLABLE=1

Degree sign is U+00B0 in UTF-8, or 0xB0 (176) in
ASCII. Though, I found out select ascii('°') from
dual would return 49480 (or, 0xC2 0xB0).Well, U+00B0 represents 'degree sign' in Unicode, and the UTF-8 encoded value for this code point is C2 B0. ASCII does not include a degree sign, and 176 is not a ASCII code value (only 0-127). The function ascii will just return the decimal form of the encoded value, in the character set of the database (not necessarily ASCII, or US7ASCII as it is called in Oracle).
>
To get around ORA-12899: value too large for column,
we use the expression convert(Source.COMMENTX,
'WE8ISO8859P1', 'AL32UTF8')).This part I don't understand. And where are you storing this? In the same AL32UTF8 database? I think this might be your problem.
>
Although viewing Target.COVAL shows a ¿ (true in
TOAD, SQL*Plus), dump(COVAL) confirms the 77th
character is 176:Yes, since 176 is an invalid value in UTF-8. U+0079 is encoded as 79, U+0080 is encoded as C2 80 - notice the "leap" there. If I would input 176 in a "utf-8 decode" I would get "out of range" or NaN back. Similarily, if you have managed to illegally store 176 as a character encoded value in a AL32UTF8 database, and are trying to retrieve that, involving a conversion to client character set, you would get the replacement character ¿ meaning "bad conversion".
>
DUMP(COVAL)
Typ=1 Len=200: [...],32,50,176,32,[...]
Try
select chr(49480) from dual;
- but you need to do this from a tool such as Oracle SQL Developer (it's free) that can handle Unicode ouput.

Xmlagg is executing slow on unicode database

Similar Messages

Maybe you are looking for