XML & Cyrillic Character Sets

Hey all.
Bit of an interesting problem me thinks. I'm in building a
multilingual multiple choice CBT CD. I'm using XML to store the
test questions to allow for easy updates. The problem I'm having is
when I'm trying to import the XML into director using the XML
parser and it's a Cyrillic language (Russian, etc) It comes out all
wrong.
I've tried a few different ways to get it to work and I've
not been able to crack it. The XML file has the correct Unicode,
I've got a CYR font being used and still nothing.
Anyone have any ideas as to the best way to go about using
XML with Cyrillic Character sets?
Cheers
Col

AFAIK, Director doesn't support unicode : try loading &
displaying your
xml data with a flash animation embedded in you dir movie
hth
Ned

Similar Messages

Discoverer 3I and Cyrilic character set

I have a discoverer 3I EUL set up and I'm trying to pull data out that is stored in the database using the cyrilic character set. All I get, however, when I view the data is ? substituted for the cyrilic character. Is discoverer able to handle cyrilic and if so how do I get it to show the actual russian character?

<BLOCKQUOTE><font size="1" face="Verdana, Arial">quote:</font><HR>Originally posted by Dave Clark ([email protected]):
I have a discoverer 3I EUL set up and I'm trying to pull data out that is stored in the database using the cyrilic character set. All I get, however, when I view the data is ? substituted for the cyrilic character. Is discoverer able to handle cyrilic and if so how do I get it to show the actual russian character?<HR></BLOCKQUOTE>
This is possibly a patch issue. For 3i this requires version 3.3.59.
null

Problem inserting XML doc (character set)

Hi all,
I'm having trouble trying to insert XML either "posting" it (xsql) or "putting" it
(OracleXML putXML).
The error that I get: "not supported
oracle-character-set-174".
The environment is:
Oracle 8i 8.1.5
(NLS_CHARACTERSET EL8MSWIN1253 for greek)
JDK 1.2.2
Apache Web Server 1.3.11
Apache JServ 1.1
XSQL v 0.9.9.1 and
XMLSQL, XML parser v2 that comes with it.
I had dropped all java classes and reloaded
them using oraclexmlsqlload batch file.
But still getting the same error.
The thing that is that I am
able to insert XML doc that was generated
with an authoring tool called W4F that extracts data from HTML pages and map them to
XML document, even with greek characters
in it. But when XML is generated using
an editor or the servlet like the following:
newschedule.xsql like
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="latestschedules.xsl"?>
<page connection="dtv" xmlns:xsql="urn:oracle-xsql">
<xsql:insert-request date-format="DD'/'MM'/'YYYY" table="schedule_details_view"
transform="request-to-newschedule.xsl"/>
<xsql:query table="schedule"
tag-case="lower" max-rows="5" rowset-element="latestschedules"
row-element="schedule">
select *
from schedules
order by schedule_id desc
</xsql:query>
</page>
request-to-newschedule.xsl like
<?xml version = '1.0'?>
<ROWSET xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xsl:version="1.0">
<xsl:for-each select="request/parameters">
<ROW>
<SCHEDULE_ID><xsl:value-of select="Schedule_id_field"/></SCHEDULE_ID>
<DESCRIPTION><xsl:value-of select="Description_field"/></DESCRIPTION>
<DETAILS>
<DETAILS_ITEM>
<STARTING_TIME><xsl:value-of select="Starting_Time_field_1"/></STARTING_TIME>
<DURATION><xsl:value-of select="Duration_field_1"/></DURATION>
</DETAILS_ITEM>
<DETAILS_ITEM>
<STARTING_TIME><xsl:value-of select="Starting_Time_field_2"/></STARTING_TIME>
<DURATION><xsl:value-of select="Duration_field_2"/></DURATION>
</DETAILS_ITEM>
<DETAILS_ITEM>
<STARTING_TIME><xsl:value-of select="Starting_Time_field_3"/></STARTING_TIME>
<DURATION><xsl:value-of select="Duration_field_3"/></DURATION>
</DETAILS_ITEM>
<DETAILS_ITEM>
<STARTING_TIME><xsl:value-of select="Starting_Time_field_4"/></STARTING_TIME>
<DURATION><xsl:value-of select="Duration_field_4"/></DURATION>
</DETAILS_ITEM>
<DETAILS_ITEM>
<STARTING_TIME><xsl:value-of select="Starting_Time_field_5"/></STARTING_TIME>
<DURATION><xsl:value-of select="Duration_field_5"/></DURATION>
</DETAILS_ITEM>
</DETAILS>
</ROW>
</xsl:for-each>
</ROWSET>
Hope that someone could help me on this ...
Any advice is highly appreciated.
Thanks in advance
Nicos Gabrielides
email: [email protected]

Hi,
How about applying an XSL on the existing XML doc to create another XML doc to filter out the table column not found in the target db table, so that all the columns are matched and then use putXML to load?
Hope that helps.
OTN team@IDC

What is needed to read/use Cyrillic character sets?

Hello all,
I'm working with a friend in Russia who is sending me documents typed in Russian, but my computers here don't show anything but jibberish and any non-Cyrillic words included in the text.
What do I need to do to view this on my computer? Do I need a character set, or just a Cyrillic font?
Thanks in advance for your help!
Sincerely,
wordman

Ajust the page encoding in the head section. that's all you need. Any browser should correctly render the page based on that using Unicode.
Mylenium

PL/SQL XML generation: Character set issues

Hi,
I am using the PL/SQL DOM (wrappers to the Java DOM), to generate XML from bits of database information. (On Oracle 8i).
The output XML must be in UTF-8, and the database character set must be anything I want it to be. So I call
setCharset(doc, 'UTF8')
at the beginning, and I call
writeToClob(doc, xmllob, 'UTF8')
at the end, just to cover all eventualities.
However, any character outside ASCII gets
replaced with the character string "\xBF\xBF", which is rather tedious.
If, instead, we go via
writeToBuffer(doc, xmlbuf, 'UTF8')
and then dump the buffer contents into a clob, the UTF8 encoding is preserved, and everything works.
(This latter method is not good enough for my needs; I need more than 32K of data...)
So I was wondering if any kind soul could tell me what I am doing wrong.
Thanks,
<< Mike Alexander >>
null

I have the same problem. Is there any solution found?
only xslprocessor.valueOf returns values of xml document not loosing special symbols.

XML data from BLOB to CLOB - character set conversion

Hi All,
I'm trying to solve a problem with a character set conversion in PL/SQL in the following scenario:
1. source is an XML as a BLOB variable.
2. target is an XML as a CLOB variable.
3. the problem I have is the following:
- database character set is set to UTF-8
- XML character set could be anything (UTF-8, ISO 8859-1, ISO 8859-2, ASCII, ...)
- I need to write a procedure which converts the source BLOB content into the target CLOB taking into account the XML encoding and converts it into the DB default character set (UTF8).
I've been able to implement a simple conversion function. However, this function expects static XML encoding ISO-8859-1. The main part of the function looks as follows:
buffer := UTL_RAW.cast_to_varchar2(
UTL_RAW.convert(
DBMS_LOB.SUBSTR(source_blob_variable, 16000, pos)
, 'American_America.UTF8'
, 'American_America.we8iso8859p1')
Does anyone have an idea how to rewrite the code to handle "any" XML encoding in the source BLOB file? In other words, is there a function in Oracle which converts XML character set names into Oracle character set values (ISO-8859-1 to we8iso8859p1, UTF-8 to UTF8, ...)?
Thanks a lot for any help.
Julius

I want to pass a BLOB to some "createXML" procedure and get a proper XMLType in UTF8 character set, properly converted from whatever character set is the input in.As per documentation the generated XML has always the encoding set at the client side depending on NLS_LANG (default UTF-8), regardless of the input encoding, so I don't see a need to parse the PI of the XML:
C:\>echo %NLS_LANG%
%NLS_LANG%
C:\>sqlplus
SQL*Plus: Release 11.1.0.6.0 - Production on Wed Apr 30 08:54:12 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> var cur refcursor
SQL>
SQL> declare
2     b   blob := utl_raw.cast_to_raw ('<a>myxml</a>');
3 begin
4     open :cur for select xmlroot (xmltype (utl_raw.cast_to_varchar2 (b))) xml from dual;
5 end;
6 /
PL/SQL procedure successfully completed.
SQL>
SQL> print cur
XML
<?xml version="1.0" encoding="UTF-8"?><a>myxml</a>
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
C:\>set NLS_LANG=GERMAN_GERMANY.WE8ISO8859P1
C:\>sqlplus
SQL*Plus: Release 11.1.0.6.0 - Production on Mi Apr 30 08:55:02 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
SQL> var cur refcursor
SQL>
SQL> declare
2     b   blob := utl_raw.cast_to_raw ('<a>myxml</a>');
3 begin
4     open :cur for select xmlroot (xmltype (utl_raw.cast_to_varchar2 (b))) xml from dual;
5 end;
6 /
PL/SQL-Prozedur erfolgreich abgeschlossen.
SQL>
SQL> print cur
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<a>myxml</a>

Database character set = UTF-8, but mismatch error on XML file upload

Dear experts,
I am having problems trying to upload an XML file into an XMLType table. The Database is 9.2.0.5.0, with the character set details:
SELECT *
FROM SYS.PROPS$
WHERE name like '%CHA%';
Query results:
NLS_NCHAR_CHARACTERSET          UTF8     NCHAR Character set
NLS_SAVED_NCHAR_CS          UTF8
NLS_NUMERIC_CHARACTERS          .,     Numeric characters
NLS_CHARACTERSET          UTF8     Character set
NLS_NCHAR_CONV_EXCP          FALSE     NLS conversion exception
To upload the XML file into the XMLType table, I am using the command:
insert into XMLTABLE
values(xmltype(getClobDocument('ServiceRequest.xml','UTF8')));
However, I get the error:
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00200: could not convert from encoding UTF-8 to UCS2
Error at line 1
ORA-06512: at "SYS.XMLTYPE", line 0
ORA-06512: at line 1
Why does it mention UCS2, as can't see that on the Database character set?
Many thanks for your help,
Mark

USC2 is known as AL16UTF16(LE/BE) by Oracle...
Try using AL32UTF8 as the character set name
AFAIK The main difference between Oracle's UTF8 and AL32UTF8 character set is that is the UTF8 character set does not support those UTF-8 characteres that require 4 bytes..
-Mark

Write XML to file - character set problem

I have a package that generates XML from relational data using SQLX. I want to write the resulting XML to the unix file system. I can do this with the following code :
DECLARE
v_xml xmltype;
doc dbms_xmldom.DOMDocument;
BEGIN
v_xml := create_my_xml; -- returns an XMLType value
doc := dbms_xmldom.newDOMDocument(v_xml);
dbms_xmldom.writeToFILE(doc, '/mydirectory/myfile.xml');
END;
This creates the file but characters such å,ä and ö are getting 'corrupted' and the resultant xml is invalid.(I've checked the xml within SQL*Plus and the characters are OK)
I assume the character set of the unix operating system doesn't support these characters. How can I overcome this ?

Hi,
Do you mean that you would like to write output to an external file somewhere on flask disk or perhaps even inside the directory where the MIDlet is located?? To be able to do so you will need manufacturere specific APIs extended on FileConnection (=JSR, don't know the number right now...). The default MIDP I/O libary does not support direct action on a file.
However, such a FileConnection method invocation requires an import statement that is manufacturere specifc...
To keep your MIDlet CLDC MIDP compliant you can try using RMS with which you can write data that will be stored in a 'database' within the 'res' directory inside your MIDlet suite. If you're new to RMS, please check the web for tutorials, etc etc.
Cheers for now,
Jasper

XML as target file - how can i change its character set?

Hi all,
i need to create my target as XML-file und to save all my information there, but with other character set (not with default). In other words i must have in XML-file in header
<?xml version="1.0" encoding="ISO-8859-15"?>.
Now i have
<?xml version="1.0"?>.
What can i do?
Thanks in advance.

I don't think Finder does this (I've tried).
iTunes does though. Where you can set artwork or the "poster frame"...
This may not be what you want but if it helps, I know 2 ways do this is
Open the video in QuicktimePlayer7 | View | Set Poster Frame (even then, you might need to save it as .mov (ie in a 'mov container').
Drag the file into iTunes and set the artwork (as in http://www.dummies.com/how-to/content/adding-album-cover-art-or-images-in-itunes .html)
From there, iTunes will use that frame as the "poster frame" ie the photo/frame that shows when you browse your videos. Which is what you want, but limited to iTunes.
When I do either of these above, the frame I set does not show when exploring files in "Finder" (or in the other Explorer tool I use called "Pathfinder").
So it maybe, that exactly what you want, is not possible.

Oracle 10G support for both Cyrillic and Western European Character Sets

Dear all,
Our DB currently supports western EU characters sets but we need to also support Russian Characters.
Is there a common character set for both? or some trick that does the job?
Thanks.
DB: Oracle 10G R2
OS: Linux
Current Char Set:
NLS_CHARACTERSET     WE8ISO8859P1
NLS_CALENDAR     GREGORIAN
NLS_NCHAR_CHARACTERSET     AL16UTF16

AL32UTF8 will always do the job.
CL8ISO8859P5
CL8MSWIN1251
could to the job according to http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/applocaledata.htm#sthref1960.
Edited by: P. Forstmann on 9 août 2011 17:41

ORA-12712 error while changing nls character set to AL32UTF8

Hi,
It is strongly recommend to use database character set AL32UTF8 when ever a database is going to used with our XML capabilities. The database character set in the installed DB is WE8MSWIN1252. For making use of XML DB features, I need to change it to AL32UTF8. But, when I try doing this, I'm getting ORA-12712: new character set must be a superset of old character set. Is there a way to solve this issue?
Thanks in advance,
Divya.

Hi,
a change from we8mswin1252 to al32utf8 is not directly possible. This is because al32utf is not a binary superset of we8mswin1252.
There are 2 options:
- use full export and import
- Use of the Alter in a sort of restricted way
The method you can choose depends on the characters in the database, is it only ASCII then the second one can work, in other cases the first one is needed.
It is all described in the Support Note 260192.1, "Changing the NLS_CHARACTERSET to AL32UTF8 / UTF8 (Unicode)". Get it from the support/metalink site.
You can also read the chapters about this issue in the Globalization Guide: [url http://download.oracle.com/docs/cd/E11882_01/server.112/e10729/ch11charsetmig.htm#g1011430]Change characterset.
Herald ten Dam
http://htendam.wordpress.com

How do you define which character set gets embedded with a font embedded in the library (i.e. Korean)?

I have project that uses a shared fonts. The fonts are all
contained in a single swf ("fonts.swf"), are embedded in that swf's
library and are set to export for actionscript and runtime sharing.
The text in the project is dynamic and is loaded in from
external XML files. The text is formatted via styles contained in a
CSS object.
This project needs to be localized into 20 or so different
languages.
Everything works great with one exception: I can’t
figure out how to set which character set gets exported for runtime
sharing. i.e. I want to create a fonts.swf that contains Korean
characters, change the XML based text to Korean and have the text
display correctly.
I’ve tried changing the language of my OS (WinXP) and
re-exporting but that doesn’t work correctly. I’ve also
tried adding substitute font keys to the registry (at:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows
NT\CurrentVersion\FontSubstitutes) as outlined here:
http://www.quasimondo.com/archives/000211.php
but the fonts I added did not show up in Flash's font menue.
I’ve also tried the method outlined here:
http://www.adobe.com/cfusion/knowledgebase/index.cfm?id=tn_16275
to no avail.
I know there must be a simple solution that will allow me to
embed language specific character sets for the fonts embedded in
the library but I have yet to discover what it is.
Any insight would be greatly appreciated.
http://www.quasimondo.com/archives/000211.php
http://www.adobe.com/cfusion/knowledgebase/index.cfm?id=tn_16275

Thanks Jim,
I know that it is easy to specify the language you want to
use when setting the embed font properties for a specific text
field but my project has hundreds of text fields and I'm setting
the font globally by referencing the font symbols in a single swf.
I have looked at the info you've pointed out but wasn't
helped by it. What I'd like to be able to do is to tell Flash to
embed a language specific character-set for the font symbols in the
library. It currently is only embedding Latin characters even
though I know the fonts specified contains characters for other
languages.
For example. I have a font symbol in the libary named
"Font1". When I look at its properties I can see it is spcified as
Tahoma. I know the Tahoma font on my system contains the characters
for Korean but when I compile the swf it only contains Latin
characters (gylphs) - this corresponds to the language of my OS (US
English). I want to know how to tell Flash to embedd the Korean
language charaters rather than or as well as the Latin characters
for any given FONT SYMBOL. If I could do that, then, when I enter
Korean text into my XML files the correct characters will be
available to Flash. As it is now, the characters are not available
and thus the text doesn' t display.
Make sense?
Many thanks,
Mike

Java Character set error while loding data using iSetup

Hi,
I am getting the following error while migrating settup data from R12 (12.1.2) Instance to another R12 (12.1.2) Instance, Both the Database has same DB character set (AL32UTF8)
we are getting this error while migrating any setup data
Actual error is
Downloading the extract from central instance
Successfully copied the Extract
Time taken to download Extract and write as zip file = 0 seconds
Validating Primary Extract...
Source Java Charset: AL32UTF8
Target Java Charset: UTF-8
Target Java Charset does not match with Source Java Charset
java.lang.Exception: Target Java Charset does not match with Source Java Charset
     at oracle.apps.az.r12.common.cpserver.PreValidator.validate(PreValidator.java:191)
     at oracle.apps.az.r12.loader.cpserver.APILoader.callAPIs(APILoader.java:119)
     at oracle.apps.az.r12.loader.cpserver.LoaderContextImpl.load(LoaderContextImpl.java:66)
     at oracle.apps.az.r12.loader.cpserver.LoaderCp.runProgram(LoaderCp.java:65)
     at oracle.apps.fnd.cp.request.Run.main(Run.java:157)
Error while loading apis
java.lang.NullPointerException
     at oracle.apps.az.r12.loader.cpserver.APILoader.callAPIs(APILoader.java:158)
     at oracle.apps.az.r12.loader.cpserver.LoaderContextImpl.load(LoaderContextImpl.java:66)
     at oracle.apps.az.r12.loader.cpserver.LoaderCp.runProgram(LoaderCp.java:65)
     at oracle.apps.fnd.cp.request.Run.main(Run.java:157)
Please help in identifying and resolving the issue
Sachin

The Source and Target DB character set is same
Output from the query
------------- Source --------------
SQL> select value from nls_database_parameters where parameter='NLS_CHARACTERSET';
VALUE
AL32UTF8
And target Instance
-------------- Target----------------------
SQL> select value from nls_database_parameters where parameter='NLS_CHARACTERSET';
VALUE
AL32UTF8
The Error is about Source and Target JAVA Character set
I will check the Prevalidator xml from How to use iSetup and update the note
Thanks
Sachin

Character set Conversion (US7ASCII to AL32UTF8) -- ORA-31011 problem

Hello,
We've run into some problems as part of our character set conversion from US7ASCII to AL32UTF8. The latest problem is that we have a query that works in US7ASCII, but after converting to AL32UTF8 it no longer works and generates an ORA-31011 error. This is very concerning to us as this error indicates an XML parsing problem and we are doing no XML whatsoever in our DB. We do not have XML columns (nor even CLOBs or BLOBs) nor XML tables and it's not XMLDB.
For reference, we're running 11.2.0.2.0 over Solaris.
Has anyone seen this kind of problem before?
If need be, I'll find a way to post table definitions. However, it's safe to assume that we are only using DATE, VARCHAR2 and NUMBER column types in these tables. All of the tables are local to the DB.
Thanks

We converted using the database using scripts I developed. I'm not quite sure how we converted is relevant, other than saying that we did not use the Oracle conversion utility (not csscan, but the GUI Java tool).
A summary:
1) We replaced the lossy characters by parsing a csscan output file
2) After re-scanning with csscan and coming up clean, our DBA converted the database to AL32UTF8 (changed the parameter file, changing the character set, switched the semantics to char, etc).
3) Final step was changing existing tables to use char semantics by changing the table schema for VARCHAR2 columns
Any specific steps I cannot easily answer, I worked with a DBA at our company to do this work. I handled the character replacement / DDL changes and the DBA ran csscan & performed the database config changes.
Our actual error message:
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00210: expected '<' instead of '�Error at line 1
31011. 00000 - "XML parsing failed"
*Cause: XML parser returned an error while trying to parse the document.
*Action: Check if the document to be parsed is valid.
Error at Line: 24 Column: 15
This seems to match the the document ID referenced below. I will ask our DBA to pull it up and review it.
Please advise if more information is needed from my end.

Fixing a US7ASCII - WE8ISO8859P1 Character Set Conversion Disaster

In hopes that it might be helpful in the future, here's the procedure I followed to fix a disastrous unintentional US7ASCII on 9i to WE8ISO8859P1 on 10g migration.
BACKGROUND
Oracle has multiple character sets, ranging from US7ASCII to AL32UTF16.
US7ASCII, of course, is a cheerful 7 bit character set, holding the basic ASCII characters sufficient for the English language.
However, it also has a handy feature: character fields under US7ASCII will accept characters with values > 128. If you have a web application, users can type (or paste) Us with umlauts, As with macrons, and quite a few other funny-looking characters.
These will be inserted into the database, and then -- if appropriately supported -- can be selected and displayed by your app.
The problem is that while these characters can be present in a VARCHAR2 or CLOB column, they are not actually legal. If you try within Oracle to convert from US7ASCII to WE8ISO8859P1 or any other character set, Oracle recognizes that these characters with values greater than 127 are not valid, and will replace them with a default "unknown" character. In the case of a change from US7ASCII to WE8ISO8859P1, it will change them to 191, the upside down question mark.
Oracle has a native utility, introduced in 8i, called csscan, which assists in migrating to different character sets. This has been replaced in newer versions with the Database MIgration Assistant for Unicode (DMU), which is the new recommended tool for 11.2.0.3+.
These tools, however, do no good unless they are run. For my particular client, the operations team took a database running 9i and upgraded it to 10g, and as part of that process the character set was changed from US7ASCII to WE8ISO8859P1. The database had a large number of special characters inserted into it, and all of these abruptly turned into upside-down question marks. The users of the application didn't realize there was a problem until several weeks later, by which time they had put a lot of new data into the system. Rollback was not possible.
FIXING THE PROBLEM
How fixable this problem is and the acceptable methods which can be used depend on the application running on top of the database. Fortunately, the client app was amenable.
(As an aside note: this approach does not use csscan -- I had done something similar previously on a very old system and decided it would take less time in this situation to revamp my old procedures and not bring a new utility into the mix.)
We will need to separate approaches -- one to fix the VARCHAR2 & CHAR fields, and a second for CLOBs.
In order to set things up, we created two environments. The first was a clone of production as it is now, and the second a clone from before the upgrade & character set change. We will call these environments PRODCLONE and RESTORECLONE.
Next, we created a database link, OLD6. This allows PRODCLONE to directly access RESTORECLONE. Since they were cloned with the same SID, establishing the link needed the global_names parameter set to false.
alter system set global_names=false scope=memory;
CREATE PUBLIC DATABASE LINK OLD6
CONNECT TO DBUSERNAME
IDENTIFIED BY dbuserpass
USING 'restoreclone:1521/MYSID';
Testing the link...
SQL> select count(1) from users@old6;
COUNT(1)
       454
Here is a row in a table which contains illegal characters. We are accessing RESTORECLONE from PRODCLONE via our link.
PRODCLONE> select dump(title) from my_contents@old6 where pk1=117286;
DUMP(TITLE)
Typ=1 Len=49: 78,67,76,69,88,45,80,78,174,32,69,120,97,109,32,83,116,121,108,101
,32,73,110,116,101,114,97,99,116,105,118,101,32,82,101,118,105,101,119,32,81,117
,101,115,116,105,111,110,115
By comparison, a dump of that row on PRODCLONE's my_contents gives:
PRODCLONE> select dump(title) from my_contents where pk1=117286;
DUMP(TITLE)
Typ=1 Len=49: 78,67,76,69,88,45,80,78,191,32,69,120,97,109,32,83,116,121,108,101
,32,73,110,116,101,114,97,99,116,105,118,101,32,82,101,118,105,101,119,32,81,117
,101,115,116,105,111,110,115
Note that the "174" on RESTORECLONE was changed to "191" on PRODCLONE.
We can manually insert CHR(174) into our PRODCLONE and have it display successfully in the application.
However, I tried a number of methods to copy the data from RESTORECLONE to PRODCLONE through the link, but entirely without success. Oracle would recognize the character as invalid and silently transform it.
Eventually, I located a clever workaround at this link:
https://kr.forums.oracle.com/forums/thread.jspa?threadID=231927
It works like this:
On RESTORECLONE you create a view, vv, with UTL_RAW:
RESTORECLONE> create or replace view vv as select pk1,utl_raw.cast_to_raw(title) as title from my_contents;
View created.
This turns the title to raw on the RESTORECLONE.
You can now convert from RAW to VARCHAR2 on the PRODCLONE database:
PRODCLONE> select dump(utl_raw.cast_to_varchar2 (title)) from vv@old6 where pk1=117286;
DUMP(UTL_RAW.CAST_TO_VARCHAR2(TITLE))
Typ=1 Len=49: 78,67,76,69,88,45,80,78,174,32,69,120,97,109,32,83,116,121,108,101
,32,73,110,116,101,114,97,99,116,105,118,101,32,82,101,118,105,101,119,32,81,117
,101,115,116,105,111,110,115
The above works because oracle on PRODCLONE never knew that our TITLE string on RESTORE was originally in US7ASCII, so it was unable to do its transparent character set conversion.
PRODCLONE> update my_contents set title=( select utl_raw.cast_to_varchar2 (title) from vv@old6 where pk1=117286) where pk1=117286;
PRODCLONE> select dump(title) from my_contents where pk1=117286;
DUMP(UTL_RAW.CAST_TO_VARCHAR2(TITLE))
Typ=1 Len=49: 78,67,76,69,88,45,80,78,174,32,69,120,97,109,32,83,116,121,108,101
,32,73,110,116,101,114,97,99,116,105,118,101,32,82,101,118,105,101,119,32,81,117
,101,115,116,105,111,110,115
Excellent! The "174" character has survived the transfer and is now in place on PRODCLONE.
Now that we have a method to move the data over, we have to identify which columns /tables have character data that was damaged by the conversion. We decided we could ignore anything with a length smaller than 10 -- such fields in our application would be unlikely to have data with invalid characters.
RESTORECLONE> select count(1) from user_tab_columns where data_type in ('CHAR','VARCHAR2') and data_length > 10;
   COUNT(1)
    533
By converting a field to WE8ISO8859P1, and then comparing it with the original, we can see if the characters change:
RESTORECLONE> select count(1) from my_contents where title != convert (title,'WE8ISO8859P1','US7ASCII') ;
COUNT(1)
     10568
So 10568 rows have characters which were transformed into 191s as part of the original conversion.
[ As an aside, we can't use CONVERT() on LOBs -- for them we will need another approach, outlined further below.
RESTOREDB> select count(1) from my_contents where main_data != convert (convert(main_DATA,'WE8ISO8859P1','US7ASCII'),'US7ASCII','WE8ISO8859P1') ;
select count(1) from my_contents where main_data != convert (convert(main_DATA,'WE8ISO8859P1','US7ASCII'),'US7ASCII','WE8ISO8859P1')
ERROR at line 1:
ORA-00932: inconsistent datatypes: expected - got CLOB
Anyway, now that we can identify VARCHAR2 fields which need to be checked, we can put together a PL/SQL stored procedure to do it for us:
create or replace procedure find_us7_strings
(table_name varchar2,
fix_col varchar2 )
authid current_user
as
orig_sql varchar2(1000);
begin
orig_sql:='insert into cnv_us7(mytablename,myindx,mycolumnname) select '''||table_name||''',pk1,'''||fix_col||''' from '||table_name||' where '||fix_col||' != CONVERT(CONVERT('||fix_col||',''WE8ISO8859P1''),''US7ASCII'') and '||fix_col||' is not null';
-- Uncomment if debugging:
-- dbms_output.put_line(orig_sql);
execute immediate orig_sql;
end;
And create a table to store the information as to which tables, columns, and rows have the bad characters:
drop table cnv_us7;
create table cnv_us7 (mytablename varchar2(50), myindx number,      mycolumnname varchar2(50) ) tablespace myuser_data;
create index list_tablename_idx on cnv_us7(mytablename) tablespace myuser_indx;
With a SQL-generating SQL script, we can iterate through all the tables/columns we want to check:
--example of using the data: select title from my_contents where pk1 in (select myindx from cnv_us7)
set head off pagesize 1000 linesize 120
spool runme.sql
select 'exec find_us7_strings ('''||table_name||''','''||column_name||'''); ' from user_tab_columns
      where
          data_type in ('CHAR','VARCHAR2')
          and table_name in (select table_name from user_tab_columns where column_name='PK1' and table_name not in ('HUGETABLEIWANTTOEXCLUDE','ANOTHERTABLE'))
          and char_length > 10
          order by table_name,column_name;
spool off;
set echo on time on timing on feedb on serveroutput on;
spool output_of_runme
@./runme.sql
spool off;
Which eventually gives us the following inserted into CNV_US7:
20:48:21 SQL> select count(1),mycolumnname,mytablename from cnv_us7 group by mytablename,mycolumnname;
         4 DESCRIPTION                                        MY_FORUMS
     21136 TITLE                                              MY_CONTENTS
Out of 533 VARCHAR2s and CHARs, we only had five or six columns that needed fixing
We create our views on RESTOREDB:
create or replace view my_forums_vv as select pk1,utl_raw.cast_to_raw(description) as description from forum_main;
create or replace view my_contents_vv as select pk1,utl_raw.cast_to_raw(title) as title from my_contents;
And then we can fix it directly via sql:
update my_contents taborig1 set TITLE= (select utl_raw.cast_to_varchar2 (TITLE) from my_contents_vv@old6 where pk1=taborig1.pk1)
where pk1 in (
select tabnew.pk1 from my_contents@old6 taborig,my_contents tabnew,cnv_us7@old6
      where taborig.pk1=tabnew.pk1
          and myindx=tabnew.pk1
          and mycolumnname='TITLE'
          and mytablename='MY_CONTENTS'
          and convert(taborig.TITLE,'US7ASCII','WE8ISO8859P1') = tabnew.TITLE );
Note this part:
      "and convert(taborig.TITLE,'US7ASCII','WE8ISO8859P1') = tabnew.TITLE "
This checks to verify that the TITLE field on the PRODCLONE and RESTORECLONE are the same (barring character set issues). This is there because if the users have changed TITLE -- or any other field -- on their own between the time of the upgrade and now, we do not want to overwrite their changes. We make the assumption that as part of the process, they may have changed the bad character on their own.
We can also create a stored procedure which will execute the SQL for us:
create or replace procedure fix_us7_strings
(TABLE_NAME varchar2,
FIX_COL varchar2 )
authid current_user
as
orig_sql varchar2(1000);
TYPE cv_type IS REF CURSOR;
orig_cur cv_type;
begin
orig_sql:='update '||TABLE_NAME||' taborig1 set '||FIX_COL||'= (select utl_raw.cast_to_varchar2 ('||FIX_COL||') from '||TABLE_NAME||'_vv@old6 where pk1=taborig1.pk1)
where pk1 in (
select tabnew.pk1 from '||TABLE_NAME||'@old6 taborig,'||TABLE_NAME||' tabnew,cnv_us7@old6
      where taborig.pk1=tabnew.pk1
          and myindx=tabnew.pk1
          and mycolumnname='''||FIX_COL||'''
          and mytablename='''||TABLE_NAME||'''
          and convert(taborig.'||FIX_COL||',''US7ASCII'',''WE8ISO8859P1'') = tabnew.'||FIX_COL||')';
dbms_output.put_line(orig_sql);
execute immediate orig_sql;
end;
exec fix_us7_strings('MY_FORUMS','DESCRIPTION');
exec fix_us7_strings('MY_CONTENTS','TITLE');
commit;
To validate this before and after, we can run something like:
select dump(description) from my_forums where pk1 in (select myindx from cnv_us7@old6 where mytablename='MY_FORUMS');
The above process fixes all the VARCHAR2s and CHARs. Now what about the CLOB columns?
Note that we're going to have some extra difficulty here, not just because we are dealing with CLOBs, but because we are working with CLOBs in 9i, whose functions have less CLOB-related functionality.
This procedure finds invalid US7ASCII strings inside a CLOB in 9i:
create or replace procedure find_us7_clob
(table_name varchar2,
fix_col varchar2)
authid current_user
as
orig_sql varchar2(1000);
type cv_type is REF CURSOR;
orig_table_cur cv_type;
my_chars_read NUMBER;
my_offset NUMBER;
my_problem NUMBER;
my_lob_size NUMBER;
my_indx_var NUMBER;
my_total_chars_read NUMBER;
my_output_chunk VARCHAR2(4000);
my_problem_flag NUMBER;
my_clob CLOB;
my_total_problems NUMBER;
ins_sql VARCHAR2(4000);
BEGIN
   DBMS_OUTPUT.ENABLE(1000000);
   orig_sql:='select pk1,dbms_lob.getlength('||FIX_COL||') as cloblength,'||fix_col||' from '||table_name||' where dbms_lob.getlength('||fix_col||') >0 and '||fix_col||' is not null order by pk1';
   open orig_table_cur for orig_sql;
   my_total_problems := 0;
   LOOP
        FETCH orig_table_cur INTO my_indx_var,my_lob_size,my_clob;
                EXIT WHEN orig_table_cur%NOTFOUND;
        my_offset :=1;
        my_chars_read := 512;
        my_problem_flag :=0;
        WHILE my_offset < my_lob_size and my_problem_flag =0
                LOOP
                DBMS_LOB.READ(my_clob,my_chars_read,my_offset,my_output_chunk);
                my_offset := my_offset + my_chars_read;
                IF my_output_chunk != CONVERT(CONVERT(my_output_chunk,'WE8ISO8859P1'),'US7ASCII')
                        THEN
                        -- DBMS_OUTPUT.PUT_LINE('Problem with '||my_indx_var);
                        -- DBMS_OUTPUT.PUT_LINE(my_output_chunk);
                        my_problem_flag:=1;
                END IF;
        END LOOP;
        IF my_problem_flag=1
                THEN my_total_problems := my_total_problems +1;
                ins_sql:='insert into cnv_us7(mytablename,myindx,mycolumnname) values ('''||table_name||''','||my_indx_var||','''||fix_col||''')';
                execute immediate ins_sql;
                END IF;
   END LOOP;
   DBMS_OUTPUT.PUT_LINE('We found '||my_total_problems||' problem rows in table '||table_name||', column '||fix_col||'.');
END;
And we can use SQL-generating SQL to find out which CLOBs have issues, out of all the ones in the database:
RESTOREDB> select 'exec find_us7_clob('''||table_name||''','''||column_name||''');' from user_tab_columns where data_type='CLOB';
exec find_us7_clob('MY_CONTENTS','DATA');
After completion, the CNV_US7 table looked like this:
RESTOREDB> set linesize 120 pagesize 100;
RESTOREDB> select count(1),mytablename,mycolumnname from cnv_us7
   where mytablename||' '||mycolumnname in (select table_name||' '||column_name from user_tab_columns
         where data_type='CLOB' )
      group by mytablename,mycolumnname;
COUNT(1) MYTABLENAME                                        MYCOLUMNNAME
     69703 MY_CONTENTS                                  DATA
On RESTOREDB, our 9i version, we will use this procedure (found many years ago on the internet):
create or replace procedure CLOB2BLOB (p_clob in out nocopy clob, p_blob in out nocopy blob) is
-- transforming CLOB to BLOB
l_off number default 1;
l_amt number default 4096;
l_offWrite number default 1;
l_amtWrite number;
l_str varchar2(4096 char);
begin
loop
dbms_lob.read ( p_clob, l_amt, l_off, l_str );
l_amtWrite := utl_raw.length ( utl_raw.cast_to_raw( l_str) );
dbms_lob.write( p_blob, l_amtWrite, l_offWrite,
utl_raw.cast_to_raw( l_str ) );
l_offWrite := l_offWrite + l_amtWrite;
l_off := l_off + l_amt;
l_amt := 4096;
end loop;
exception
when no_data_found then
NULL;
end;
We can test out the transformation of CLOBs to BLOBs with a single row like this:
drop table my_contents_lob;
Create table my_contents_lob (pk1 number,data blob);
DECLARE
      v_clob CLOB;
      v_blob BLOB;
    BEGIN
      SELECT data INTO v_clob FROM my_contents WHERE pk1 = 16 ;
      INSERT INTO my_contents_lob (pk1,data) VALUES (16,empty_blob() );
      SELECT data INTO v_blob FROM my_contents_lob WHERE pk1=16 FOR UPDATE;
      clob2blob (v_clob, v_blob);
    END;
select dbms_lob.getlength(data) from my_contents_lob;
DBMS_LOB.GETLENGTH(DATA)
                             329
SQL> select utl_raw.cast_to_varchar2(data) from my_contents_lob;
UTL_RAW.CAST_TO_VARCHAR2(DATA)
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam...
Now we need to push it through a loop. Unfortunately, I had trouble making the "SELECT INTO" dynamic. Thus I used a version of the procedure for each table. It's aesthetically displeasing, but at least it worked.
create table my_contents_lob(pk1 number,data blob);
create index my_contents_lob_pk1 on my_contents_lob(pk1) tablespace my_user_indx;
create or replace procedure blob_conversion_my_contents
(table_name varchar2,
fix_col varchar2)
authid current_user
as
orig_sql varchar2(1000);
type cv_type is REF CURSOR;
orig_table_cur cv_type;
my_chars_read NUMBER;
my_offset NUMBER;
my_problem NUMBER;
my_lob_size NUMBER;
my_indx_var NUMBER;
my_total_chars_read NUMBER;
my_output_chunk VARCHAR2(4000);
my_problem_flag NUMBER;
my_clob CLOB;
my_blob BLOB;
my_total_problems NUMBER;
new_sql VARCHAR2(4000);
BEGIN
DBMS_OUTPUT.ENABLE(1000000);
   orig_sql:='select pk1,dbms_lob.getlength('||FIX_COL||') as cloblength,'||fix_col||' from '||table_name||' where pk1 in (select myindx from cnv_us7 where mytablename='''||TABLE_NAME||''' and mycolumnname='''||FIX_COL||''') order by pk1';
   open orig_table_cur for orig_sql;
   LOOP
        FETCH orig_table_cur INTO my_indx_var,my_lob_size,my_clob;
                EXIT WHEN orig_table_cur%NOTFOUND;
        new_sql:='INSERT INTO '||table_name||'_lob(pk1,'||fix_col||') values ('||my_indx_var||',empty_blob() )';
        dbms_output.put_line(new_sql);
      execute immediate new_sql;
-- Here's the bit that I had trouble making dynamic. Feel free to let me know what I am doing wrong.
-- new_sql:='SELECT '||fix_col||' INTO my_blob from '||table_name||'_lob where pk1='||my_indx_var||' FOR UPDATE';
--        dbms_output.put_line(new_sql);
        select data into my_blob from my_contents_lob where pk1=my_indx_var FOR UPDATE;
      clob2blob(my_clob,my_blob);
   END LOOP;
   CLOSE orig_table_cur;
DBMS_OUTPUT.PUT_LINE('Completed program');
END;
exec blob_conversion_my_contents('MY_CONTENTS','DATA');
Verify that things work properly:
select dump( utl_raw.cast_to_varchar2(data)) from my_contents_lob where pk1=xxxx;
This should let you see see characters > 150. Thus, the method works.
We can now take this data, export it from RESTORECLONE
exp file=a.dmp buffer=4000000 userid=system/XXXXXX tables=my_user.my_contents rows=y
and import the data on prodclone
imp file=a.dmp fromuser=my_user touser=my_user userid=system/XXXXXX buffer=4000000;
For paranoia's sake, double check that it worked properly:
select dump( utl_raw.cast_to_varchar2(data)) from my_contents_lob;
On our 10g PRODCLONE, we'll use these stored procedures:
CREATE OR REPLACE FUNCTION CLOB2BLOB(L_CLOB CLOB) RETURN BLOB IS
L_BLOB BLOB;
L_SRC_OFFSET NUMBER;
L_DEST_OFFSET NUMBER;
L_BLOB_CSID NUMBER := DBMS_LOB.DEFAULT_CSID;
V_LANG_CONTEXT NUMBER := DBMS_LOB.DEFAULT_LANG_CTX;
L_WARNING NUMBER;
L_AMOUNT NUMBER;
BEGIN
DBMS_LOB.CREATETEMPORARY(L_BLOB, TRUE);
L_SRC_OFFSET := 1;
L_DEST_OFFSET := 1;
L_AMOUNT := DBMS_LOB.GETLENGTH(L_CLOB);
DBMS_LOB.CONVERTTOBLOB(L_BLOB,
L_CLOB,
L_AMOUNT,
L_SRC_OFFSET,
L_DEST_OFFSET,
1,
V_LANG_CONTEXT,
L_WARNING);
RETURN L_BLOB;
END;
CREATE OR REPLACE FUNCTION BLOB2CLOB(L_BLOB BLOB) RETURN CLOB IS
L_CLOB CLOB;
L_SRC_OFFSET NUMBER;
L_DEST_OFFSET NUMBER;
L_BLOB_CSID NUMBER := DBMS_LOB.DEFAULT_CSID;
V_LANG_CONTEXT NUMBER := DBMS_LOB.DEFAULT_LANG_CTX;
L_WARNING NUMBER;
L_AMOUNT NUMBER;
BEGIN
DBMS_LOB.CREATETEMPORARY(L_CLOB, TRUE);
L_SRC_OFFSET := 1;
L_DEST_OFFSET := 1;
L_AMOUNT := DBMS_LOB.GETLENGTH(L_BLOB);
DBMS_LOB.CONVERTTOCLOB(L_CLOB,
L_BLOB,
L_AMOUNT,
L_SRC_OFFSET,
L_DEST_OFFSET,
1,
V_LANG_CONTEXT,
L_WARNING);
RETURN L_CLOB;
END;
And now, for the piece de' resistance, we need a BLOB to CLOB conversion that assumes that the BLOB data is stored initially in WE8ISO8859P1.
To find correct CSID for WE8ISO8859P1, we can use this query:
select nls_charset_id('WE8ISO8859P1') from dual;
Gives "31"
create or replace FUNCTION BLOB2CLOBASC(L_BLOB BLOB) RETURN CLOB IS
L_CLOB CLOB;
L_SRC_OFFSET NUMBER;
L_DEST_OFFSET NUMBER;
L_BLOB_CSID NUMBER := 31;      -- treat blob as WE8ISO8859P1
V_LANG_CONTEXT NUMBER := 31;   -- treat resulting clob as WE8ISO8850P1
L_WARNING NUMBER;
L_AMOUNT NUMBER;
BEGIN
DBMS_LOB.CREATETEMPORARY(L_CLOB, TRUE);
L_SRC_OFFSET := 1;
L_DEST_OFFSET := 1;
L_AMOUNT := DBMS_LOB.GETLENGTH(L_BLOB);
DBMS_LOB.CONVERTTOCLOB(L_CLOB,
L_BLOB,
L_AMOUNT,
L_SRC_OFFSET,
L_DEST_OFFSET,
L_BLOB_CSID,
V_LANG_CONTEXT,
L_WARNING);
RETURN L_CLOB;
END;
select dump(dbms_lob.substr(blob2clobasc(data),4000,1)) from my_contents_lob;
Now, we can compare these:
select dbms_lob.compare(blob2clob(old.data),new.data) from my_contents new,my_contents_lob old where new.pk1=old.pk1;
DBMS_LOB.COMPARE(BLOB2CLOB(OLD.DATA),NEW.DATA)
                                                             0
                                                             0
                                                             0
Vs
select dbms_lob.compare(blob2clobasc(old.data),new.data) from my_contents new,my_contents_lob old where new.pk1=old.pk1;
DBMS_LOB.COMPARE(BLOB2CLOBASC(OLD.DATA),NEW.DATA)
                                                               -1
                                                               -1
                                                               -1
update my_contents a set data=(select blob2clobasc(data) from my_contents_lob b where a.pk1= b.pk1)
    where pk1 in (select al.pk1 from my_contents_lob al where dbms_lob.compare(blob2clob(al.data),a.data) =0 );
SQL> select dump(dbms_lob.substr(data,4000,1)) from my_contents where pk1 in (select pk1 from my_contents_lob);
Confirms that we're now working properly.
To run across all the _LOB tables we've created:
[oracle@RESTORECLONE ~]$ exp file=all_fixed_lobs.dmp buffer=4000000 userid=my_user/mypass tables=MY_CONTENTS_LOB,MY_FORUM_LOB...
[oracle@RESTORECLONE ~]$ scp all_fixed_lobs.dmp jboulier@PRODCLONE:/tmp
And then on PRODCLONE we can import:
imp file=all_fixed_lobs.dmp buffer=4000000 userid=system/XXXXXXX fromuser=my_user touser=my_user
Instead of running the above update statement for all the affected tables, we can use a simple stored procedure:
create or replace procedure fix_us7_CLOBS
(TABLE_NAME varchar2,
     FIX_COL varchar2 )
    authid current_user
    as
     orig_sql varchar2(1000);
     bak_sql varchar2(1000);
    begin
    dbms_output.put_line('Creating '||TABLE_NAME||'_PRECONV to preserve the original data in the table');
    bak_sql:='create table '||TABLE_NAME||'_preconv as select pk1,'||FIX_COL||' from '||TABLE_NAME||' where pk1 in (select pk1 from '||TABLE_NAME||'_LOB) ';
    execute immediate bak_sql;
    orig_sql:='update '||TABLE_NAME||' tabnew set '||FIX_COL||'= (select blob2clobasc ('||FIX_COL||') from '||TABLE_NAME||'_LOB taborig where tabnew.pk1=taborig.pk1)
   where pk1 in (
   select a.pk1 from '||TABLE_NAME||'_LOB a,'||TABLE_NAME||' b
      where a.pk1=b.pk1
             and dbms_lob.compare(blob2clob(a.'||FIX_COL||'),b.'||FIX_COL||') = 0 )';
    -- dbms_output.put_line(orig_sql);
    execute immediate orig_sql;
   end;
Now we can run the procedure and it fixes everything for our previously-broken tables, keeping the changed rows -- just in case -- in a table called table_name_PRECONV.
set serveroutput on time on timing on;
exec fix_us7_clobs('MY_CONTENTS','DATA');
commit;
After confirming with the client that the changes work -- and haven't noticeably broken anything else -- the same routines can be carefully run against the actual production database.

We converted using the database using scripts I developed. I'm not quite sure how we converted is relevant, other than saying that we did not use the Oracle conversion utility (not csscan, but the GUI Java tool).
A summary:
1) We replaced the lossy characters by parsing a csscan output file
2) After re-scanning with csscan and coming up clean, our DBA converted the database to AL32UTF8 (changed the parameter file, changing the character set, switched the semantics to char, etc).
3) Final step was changing existing tables to use char semantics by changing the table schema for VARCHAR2 columns
Any specific steps I cannot easily answer, I worked with a DBA at our company to do this work. I handled the character replacement / DDL changes and the DBA ran csscan & performed the database config changes.
Our actual error message:
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00210: expected '<' instead of '�Error at line 1
31011. 00000 - "XML parsing failed"
*Cause:    XML parser returned an error while trying to parse the document.
*Action:   Check if the document to be parsed is valid.
Error at Line: 24 Column: 15
This seems to match the the document ID referenced below. I will ask our DBA to pull it up and review it.
Please advise if more information is needed from my end.

XML & Cyrillic Character Sets

Similar Messages

Maybe you are looking for