Character Set questions on setup

I am trying to determine what the best setup recommendations are for creating non_English Oracle 10g databases. I have not had much experience building databases for non_English locales, so this is getting a little overwhelming as I have been researching Oracle's Database Globalization Support Guide. Obviously it has a wealth of information and I am trying to determine what applies to us at this point and time.
Generally when someone buys our product they create a new Oracle instance for our app. I need to be able to recommend proper database settings/parameters for potential global customers who purchase our software to run on Oracle.
Currently my biggest question is what to recommend for the Database Characterset on db creation. Currently the DB Character Set we recommend (for standard U.S. installs on Windows) is the default WE8MSWIN1252 character set. Our application is non-unicode. It has been recommended to me from an outside consultant that we "must" use UTF-8 for DB and National Character Set settings, as opposed to WE8MSWIN1252 or WE8ISO8859P1. I should mention that our focus at this point and time is getting a solution for French, German, and Spanish. We are also more concerned about a single language setup than multilanguage - although that is a definite future consideration.
What impact can using UTF-8 as opposed to WE8MSWIN1252 or WE8ISO8859P1 have on a non-unicode application? I hope I am explaining the situation well enough as I am fairly new and still getting to know our application. I am kind of getting thrown into the i18n fire...
Any input is greatly appreciated. Thanks.

Your questions are certainly valid but you have not given any details about your application: what it does, what technologies and access drivers are employed, and what client operating systems are supported. This determines how much effort is required to make the application Unicode-enabled and what are the risks coming from each of the possible approaches.
As long as your application can work with single-byte character sets only and as long as it is not expected to contain multibyte data, and as long as it supports Windows only, the Oracle character set corresponding to relevant Windows ANSI code page is the correct choice. For English, French, German, Spanish, and other Western European languages, WE8MSWIN1252 is right one.
Processing of WE8MSWIN1252 is easier and somehow faster than processing AL32UTF8 (i.e. UTF-8) data. One character corresponds to one byte and this simplifies some aspects of text processing.
On the other hand, world becomes smaller and smaller in the Internet area. Companies that never did any business abroad start to talk to customers around the world because somebody found their website. Western European companies take advantage of the European Union enlargement and start making business in new countries. Therefore, it is dangerous to assume that a company currently interested in a monolingual, single-byte solution will not want to migrate to a multilingual and multibyte solution in few years.
If you follow a few rules in database design and programming, you can run your single-byte application against an AL32UTF8 database, even if you do not get a multilingual system in this way. Such configuration has the huge advantage of avoiding the need of a complex and resource consuming task of migrating the database character set to Unicode in future, when your customer asks for multilingual support. Upgrading binaries of your application to an Unicode-enabled version is usually fast, migrating the database character set is not.
The main rules you should follow are:
1) Use character length semantics to define column and PL/SQL variable lengths, i.e. say VARCHAR2(10 CHAR) instead of VARCHAR2(10 [BYTE]). If you do not want to modify all creation scripts to include the CHAR keyword, issue ALTER SESSION SET NLS_LENGTH_SEMANTICS=CHAR at the beginning of each script. I recommend modifying the scripts.
2) Do not use VARCHAR2 columns longer than 1000 characters, CHAR columns longer than 500 characters, and PL/SQL VARCHAR2/CHAR variables longer than 8190 characters. This guarantees that in the future no AL32UTF8 string will exceed the hard limit of 4000/2000/32760 bytes. Use CLOB for longer text.
3) Use SUBSTR/LENGTH/INSTR in place of SUBSTRB/LENGTHB/INSTRB. Use SUBSTRB/LENGTHB/INSTRB only when dealing with legacy stuff or Data Dictionary that still use byte length semantics.
4) Define the client setting - mainly NLS_LANG - to correctly correspond to the character set processed by your application.
5) Modify interfaces to other databases, if any, to cope with the character length semantics. You do not have to do much if the other databases follow the same rules.
The cost of running the database in Unicode is not high for most languages, though languages that do not use Latin script, such as Russian, Greek, or Japanese, need significantly more storage for the textual data (but only textual data in those languages - this is only some fraction of all data in the database). Processing is slower by a few percent as compared to single-byte character sets (unless a lot of textual processing is performed in the database, in which case the percentage may be higher - benchmark recommended). This costs can be usually compensated by adding some more computing power (GHz and disks). Unless your application needs a VLDB (very large database) and almost saturates the system, you should not notice a big difference.
-- Sergiusz

Similar Messages

  • Character set conversion UTF-8 -- ISO-8859-1 generates question mark (?)

    I'm trying to convert an XML-file in UTF-8 format to another file with character set ISO-8859-1.
    My problem is that the ISO-8859-1 file generates a question mark (?) and puts it as a prefix in the file.
    ?<?xml version="1.0" encoding="UTF-8"?>
    <ns0:messagetype xmlns:ns0="urn:olof">
    <underkat>testv���rde</underkat>
    </ns0:messagetype>
    Is there a way to do the conversion without getting the question mark?
    My code looks as follows:
    public class ConvertEncoding {
         public static void main(String[] args) {
              String from = "UTF-8", to = "ISO-8859-1";
              String infile = "C:\\temp\\infile.xml", outfile = "C:\\temp\\outfile.xml";
              try {
                   convert(infile, outfile, from, to);
              } catch (Exception e) {
                   System.out.println(e.getMessage());
                   System.exit(1);
         private static void convert(String infile, String outfile,
                                            String from, String to)
                             throws IOException, UnsupportedEncodingException
              //Set up byte streams
              InputStream in = null;
              OutputStream out = null;
              if(infile != null) {
                   in = new FileInputStream(infile);
              if(outfile != null) {
                   out = new FileOutputStream(outfile);
              //Set up character streams
              Reader r = new BufferedReader(new InputStreamReader(in, from));
              Writer w = new BufferedWriter(new OutputStreamWriter(out, to));
              /*Copy characters from input to output.
               * The InputSreamreader converts
               * from Unicode to the output encoding.
               * Characters that cannot be represented in
               * the output encoding are output as '?'
              char[] buffer = new char[4096];
              int len;
              while((len = r.read(buffer))!= -1) { //Read a block of output
                   w.write(buffer, 0, len);
              r.close();
              w.flush();
              w.close();
    }

    Yes the next character is the '<'
    The file that I read from is generated by an integration platform. I send a plain file to it (supposedly in UTF-8 encoding) and it returns another file (in between I call my java class that converts the characterset from UTF-8 to ISO-8859-1). The file that I get back contains the '���' if the conversion doesn't work and '?' if the conversion worked.
    My solution so far is to skip the first "junk-characters" when reading from the inputstream. Something like:
    private static final char UTF_BOM = '\uFEFF'; //UTF-BOM = ?
    String from = "UTF-8", to = "ISO-8859-1";
    if (from != null && from.toLowerCase().startsWith("utf-")) { //Are we reading an UTF encoded file?
    /*Read first character of the UTF-Encoded file
    It will return '?' in the first position if we are dealing with UTF encoding If ? is returned we skip this character in the read
    try {
    r.mark(1); //Only allow to read one char for the reset function to work
    char c;
    int i = r.read();
    c = (char) i;
    if (String.valueOf(UTF_BOM).equalsIgnoreCase(String.valueOf(c))) {
    r.reset(); //reset to start position
    r.skip(1); //Skip first character when reading from the stream
    else {
    r.reset();
    } catch (IOException e) {
    e.getMessage();
    //return null;
    }

  • Character sets and conversions

    Hi all,
    were facing a quite complex problem, for which I'am not even able to specify were it is going wrong or what needs configuring, partly for lack of experience and partly for combining different tecnical areas from which I'm only responible for some of them.
    So I'll sketch breefly the situation, and hopefully you might give me some guidelines or hints as to where to look at.
    The setup : web application (so clients access by use of browser) on Weblogic- Linux platform, Tuxedo on Iseries , and as far as I understand some DB internally to Iseries where data is stored.
    Data is entered in the DB by use of some data-entry application that comes with the iSeries.
    The problem: consulting data by use of the web-aplication , some characters dont show up correctly , e.g. @ in email addresses, e's with accents, ...
    For the chain being "browser <-> WL <-> Tuxedo <-> DB" , the problem might be different points. But from trace beeing activated , we could see that the response going out of tuxedo to WL is not correct...
    Any hint as to what to look for, what can configuration is important, would be welcome ...
    Some sub-questions:
    - I understand Tuxedo is always "installed" in English , with no other option. This means that f.e. logs are in English.
    But can/need to define some character set?
    - Between Tuxedo <-> DB you can use som conversion tables ?
    Any help would be apreciated , were quite lost ..

    Hi,
    Given that you are running Tuxedo on iSeries, I'm guessing you are running Tuxedo 6.5 as the port for the current Tuxedo release on iSeries hasn't been released yet. Tuxedo 6.5 does not directly support multi-byte character strings. The two common buffer formats for string data in Tuxedo are STRING which doesn't support multi-byte characters, or CARRAY which does support multi-byte characters as a CARRAY is essentially a blob. Do you know what buffer type the Tuxedo application is using to send data to WebLogic Server?
    In Tuxedo 9.0 and later, direct support for multi-byte strings was added in the form of the MBSTRING buffer type. This buffer type supports multi-byte strings with a variety of character sets and encodings.
    Regards,
    Todd Little
    Oracle Tuxedo Chief Archiitect

  • Character Set issues.  Please advise

    I have a client who use a version 10gR2 DB that stores both English and French data. There are several times where they will send up .dmp file where we load it into ours.
    2 questions.
    What would be the best charater sets to use here in this setup?
    I am assuming we would use
    NLS_CHARACTERSET = WE8ISO9959P1
    NLS_NCHAR_CHARACTERSET = AL16UTF16
    Also if someone can confirm for me.
    NLS_CHARACTERSET = database character set ???
    NLS_NCHAR_CHARACTERSET = national character set???

    So is it better to say that I should use the AL32UTF8
    instead of AL16UTF16 ?It's not an instead of situation. AL32UTF8 is a valid setting for the database character set, which controls CHAR and VARCHAR2 columns. AL16UTF16 is a valid setting for the national character set which controls NCHAR and NVARCHAR2 columns.
    Could you tell me the difference?The difference between the two encodings comes down to how many bytes are required to store a particular code point (character). AL32UTF8 is a variable-length character set, so 1 character will require between 1 and 3 bytes of storage (4 for the supplemental characters but those are rather rare). AL16UTF16 is a fixed-width character set, so 1 character will require 2 bytes of storage (4 for the rare supplemental characters again).
    Also could you tell me the difference between
    WE8ISO8859P15 and WE8ISO8859P1 ? There's a Wikipedia article that discusses the differences and has links to the two different code tables.
    Werner's point is an excellent one as well. I was assuming that we were talking about how to set up both sides of this proposed system. If the source system already exists, there are additional considerations like ensuring that your target system supports a superset of the characters supported by the source system. Regardless, when doing imports & exports, as Werner points out, you need to ensure that NLS_LANG is set appropriately.
    Justin

  • Client-side greek character set problem

    hi everyone,
    i am a .net developer and absolutelly new to oracle, its my first project that i have make oracle and .net co-operate and it's proving to be a nightmare!
    i ll try and provide as much info as possible to you
    we have a unix sap server with oracle 10.something on it (i can check exactly what version if that's of importance) and i am writting a .net app which accesses the oracle db and retrieve some data we need
    the locale settings of the db are american_america.we8dec and i cannot temper with that machine, it is out of the question
    i have made the connection to the db using just connection string info, not tns and stuff, and i have set my dev machine's nls_lang to we8dec everywhere i could find it with a 'find' in the registry
    however my app displays the data (which is in greek) showing random symbols that obviously make to sense
    furthermore when i connect to the db via sql developer or navicat i still get the same symbols
    i understand it is something to do with my client system's settings but i cant work it out
    i would really appreciate your help i am sorry if it is something very simple but i just cant find it
    thanks

    user9084006 wrote:
    i am a .net developer and absolutelly new to oracle, its my first project that i have make oracle and .net co-operate and it's proving to be a nightmare!Welcome. It's a brand new universe, but if you take it step by step I'm sure you'll do fine. Get someone near with Oracle dev and dba background to guide you, so you don't get stuck or go down broken roads.
    we have a unix sap server with oracle 10.something on it (i can check exactly what version if that's of importance)It is of importance most of the time. Down to at least 4 positions (for example, "10g" is just a marketing label and mostly useless in a techincal context - 10.2.0.5 is more like it).
    Also mention platform (unix OS) details.
    and i am writting a .net app which accesses the oracle db and retrieve some data we needHow is the to-be retrieved data loaded or entered to begin?
    >
    the locale settings of the db are american_america.we8dec and i cannot temper with that machine, it is out of the question Please post output from:
    SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET';
    and i have set my dev machine's nls_lang to we8dec everywhere i could find it with a 'find' in the registryDoes your dev (client) machine use a "we8dec" character set? Likely, as you seem to be on Windows, you should have set nls_lang client char set part to match win-1252 or whatever client char set (code page) is in use.
    however my app displays the data (which is in greek) showing random symbols that obviously make to senseNo surprise since DEC character set does not have a greek character in its repertoire. If I'm not all mistaken DEC MCS (or similar) is the charcter set to which WE8DEC corresponds. You could check against mapping table in Locale builder, but available characters should be per following link or close enough.
    http://czyborra.com/charsets/iso8859.html#ISO-8859-1 (second chart is DEC-MCS)
    Please provide a sample of those "random symbols".
    furthermore when i connect to the db via sql developer or navicat i still get the same symbolsIf that's Oracle SQL Developer it should give a true picture of what's actually stored - which, unfortuantely, seem to be invalid character data.
    i understand it is something to do with my client system's settings but i cant work it out See above. But even if you correctly setup client environment, the db won't be able to store the data if databas character set is in fact WE8DEC.
    >
    i would really appreciate your help i am sorry if it is something very simple but i just cant find it I'm not sure, but it would seem as you have been given unfeasible conditions to work from. So maybe it's not up to you, until a character set migration has been taken place.
    In general, the Database character set should be a suitable superset (repertoire) that covers all current (and hopefully future) language alphabets requirements.
    You might want to search forum for 'we8dec' to find previous related discussions.
    Edit:
    - Added url with DEC MCS.
    - platform info
    Edited by: orafad on Jan 26, 2012 12:28 PM
    Edited by: orafad on Jan 26, 2012 12:43 PM

  • JSF and Character Sets (UTF-8)

    Hi all,
    This question might have been asked before, but I'm going to ask it anyway because I'm completely puzzled by how this works in JSF.
    Let's begin with the basics, I have an application running on an OC4J servlet container, and am using JSF 1.1 (MyFaces). The problems I am having with this setup, is that it seems that the character encodings I want the server/client to use are not coming across correctly. I'm trying to enforce the application to be UTF-8, but after the response is rendered to my client, I've magically been reverted to ISO-8859-1, which is the main character set for the netherlands. However, I'm building the application to support proper internationalization; which means I NEED to use UTF-8.
    I've executed the following steps to reach this goal:
    - All JSP files contain page directives, noting the character set:
    <%@ page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8" %>I've checked the generated source that comes from the JSP's, it looks as expected.
    - I've created a servlet filter to set the character set directly on the request and response objects:
        public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException {
            // Set the characterencoding for the request and response streams.
            req.setCharacterEncoding("UTF-8");
            res.setContentType("text/html; charset=UTF-8");       
            // Complete (continue) the processing chain.
            chain.doFilter(req, res); 
        }I've debugged the code, and this works fine, except for where JSF comes in. If I use the above situation, without going through JSF, my pages come back UTF-8. When I go through JSF, my pages come back as ISO-8859-1. I'm baffled as to what is causing this. On several forums, writing a filter was proposed as the solution, however this doesn't do it for me.
    It looks like somewhere internally in JSF the character set is changed to ISO. I've been through the sources, and I've found several pieces of code that support that theory. I've seen portions of code where the character set for the response is set to that of the request. Which in my case coming from a dutch system, will be ISO.
    How can this be prevented? Can anyone give some good insight on the inner workings of JSF with regards to character sets in specific? Could this be a servlet container problem?
    Many thanks in advance for your assistance,
    Jarno

    Jarno,
    I've been investigating JSF and character encodings a bit this weekend. And I have to say it's more than a little confusing. But I may have a little insight as to what's going on here.
    I have a post here:
    http://forum.java.sun.com/thread.jspa?threadID=725929&tstart=45
    where I have a number of open questions regarding JSF 1.2's intended handling of character encodings. Please feel free to comment, as you're clearly struggling with some of the same questions I have.
    In MyFaces JSF 1.1 and JSF-RI 1.2 the handling appears to be dependent on the raw Content-Type header. Looking at the MyFaces implementation here -
    http://svn.apache.org/repos/asf/myfaces/legacy/tags/JSF_1_1_started/src/myfaces/org/apache/myfaces/application/jsp/JspViewHandlerImpl.java
    (which I'm not sure is the correct code, but it's the best I've found) it looks like the raw header Content-Type header is being parsed in handleCharacterEncoding. The resulting value (if not null) is used to set the request character encoding.
    The JSF-RI 1.2 code is similar - calculateCharacterEncoding(FacesContext) in ViewHandler appears to parse the raw header, as opposed to using the CharacterEncoding getter on ServletRequest. This is understandable, as this code should be able to handle PortletRequests as well as ServletRequests. And PortletRequests don't have set/getCharacterEncoding methods.
    My first thought is that calling setCharacterEncoding on the request in the filter may not update the raw Content-Type header. (I haven't checked if this is the case) If it doesn't, then the raw header may be getting reparsed and the request encoding getting reset in the ViewHandler. I'd suggest that you check the state of the Content-Type header before and after your call to req.setCharacterEncoding('UTF-8"). If the header charset value is unset or unchanged after this call, you may want to update it manually in your Filter.
    If that doesn't work, I'd suggest writing a simple ViewHandler which prints out the request's character encoding and the value of the Content-Type header to your logs before and after the calls to the underlying ViewHandler for each major method (i.e. renderView, etc.)
    Not sure if that's helpful, but it's my best advice based on the understanding I've reached to date. And I definitely agree - documentation on this point appears to be lacking. Good luck
    Regards,
    Peter

  • Oracle to Mysql character set conversion problem!!! PLZ HELP

    Hi Experts,
    I have created a database link from Oracle 10g to Mysql 5.
    I have installed Oracle Gateway 11g for this purpose.
    When i retreive the data from sql plus the text is displayed as question marks.
    Oracle 10g Database character set is WE8MSWIN1252
    Mysql character set --->latin1
    Character set of ODBC connector for mysql is  latin7
    Character set in the parameter file of HS folder is WE8MSWIN1252When i retrieve data from sql developer the text is fine(as i think it directly takes the character set of target) but
    when i login from sqlplus i get question marks!
    Appreciate your help,
    regards

    thank you for replying damorgan,
    my previous two threads in the "heterogeneous Connectivity" forum were for different issues, one was to enquire as to how i could connect from oracle to mysql(which i have marked as answered), the other is for error when i get when i tried accessing data(which i am still facing on my office machine ).
    I followed the steps from these two threads and was able to successfully connect to mysql on my personal PC at home, but faced some problem with text not displayed so i created this thread.
    I had created another thread similar to this in the globalisation support as i was facing issue with the character sets in a heterogenous setup, so wasen't clear as to which forum would be suitable for this issue.
    My apologies to everyone if this has offended you.

  • Lanovo ix4 300d and european character set for samba shares

    Hi,
    I am installing a couple of IX4-300d at a client site, struggling to set and fix the samba character set for the Windows shares. Here in Italy everyone is used to put western european characters into filenames (for example è ò ° €) but the NAS acts wierdly when it finds such caracters in folder names and file names.
    As you know samba gives the possibility to declare dos charset= ISO8859-1 and unix charset= ISO8859-1 in the smb.conf file but the option is not present in the IX4 setup pages.
    Is there an easy way to circumvent the problem and fix this option? As it is now the machine is fearly useless, at least if you need to migrate files and folder from a server or storage with ISO8859 charset already applied.
    thanks a lot
    np
    Solved!
    Go to Solution.

    This issue has been raised on all iomega drives since day 1. although 'documented' it is still shame, that units sold to 'r-o-w' means rest-of-world, outside US, are stuckto Aa-Zz09 basically wheras client systems ( windows, linux, ios ) don't care and take what they are supposed to do.
    Take it or leave it. ( I mean, take another vendor), I do not believe someone ever has taken that seriously or will do it......
    Various PCs / Laptops ( sorry I still really love Dell and Fujitsu ;-))
    Supporting Customers ix2s and ix4s -- Love Networking ( not only technically ).
    I am not a Lenovo Employee.
    If you find a post helpful and it answers your question, please mark it as an "Accepted Solution"!

  • Character Set Inquiry

    I have a database which is installed with ASCII as character set and I can store chinese in it. I find that it is possible to install the dB with character set for BIG5 or GB for storing chinese. What's the difference?
    Moreover, how oracle perform sorting for BIG5 and GB?

    Exactly 1TB? Slightly under/slightly over?
    Traditional Solaris disk labels are in VTOC format, but this format cannot describe disks larger than 1TB. So EFI labels must be used on disks larger than 1TB. Setup is slightly different.
    Are these physical disks or LUNs from a SAN array? If they are array LUNS, it is often the case that they don't have a Sun label of any type. So...
    #1 Apply a Solaris label
    If the LUNS don't have a label (when selected in 'format', it gives a warning that no label is present and offers to apply a label immediately). When run non-interactively, format assumes "yes" for any questions. So all you'd have to do is select every disk to have it apply labels to any unlabled disk. Run 'format' once and find the highest number (maybe it's 50 for you). Create a text file that looks like this:
    disk 1
    disk 2
    disk 3
    disk 50Then feed that to format like this:
    # format -f /tmp/disklist or whatever you've named the file.
    #2 Apply the partition layout to all disks you want.
    You asked if you should do the same procedure, but I don't see that you've actually done anything above other than print out the existing layout. Take one of your 48 drives and partition it the way you want manually (set the slices to the sizes that you want). Then you can copy the layout of that disk to others. You only want to do this between disks/LUNs of the same size. As an example, if you've explicitly partitioned c1t0d0 and you want to apply this to c1t1d0, do this:
    # prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2Repeat for all of your other disks.
    Darren

  • HOW can I enter text using Japanese character sets?

    The "Text, Plates, Insets" section of the LOOKOUT(6.01) Help files states:
    "Click the » button to the right of the Text field to expand the field for multiple line entries. You can enter text using international character sets such as Chinese, Korean, and Japanese."
    Can someone please explain HOW to do this? Note, I have NO problem inputting Hirigana, Katakana, and Kanji into MS WORD; the keyboard emulates the Japanese layout and characters (Romaji is default) and the IME works fine converting Romaji, and I can also select charcters directly from the IME Pad. I have tried several different fonts with success and am currently using MS UI Gothic.ttf as default. Again, everything is normal and working in a predictable manner within Word.
    I cannot get these texts into Lookout. I can't cut/paste from HTML pages or from text editors, even though both display properly. Within Lookout with JP selected as language/keyboard, when trying to type directly into the text field, the IME CORRECTLY displays Hirigana until <enter> is pressed, at which point all text reverts to question marks (?? ???? ? ?????). If I use the IME Pad, it does pretty much the same. I managed to get the "Yen" symbol to display, though, if that's relevant. As I said, font selected (in text/plate font options) is MS UI Gothic with Japanese as the selected script. Oddly enough, at this point the "sample" window is showing me the exact Hirigana character I want displayed in Lookout, but it won't. I've also tried staying in English and copying unicode characters from the Windows Character Map. Same results (Yen sign works, Hirigana WON'T).
    Help me!
    JW_Tech

    JW_Tech,
    Have you changed the regional setting to Japanese?
    Doug M
    Applications Engineer
    National Instruments
    For those unfamiliar with NBC's The Office, my icon is NOT a picture of me
    Attachments:
    language.JPG ‏50 KB

  • How to add a new character set encoding?

    Hello,
    can anybody please explain to me, how to add a new character set encoding to Mac OS Tiger?
    I have two Mac laptops, a new one with Snow Leopard and an older one with Tiger, and on the old one i cannot use or enable anywhere the "Russian (DOS)" character set encoding, which i need to be able to use some old text files.
    On the Snow Leopard, this encoding is present in the list of available encodings of TextWrangler, but not in TIger.
    If i have understood correctly, this is not a problem of TextWrangler, and the same encodings are available systemwide.
    So, the question is: how to add new encodings to Tiger (or to Mac OS in general)?
    Thanks.

    I think possibly that's in the Get Info window of Finder?
    I don't think either that or the input menu have any effect on available encoding choices. Adding languages to system prefs/international/languages can do that, but once you have added Russian there, I don't know of any way to add an additional Russian encoding (there are quite a number of them).

  • Java Character set error while loding data using iSetup

    Hi,
    I am getting the following error while migrating settup data from R12 (12.1.2) Instance to another R12 (12.1.2) Instance, Both the Database has same DB character set (AL32UTF8)
    we are getting this error while migrating any setup data
    Actual error is
    Downloading the extract from central instance
    Successfully copied the Extract
    Time taken to download Extract and write as zip file = 0 seconds
    Validating Primary Extract...
    Source Java Charset: AL32UTF8
    Target Java Charset: UTF-8
    Target Java Charset does not match with Source Java Charset
    java.lang.Exception: Target Java Charset does not match with Source Java Charset
         at oracle.apps.az.r12.common.cpserver.PreValidator.validate(PreValidator.java:191)
         at oracle.apps.az.r12.loader.cpserver.APILoader.callAPIs(APILoader.java:119)
         at oracle.apps.az.r12.loader.cpserver.LoaderContextImpl.load(LoaderContextImpl.java:66)
         at oracle.apps.az.r12.loader.cpserver.LoaderCp.runProgram(LoaderCp.java:65)
         at oracle.apps.fnd.cp.request.Run.main(Run.java:157)
    Error while loading apis
    java.lang.NullPointerException
         at oracle.apps.az.r12.loader.cpserver.APILoader.callAPIs(APILoader.java:158)
         at oracle.apps.az.r12.loader.cpserver.LoaderContextImpl.load(LoaderContextImpl.java:66)
         at oracle.apps.az.r12.loader.cpserver.LoaderCp.runProgram(LoaderCp.java:65)
         at oracle.apps.fnd.cp.request.Run.main(Run.java:157)
    Please help in identifying and resolving the issue
    Sachin

    The Source and Target DB character set is same
    Output from the query
    ------------- Source --------------
    SQL> select value from nls_database_parameters where parameter='NLS_CHARACTERSET';
    VALUE
    AL32UTF8
    And target Instance
    -------------- Target----------------------
    SQL> select value from nls_database_parameters where parameter='NLS_CHARACTERSET';
    VALUE
    AL32UTF8
    The Error is about Source and Target JAVA Character set
    I will check the Prevalidator xml from How to use iSetup and update the note
    Thanks
    Sachin

  • Character set migration error to UTF8 urgent

    Hi
    when we migrated from ar8iso889p6 to utf8 characterset we are facing one error when i try to compile one package through forms i am getting error program unit pu not found.
    When i running the source code of that procedure direct from database using sqlplus its running wihtout any problem.How can i migrate this forms from ar8iso889p6 to utf8 characterset. We migrated from databas with ar8iso889p6 oracle 81.7 database to oracle 9.2. database with character set UTF8 (windows 2000) export and import done without any error
    I am using oracle 11i inside the calling forms6i and reports 6i
    with regards
    ramya
    1) this is server side program yaa when connecting with forms i am getting error .When i am running this program using direct sql its working when i running compiling i am getting this error.
    3) yes i am using 11 i (11.5.10) inside its calling forms 6i and reports .Why this is giving problem using forms.Is there any setting changing in forms nls_lang
    with regards

    Hi Ramya
    what i understand from your question is that you are trying to compile a procedure from a forms interface at client side?
    if yes you should check the code in the forms that is calling the compilation package.
    does it contains strings that might be affected from the character set change???
    Tony G.

  • Precautions i need to take when changing the Character set

    Hi,
    ORACLE VERSION: 10G Release 1 (10.1.0.3.0)
    I am going to change my database's characterset from AL32UTF8 to WE8MSWIN1252 character set and AL16UTF16 NCHAR character set. So i have few questions for you.
    1. What is the difference between Character Set and National Character set? Do i have to set both?
    2. What are precautions that i need to take while changing the characterset?
    3. What are JOB_QUEUE_PROCESSES and AQ_TM_PROCESSES parameters in Plain English? Why do i have to set these parameters to 0 as mentioned in this post below.
    Storing Chinese in Oracle Database

    1) The database character set controls (and specifies) the character set of CHAR & VARCHAR2 columns. The national character set controls the character set of NCHAR & NVARCHAR2 columns.
    2) Please make sure that you read the section of the Globalization manual that discusses character set migration. In particular, going from UTF-8 to Windows-1252 is going to require a bit more work since the latter is a subset (and not a strict binary subset) of the former.
    Justin

  • Fixing a US7ASCII - WE8ISO8859P1 Character Set Conversion Disaster

    In hopes that it might be helpful in the future, here's the procedure I followed to fix  a disastrous unintentional US7ASCII on 9i to WE8ISO8859P1 on 10g migration.
    BACKGROUND
    Oracle has multiple character sets, ranging from US7ASCII to AL32UTF16.
    US7ASCII, of course, is a cheerful 7 bit character set, holding the basic ASCII characters sufficient for the English language.
    However, it also has a handy feature: character fields under US7ASCII will accept characters with values > 128. If you have a web application, users can type (or paste) Us with umlauts, As with macrons, and quite a few other funny-looking characters.
    These will be inserted into the database, and then -- if appropriately supported -- can be selected and displayed by your app.
    The problem is that while these characters can be present in a VARCHAR2 or CLOB column, they are not actually legal. If you try within Oracle to convert from US7ASCII to WE8ISO8859P1 or any other character set, Oracle recognizes that these characters with values greater than 127 are not valid, and will replace them with a default "unknown" character. In the case of a change from US7ASCII to WE8ISO8859P1, it will change them to 191, the upside down question mark.
    Oracle has a native utility, introduced in 8i, called csscan, which assists in migrating to different character sets. This has been replaced in newer versions with the Database MIgration Assistant for Unicode (DMU), which is the new recommended tool for 11.2.0.3+.
    These tools, however, do no good unless they are run. For my particular client, the operations team took a database running 9i and upgraded it to 10g, and as part of that process the character set was changed from US7ASCII to WE8ISO8859P1. The database had a large number of special characters inserted into it, and all of these abruptly turned into upside-down question marks. The users of the application didn't realize there was a problem until several weeks later, by which time they had put a lot of new data into the system. Rollback was not possible.
    FIXING THE PROBLEM
    How fixable this problem is and the acceptable methods which can be used depend on the application running on top of the database. Fortunately, the client app was amenable.
    (As an aside note: this approach does not use csscan -- I had done something similar previously on a very old system and decided it would take less time in this situation to revamp my old procedures and not bring a new utility into the mix.)
    We will need to separate approaches -- one to fix the VARCHAR2 & CHAR fields,  and a second for CLOBs.
    In order to set things up, we created two environments. The first was a clone of production as it is now, and the second a clone from before the upgrade & character set change. We will call these environments PRODCLONE and RESTORECLONE.
    Next, we created a database link, OLD6. This allows PRODCLONE to directly access RESTORECLONE. Since they were cloned with the same SID, establishing the link needed the global_names parameter set to false.
    alter system set global_names=false scope=memory;
    CREATE PUBLIC DATABASE LINK OLD6
    CONNECT TO DBUSERNAME
    IDENTIFIED BY dbuserpass
    USING 'restoreclone:1521/MYSID';
    Testing the link...
    SQL> select count(1) from users@old6;
      COUNT(1)
           454
    Here is a row in a table which contains illegal characters. We are accessing RESTORECLONE from PRODCLONE via our link.
    PRODCLONE> select dump(title) from my_contents@old6 where pk1=117286;
    DUMP(TITLE)
    Typ=1 Len=49: 78,67,76,69,88,45,80,78,174,32,69,120,97,109,32,83,116,121,108,101
    ,32,73,110,116,101,114,97,99,116,105,118,101,32,82,101,118,105,101,119,32,81,117
    ,101,115,116,105,111,110,115
    By comparison, a dump of that row on PRODCLONE's my_contents gives:
    PRODCLONE> select dump(title) from my_contents where pk1=117286;
    DUMP(TITLE)
    Typ=1 Len=49: 78,67,76,69,88,45,80,78,191,32,69,120,97,109,32,83,116,121,108,101
    ,32,73,110,116,101,114,97,99,116,105,118,101,32,82,101,118,105,101,119,32,81,117
    ,101,115,116,105,111,110,115
    Note that the "174" on RESTORECLONE was changed to "191" on PRODCLONE.
    We can manually insert CHR(174) into our PRODCLONE and have it display successfully in the application.
    However, I tried a number of methods to copy the data from RESTORECLONE to PRODCLONE through the link, but entirely without success. Oracle would recognize the character as invalid and silently transform it.
    Eventually, I located a clever workaround at this link:
    https://kr.forums.oracle.com/forums/thread.jspa?threadID=231927
    It works like this:
    On RESTORECLONE you create a view, vv, with UTL_RAW:
    RESTORECLONE> create or replace view vv as select pk1,utl_raw.cast_to_raw(title) as title from my_contents;
    View created.
    This turns the title to raw on the RESTORECLONE.
    You can now convert from RAW to VARCHAR2 on the PRODCLONE database:
    PRODCLONE> select dump(utl_raw.cast_to_varchar2 (title)) from vv@old6 where pk1=117286;
    DUMP(UTL_RAW.CAST_TO_VARCHAR2(TITLE))
    Typ=1 Len=49: 78,67,76,69,88,45,80,78,174,32,69,120,97,109,32,83,116,121,108,101
    ,32,73,110,116,101,114,97,99,116,105,118,101,32,82,101,118,105,101,119,32,81,117
    ,101,115,116,105,111,110,115
    The above works because oracle on PRODCLONE never knew that our TITLE string on RESTORE was originally in  US7ASCII, so it was unable to do its transparent character set conversion.
    PRODCLONE> update my_contents set title=( select utl_raw.cast_to_varchar2 (title) from vv@old6 where pk1=117286) where pk1=117286;
    PRODCLONE> select dump(title) from my_contents where pk1=117286;
    DUMP(UTL_RAW.CAST_TO_VARCHAR2(TITLE))
    Typ=1 Len=49: 78,67,76,69,88,45,80,78,174,32,69,120,97,109,32,83,116,121,108,101
    ,32,73,110,116,101,114,97,99,116,105,118,101,32,82,101,118,105,101,119,32,81,117
    ,101,115,116,105,111,110,115
    Excellent! The "174" character has survived the transfer and is now in place on PRODCLONE.
    Now that we have a method to move the data over, we have to identify which columns /tables have character data that was damaged by the conversion. We decided we could ignore anything with a length smaller than 10 -- such fields in our application would be unlikely to have data with invalid characters.
    RESTORECLONE> select count(1) from user_tab_columns where data_type in ('CHAR','VARCHAR2') and data_length > 10;
       COUNT(1)
        533
    By converting a field to WE8ISO8859P1, and then comparing it with the original, we can see if the characters change:
    RESTORECLONE> select count(1) from my_contents where title != convert (title,'WE8ISO8859P1','US7ASCII') ;
      COUNT(1)
         10568
    So 10568 rows have characters which were transformed  into 191s as part of the original conversion.
    [ As an aside, we can't use CONVERT() on LOBs -- for them we will need another approach, outlined further below.
    RESTOREDB> select count(1) from my_contents where main_data != convert (convert(main_DATA,'WE8ISO8859P1','US7ASCII'),'US7ASCII','WE8ISO8859P1') ;
    select count(1) from my_contents where main_data != convert (convert(main_DATA,'WE8ISO8859P1','US7ASCII'),'US7ASCII','WE8ISO8859P1')
    ERROR at line 1:
    ORA-00932: inconsistent datatypes: expected - got CLOB
    Anyway, now that we can identify VARCHAR2 fields which need to be checked, we can put together a PL/SQL stored procedure to do it for us:
    create or replace procedure find_us7_strings
    (table_name varchar2,
    fix_col varchar2 )
    authid current_user
    as
    orig_sql varchar2(1000);
    begin
    orig_sql:='insert into cnv_us7(mytablename,myindx,mycolumnname)  select '''||table_name||''',pk1,'''||fix_col||''' from '||table_name||' where '||fix_col||' !=  CONVERT(CONVERT('||fix_col||',''WE8ISO8859P1''),''US7ASCII'') and '||fix_col||' is not null';
    -- Uncomment if debugging:
    -- dbms_output.put_line(orig_sql);
      execute immediate orig_sql;
    end;
    And create a table to store the information as to which tables, columns, and rows have the bad characters:
    drop table cnv_us7;
    create table cnv_us7 (mytablename varchar2(50), myindx number,      mycolumnname varchar2(50) ) tablespace myuser_data;
    create index list_tablename_idx on cnv_us7(mytablename) tablespace myuser_indx;
    With a SQL-generating SQL script, we can iterate through all the tables/columns we want to check:
    --example of using the data: select title from my_contents where pk1 in (select myindx from cnv_us7)
    set head off pagesize 1000 linesize 120
    spool runme.sql
    select 'exec find_us7_strings ('''||table_name||''','''||column_name||'''); ' from user_tab_columns
          where
              data_type in ('CHAR','VARCHAR2')
              and table_name in (select table_name from user_tab_columns where column_name='PK1' and  table_name not  in ('HUGETABLEIWANTTOEXCLUDE','ANOTHERTABLE'))
              and char_length > 10
              order by table_name,column_name;
    spool off;
    set echo on time on timing on feedb on serveroutput on;
    spool output_of_runme
    @./runme.sql
    spool off;
    Which eventually gives us the following inserted into CNV_US7:
    20:48:21 SQL> select count(1),mycolumnname,mytablename from cnv_us7 group by mytablename,mycolumnname;
             4 DESCRIPTION                                        MY_FORUMS
         21136 TITLE                                              MY_CONTENTS
    Out of 533 VARCHAR2s and CHARs, we only had five or six columns that needed fixing
    We create our views on  RESTOREDB:
    create or replace view my_forums_vv as select pk1,utl_raw.cast_to_raw(description) as description from forum_main;
    create or replace view my_contents_vv as select pk1,utl_raw.cast_to_raw(title) as title from my_contents;
    And then we can fix it directly via sql:
    update my_contents taborig1 set TITLE= (select utl_raw.cast_to_varchar2 (TITLE) from my_contents_vv@old6 where pk1=taborig1.pk1)
    where pk1 in (
    select tabnew.pk1 from my_contents@old6 taborig,my_contents tabnew,cnv_us7@old6
          where taborig.pk1=tabnew.pk1
              and myindx=tabnew.pk1
              and mycolumnname='TITLE'
              and mytablename='MY_CONTENTS'
              and convert(taborig.TITLE,'US7ASCII','WE8ISO8859P1') = tabnew.TITLE );
    Note this part:
          "and convert(taborig.TITLE,'US7ASCII','WE8ISO8859P1') = tabnew.TITLE "
    This checks to verify that the TITLE field on the PRODCLONE and RESTORECLONE are the same (barring character set issues). This is there  because if the users have changed TITLE  -- or any other field -- on their own between the time of the upgrade and now, we do not want to overwrite their changes. We make the assumption that as part of the process, they may have changed the bad character on their own.
    We can also create a stored procedure which will execute the SQL for us:
    create or replace procedure fix_us7_strings
    (TABLE_NAME varchar2,
    FIX_COL varchar2 )
    authid current_user
    as
    orig_sql varchar2(1000);
    TYPE cv_type IS REF CURSOR;
    orig_cur cv_type;
    begin
    orig_sql:='update '||TABLE_NAME||' taborig1 set '||FIX_COL||'= (select utl_raw.cast_to_varchar2 ('||FIX_COL||') from '||TABLE_NAME||'_vv@old6 where pk1=taborig1.pk1)
    where pk1 in (
    select tabnew.pk1 from '||TABLE_NAME||'@old6 taborig,'||TABLE_NAME||' tabnew,cnv_us7@old6
          where taborig.pk1=tabnew.pk1
              and myindx=tabnew.pk1
              and mycolumnname='''||FIX_COL||'''
              and mytablename='''||TABLE_NAME||'''
              and convert(taborig.'||FIX_COL||',''US7ASCII'',''WE8ISO8859P1'') = tabnew.'||FIX_COL||')';
    dbms_output.put_line(orig_sql);
    execute immediate orig_sql;
    end;
    exec fix_us7_strings('MY_FORUMS','DESCRIPTION');
    exec fix_us7_strings('MY_CONTENTS','TITLE');
    commit;
    To validate this before and after, we can run something like:
    select dump(description) from my_forums where pk1 in (select myindx from cnv_us7@old6 where mytablename='MY_FORUMS');
    The above process fixes all the VARCHAR2s and CHARs. Now what about the CLOB columns?
    Note that we're going to have some extra difficulty here, not just because we are dealing with CLOBs, but because we are working with CLOBs in 9i, whose functions have less CLOB-related functionality.
    This procedure finds invalid US7ASCII strings inside a CLOB in 9i:
    create or replace procedure find_us7_clob
    (table_name varchar2,
    fix_col varchar2)
    authid current_user
    as
      orig_sql varchar2(1000);
      type cv_type is REF CURSOR;
      orig_table_cur cv_type;
      my_chars_read NUMBER;
      my_offset NUMBER;
      my_problem NUMBER;
      my_lob_size NUMBER;
      my_indx_var NUMBER;
      my_total_chars_read NUMBER;
      my_output_chunk VARCHAR2(4000);
      my_problem_flag NUMBER;
      my_clob CLOB;
      my_total_problems NUMBER;
      ins_sql VARCHAR2(4000);
    BEGIN
       DBMS_OUTPUT.ENABLE(1000000);
       orig_sql:='select pk1,dbms_lob.getlength('||FIX_COL||') as cloblength,'||fix_col||' from '||table_name||' where dbms_lob.getlength('||fix_col||') >0 and '||fix_col||' is not null order by pk1';
       open orig_table_cur for orig_sql;
       my_total_problems := 0;
       LOOP
            FETCH orig_table_cur INTO my_indx_var,my_lob_size,my_clob;
                    EXIT WHEN orig_table_cur%NOTFOUND;
            my_offset :=1;
            my_chars_read := 512;
            my_problem_flag :=0;
            WHILE my_offset < my_lob_size and my_problem_flag =0
                    LOOP
                    DBMS_LOB.READ(my_clob,my_chars_read,my_offset,my_output_chunk);
                    my_offset := my_offset + my_chars_read;
                    IF my_output_chunk != CONVERT(CONVERT(my_output_chunk,'WE8ISO8859P1'),'US7ASCII')
                            THEN
                            -- DBMS_OUTPUT.PUT_LINE('Problem with '||my_indx_var);
                            -- DBMS_OUTPUT.PUT_LINE(my_output_chunk);
                            my_problem_flag:=1;
                    END IF;
            END LOOP;
            IF my_problem_flag=1
                    THEN my_total_problems := my_total_problems +1;
                    ins_sql:='insert into cnv_us7(mytablename,myindx,mycolumnname) values ('''||table_name||''','||my_indx_var||','''||fix_col||''')';
                    execute immediate ins_sql;
                    END IF;
       END LOOP;
       DBMS_OUTPUT.PUT_LINE('We found '||my_total_problems||' problem rows in table '||table_name||', column '||fix_col||'.');
    END;
    And we can use SQL-generating SQL to find out which CLOBs have issues, out of all the ones in the database:
    RESTOREDB> select 'exec find_us7_clob('''||table_name||''','''||column_name||''');' from user_tab_columns where data_type='CLOB';
    exec find_us7_clob('MY_CONTENTS','DATA');
    After completion, the CNV_US7 table looked like this:
    RESTOREDB> set linesize 120 pagesize 100;
    RESTOREDB>  select count(1),mytablename,mycolumnname from cnv_us7
       where mytablename||' '||mycolumnname in (select table_name||' '||column_name from user_tab_columns
             where data_type='CLOB' )
          group by mytablename,mycolumnname;
      COUNT(1) MYTABLENAME                                        MYCOLUMNNAME
         69703 MY_CONTENTS                                  DATA
    On RESTOREDB, our 9i version, we will use this procedure (found many years ago on the internet):
    create or replace procedure CLOB2BLOB (p_clob in out nocopy clob, p_blob in out nocopy blob) is
    -- transforming CLOB to BLOB
    l_off number default 1;
    l_amt number default 4096;
    l_offWrite number default 1;
    l_amtWrite number;
    l_str varchar2(4096 char);
    begin
    loop
    dbms_lob.read ( p_clob, l_amt, l_off, l_str );
    l_amtWrite := utl_raw.length ( utl_raw.cast_to_raw( l_str) );
    dbms_lob.write( p_blob, l_amtWrite, l_offWrite,
    utl_raw.cast_to_raw( l_str ) );
    l_offWrite := l_offWrite + l_amtWrite;
    l_off := l_off + l_amt;
    l_amt := 4096;
    end loop;
    exception
    when no_data_found then
    NULL;
    end;
    We can test out the transformation of CLOBs to BLOBs with a single row like this:
    drop table my_contents_lob;
    Create table my_contents_lob (pk1 number,data blob);
    DECLARE
          v_clob CLOB;
          v_blob BLOB;
        BEGIN
          SELECT data INTO v_clob FROM my_contents WHERE pk1 = 16 ;
          INSERT INTO my_contents_lob (pk1,data) VALUES (16,empty_blob() );
          SELECT data INTO v_blob FROM my_contents_lob WHERE pk1=16 FOR UPDATE;
          clob2blob (v_clob, v_blob);
        END;
    select dbms_lob.getlength(data) from my_contents_lob;
    DBMS_LOB.GETLENGTH(DATA)
                                 329
    SQL> select utl_raw.cast_to_varchar2(data) from my_contents_lob;
    UTL_RAW.CAST_TO_VARCHAR2(DATA)
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam...
    Now we need to push it through a loop. Unfortunately, I had trouble making the "SELECT INTO" dynamic. Thus I used a version of the procedure for each table. It's aesthetically displeasing, but at least it worked.
    create table my_contents_lob(pk1 number,data blob);
    create index my_contents_lob_pk1 on my_contents_lob(pk1) tablespace my_user_indx;
    create or replace procedure blob_conversion_my_contents
    (table_name varchar2,
    fix_col varchar2)
    authid current_user
    as
      orig_sql varchar2(1000);
      type cv_type is REF CURSOR;
      orig_table_cur cv_type;
      my_chars_read NUMBER;
      my_offset NUMBER;
      my_problem NUMBER;
      my_lob_size NUMBER;
      my_indx_var NUMBER;
      my_total_chars_read NUMBER;
      my_output_chunk VARCHAR2(4000);
      my_problem_flag NUMBER;
      my_clob CLOB;
      my_blob BLOB;
      my_total_problems NUMBER;
      new_sql VARCHAR2(4000);
    BEGIN
      DBMS_OUTPUT.ENABLE(1000000);
       orig_sql:='select pk1,dbms_lob.getlength('||FIX_COL||') as cloblength,'||fix_col||' from '||table_name||' where pk1 in (select myindx from cnv_us7 where mytablename='''||TABLE_NAME||''' and mycolumnname='''||FIX_COL||''') order by pk1';
       open orig_table_cur for orig_sql;
       LOOP
            FETCH orig_table_cur INTO my_indx_var,my_lob_size,my_clob;
                    EXIT WHEN orig_table_cur%NOTFOUND;
            new_sql:='INSERT INTO '||table_name||'_lob(pk1,'||fix_col||') values ('||my_indx_var||',empty_blob() )';
            dbms_output.put_line(new_sql);
          execute immediate new_sql;
    -- Here's the bit that I had trouble making dynamic. Feel free to let me know what I am doing wrong.
    -- new_sql:='SELECT '||fix_col||' INTO my_blob from '||table_name||'_lob where pk1='||my_indx_var||' FOR UPDATE';
    --        dbms_output.put_line(new_sql);
            select data into my_blob from my_contents_lob where pk1=my_indx_var FOR UPDATE;
          clob2blob(my_clob,my_blob);
       END LOOP;
       CLOSE orig_table_cur;
      DBMS_OUTPUT.PUT_LINE('Completed program');
    END;
    exec blob_conversion_my_contents('MY_CONTENTS','DATA');
    Verify that things work properly:
    select dump( utl_raw.cast_to_varchar2(data))  from my_contents_lob where pk1=xxxx;
    This should let you see see characters > 150. Thus, the method works.
    We can now take this data, export it from RESTORECLONE
    exp file=a.dmp buffer=4000000 userid=system/XXXXXX tables=my_user.my_contents rows=y
    and import the data on prodclone
    imp file=a.dmp fromuser=my_user touser=my_user userid=system/XXXXXX buffer=4000000;
    For paranoia's sake, double check that it worked properly:
    select dump( utl_raw.cast_to_varchar2(data))  from my_contents_lob;
    On our 10g PRODCLONE, we'll use these stored procedures:
    CREATE OR REPLACE FUNCTION CLOB2BLOB(L_CLOB CLOB) RETURN BLOB IS
    L_BLOB BLOB;
    L_SRC_OFFSET NUMBER;
    L_DEST_OFFSET NUMBER;
    L_BLOB_CSID NUMBER := DBMS_LOB.DEFAULT_CSID;
    V_LANG_CONTEXT NUMBER := DBMS_LOB.DEFAULT_LANG_CTX;
    L_WARNING NUMBER;
    L_AMOUNT NUMBER;
    BEGIN
    DBMS_LOB.CREATETEMPORARY(L_BLOB, TRUE);
    L_SRC_OFFSET := 1;
    L_DEST_OFFSET := 1;
    L_AMOUNT := DBMS_LOB.GETLENGTH(L_CLOB);
    DBMS_LOB.CONVERTTOBLOB(L_BLOB,
    L_CLOB,
    L_AMOUNT,
    L_SRC_OFFSET,
    L_DEST_OFFSET,
    1,
    V_LANG_CONTEXT,
    L_WARNING);
    RETURN L_BLOB;
    END;
    CREATE OR REPLACE FUNCTION BLOB2CLOB(L_BLOB BLOB) RETURN CLOB IS
    L_CLOB CLOB;
    L_SRC_OFFSET NUMBER;
    L_DEST_OFFSET NUMBER;
    L_BLOB_CSID NUMBER := DBMS_LOB.DEFAULT_CSID;
    V_LANG_CONTEXT NUMBER := DBMS_LOB.DEFAULT_LANG_CTX;
    L_WARNING NUMBER;
    L_AMOUNT NUMBER;
    BEGIN
    DBMS_LOB.CREATETEMPORARY(L_CLOB, TRUE);
    L_SRC_OFFSET := 1;
    L_DEST_OFFSET := 1;
    L_AMOUNT := DBMS_LOB.GETLENGTH(L_BLOB);
    DBMS_LOB.CONVERTTOCLOB(L_CLOB,
    L_BLOB,
    L_AMOUNT,
    L_SRC_OFFSET,
    L_DEST_OFFSET,
    1,
    V_LANG_CONTEXT,
    L_WARNING);
    RETURN L_CLOB;
    END;
    And now, for the piece de' resistance, we need a BLOB to CLOB conversion that assumes that the BLOB data is stored initially in WE8ISO8859P1.
    To find correct CSID for WE8ISO8859P1, we can use this query:
    select nls_charset_id('WE8ISO8859P1') from dual;
    Gives "31"
    create or replace FUNCTION BLOB2CLOBASC(L_BLOB BLOB) RETURN CLOB IS
    L_CLOB CLOB;
    L_SRC_OFFSET NUMBER;
    L_DEST_OFFSET NUMBER;
    L_BLOB_CSID NUMBER := 31;      -- treat blob as  WE8ISO8859P1
    V_LANG_CONTEXT NUMBER := 31;   -- treat resulting clob as  WE8ISO8850P1
    L_WARNING NUMBER;
    L_AMOUNT NUMBER;
    BEGIN
    DBMS_LOB.CREATETEMPORARY(L_CLOB, TRUE);
    L_SRC_OFFSET := 1;
    L_DEST_OFFSET := 1;
    L_AMOUNT := DBMS_LOB.GETLENGTH(L_BLOB);
    DBMS_LOB.CONVERTTOCLOB(L_CLOB,
    L_BLOB,
    L_AMOUNT,
    L_SRC_OFFSET,
    L_DEST_OFFSET,
    L_BLOB_CSID,
    V_LANG_CONTEXT,
    L_WARNING);
    RETURN L_CLOB;
    END;
    select dump(dbms_lob.substr(blob2clobasc(data),4000,1)) from my_contents_lob;
    Now, we can compare these:
    select dbms_lob.compare(blob2clob(old.data),new.data) from  my_contents new,my_contents_lob old where new.pk1=old.pk1;
    DBMS_LOB.COMPARE(BLOB2CLOB(OLD.DATA),NEW.DATA)
                                                                 0
                                                                 0
                                                                 0
    Vs
    select dbms_lob.compare(blob2clobasc(old.data),new.data) from  my_contents new,my_contents_lob old where new.pk1=old.pk1;
    DBMS_LOB.COMPARE(BLOB2CLOBASC(OLD.DATA),NEW.DATA)
                                                                   -1
                                                                   -1
                                                                   -1
    update my_contents a set data=(select blob2clobasc(data) from my_contents_lob b where a.pk1= b.pk1)
        where pk1 in (select al.pk1 from my_contents_lob al where dbms_lob.compare(blob2clob(al.data),a.data) =0 );
    SQL> select dump(dbms_lob.substr(data,4000,1)) from my_contents where pk1 in (select pk1 from my_contents_lob);
    Confirms that we're now working properly.
    To run across all the _LOB tables we've created:
    [oracle@RESTORECLONE ~]$ exp file=all_fixed_lobs.dmp buffer=4000000 userid=my_user/mypass tables=MY_CONTENTS_LOB,MY_FORUM_LOB...
    [oracle@RESTORECLONE ~]$ scp all_fixed_lobs.dmp jboulier@PRODCLONE:/tmp
    And then on PRODCLONE we can import:
    imp file=all_fixed_lobs.dmp buffer=4000000 userid=system/XXXXXXX fromuser=my_user touser=my_user
    Instead of running the above update statement for all the affected tables, we can use a simple stored procedure:
    create or replace procedure fix_us7_CLOBS
      (TABLE_NAME varchar2,
         FIX_COL varchar2 )
        authid current_user
        as
         orig_sql varchar2(1000);
         bak_sql  varchar2(1000);
        begin
        dbms_output.put_line('Creating '||TABLE_NAME||'_PRECONV to preserve the original data in the table');
        bak_sql:='create table '||TABLE_NAME||'_preconv as select pk1,'||FIX_COL||' from '||TABLE_NAME||' where pk1 in (select pk1 from '||TABLE_NAME||'_LOB) ';
        execute immediate bak_sql;
        orig_sql:='update '||TABLE_NAME||' tabnew set '||FIX_COL||'= (select blob2clobasc ('||FIX_COL||') from '||TABLE_NAME||'_LOB taborig where tabnew.pk1=taborig.pk1)
       where pk1 in (
       select a.pk1 from '||TABLE_NAME||'_LOB a,'||TABLE_NAME||' b
          where a.pk1=b.pk1
                 and dbms_lob.compare(blob2clob(a.'||FIX_COL||'),b.'||FIX_COL||') = 0 )';
        -- dbms_output.put_line(orig_sql);
        execute immediate orig_sql;
       end;
    Now we can run the procedure and it fixes everything for our previously-broken tables, keeping the changed rows -- just in case -- in a table called table_name_PRECONV.
    set serveroutput on time on timing on;
    exec fix_us7_clobs('MY_CONTENTS','DATA');
    commit;
    After confirming with the client that the changes work -- and haven't noticeably broken anything else -- the same routines can be carefully run against the actual production database.

    We converted using the database using scripts I developed. I'm not quite sure how we converted is relevant, other than saying that we did not use the Oracle conversion utility (not csscan, but the GUI Java tool).
    A summary:
    1) We replaced the lossy characters by parsing a csscan output file
    2) After re-scanning with csscan and coming up clean, our DBA converted the database to AL32UTF8 (changed the parameter file, changing the character set, switched the semantics to char, etc).
    3) Final step was changing existing tables to use char semantics by changing the table schema for VARCHAR2 columns
    Any specific steps I cannot easily answer, I worked with a DBA at our company to do this work. I handled the character replacement / DDL changes and the DBA ran csscan & performed the database config changes.
    Our actual error message:
    ORA-31011: XML parsing failed
    ORA-19202: Error occurred in XML processing
    LPX-00210: expected '<' instead of '�Error at line 1
    31011. 00000 - "XML parsing failed"
    *Cause:    XML parser returned an error while trying to parse the document.
    *Action:   Check if the document to be parsed is valid.
    Error at Line: 24 Column: 15
    This seems to match the the document ID referenced below. I will ask our DBA to pull it up and review it.
    Please advise if more information is needed from my end.

Maybe you are looking for

  • Update to CC 2014, can no longer export to PDF

    Greetings, My recent update of InDesign to the CC 2014 version leaves me unable to export ID files to PDF.  I get the message:  "1 Problem (1 failure) was found with a background task:  export (filename).pdf (1).  This happens regardless of file: exi

  • Get quads for word

    I would like to know how to get the coordinates for a word and found the following code example: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ var nPage = 75; var nWord = 26; // get quads for word var qQuad = this.getPageNthWordQuads(nPag

  • How to extract the .ecp files?

    We need to extract some *.ecp files. Please help tell whether a double click would be enough? We did not get the files yet, just prepare the procedure. Thanks!

  • When does itunes add a "play count"

    ive been using itunes for about a year now, and i am wondering how the play count feature works. i like to use the smart playlists for most played songs. when exactly does itunes add another play count to the column? i know with lastfm you can set it

  • How to improve odi client performance

    We use ODI 10.1.3.6.2. Whenever we use the client on a local machine, the performance is very slow. But when we connect directly to the server(remote into it) and use ODI there, it is very fast. Are there any ways to improve odi client's operating sp