UTF-8 characters in database

Hi there,
I'm having problems with UTF-8 characters displaying incorrectly. The problem seems to be that the Content-Type HTTP headers have the character set as "Windows 1252" when it should be UTF-8. There is a demonstration of the problem here:
http://teasel.homeunix.net/~rah/screenshots-unicode-db/apex-unicode-display.html
I though I'd solved this previously, because I changed the nls_lang variable in wdbsvr.app to use single-quotes instead of double-quotes. After I did this, the server started sending Content-Type HTTP headers with UTF-8 in them and the characters in the example above displayed fine. For some reason, it seems to have reverted to sending "Windows 1252" and I don't know why.
I noticed that the logs contain the following lines:
[Wed Nov  1 09:50:45 2006] [alert] mod_plsql: Wrong language for NLS_LANG 'ENGLISH_UNITED KINGDOM.AL32UTF8' for apex DAD
[Wed Nov  1 09:50:45 2006] [alert] mod_plsql: Wrong charset for NLS_LANG 'ENGLISH_UNITED KINGDOM.AL32UTF8' for apex DAD
Any help getting it to send UTF-8 would be greatly appreciated.
Thanks,
Robert

However I believe that it is correct to use double
quotes rather than single quotes where the setting in
the nls_lang contains space.Well, this is odd; using double quotes gives the same error, but with the double quotes instead of the single in the error message:
[Thu Nov  2 09:43:42 2006] [alert] mod_plsql: Wrong language for NLS_LANG "ENGLISH_UNITED KINGDOM.AL32UTF8" for apex DAD
[Thu Nov  2 09:43:42 2006] [alert] mod_plsql: Wrong charset for NLS_LANG "ENGLISH_UNITED KINGDOM.AL32UTF8" for apex DAD
This implies that the parser is pulling the string out verbatim and including the quotes when it shouldn't. Lo and behold, removing any quotes:
nls_lang = ENGLISH_UNITED KINGDOM.AL32UTF8
causes the error to go away, and causes the HTTP headers to declare UTF-8; problem solved.
I'm loathe to step ahead again and say that the docs need updating, but it certainly looks that way.
Robert

Similar Messages

  • How to store UTF-8 characters in an iso-8859-1 encoded oracle database?

    How can we store UTF-8 characters in an iso-8859-1 encoded oracle database? We can NOT change the database encoding but need to store e.g. Polish or Russian characters besides other European languages.
    Is there any stable sollution with good performance?
    We use Oracle 8.1.6 with iso-8859-1 encoding, Bea WebLogic 7.0, JDK 1.3.1 and the following thin driver: "Oracle JDBC Driver version - 9.0.2.0.0".

    There are a couple of unsupported options, but I wouldn't consider using them on a production database running other critical applications. I would also strongly discourage their use unless you understand in detail how Oracle National Language Support (NLS) works, otherwise you could end up with corrupt data or worse.
    In a sense, you've been asked to do the impossible. The existing databas echaracter sets do not support encoding the data you've been asked to store.
    Can you create a new database with an appropriate database character set and deploy your application there? That's probably the easiest solution.
    If that isn't an option, and you really need to store data in this database, you could use one of the binary data types (RAW and BLOB), but that would mean that it would be exceptionally difficult for applications other than yours to extract the data. You would have to ensure that the data was always encoded in the same character set, otherwise you wouldn't be able to properly decode it later. This would also add a lot of complexity to your application, since you couldn't send or recieve string data from the database.
    Unfortunately, I suspect you will have to choose from a list of bad options.
    Justin
    Distributed Database Consulting, Inc.
    http://www.ddbcinc.com/askDDBC

  • GetClob not working with UTF-8 characters

    Hi,
    I have a column with data type CLOB in a table in Oracle DB. I want to store
    and retrieve CJK(Chinese, Japanese, Korean) data. I have tried all the
    resultset functions provided by Oracle but I was not able to get UTF-8
    characters from CLOB column. Please let me know how can I get the UTF-8 data
    from a CLOB column in an Oracle DB.
    Thanks,
    Naval.

    Clob may be supporting unicode but isnt NCLOB specially for unicode!
    as the document "Migration to Unicode Datatypes for Multilingual Databases and
    Applications" says "Unicode datatypes were introduced in Oracle9i. Unicode datatypes are supported through the SQL
    NCHAR datatypes: NCHAR, NVARCHAR2, and NCLOB. (In this paper, “Unicode datatypes” refers
    to SQL NCHAR types.) SQL NCHAR datatypes have existed since Oracle8. However, in Oracle9i
    forward, they have been redefined and their length semantics have been changed to meet customer
    globalization requirements. Data stored in columns of SQL NCHAR datatypes are exclusively stored in
    a Unicode encoding regardless of the database character set. These Unicode columns allow users to
    store Unicode in a database, which may not use Unicode as the database character set. Therefore,
    developers can build Unicode applications without dependence on the database character set. The
    Unicode datatypes also make it easier for customers to incrementally migrate existing applications and
    databases to support Unicode."

  • UTF-8 characters not displaying in IE6

    Dear Sirs,
    I have an issue in displaying UTF-8 characters in Internet explorer 6.
    I set up my all jsp pages encoding as UTF-8.
    Any language characters(like chinese,tamil etc) perfectly dispaying in firebox browser.
    But in internet explorer, the characters are not displaying.it displays like ?! ..
    Could any body help me out?
    Thanks
    mulaimaran

    Thanks Viravan,
    But, I have added this line in my jsp before html tag.
    <%@ page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8" %>
    After html tag,i added this meta tag.
    <META http-equiv="Content-Type" content="text/html;charset=UTF-8">
    So, the UTF-8 encoding is capable to show different language characters in firebox browser.
    But In Internet Explorer 6 other language characters not displaying..
    > jsp sends out the UTF-8 BOM (hex: EF BB BF) before
    the HTML tag.I cant understand this line.I m new one to java.
    So ,please help me out.
    Thanks
    mullaimaran

  • UTF-8 characters (e.g À ) are not supporting in batch file

    I am trying to add user using batch file, I have used UTF-8 characters in username. I am trying below command net user Àdmin "sdf" /ADD /FULLNAME:Àdmin /COMMENT:"description" I observed that user added successfully but if i see local users
    then is shows ├Çdmin in place of Àdmin.
    If I run the above command manually on cmd it is working as per expected but it is not working if i execute this command using bat file.
    It looks like limitation of bat.
    I have gone though few forums and found that this could be the problem of DOS version.
    I am using 6.1.7600 DOS version.
    Could anybody help on this.
    Thanks,

    chcp 65001 helped me to run UTF-8 batches
    example for unicode character ⬥:
    chcp 65001
    C:\Tools\Code128\Code128Gen.exe 0 "aCLa" 2 D:\Out\128_N2T60_⬥CL⬥.bmp
    C:\Tools\Code128\Code128Gen.exe 0 "aCLAa" 2 D:\Out\128_N2T60_⬥CLA⬥.bmp
    Important:
    if your .bat file begins with BOM mark, remove it (switch the encoding to "UTF-8 without BOM") otherwise the interpreter will complain about first line of your batch, for example:
    C:\Tools\Code128>´╗┐chcp 65001
    '´╗┐chcp' is not recognized as an internal or external command,
    operable program or batch file.
    After some time, original UTF-8 batch file stopped working normally at commands which contained non-ascii characters. Commands were executed normally as before (producing correct output), but this misformatted message was shown at output of each:
    C:\Tools\Code128>C:\Tools\Code128\Code128Gen.exeThe system cannot write to the specified device.

  • How to validate UTF-8 characters using Regex?

    Hi All,
    In one of my applications, i need to include UTF-8 character set for validation of a certain string, which I am validating using a Regex.
    However, I do not know how to include UTF-8 characters in a Regex, or if at all, we can specify the UTF-8 charaters ina regex.
    Please Help!! Its Urgent!!!
    Thanks in Advance,
    Rajat Aggarwal

    Ok, Let me re-state my problem again, and exactly what i am looking for:
    I have an XML file with the following header: <?xml version="1.0" encoding="UTF-8"?>
    This XML file contains a tag, whose text is to be validate for a syntax : Operand operator Operand.
    Now, the operand on the right hand side of the operator could be a variable, or a string literal, which may contain some permissible special characters (as said above), and may or may not contain UTF-8 characters as well.
    I am using the xerces SAXParser to parse the XML document, and am retrieving the text of the elemnt tag with the method <code>element.getChildText("<tagName>")<//code>
    According to the org.jdom.Element API Docs,
    the getChildText() method is defined as follows:
    h3. getChildText{noformat}public java.lang.String getChildText(java.lang.String name){noformat}<dl><dd>Returns the textual content of the named child element, or null if there's no such child. This method is a convenience because calling <code>getChild().getText()</code> can throw a NullPointerException. <br<dd><dl><dt>Parameters: </dt><dd><code>name</code> - the name of the child </dd><dt>Returns: </dt><dd>text content for the named child, or null if no such child
    </dd></dl></dd></dl>
    Now, I am not sure if the String that I am reading is in UTF-8 Format. Is there any special way of reading a string in that format, or for that matter, convert a string to UTF-8 encoding?
    h3.

  • Any recommendations for an editor to help find invalid UTF-8 characters?

    I'm generating some XML programmatically. The content has some old invalid characters which are not encoded as UTF-8. For example, in one case there is a long dash character that I can clearly see is ISO-8859-1 in a browser, because if I force the browser to view in ISO-8859-1 I can see the character, but if I view in UTF-8 the characters look like the black-diamond-with-a-question-mark.
    BBEdit will warn me that the the file contains invalid UTF-8. But it doesn't show me where those characters are.
    If I try to load the XML in a browser, like Chrome, it will tell me where the first instance is of an invalid character, but not where all of them are. So I was able to locate the one you see in the screenshot and go in and manually fix that one entry.. But in BBEdit it again just shows a default invalid character symbol.
    What I'd like to be able to do are two things:
    (1) Find all invalid characters so I can then go and fix them all at once without repeated "find the first invalid character" fails when loading the XML in a browser.
    (2) Know what the characters are (rather than generically seeing a bad character symbold) so  I can programmatically strip them out or substitute them when generating the XML. So I need to know what the character values (e.g. the hex values) are for those characters so I can use the replace() method in my server-side JavaScript to get rid of them.
    Anybody know a good editor I can use for these purposes?
    Thanks,
    Doug

    Well, now BBEdit doesn't complain anymore about invalid UTF-8 characters. I've gotten rid of all as far as BBEdit is concerned. But Chrome and other browsers still report a few. I'm trying to quash them now, but because the browsers only report them one-by-one I have to generate the XML multiple times to track them down, which takes one hour per run.
    I think there are only a few left. One at line 180,000. The next at like 450,000. There are only about 600,000 lines in the file, so I think I'll be done soon. Still... it would be nice to have a Mac tool that would locate all the invalid characters the browsers are choking on so I could fix them in one sweep. It would save hours.
    Does anybody know of such a tool for the Mac?
    Thanks,
    Doug

  • Importing a thesaurus with UTF-8 characters in it?

    I have a *.syn file with UTF-8 characters in it (danish characters) and the file is encoded in UTF-8. I can import the thesaurus without any problems, and if I export it again it looks all good, it is encoded in UTF-8 and the danish characters looks good.
    However if I try and search with broaden or narrow I never get any hits when there are danish characters involved, but it works fine for ASCII characters.
    Maybe I should note that searching with danish characters without using the thesaurus works fine.
    Any ideas where I should look for my problem?
    Thank you
    Søren

    NLS_LANG is a system environment variable.
    See the Globalization Support Guide:
    http://download.oracle.com/docs/cd/B10500_01/server.920/a96529/ch3.htm#5014
    How are you loading the thesaurus? Via the command line admin utility or via web services? If it's web services then setting the environment variable may not work.

  • Non-latin utf-8 characters on ssh host displayed as "???" in terminal

    Both the client and the host use the same locale: en_US.UTF-8
    I can input non-latin just fine but the resulting string appears in the terminal as question marks (e.g. "touch 表示してくれ; ls" gives me "??????????????????"), however if I check the same string with my SFTP client (Filezilla), the strings appears as intended so the character encoding seems to be handled correctly.
    I'm using "gnome-terminal" on my client, in case that makes any difference (terminal parameters: on the client: TERM="xterm" and COLORTERM="gnome-terminal"; on the host: TERM="xterm").
    Also, am I correct in assuming that I don't need to have any fonts installed on the host, as that's irrelevant to ssh?
    UPDATE: I noticed a "cat" on a utf-8 text file on the ssh host shows the characters correctly, but looking at the same file through "nano" shows garbled characters. On my local machine (with practically identical Arch software versions) on the other hand, both "cat" and "nano" show files with these characters correctly.
    Last edited by Lazar (2010-09-19 15:13:59)

    Hi,
    Like BluShadow I'm curious to see a test case...
    Depending on the way it's been created, the encoding declared in the XML prolog doesn't necessarily reflects the actual encoding of the content.
    What's your database character set, and version?
    Please also post the error message you get (LPX-00200 probably?).

  • BUG?? UTF-8 non-Latin database chars in IR csv export file not export right

    Hello,
    i have this issue: my database character set is UTF-8 (AL32UTF8) and contains data in a table used in IR that are Greek (non-Latin). While i can see them displayed correctly in IR and also via select / in Object Browser in SQL Workshop when i try to Download as csv the produced csv does not have the Greek characters exported correctly, while the Latin ones are ok.
    This problem is the same if i try IE or Firefox. Also the export in HTML works successfully and i see the Greek characters there correctly!
    Is there any issue with UTF-8 and non-Latin characters in export to csv from IRs ? Can someone confirm this, or has a similar export problem with UTF-8 DB and non-Latin characters ?
    How could i solve this issue ?
    TIA

    Hello Joel,
    thanks for taking the time to answer to my Issue. Well this does not work for my case as the source of data (Database character set) is UTF-8. The Data inside the database that are shown in the IR on the Screen is UTF-8 and this is done correctly. You can see this in my example. The actual Data in the Database are from multiple languages, English, Greek, German, Bulgarian etc that's why i selected the UTF-8 character set when implementing the Database and this requirement was for all character data. Also the suggested character set from Oracle is Unicode when you create a Database and you have to support data from multiple languages.
    What is the requirement, is that what i see in the IR (i mean in Display) i need to export in CSV file correctly and this is what i expect from the Download as CSV feature to achieve. I understand that you had in mind Excel when implementing this feature but a CSV is just an easy way to export the Data - a Comma Separated Values file, not necessarily to open them directly in Excel. Also i want to add here that in Excel you can import the Data in UTF-8 encoding when importing from CSV, which is fine for my customer. Also Excel 2008 and later understands a UTF-8 CSV file if you have placed the UTF-8 BOM character at the start of the file (well, it drops you to the wizzard, but it's almost the same as importing).
    Since the feature you describe and if i understood correctly is creating always an ANSI encoded file in every case, even when the Database character set is UTF-8, it is impossible to export correctly if i have data that are neither in Latin, not in the other 128 country specific characters i choose in Globalization attributes and these data is that i see in Display and need to export to CSV. I believe that this feature in case the Database character set is UTF-8 should create a CSV file that is UTF-8 encoded and export correctly what i see i the screen and i suspect that others would also expect this behaviour. Or at least you can allow/implement(?) this behaviour when Automatic CSV encoding is set to No. But i stongly believe - and especially from the eyes of a user - to have different things in screen and in the depicted CSV file is a bug, not a feature.
    I would like to have comments on this from other people here too.
    Dionyssis

  • Need to find out extended ASCII characters in database

    Hi All,
    I am looking for a query that can fetch list of all tables and columns where there is a extended ASCII character (from 128 to 256). Can any one help me?
    Regards
    Yadala

    yadala wrote:
    Hi All,
    I am looking for a query that can fetch list of all tables and columns where there is a extended ASCII character (from 128 to 256). Can any one help me?
    Regards
    YadalaThis should match your requirement:
    select t.TABLE_NAME, t.COLUMN_NAME from ALL_TAB_COLUMNS t
    where length(asciistr(t.TABLE_NAME))!=length(t.TABLE_NAME) 
    or length(asciistr(t.COLUMN_NAME))!=length(t.COLUMN_NAME);The ASCIISTR function returns an ASCII version of the string in the database character set.
    Non-ASCII characters are converted to the form \xxxx, where xxxx represents a UTF-16 code unit.
    The CHR function is the opposite of the ASCII function. It returns the character based on the NUMBER code.
    ASCII code 174
    SQL> select CHR(174) from dual;
    CHR(174)
    Ž
    SQL> select ASCII(CHR(174)) from dual;
    ASCII(CHR(174))
                174
    SQL> select ASCIISTR(CHR(174)) from dual;
    ASCIISTR(CHR(174))
    \017DASCII code 74
    SQL> select CHR(74) from dual;
    CHR(74)
    J
    SQL> select ASCII(CHR(74)) from dual;
    ASCII(CHR(74))
                74
    SQL> select ASCIISTR(CHR(74)) from dual;
    ASCIISTR(CHR(74))
    J

  • Unable to insert Chinese characters in Database

    My problem is that I am not able to insert chinese
    (to traditional chinese) characters into my tables in the
    database.
    I have changed the character set to UTF8 while creating the
    database and also tried the alter session command in SQL to
    alter the NLS_LANGUAGE and NLS_TERRITORY (to say traditional chinese).
    But this did not solve my problem.
    Also tried all possibilites like getting Chinese characters
    in my notepad by copy - paste from a Chinese web site
    but while giving the insert into command in my database
    it takes some junk values.
    Someone PLEASE HELP!!!URGENT!!!
    Thanks in advance.
    RKP
    null

    You mentioned in your first note that you have set your database character set to UTF-8? If so, then you are able to store and retrieve multilingual data, including Chinese and Japanese characters. Your issue is not the database. Your client OS must be able to support these languages as well. It is likely that your version of OS supports only Latin and Western European characters. By the way changing your NT regional setting only effects sorting, date formats etc. It doesn't help you change the languages that your keyboard will support.
    1.To determine your Win32 operating system's current ANSI CodePage (ACP). This can be found by bringing up the registry editor (Start --> Run..., type "regedit.exe", and click "OK") and looking at the
    registry entry HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\ACP (there are many registry entries with very similar names, so please make sure that you are looking at the right place in the registry).
    2.To find the character set in the table below based on the ACP you got above.
    ANSI CodePage (ACP) Client character set (3rd part of NLS_LANG) (*1)
    1250 EE8MSWIN1250
    1251 CL8MSWIN1251
    1252 WE8MSWIN1252
    1253 EL8MSWIN1253
    1254 TR8MSWIN1254
    1255 IW8MSWIN1255
    1256 AR8MSWIN1256
    1257 BLT8MSWIN1257
    1258 VN8MSWIN1258
    874 TH8TISASCII
    932 JA16SJIS
    936 ZHS16GBK
    949 KO16MSWIN949
    950 ZHT16MSWIN950
    others UTF8 (*2)
    (*1) The character sets listed here are compatible with Win32's non-Unicode graphical user interface (GUI). Since Win32's MSDOS Box (Command Prompt) uses different character sets, NLS_LANG needs to be manually set in the MSDOS Box (or set in a batch script) in order to handle the difference
    between Win32's GUI and MSDOS Box. (Please see "NLS_LANG Settings in MS-DOS Mode and Batch Mode" in the Oracle8i Installation Guide Release 2 (8.1.6) for Windows NT, part# A73010-01.)
    (*2) If you use UTF8 for the 3rd part of NLS_LANG on Win32, client programs that you can use on this operating system would be limited to the ones that explicitly support this configuration. Recent versions of Oracle Forms' Client/Server mode (Fat-Client) on NT4.0 would be an example of such client
    programs. This is because the user interface of Win32 is not UTF8, therefore the client programs have to perform explicit conversions between UTF8 (used in Oracle side) and UTF16 (used in Win32 side).

  • Get rows not containing UTF-8 characters.

    Hi guys,
    The first part is just general information. If you want to go straight to my question go to the bottom.
    Our business has recently loaded a lot of data from a legacy system to our database.
    I have found out that there are some problems with the data added to our database.
    Some characters which is not UTF-8(?) characters has been added to a column (LONG) in the table fnd_documents_long_text.
    I noticed the data error because XML Publisher gave me the following warning:
    java.io.UTFDataFormatException: Invalid UTF8 encoding.
    I then copied the output xml from the server to my own desktop. When I tries to open the XML file in my editor it replaces a character with a Unicode substitution character:
    Some bytes have been replace with the Unicode substitution character while..
    Now. I found the line causing the error in XML Publisher and my editor.
    The line is:
    <LONG_TEXT>COPPER WIRE TO BS 4516 PT.1 [xCF].25 PVA GR.2 (LEWMEX) MAX REEL SIZE 25KG</LONG_TEXT>The [xCF] are what raises the whole problem. When I try to copy the character into the editor in the forum it shows ϱ
    QUESTION
    I want to find all the attachments (rows in fnd_documents_long_text), which will cause XML Publisher to fail executing a report request.
    I have created the following PL/SQL Block to identify the rows that needs correction:
    DECLARE
      CURSOR c1 IS
        SELECT  media_id,
                long_text
          FROM  fnd_documents_long_text
      v_media_id              fnd_documents_long_text.media_id%TYPE;
      v_long_text             fnd_documents_long_text.long_text%TYPE;
      l_test                  varchar2(2000);
    BEGIN
      dbms_output.put_line('START');
      IF (c1%ISOPEN)
      THEN
        CLOSE c1;
      END IF;
      OPEN c1;
      LOOP
        FETCH c1 INTO v_media_id, v_long_text;
        EXIT WHEN c1%NOTFOUND;
          l_test := REGEXP_REPLACE(v_long_text,'[\x80-\xFF]','');
          IF (l_test != ' ')
          THEN
            dbms_output.put_line('Media: ' || v_media_id || ', Text: ' || v_long_text);
          END IF;
      END LOOP;
      CLOSE c1;
      dbms_output.put_line('END');
    END;My problem is that the list I get back contains row where æøå üä etc. exists. How do I get the list so I only contains rows where screwed up characters like ϱ exists?
    Thank you in advance.
    /Kenneth

    Kenneth_ wrote:
    Which means that Š is a UTF8 chacter.?.
    I have tried with several different UTF16 characters instead (水) but with the same result... :/
    Kenneth_ wrote:I do not want to convert characters. I want to get a "list" of rows with non-utf8 characters in it, so the business can decide the next action.imho this is not a problem of "non-UTF" character. Greek letters as you mentioned in your first post are "part" of it, as Š and (水) are.
    The difference betweeen utf-8 and utf-16 is not the letters representable but the bytes they use for the representation.
    Your problem seems more to be a configuration problem of xml-publisher (or the xml-Document). There are some threads on this under the business suite forum.
    regards chris

  • How do I supress the encoding of UTF-8 characters in a f:param element

    Hello,
    I have a keyboard displayed on my page, which won't work properly because of the used german characters.
    I have an icon for every button embedded in a link, which adds the selected character to the searchstring.
    For example adding an a works like this:
    from keyboard.xhtml:
    <s:link><f:param name="#{keyname}" value="#{keyword}a"/><h:graphicImage value="key_a.png"/></s:link>keyname and keyword are parameters submitted by the including form:
    from myform.xhtml:
    <ui:param name="keyword" value="#{end}"/>
    <ui:param name="keyname" value="end"/>This works great as long as the character is a standard one, but on as soon as I have a german umlaut in the string, the umlaut gets encoded/escaped with every single character that i add to the searchstring:
    The string makes it's way correctly to the keyboard-template, I can use a h:outputText to show it on the page and it doesn't get escaped.
    So, how can I prevent the escaping of my characters in the f:params elements?
    I really need to get this to work. so any hint or even solution would be fabulous.
    Thanks in advance, Peter
    PS: maybe my web server is doing something nasty, so it would be nice, if someone can check this code:
    <s:link><f:param name="test" value="�"/>INIT</s:link><br/>
    <s:link><f:param name="test" value="#{test}"/>REPEAT</s:link><br/>
    INFO: <h:outputText value="#{test}" /><br/>here is the same one with h:outputLink
    <h:outputLink><f:param name="test" value="�"/>INIT</h:outputLink><br/>
    <h:outputLink><f:param name="test" value="#{test}"/>REPEAT</h:outputLink><br/>
    INFO: <h:outputText value="#{test}" /><br/>EDIT: I found the solution, it was my beloved jboss application server, after adding a parameter to the server.xml, everything worked as expected:
    use page settings:
    <Connector port="8080" .....
    useBodyEncodingForURI="true" ..../>hardcoded:
    <Connector port="8080" .....
    URIEncoding="UTF-8" ..../> Edited by: pete007 on Mar 12, 2008 1:47 PM

    "Encoding" refers to the charset used to convert the Unicode data into bytes. But since you're writing to a String, you aren't converting the data to bytes and therefore UTF-16 is the appropriate encoding. It doesn't make sense to ask for your data to be encoded in UTF-8 when you aren't producing bytes.
    You could read this tutorial about XML and Unicode and encodings for more information:
    http://skew.org/xml/tutorial/

  • Entry of non-English characters into database

    Hi
    We are facing a problem in inserting non-English characters into the database.For example, we have a company name field which can accept German characters. This field has been defined as of varchar2 type of size 50 in the db. When we enter 49 English characters and then one German character, the database is throwing the error that the inserted value is too large for the column.Is it that the German character is taken as equivalent to two English characters ? Or is there any database level setting that can be done for this ? For the time being we have identified certain critical fields and have doubled the size of their fields in the db. But I guess there has to be another solution to this....
    Please help.
    TIA
    Vinoj

    Indeed, your German character is using two bytes to store itself. Consult the Oracle JDBC Developer's Guide.
    null

Maybe you are looking for