UTF-8 and Chineese Characters

I have a JSP with the following line at the very top:
<%@ page contentType="text/html; charset=utf-8"%>
This is so that it will use UTF-8 encoding to display non-english characters. Doing this, allows me to display Arabic, Hebrew, and English characters that are encoded in UTF-8 format (i.e. \u0643). However, I still can not display Chineese characters. For example, I have the String \u4E2D being read from a file and outputed on to the JSP (no different than my other non-English characters) and it does not display properly (I only see a box in its place). Can anyone tell me why this is?
I do not have the proper Chineese character set downloaded, however I don't understand how the Hebrew and Arabic display properly, when I never explicitly downloaded any sort of character set for them.
Thanks in advance.

;-D
I'm only human!
I certainly agree that UTF-8 should work. Just thought that trying a couple of other encodings might work faster than trying to figure out why UTF-8 wasn't doing the job!
As for where the character set is stored... both IE6 and the JDK will have knowledge of the character set. However, this doesn't automatically mean that they are able to display it. Both require the right font to be able to do this, and neither English Windows IE6 or the JDK carry a font as standard that is able to display the Chinese character set. By installing the Chinese language pack, the font has now been provided, which is why everything's working happily.
As for being able to prompt the user in downloading this, I'm not entirely sure whether this is possible these days. This certainly happened in Windows 9x/NT4, where IE prompted you to download the pack, but this proved to be such an unpopular method that M$ took the prompt out, and now expect you to install it off disc as of Win2K.
Hope that helps!
Martin Hughes

Similar Messages

  • UTF-8 encoding Funny characters in DB

    Dear All,
    I am Facing a Critical issue which has been lagging, dragging
    from 3 days, still couldn't figure out an issue
    I've an XML file which am Getting it from a Server using
    <cffile action="READ" file="#application.settings.paths"
    variable="xml" charset="utf-8">
    and using an XMLPARSE to parse that xml and trying to insert
    that xml data into Database, in XML i've these
    characters master
    ’s, which should be like a single quote
    encoded like this but when saving to Database
    using a coldfusion query, Data is saving as some Funny
    Characters (i.e., master
    ?s),
    xml encoding is in UTF-8 and i don't know how to convert that
    zunk characters to normal characters like ( master's) - single
    quote)
    here are the things i tried.
    in Coldfusion Administrator i added a Connection String
    "useUnicode=true&characterEncoding=UTF-8"
    and checked the box which says "
    enable unicode for datasources configured for non-latin
    characters"
    Used a ConvertCharset Function passing xml object .. [/li]
    ii)
    <cfscript>
    function convertCharset(str,charsetFrom,charsetTo)
    var resultStr="";
    var javaString="";
    var byteArray="";
    javaString = CreateObject("java", "java.lang.String");
    javaString.init(str);
    byteArray = javaString.getBytes(charsetFrom);
    resultStr = CreateObject("java", "java.lang.String");
    resultStr.init(byteArray,charsetTo);
    return resultStr.toString();
    </cfscript>
    <cfcontent type="text/html; charset=UTF-8">
    <cfset setEncoding("URL", "UTF-8")>
    <cfset setEncoding("Form", "UTF-8")>
    tried this method also
    http://www.bennadel.com/blog/1206-Content-Is-Not-Allowed-In-Prolog-ColdFusion-XML-And-The- Byte-Order-Mark-BOM-.htm[/b
    Please let me know if i need to do anything.. other than the
    above methods,
    Thanks

    I am using SQL SERVER 2005 Database,
    Field is "Description" Varchar(2000)
    did you perform your test using the same table, code, etc.?
    Yes
    did you read in & dump out the xml file? Yes, I dumped
    the xml file and if i open in NOTEPAD in UTF-8 (filetype) then i
    see a single quote instead of that different character.
    is it really utf-8?
    so i think it's utf-8,
    if your mojibake. chars are from an ms word document, then
    they're not utf-8 but a superset of
    latin-1.
    they are not from MS WORD, i got an XML file which has all
    the course and presentation information..structured properly except
    those characters.. like
    ("younus has a Bachelor’s degree). i see
    that in UTF-8
    so i want to know to which format do i need to convert to when
    saving in Database (SQL SERVER 2005)
    Thanks.

  • DW CS3 rewrites and destroys characters

    DW CS3 rewrites characters wrongly in the text in the HTML
    when a image is placed into the document by Fireworks. I have
    tested it on a correctly formatted document with all important
    UTF-8 and doctype stuff. On a blank document containing only the
    initial image placeholder from DW, characters are also rewritten
    but with some other characters as opposed to a XHTML page.
    So what I do is inserting a image placeholder in DW, edit the
    placeholder by pressing edit on the properties panel. I save both
    the png and a gif with fireworks. Fireworks makes DW rewrite the
    imagetag. When this rewrites occur all nordic characters are
    rewritten some some crazy stuff.

    Put this somewhere between <head> and </head>
    <head>
    <meta http-equiv="Content-Type"
    content="text/html;charset=UTF-8">
    Dave
    "Jane Smith 2300" <[email protected]> wrote
    in message
    news:gqe3pp$jqd$[email protected]..
    > Hi,
    > I have a form for submitting answers to questions at my
    Japanese website.
    The
    > form is an .aspx file.
    >
    > I want to be able to type some Japanese that would show
    up on the form.
    When
    > viewers see the Japanese (it's a question for them to
    answer), then they
    type
    > in Japanese in the form response section.
    >
    > My problem is that I cannot see the form properly in
    design view of DW
    CS4 but
    > I can easily find and change the correct line in CODE
    view. But, in code
    view I
    > can't type in Japanese. I keep getting the error
    mentioned below.
    >
    > So.....is there some way to be able to type in Japanese
    in code view, so
    that
    > the question I am typing shows up?
    >
    > The error that shows up is:
    >
    > The document's current encoding can not correctly save
    all of the
    characters
    > within the document. You may want to change to UTF-8 or
    an encoding that
    > supports the special characters in this document.
    >
    > How do I change to UTF-8 coding, etc.so I can type in
    CODE view in
    Japanese????
    >
    > Thanks in advance for any help offered.
    >
    > Jane
    >

  • XMLParser and Special Characters

    Hi,
    I'm trying to read in an XML Document from a stream (e.g. a file) using XMLParser. The document contains german text (i.e. lots of special characters like umlauts �, �, � and others).
    If I read this stream into a text string all these special characters are perfectly handled (i.e. � looks like an �, etc.).
    However, if I import the stream into an XMLParser.Document using ImportDocument the umlauts seem to be scrambled. If the imported document is without any changes exported again to a stream (using ExportDocument) the umlauts are not displayed correctly anymore.
    Example Stream:
    <?xml version="1.0" encoding="iso-8859-1" ?>
    <UserID>M�ller</UserID>
    If this stream is imported into an XMLParser.Document and then exported again it contains
    <UserID>M��ller</UserID>
    I'm using correct XML encoding iso-8859-1 which is for western european languages and I guess it should not be a Forte locale issue since simple string handling of the stream works fine.
    Thanks for any hints,
    Daniel

    Let's start at the basics. Right now you are quite limited by your database character set as US7ASCII. You need to migrate to something that will support Latin and Greek characters at least. Maybe EL8ISO8859P7, or UTF-8. Please look at documentation Scanner Utility, available for Oracle 8.1.6 and above to make sure migration is safe before doing any import/export. The title of paper is: Database Character Set Migration, at: http://technet.oracle.com/products/oracle8i/listing.htm#nls
    UTF-8 will give you more versatility in the languages that your customer supports now or in the future. There is some performance overhead using Unicode but how much depends? I would base a large part of the Unicode decision on how likely it would be that other languages would need to be supported in the future and special character support.
    The special characters that your customer would like to support may already exist in Unicode. IF they don't or you choose another character set then your customer will need to look at the National Language Support Guide, Appendix 'B' "Customizing Locale Data"
    Are you running Greek windows? Otherwise how will you enter Greek characters? If you are using Greek windows you probably need to set your client NLS_LANG to EL8MSWIN1253.
    On your Forms questions you might want to take a look at the following :
    1. Chapter 4 of "Oracle Forms Developer and Reports Developer Release 6i: Guidelines for Building
    Applications" discusses How to design MultiLingual Applications.
    http://otn.oracle.com/docs/products/forms/doc_index.htm

  • Unicode, UTF-8 and java servlet woes

    Hi,
    I'm writing a content management system for a website about russian music.
    One problem I'm having is trying to get a java servlet to talk Unicode to the Content mangament client.
    The client makes a request for a band, the server then sends the XML to the client.
    The XML reading works fine and the client displays the unicode fine from an XML file read locally (so the XMLReader class works fine).
    The servlet unmarshals the request perfectly (its just a filename).
    I then find the correct class, and pass it through the XML writer. that returns the XML as string, that I simply put into the output stream.
    out.write(XMLWrite(selectedBand));I have set correct header property
    response.setContentType("text/xml; charset=UTF-8");And to read it I
             //Make our URL
             URL url = new URL(pageURL);
             HttpURLConnection conn = (HttpURLConnection)url.openConnection();
             conn.setRequestMethod("POST");
             conn.setDoOutput(true); // want to send
             conn.setRequestProperty( "Content-type", "application/x-www-form-urlencoded" );
             conn.setRequestProperty( "Content-length", Integer.toString(request.length()));
             conn.setRequestProperty("Content-Language", "en-US"); 
             //Add our paramaters
             OutputStream ost = conn.getOutputStream();
             PrintWriter pw = new PrintWriter(ost);
             pw.print("myRequest=" + URLEncoder.encode(request, "UTF-8")); // here we "send" our body!
             pw.flush();
             pw.close();
             //Get the input stream
             InputStream ois = conn.getInputStream();
                InputStreamReader read = new InputStreamReader(ois);
             //Read
             int i;
             String s="";
             Log.Debug("XMLServerConnection", "Responce follows:");
             while((i = read.read()) != -1 ){
              System.out.print((char)i);
              s += (char)i;
             return s;now when I print
    read.getEncoding()It claims:
    ISO8859_1Somethings wrong there, so if I force it to accept UTF-8:
    InputStreamReader read = new InputStreamReader(ois,"UTF-8");It now claims its
    UTF8However all of the data has lost its unicode, any unicode character is replaced with a question mark character! This happens even when I don't force the input stream to be UTF-8
    More so if I view the page in my browser, it does the same thing.
    I've had a look around and I can't see a solution to this. Have I set something up wrong?
    I've set, "-encoding utf8" as a compiler flag, but I don't think this would affect it.

    I don't know what your problem is but I do have a couple of comments -
    1) In conn.setRequestProperty( "Content-length", Integer.toString(request.length())); the length of your content is not request.length(). It is the length of th URL encoded data.
    2) Why do you need to send URL encoded data? Why not just send the bytes.
    3) If you send bytes then you can write straight to the OutputStream and you won't need to convert to characters to write to PrintWriter.
    4) Since you are reading from the connection you need to setDoInput() to true.
    5) You need to get the character encoding from the response so that you can specify the encoding in           InputStreamReader read = new InputStreamReader(ois, characterEncoding);
    6) Reading a single char at a time from an InputStream is very inefficient.

  • How we represent largest code point in UTF-8 and UTF-16 whats the differenc

    how we represent largest code point in UTF-8 and UTF-16 whats the differenc
    points will be awarded

    There are standards from for CHARACTER encoding.
    See below for a brief description:
    UTF-16 (16-bit Unicode Transformation Format) is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. The encoding form maps code points (characters) into a sequence of 16-bit words, called code units. For characters in the Basic Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other planes, the encoding will result in a pair of 16-bit words, together called a surrogate pair. All possible code points from U0000 through U10FFFF, except for the surrogate code points UD800–UDFFF, are uniquely mapped by UTF-16 regardless of the code point's current or future character assignment or use.
    UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any universal character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is consistent with ASCII (requiring little or no change for software that handles ASCII but preserves other values). For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages, and other places where characters are stored or streamed.
    Check this site for details.
    http://unicode.org/.

  • Chromium 8 and Chinese characters

    Chromium 8 can't upload any file with chinese characters in its filename. Anybody has the same problem? Hope that there are many Chinese users.
    Thanks!

    Maybe it's the website that doesn't accept Chinese filenames? What encoding are your files named in (probably GB, Big5, or UTF-8)? It works fine for me with UTF-8 and Gmail.

  • Conversation String to UTF-8 and visa versa

    I have a problem: From an IBM Host via CICS i get
    the german letters, for example : "���" = x'81' x'94' x'84'. In a C program (with JNI) i use the method NewStringUTF to convert this characters to an JavaString. The result seems to be correct. I can see the exact german characters in Swing- und AWT components..
    Then, versa to convert this String back to the original HostCodes with the method GetStringUTFChars in the same C programm, i get 2 unknown, confused Bytes for the 1 correct Byte i expected. This effect takes place only at the special german characters ������� !!!!
    Who can help?

    Having been through these kinds of problems a few times, I MAY be able to point you in the right direction.
    1. You need to be VERY sure what you are seeing at each stage of the conversion. DON'T TRUST ANY DISPLAYS EXCEPT HEX DISPLAYS.
    2. If you are operating on a Windows machine, you might investigate OEMTOChar and CharToOEM. I mention this because I suspect that your original encoding is not UTF-8, and so NewStringUTF is doing something strange.

  • OWB showing unrecognisable/chineese characters for Non-Oracle schema

    Hi
    I am trying to implement a warehouse using OWB (11g) with a Peoplesoft data source.
    I have installed the database gateway for SQL Server and created a database link which allows me to query the PeopleSoft data through SQL plus, no problem.
    However, when I try to import the Metadata (create the tables) in OWB, the table list within the Object selection appear in unrecognisable/chineese characters? This is also the case if I try to select a schema within the Location setup for this connection.
    Any ideas anyone please?

    Thanks for the reply.
    The taget db has the following:
    NLS_LANGUAGE = AMERICAN
    NLS_TERRITORY = AMERICA
    NLS_CHARACTERSET = AL32UTF8
    I have tried setting the following parameters within the init<>.ora file:
    HS_LANGUAGE=AMERICAN_AMERICA.WE8MSWIN1252
    HS_NLS_NCHAR=UCS2
    but no joy

  • Oracle Report ouput is coming in english and junk characters

    Hi ,
    I am facing an issue with oracle report output in R12.
    The Report out is coming in english and junk characters.
    this report is custom report.
    Migrated from 11i to r12 instance.it is working fine in 11i with output as PDF.
    Sample out put is attached.

    Pl see if MOS Doc 1321874.1 is relevant

  • How to put in subscript and superscript characters

    Hi:
    Does anyone know how to insert subscript and superscript
    characters into Dreamweaver. I am doing a site for an
    engineering/manufacturing firm and I am trying to create a
    subscript 2 like one would see in H2O. The "2" should be small and
    lower than the "H" and the "O". I've tried creating one in WORD and
    cutting and pasting but that will not work and when I use the
    <sub>2</sub> code that does not seem ideal as it tends
    to slightly push down the line below once the character is in and I
    don't feel like changing the line spacing on the entire site. The
    other thing I can't seem to do is make a tiny "Registered" ®
    character. Simply cutting and pasting from WORD, the small
    superscripted symbol gets made into a large ® once I paste it
    in. Can anyone assist?

    Add this to your CSS -
    sub { position: relative; bottom: 0; left:.2ex; font-size:
    80%;}
    Then use the <sub> tag.
    Murray --- ICQ 71997575
    Adobe Community Expert
    (If you *MUST* email me, don't LAUGH when you do so!)
    ==================
    http://www.projectseven.com/go
    - DW FAQs, Tutorials & Resources
    http://www.dwfaq.com - DW FAQs,
    Tutorials & Resources
    ==================
    "RockingChairman" <[email protected]> wrote
    in message
    news:g54kt1$jan$[email protected]..
    > Hi:
    > Does anyone know how to insert subscript and superscript
    characters into
    > Dreamweaver. I am doing a site for an
    engineering/manufacturing firm and I
    > am
    > trying to create a subscript 2 like one would see in
    H2O. The "2" should
    > be
    > small and lower than the "H" and the "O". I've tried
    creating one in WORD
    > and
    > cutting and pasting but that will not work and when I
    use the <sub>2</sub>
    > code
    > that does not seem ideal as it tends to slightly push
    down the line below
    > once
    > the character is in and I don't feel like changing the
    line spacing on the
    > entire site. The other thing I can't seem to do is make
    a tiny
    > "Registered" ®
    > character. Simply cutting and pasting from WORD, the
    small superscripted
    > symbol
    > gets made into a large ® once I paste it in. Can
    anyone assist?
    >

  • Chinese and Korean characters not displaying in navigation pane

    I have an issue with Chinese and Korean characters not displaying on the tabs in the navigation pane:
    I have 2 RoboHelp projects (using RoboHelp 8 with the updates installed) to generate WebHelp, one in Simplified Chinese, the other Korean. The HTML files, .HHC and .HHK were sent out for translations. I have set the appropriate language in the project settings, everything almost works, except for the text on the Contents, Index, and Search tabs. (I'm not using a skin.) I have read in various threads that the lng file should be edited using the RoboHelp interface, and this seems to be the crux of the problem. The characters do not display correctly through the RoboHelp UI.
    The computer on which I generate the WebHelp is Windows Server 2003, and I do not have the language packs installed. (And am having problems getting hold of the language packs to install to see if this fixes the problem.)
    Aside from installing the language packs, is there anything else I can try to help resolve the problem? (The characters in the content are displaying as expected.)
    Any assistance is greatly appreciated

    Perhaps something in the Translation Info section of this page might help? (The specifics are for Japanese, but I believe the issue would apply to all double-byte languages).
    http://helpware.net/FAR/far_faq.htm#JapComp

  • Using SQL*Loader to Load Russian and Chinese Characters

    We are testing our new 11.2.0.1 database using Oracle Linux 6. We created the database using the AL32UTF8 NLS Character set. We have tried using sqlldr to insert a few records that contain Russian and Chinese characters as a test. We can not seem to get them into the database in the correct format. For example, we can see the correct characters in the file we are trying to load on the Linux server, but once we load them into a table in the database, some of the characters are not displayed correctly (using SQL*Developer to select them out).
    We can set the values within a column on the table by inserting them into the table and then select them out and they are correect, so it appears the problem is not in the database, but in the way sqlldr inserts them. We have tried several settings on the Linux server to set the NLS_LANG environment to AMERICAN_AMERICA.AL32UTF8, AMERICAN_AMERICA.UTF8, etc. without success.
    Can someone provide us with any guidance on this? Would really appreciate any advice as to what we are not getting here.
    Thanks!!

    The characterset of the database does not change the language used in your input data file. The character set of the datafile can be set up by using the NLS_LANG parameter or by specifying a SQL*Loader CHARACTERSET parameter. I suggest to move this question to the appropriate forum: Export/Import/SQL Loader & External Tables for closer topic alignment.

  • Identify UTF-8 and UTF-16 formats

    hi,
    Clients submit there unicode messages (arabic,telugu etc langs) in hex format then our application accepts that message and process it.
    But there are many tools in the market which will convert the unicode to UTF-8 and UTF-16 formats.
    so i need to idetify whether the message is in
    UTF-8 or
    UTF-16 or
    hex(no problem)
    something like
    isUTF8(String message)
    isUTF16(String message)
    so that i can convert them back to hex and dump it into database.
    regards
    Heral raj

    You can identify whether it is UTF16 or UTF8 by looking at it's BOM (byte order mark). These are first 2 bytes of the stream.
    Check this link http://www.websina.com/bugzero/kb/unicode-bom.html
    I do not think implementation should be a problem
    Thanks
    Gaurav

  • How do I remove spaces and special characters from the file name during rendering?

    I understand that I can set LR_renamingTokensOn to true, but I would like to replace all spaces in the file name with an underscore and remove characters not in the range A-Z and 0-9. What's the easiest way to achieve this?

    local photo = catalog:getTargetPhoto()
    local sesn = LrExportSession {
        photosToExport = { photo },
        exportSettings = {
            -- ... (determine from export preset) - whatev you want, just be sure you set export directory: LR_export_destinationPathPrefix
            LR_tokens = "{{custom_token}}",
            LR_tokenCustomString = LrPathUtils.removeExtension( photo:getFormattedMetadata( 'fileName' ) ):gsub( "[ %c]", "" ) -- remove spaces and control characters
    sesn:doExportOnNewTask()

Maybe you are looking for