UTF-8 and Chineese Characters

I have a JSP with the following line at the very top:
<%@ page contentType="text/html; charset=utf-8"%>
This is so that it will use UTF-8 encoding to display non-english characters. Doing this, allows me to display Arabic, Hebrew, and English characters that are encoded in UTF-8 format (i.e. \u0643). However, I still can not display Chineese characters. For example, I have the String \u4E2D being read from a file and outputed on to the JSP (no different than my other non-English characters) and it does not display properly (I only see a box in its place). Can anyone tell me why this is?
I do not have the proper Chineese character set downloaded, however I don't understand how the Hebrew and Arabic display properly, when I never explicitly downloaded any sort of character set for them.
Thanks in advance.

;-D
I'm only human!
I certainly agree that UTF-8 should work. Just thought that trying a couple of other encodings might work faster than trying to figure out why UTF-8 wasn't doing the job!
As for where the character set is stored... both IE6 and the JDK will have knowledge of the character set. However, this doesn't automatically mean that they are able to display it. Both require the right font to be able to do this, and neither English Windows IE6 or the JDK carry a font as standard that is able to display the Chinese character set. By installing the Chinese language pack, the font has now been provided, which is why everything's working happily.
As for being able to prompt the user in downloading this, I'm not entirely sure whether this is possible these days. This certainly happened in Windows 9x/NT4, where IE prompted you to download the pack, but this proved to be such an unpopular method that M$ took the prompt out, and now expect you to install it off disc as of Win2K.
Hope that helps!
Martin Hughes

Similar Messages

UTF-8 encoding Funny characters in DB

Dear All,
I am Facing a Critical issue which has been lagging, dragging
from 3 days, still couldn't figure out an issue
I've an XML file which am Getting it from a Server using
<cffile action="READ" file="#application.settings.paths"
variable="xml" charset="utf-8">
and using an XMLPARSE to parse that xml and trying to insert
that xml data into Database, in XML i've these
characters master
â€™s, which should be like a single quote
encoded like this but when saving to Database
using a coldfusion query, Data is saving as some Funny
Characters (i.e., master
?s),
xml encoding is in UTF-8 and i don't know how to convert that
zunk characters to normal characters like ( master's) - single
quote)
here are the things i tried.
in Coldfusion Administrator i added a Connection String
"useUnicode=true&characterEncoding=UTF-8"
and checked the box which says "
enable unicode for datasources configured for non-latin
characters"
Used a ConvertCharset Function passing xml object .. [/li]
ii)
<cfscript>
function convertCharset(str,charsetFrom,charsetTo)
var resultStr="";
var javaString="";
var byteArray="";
javaString = CreateObject("java", "java.lang.String");
javaString.init(str);
byteArray = javaString.getBytes(charsetFrom);
resultStr = CreateObject("java", "java.lang.String");
resultStr.init(byteArray,charsetTo);
return resultStr.toString();
</cfscript>
<cfcontent type="text/html; charset=UTF-8">
<cfset setEncoding("URL", "UTF-8")>
<cfset setEncoding("Form", "UTF-8")>
tried this method also
http://www.bennadel.com/blog/1206-Content-Is-Not-Allowed-In-Prolog-ColdFusion-XML-And-The- Byte-Order-Mark-BOM-.htm[/b
Please let me know if i need to do anything.. other than the
above methods,
Thanks

I am using SQL SERVER 2005 Database,
Field is "Description" Varchar(2000)
did you perform your test using the same table, code, etc.?
Yes
did you read in & dump out the xml file? Yes, I dumped
the xml file and if i open in NOTEPAD in UTF-8 (filetype) then i
see a single quote instead of that different character.
is it really utf-8?
so i think it's utf-8,
if your mojibake. chars are from an ms word document, then
they're not utf-8 but a superset of
latin-1.
they are not from MS WORD, i got an XML file which has all
the course and presentation information..structured properly except
those characters.. like
("younus has a Bachelorâ€™s degree). i see
that in UTF-8
so i want to know to which format do i need to convert to when
saving in Database (SQL SERVER 2005)
Thanks.

DW CS3 rewrites and destroys characters

DW CS3 rewrites characters wrongly in the text in the HTML
when a image is placed into the document by Fireworks. I have
tested it on a correctly formatted document with all important
UTF-8 and doctype stuff. On a blank document containing only the
initial image placeholder from DW, characters are also rewritten
but with some other characters as opposed to a XHTML page.
So what I do is inserting a image placeholder in DW, edit the
placeholder by pressing edit on the properties panel. I save both
the png and a gif with fireworks. Fireworks makes DW rewrite the
imagetag. When this rewrites occur all nordic characters are
rewritten some some crazy stuff.

Put this somewhere between <head> and </head>
<head>
<meta http-equiv="Content-Type"
content="text/html;charset=UTF-8">
Dave
"Jane Smith 2300" <[email protected]> wrote
in message
news:gqe3pp$jqd$[email protected]..
> Hi,
> I have a form for submitting answers to questions at my
Japanese website.
The
> form is an .aspx file.
>
> I want to be able to type some Japanese that would show
up on the form.
When
> viewers see the Japanese (it's a question for them to
answer), then they
type
> in Japanese in the form response section.
>
> My problem is that I cannot see the form properly in
design view of DW
CS4 but
> I can easily find and change the correct line in CODE
view. But, in code
view I
> can't type in Japanese. I keep getting the error
mentioned below.
>
> So.....is there some way to be able to type in Japanese
in code view, so
that
> the question I am typing shows up?
>
> The error that shows up is:
>
> The document's current encoding can not correctly save
all of the
characters
> within the document. You may want to change to UTF-8 or
an encoding that
> supports the special characters in this document.
>
> How do I change to UTF-8 coding, etc.so I can type in
CODE view in
Japanese????
>
> Thanks in advance for any help offered.
>
> Jane
>

XMLParser and Special Characters

Hi,
I'm trying to read in an XML Document from a stream (e.g. a file) using XMLParser. The document contains german text (i.e. lots of special characters like umlauts �, �, � and others).
If I read this stream into a text string all these special characters are perfectly handled (i.e. � looks like an �, etc.).
However, if I import the stream into an XMLParser.Document using ImportDocument the umlauts seem to be scrambled. If the imported document is without any changes exported again to a stream (using ExportDocument) the umlauts are not displayed correctly anymore.
Example Stream:
<?xml version="1.0" encoding="iso-8859-1" ?>
<UserID>M�ller</UserID>
If this stream is imported into an XMLParser.Document and then exported again it contains
<UserID>M��ller</UserID>
I'm using correct XML encoding iso-8859-1 which is for western european languages and I guess it should not be a Forte locale issue since simple string handling of the stream works fine.
Thanks for any hints,
Daniel

Let's start at the basics. Right now you are quite limited by your database character set as US7ASCII. You need to migrate to something that will support Latin and Greek characters at least. Maybe EL8ISO8859P7, or UTF-8. Please look at documentation Scanner Utility, available for Oracle 8.1.6 and above to make sure migration is safe before doing any import/export. The title of paper is: Database Character Set Migration, at: http://technet.oracle.com/products/oracle8i/listing.htm#nls
UTF-8 will give you more versatility in the languages that your customer supports now or in the future. There is some performance overhead using Unicode but how much depends? I would base a large part of the Unicode decision on how likely it would be that other languages would need to be supported in the future and special character support.
The special characters that your customer would like to support may already exist in Unicode. IF they don't or you choose another character set then your customer will need to look at the National Language Support Guide, Appendix 'B' "Customizing Locale Data"
Are you running Greek windows? Otherwise how will you enter Greek characters? If you are using Greek windows you probably need to set your client NLS_LANG to EL8MSWIN1253.
On your Forms questions you might want to take a look at the following :
1. Chapter 4 of "Oracle Forms Developer and Reports Developer Release 6i: Guidelines for Building
Applications" discusses How to design MultiLingual Applications.
http://otn.oracle.com/docs/products/forms/doc_index.htm

Unicode, UTF-8 and java servlet woes

Hi,
I'm writing a content management system for a website about russian music.
One problem I'm having is trying to get a java servlet to talk Unicode to the Content mangament client.
The client makes a request for a band, the server then sends the XML to the client.
The XML reading works fine and the client displays the unicode fine from an XML file read locally (so the XMLReader class works fine).
The servlet unmarshals the request perfectly (its just a filename).
I then find the correct class, and pass it through the XML writer. that returns the XML as string, that I simply put into the output stream.
out.write(XMLWrite(selectedBand));I have set correct header property
response.setContentType("text/xml; charset=UTF-8");And to read it I
         //Make our URL
         URL url = new URL(pageURL);
         HttpURLConnection conn = (HttpURLConnection)url.openConnection();
         conn.setRequestMethod("POST");
         conn.setDoOutput(true); // want to send
         conn.setRequestProperty( "Content-type", "application/x-www-form-urlencoded" );
         conn.setRequestProperty( "Content-length", Integer.toString(request.length()));
         conn.setRequestProperty("Content-Language", "en-US");
         //Add our paramaters
         OutputStream ost = conn.getOutputStream();
         PrintWriter pw = new PrintWriter(ost);
         pw.print("myRequest=" + URLEncoder.encode(request, "UTF-8")); // here we "send" our body!
         pw.flush();
         pw.close();
         //Get the input stream
         InputStream ois = conn.getInputStream();
            InputStreamReader read = new InputStreamReader(ois);
         //Read
         int i;
         String s="";
         Log.Debug("XMLServerConnection", "Responce follows:");
         while((i = read.read()) != -1 ){
          System.out.print((char)i);
          s += (char)i;
         return s;now when I print
read.getEncoding()It claims:
ISO8859_1Somethings wrong there, so if I force it to accept UTF-8:
InputStreamReader read = new InputStreamReader(ois,"UTF-8");It now claims its
UTF8However all of the data has lost its unicode, any unicode character is replaced with a question mark character! This happens even when I don't force the input stream to be UTF-8
More so if I view the page in my browser, it does the same thing.
I've had a look around and I can't see a solution to this. Have I set something up wrong?
I've set, "-encoding utf8" as a compiler flag, but I don't think this would affect it.

I don't know what your problem is but I do have a couple of comments -
1) In conn.setRequestProperty( "Content-length", Integer.toString(request.length())); the length of your content is not request.length(). It is the length of th URL encoded data.
2) Why do you need to send URL encoded data? Why not just send the bytes.
3) If you send bytes then you can write straight to the OutputStream and you won't need to convert to characters to write to PrintWriter.
4) Since you are reading from the connection you need to setDoInput() to true.
5) You need to get the character encoding from the response so that you can specify the encoding in           InputStreamReader read = new InputStreamReader(ois, characterEncoding);
6) Reading a single char at a time from an InputStream is very inefficient.

How we represent largest code point in UTF-8 and UTF-16 whats the differenc

how we represent largest code point in UTF-8 and UTF-16 whats the differenc
points will be awarded

There are standards from for CHARACTER encoding.
See below for a brief description:
UTF-16 (16-bit Unicode Transformation Format) is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. The encoding form maps code points (characters) into a sequence of 16-bit words, called code units. For characters in the Basic Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other planes, the encoding will result in a pair of 16-bit words, together called a surrogate pair. All possible code points from U0000 through U10FFFF, except for the surrogate code points UD800UDFFF, are uniquely mapped by UTF-16 regardless of the code point's current or future character assignment or use.
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any universal character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is consistent with ASCII (requiring little or no change for software that handles ASCII but preserves other values). For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages, and other places where characters are stored or streamed.
Check this site for details.
http://unicode.org/.

Chromium 8 and Chinese characters

Chromium 8 can't upload any file with chinese characters in its filename. Anybody has the same problem? Hope that there are many Chinese users.
Thanks!

Maybe it's the website that doesn't accept Chinese filenames? What encoding are your files named in (probably GB, Big5, or UTF-8)? It works fine for me with UTF-8 and Gmail.

Conversation String to UTF-8 and visa versa

I have a problem: From an IBM Host via CICS i get
the german letters, for example : "��" = x'81' x'94' x'84'. In a C program (with JNI) i use the method NewStringUTF to convert this characters to an JavaString. The result seems to be correct. I can see the exact german characters in Swing- und AWT components..
Then, versa to convert this String back to the original HostCodes with the method GetStringUTFChars in the same C programm, i get 2 unknown, confused Bytes for the 1 correct Byte i expected. This effect takes place only at the special german characters �� !!!!
Who can help?

Having been through these kinds of problems a few times, I MAY be able to point you in the right direction.
1. You need to be VERY sure what you are seeing at each stage of the conversion. DON'T TRUST ANY DISPLAYS EXCEPT HEX DISPLAYS.
2. If you are operating on a Windows machine, you might investigate OEMTOChar and CharToOEM. I mention this because I suspect that your original encoding is not UTF-8, and so NewStringUTF is doing something strange.

OWB showing unrecognisable/chineese characters for Non-Oracle schema

Hi
I am trying to implement a warehouse using OWB (11g) with a Peoplesoft data source.
I have installed the database gateway for SQL Server and created a database link which allows me to query the PeopleSoft data through SQL plus, no problem.
However, when I try to import the Metadata (create the tables) in OWB, the table list within the Object selection appear in unrecognisable/chineese characters? This is also the case if I try to select a schema within the Location setup for this connection.
Any ideas anyone please?

Thanks for the reply.
The taget db has the following:
NLS_LANGUAGE = AMERICAN
NLS_TERRITORY = AMERICA
NLS_CHARACTERSET = AL32UTF8
I have tried setting the following parameters within the init<>.ora file:
HS_LANGUAGE=AMERICAN_AMERICA.WE8MSWIN1252
HS_NLS_NCHAR=UCS2
but no joy

Oracle Report ouput is coming in english and junk characters

Hi ,
I am facing an issue with oracle report output in R12.
The Report out is coming in english and junk characters.
this report is custom report.
Migrated from 11i to r12 instance.it is working fine in 11i with output as PDF.
Sample out put is attached.

Pl see if MOS Doc 1321874.1 is relevant

How to put in subscript and superscript characters

Hi:
Does anyone know how to insert subscript and superscript
characters into Dreamweaver. I am doing a site for an
engineering/manufacturing firm and I am trying to create a
subscript 2 like one would see in H2O. The "2" should be small and
lower than the "H" and the "O". I've tried creating one in WORD and
cutting and pasting but that will not work and when I use the
2 code that does not seem ideal as it tends
to slightly push down the line below once the character is in and I
don't feel like changing the line spacing on the entire site. The
other thing I can't seem to do is make a tiny "Registered" ®
character. Simply cutting and pasting from WORD, the small
superscripted symbol gets made into a large ® once I paste it
in. Can anyone assist?

Add this to your CSS -
sub { position: relative; bottom: 0; left:.2ex; font-size:
80%;}
Then use the tag.
Murray --- ICQ 71997575
Adobe Community Expert
(If you *MUST* email me, don't LAUGH when you do so!)
==================
http://www.projectseven.com/go
- DW FAQs, Tutorials & Resources
http://www.dwfaq.com - DW FAQs,
Tutorials & Resources
==================
"RockingChairman" <[email protected]> wrote
in message
news:g54kt1$jan$[email protected]..
> Hi:
> Does anyone know how to insert subscript and superscript
characters into
> Dreamweaver. I am doing a site for an
engineering/manufacturing firm and I
> am
> trying to create a subscript 2 like one would see in
H2O. The "2" should
> be
> small and lower than the "H" and the "O". I've tried
creating one in WORD
> and
> cutting and pasting but that will not work and when I
use the 2
> code
> that does not seem ideal as it tends to slightly push
down the line below
> once
> the character is in and I don't feel like changing the
line spacing on the
> entire site. The other thing I can't seem to do is make
a tiny
> "Registered" ®
> character. Simply cutting and pasting from WORD, the
small superscripted
> symbol
> gets made into a large ® once I paste it in. Can
anyone assist?
>

Chinese and Korean characters not displaying in navigation pane

I have an issue with Chinese and Korean characters not displaying on the tabs in the navigation pane:
I have 2 RoboHelp projects (using RoboHelp 8 with the updates installed) to generate WebHelp, one in Simplified Chinese, the other Korean. The HTML files, .HHC and .HHK were sent out for translations. I have set the appropriate language in the project settings, everything almost works, except for the text on the Contents, Index, and Search tabs. (I'm not using a skin.) I have read in various threads that the lng file should be edited using the RoboHelp interface, and this seems to be the crux of the problem. The characters do not display correctly through the RoboHelp UI.
The computer on which I generate the WebHelp is Windows Server 2003, and I do not have the language packs installed. (And am having problems getting hold of the language packs to install to see if this fixes the problem.)
Aside from installing the language packs, is there anything else I can try to help resolve the problem? (The characters in the content are displaying as expected.)
Any assistance is greatly appreciated

Perhaps something in the Translation Info section of this page might help? (The specifics are for Japanese, but I believe the issue would apply to all double-byte languages).
http://helpware.net/FAR/far_faq.htm#JapComp

Using SQL*Loader to Load Russian and Chinese Characters

We are testing our new 11.2.0.1 database using Oracle Linux 6. We created the database using the AL32UTF8 NLS Character set. We have tried using sqlldr to insert a few records that contain Russian and Chinese characters as a test. We can not seem to get them into the database in the correct format. For example, we can see the correct characters in the file we are trying to load on the Linux server, but once we load them into a table in the database, some of the characters are not displayed correctly (using SQL*Developer to select them out).
We can set the values within a column on the table by inserting them into the table and then select them out and they are correect, so it appears the problem is not in the database, but in the way sqlldr inserts them. We have tried several settings on the Linux server to set the NLS_LANG environment to AMERICAN_AMERICA.AL32UTF8, AMERICAN_AMERICA.UTF8, etc. without success.
Can someone provide us with any guidance on this? Would really appreciate any advice as to what we are not getting here.
Thanks!!

The characterset of the database does not change the language used in your input data file. The character set of the datafile can be set up by using the NLS_LANG parameter or by specifying a SQL*Loader CHARACTERSET parameter. I suggest to move this question to the appropriate forum: Export/Import/SQL Loader & External Tables for closer topic alignment.

Identify UTF-8 and UTF-16 formats

hi,
Clients submit there unicode messages (arabic,telugu etc langs) in hex format then our application accepts that message and process it.
But there are many tools in the market which will convert the unicode to UTF-8 and UTF-16 formats.
so i need to idetify whether the message is in
UTF-8 or
UTF-16 or
hex(no problem)
something like
isUTF8(String message)
isUTF16(String message)
so that i can convert them back to hex and dump it into database.
regards
Heral raj

You can identify whether it is UTF16 or UTF8 by looking at it's BOM (byte order mark). These are first 2 bytes of the stream.
Check this link http://www.websina.com/bugzero/kb/unicode-bom.html
I do not think implementation should be a problem
Thanks
Gaurav

How do I remove spaces and special characters from the file name during rendering?

I understand that I can set LR_renamingTokensOn to true, but I would like to replace all spaces in the file name with an underscore and remove characters not in the range A-Z and 0-9. What's the easiest way to achieve this?

local photo = catalog:getTargetPhoto()
local sesn = LrExportSession {
    photosToExport = { photo },
    exportSettings = {
        -- ... (determine from export preset) - whatev you want, just be sure you set export directory: LR_export_destinationPathPrefix
        LR_tokens = "{{custom_token}}",
        LR_tokenCustomString = LrPathUtils.removeExtension( photo:getFormattedMetadata( 'fileName' ) ):gsub( "[ %c]", "" ) -- remove spaces and control characters
sesn:doExportOnNewTask()

UTF-8 and Chineese Characters

Similar Messages

Maybe you are looking for