Storing unicode (khmer) woes

Hey all,
I am working on an applicaiton that needs to accept the khmer language in various text inputs and mssql database. I have gotten most of it working, but still have one bug. I can display khmer characters if they are typed in. If I copy and paste khmer text directly in my database, and query for it, it comes out properly. The issue is when I take khmer text from a form field and insert it, then it is transformed into a bunch of ?????.
Here are the steps I've taken so far to enable unicode on my website
- Configured the datasource to accept high ascii values and unicode
- Configured the database table columns to be of type nvarchar
- Added
     <cfscript>
        SetEncoding("form","utf-8");
        SetEncoding("url","utf-8");
       </cfscript>
       <cfcontent type="text/html; charset=utf-8">
to my application.cfm file.
-Added <META http-equiv="Content-Type" content="text/html; charset=utf-8"> in the head of my pages.
-Added <cfprocessingdirective pageEncoding="utf-8"> on my page that attempts to update the database.
It's weird. If i copy and paste khmer directly in the DB and query for it that works fine. If I hard code some khmer on a page, that displays fine to. If I type in khmer into a form, and dump the form value back out, that works. It's only when a form value is saved to the database and pulled back out is it mangled. You can see an example here of what I'm talking about.
http://www.psasmart.com/test.cfm
And here is the code that makes that page.
<cfscript>
        SetEncoding("form","utf-8");
        SetEncoding("url","utf-8");
</cfscript>
<cfcontent type="text/html; charset=utf-8">
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
<cfprocessingdirective pageEncoding="utf-8">
If you have the Khmer language pack installed this: <h2>ម៉ោងផ្សាយ-រលកធ</h2> should appear as cambodian text.
<hr />
<form name="submitForm" method="post" accept-charset="utf-8">
     Now enter some Khmer text to save to the database: <input name="text" type="text" value="ម៉ោងផ្សាយ-រលកធ">
    <br />
    <input type="submit" name="submit" value="submit" >
</form>
<cfoutput>
     <cfif isdefined("form.submit")>
          This is the same text as entered in the form: <h2>#form.text#</h2><br />
          <cfquery name="update" datasource="#application.dsn#" >
               Update serverSettings
               SET khmerReadWriteTest ='#form.text#'
          </cfquery>
     </cfif>
     <Cfquery name="getKhemer" datasource="#application.dsn#">
          select khmerReadTest, khmerReadWriteTest
          from serverSettings
     </Cfquery>
     This is the same text as entered in the form but saved to the db and queried for then displayed: <h2>#getKhemer.khmerReadWriteTest#</h2>
     This is some Sample Khmer Text Inputed Directly in the database then queried for and displayed: <h2>#getKhemer.khmerReadTest#</h2>
</cfoutput>

On 2/3/2011 12:17 AM, kenji776 said:
>
Hey all, I am working on an applicaiton that needs to accept the khmer
language in various text inputs and mssql database. I have gotten most of it
working, but still have one bug. I can display khmer characters if they are
typed in. If I copy and paste khmer text directly in my database, and query
for it, it comes out properly. The issue is when I take khmer text from a
form field and insert it, then it is transformed into a bunch of ?????.
you should already know the answer to this. btw it's not just khmer, it's any
unicode encoded text.
first the usual suspects: what db driver? 100% sure you're using the correct dsn?
then this caught my eye: SET khmerReadWriteTest ='#form.text#'
uh either use cfqueryparam (good practice besides you turned on unicode in the
dsn anyway) or unicode hinting.:
SET khmerReadWriteTest=N'#form.text#'
guess you didn't look close enough at my "greek test" code

Similar Messages

How to install and use the Khmer unicode Khmer MEF1 and Khmer MEF2?

How to install and use the Khmer unicode Khmer MEF1 and Khmer MEF2?
from www.mef.gov.kh in the bottom of the page, you can get the unicode for the ms office

I think that stuff is for windows, a waste of time trying to use it on a Mac.
There is no need to download anything for Khmer on a Mac. Apple provides Khmer fonts and keuyboards with OS X.
OS X Mountain Lion: Type in another language
MS Word for Mac does not support Khmer. Use TextEdit, Pages, Nisus Writer, LibreOffice instead

Unicode Khmer font display problems in FrameMaker 10?

How do I get Khmer OS or any unicode Khmer font to display properly in FrameMaker 10?
I am working with cambodian files in FrameMaker 10.
This is what it looks like in Word (this is how it should look)
This is how the same texk looks in FrameMaker 10...
The font is NiDA Chenla and I was able to fix this is ID CS5.5 by turning on the World-Composer.
Is there something similar in FrameMaker 10.

Hi CTSRisk,
Probably spread and stretch option in Character designer might get if fixed for as i dont know the language,i am assuming the language is ok and the only problem is the spacing between them.
If yes,kindly goto character designer and change the spacing between them.
I apologize for the late response,however sometimes its not possible to answer on the forums quickly for us.
Let me know in case i understood the question wrong.
Harpreet.

Storing Unicode data in SQL Server 2000

hello,
I'm currently developing a website which must store Cyrillic characters in a SQL Server 2000 database. I know the database can store the data correctly because when I use the MS Front end to connect to the database I am able to copy/paste the text into the database columns correctly.
Retrieving the data from the database also works correctly and is displayed with the correct characters.
The problems I am having is that the text seems to get garbled during the SQL insert by the database driver. If I print out the SQL just before it is inserted the characters are still correct but once they get into the database they are wrong. I'm using the JDBC drive to connect to the database.
Any help would be appreciated,
Alan

I change the way I was storing the data from using unicode to getting the char values of each letter (ie \u1041 for Б) and storing them in the database. Then I parse the numbers when I bring them back out and combine them into the original String

Storing Unicode to backend - Urgent

Hi,
I have some word like "Ungu\u00ED", which is equivalent of Ungu� (the hexa equivalent of � in unicode is that \u00ED).
I have to store it to the database.
How can i store it as Ungu� into the database.
Thinking of using java code to do that, still not clear about how to do that,,,,,
TIA
RSrinivasan

Typically you must configure your database to use a single charset. If you intend on storing data from several scripts, you should seriously consider the common Unicode encodings: UTF-16, UTF-8.
Once you've configured your db charset properly, storing "Ungu\u00ED" should be as simple as storing "ABCD"...and the method for storing either string will be the same.
Regards,
John O'Conner

Plz its URGENT : Storing unicode data in MS SQL Server 2000 through JSPs

Hello All,
I'm trying to store unicode data, entered from JSP page into the SQL Server. For that I've tried the following :
1> I put tag -
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
2> Also set in JSP tag -
<%@page contentType="text/html;charset=UTF-8"%>
But, still data is being entered in ISO-8859-1 format, don't know why. I tried with function for convertion - private String toUniCode(String strPar)-it successfully shows me the unicode data in alret msg, but it doesn't enters unicode data in SQL Server. In SQL Server only '??????' get entered. I kept data-type in SQL Server as nvarchar to store data in unicode.
Would it be possible for me, to accept the data as UTF-8 itself & can I store it in SQL Server as it is? How can I do that? I'm accepting data in 'marathi' language.
Plz, anybody Help me, I'm trying for this from around more than 1 week.
Thanks in advance for any replies!

Hello dmorris800,
Thanx for your help. In fact I've tried lot many alternatives for that. Later I realised that it was problame of Driver, not of code. I was using jdbc:odbc driver which doesn't support unicode, or I don't know what was the problame with it.
But I downloaded the driver named :'TaveConnect30C'. It is the connection optimised driver of Atinav.com This is the JDBC3 Type 4 Driver for MS SQL Server 6.5/7.0/2000 & trial version of which can be downloaded from http://www.atinav.com/download.htm
It is really very good type-4 Driver.
Cheers
-Yogesh

RegExp unicode range woes

Hi
I have a simple regex:
var rtl_match: RegExp = /[\u0041-\u007A]/g
which will match any letter upper case or lower case and it works. Fine.
However when I change the range to look for hebrew characters:
var rtl_match: RegExp = /[\u0590–\u05FF]/g;
it simply refuses to work. In fact it no longer treats the regex as unicode syntax as the above will match the characters u059 and F the actual characters which make up the regex, not the unicode range they define.
You can verify this for yourself if you copy this into a new fla:
//var rtl_match: RegExp = /[\u0041-\u007A]/g;
var rtl_match: RegExp = /[\u0590–\u05FF]/g;
var heb: String = 'השידור לא זמין. אנא נסה שוב או בחר בשידור אחר.';
var lat: String = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJLKMNOPQRSTUVWXYZ1234567890';
trace(heb.replace(rtl_match, '!'));
trace(lat.replace(rtl_match, '!'));
you should see in the second traced string what I'm talking about.
So that is my question, why on earth should the hebrew unicode range cause my syntax to break down? Is there a work around?
Thanks

No one?
In the end I had to create a more verbose regex stipulating each character of the hebrew alphabet, hardly ideal. I'm a bit surprised no one seems to have encountered this before, it doesn't seem to be an issue on other platforms. There should at least be a comment in the help/live docs detailing what ranges are permitted if what I've observed is correct, it cost me numerous hours trying to work out and fix what was going wrong.
One for Adobe? Is there an official channel through which I can escalate this?

Aperture storing and sorting woes

Hello!
I have some trouble coming to terms with Aperture's library, vaults and storing capabilities.
My question is this:
How can I make Aperture keep my pictures folder organized, without having to store the photos in a photo-library file ?
I would like to have my photos store on one computer, while being able to display them on another, so I have selected to store the files outside of Aperture's library, so that the files are referenced, and the library file itself is small enough to be copied.
A bit of a "hack" if you will, but it is apparently the only way to do something like that.
Now, because of this, I have no way of sorting my photos.
When importing, I can select a folder into which the photos will be dumped, but afterwards, if I move them in Aperture, they stay in the same position. Yes, I know its pretty obvious but my problem is this:
I import a lot of photographs with photos of say, a MacBook and an iPod, I would first like to dump them in a folder/album called "unsorted", and then latter, move the MacBook photos into a folder/album called MacBook, and the iPod photos into an iPod album.
But if I do so, the photos themselves stay where they were.

This is not how Aperture works. Instead of fighting with it I would recommend learning how it does work and then making decisions on how you could make the best use of it's features.
These may help you:
http://photo.rwboyer.com/2008/09/managing-aperture-2-on-multiple-computers/
http://photo.rwboyer.com/2008/07/apple-aperture-21-organization/
http://photo.rwboyer.com/2008/10/aperture2-vs-lightroom2-file-management/
RB

UTF-8 stored in VARCHAR2 on a non-Unicode DB

Hi there,
we have a company that implements storing Unicode data in Oracle in the following way:
A plain VARCHAR2 on a non-Unicode DB (charset is actually WE8MSWIN1252) receives UTF-8 coded data.
As client and server have the same setting for NLS_LANG, no conversion takes place, and the app will run fine.
(in my eyes, a clean way to set this up would be utilizing NVARCHAR fields for this, but this is no option)
But: how can I do query based on these columns without getting garbage for each non-ASCII character?
I imagine setting up views for that purpose, but I need the syntax on how to re-interpret the UTF-8 data coming from a VARCHAR2 field.
I tried the following:
SELECT CONVERT(column, 'WE8MSWIN1252', 'AL32UTF8') FROM table where ...
This will give me the right data on a client with cp 1252 set up, with the restriction to 8 bit output.
Now I would like to have a Unicode-capable application like SQL*Developer to be fully capable of dealing with the Unicode data, but I guess, for that to work, I would need the DB to deliver a NVCHAR2 output from the above query?
Any help and comments appreciated.
Tom
Message was edited by: snmdla

we have a company that implements storing Unicode data in Oracle in the following way:
No - they don't. They are NOT storing unicode data - they are storing individual one-byte characters and using that VARCHAR2 column as a BLOB. Ask them how, of if, they query the data.
A plain VARCHAR2 on a non-Unicode DB (charset is actually WE8MSWIN1252) receives UTF-8 coded data.
No - it doesn't. It receives a string of one byte characters in the WE8MSWIN1252 character set. It does not know, or care, what those one-byte characters represent. All you are doing is storing BINARY data in that VARCHAR2 column one byte at a time. When you query it you will get one or several bytes back - but since Oracle thinks it is really character data, when it is actually binary, you can only match it by matching those one-byte characters.
I imagine setting up views for that purpose, but I need the syntax on how to re-interpret the UTF-8 data coming from a VARCHAR2 field.
You don't have UTF-8 data - you have a BLOB that you need to convert to UTF-8 data. You can use the DBMS_LOB.CONVERTTOCLOB procedure to do the conversion and specify the character set to use. See the DBMD_LOB API
http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_lob.htm#i1020356
CONVERTTOCLOB Procedure
This procedure takes a source BLOB instance, converts the binary data in the source instance to character data using the character set you specify, writes the character data to a destination CLOB or NCLOB instance, and returns the new offsets.
See this AskTom article for further review
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:3575852900346063772

Khmer unicode font doesn't on premier pro

i'm working on subtitle project in my premier pro cs6, i type khmer font. unicode khmer font it appear but it don't support properly. it separate between its consonant and vowel. what should i do?

i reinstalled it already but nothing changed

Unable to show Unicode Data in Oracle RESTful Service JSON

Hi Everyone.
I have stored unicode data in Oracle database and when i retrieve in sql query it is showing the same. But when i retrieve the data in json using oracle RESTful web service (GET), it bringing with unknown character as shown below.
next: {},$ref: "http://000.00.00.00:8085/ords/mobile/sch/loginm/?user=SURESH&pwd=123&page=1"
items: [
uri: {},$ref: "http://000.00.00.00:8085/ords/mobile/sch/loginm/41"
stud_id: 41,
stud_code: "1001",
stud_name: "à®…à®ªà¯à®¤à¯à®²à¯ à®œà®ªà¯à®ªà®¾à®°à¯"
My Database Setup as below:
SQL> SELECT name,value$ FROM sys.props$;
NAME                                                          VALUE$
DICT.BASE                                                  2
DEFAULT_TEMP_TABLESPACE               TEMP
DEFAULT_PERMANENT_TABLESPACE     USERS
DEFAULT_EDITION                                   ORA$BASE
Flashback Timestamp TimeZone                    GMT
TDE_MASTER_KEY_ID
DBTIMEZONE                                        -07:00
DST_UPGRADE_STATE                         NONE
DST_PRIMARY_TT_VERSION               11
DST_SECONDARY_TT_VERSION          0
DEFAULT_TBS_TYPE                              SMALLFILE
NLS_LANGUAGE                              AMERICAN
NLS_TERRITORY                                   AMERICA
NLS_CURRENCY                                   $
NLS_ISO_CURRENCY                         AMERICA
NLS_NUMERIC_CHARACTERS               .,
NLS_CHARACTERSET                         AL32UTF8
NLS_CALENDAR                                   GREGORIAN
NLS_DATE_FORMAT                              DD-MON-RR
NLS_DATE_LANGUAGE                         AMERICAN
NLS_SORT                                        BINARY
NLS_TIME_FORMAT                         HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT               DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT               HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT          DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY                    $
NLS_COMP                                   BINARY
NLS_LENGTH_SEMANTICS          BYTE
NLS_NCHAR_CONV_EXCP          FALSE
NLS_NCHAR_CHARACTERSET          AL16UTF16
NLS_RDBMS_VERSION               11.2.0.1.0
GLOBAL_DB_NAME                    MOBILE
EXPORT_VIEWS_VERSION
SQL> select DECODE(parameter, 'NLS_CHARACTERSET', 'CHARACTER SET',
2 'NLS_LANGUAGE', 'LANGUAGE',
3 'NLS_TERRITORY', 'TERRITORY') name,
4 value from v$nls_parameters
5 WHERE parameter IN ( 'NLS_CHARACTERSET', 'NLS_LANGUAGE', 'NLS_TERRITORY');
NAME          VALUE
LANGUAGE      AMERICAN
TERRITORY     AMERICA
CHARACTER SET AL32UTF8
          8
WORKLOAD_CAPTURE_MODE
WORKLOAD_REPLAY_MODE
Awaiting you solution.
-- Abdul Jabbar

Kumar,
Ftping the PG.xml to mds folder will not help the page to goto MDS directory
You have to import the file using xmlimporter
I understand you have done the import, but it is not success.
Could you please post what is the script you used to import the PG.xml
and once you run what was the output you have got.
May be you can refer the URL for the scripts
http://apps2fusion.com/at/61-kv/331-oa-framework-scripts
With regards,
Kali.
OSSI.

PDF417 and Unicode

Hello all,
I am trying to stuff some data into a PDF417 barcode field, for one of the forms I will need to put Unicode characters; however I found that the capacity of the barcode field is reduced drastically when storing unicode characters, I did some simple testing and found the capacity is reduced down to 1 over 7 of the capacity for ASCII characters.
Did any body face such situation, and does any body have explanation for it ?
Thank you,
Yasser M. Maree

Hi Yasser,
I did send you a note directly, but as I had mentioned there seems to be some kind of an issue since "Byte mode encoding should be 1.2 bytes per code word and "Text should be 2 characters per code word.
So you should be looking at a 50% reduction when using data that would be 100% "Byte".
Lee.

Get unicode from web hex format

I am using Java/MSSQL
I am having a field to store chinese name which is defined as nvarchar. But due to some problem whenever i submit the jsp form the fields are getting stored into web hex format
i.e. 主 主 ... etc
I know ideally it should be storing unicode but due some corporate reason i can not change that code, so only option left for me is detect these characters i.e. chars which r encoded inside "&#XXXXX;" -> get these XXXXX and convert into decimals and then convert them as string and put it as \uDDDD. I need this conversion for FOP. Seems FOP does not understand web hex format. That's workaround i am thinking of.
ok, my question is how i can detect "&#"....";" this and how to get the chars in between and lastly how to conver it to decimal. Whether anybody can give any idea about it pls ?
i hope it's not complicated,
regards
Manisha

What External library are you using? You mean you are creating an ExtenralObject using a DLL or shared library that you supply? If so, then the problem is within code that you have written or supplied and we cannot see.
Perhaps give an example ,but this doesn't sound like something we could solve without being able to see your code!!

AL32UTF8 - VARCHAR2 ok for Unicode ? No need for NVARCHAR2 ?

If a 10gR2 database character set is AL32UTF8, then do VARCHAR2 columns suffice for storing unicode data of any language ? I'm hearing conflicting advice - some people are telling me NVARCHAR2 is necessary, others are saying VARCHAR2 is fine with AL32UTF8.
I'll be using AL32UTF8 anyway due to needs of XML data, but will also have some character data in "normal" SQL columns, so need to choose between VARCHAR2 and NVARCHAR2.
Thanks,
Andy Mackie.

If you refer to a default installation, i.e., AL32UTF8 as database character set and AL16UTF16 as national character set. Both supports Unicode happily. On the contrary, you take into account of:
1) Performance
2) Length semantics
3) Sizing
All are well documented in free online documents.

Cannot Store Greek Characters in NVARCHAR2 columns

Can someone please help.
I have a table TEST with one column A of type NVARCHAR2(20) and when I try to insert the greek character Ω(Omega) - it gets stored as O instead.
I am inserting using SQL Developer using the 'N' prefix and my environment is as follows:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
NLS_CHARACTERSET = WE8MSWIN1252
NLS_NCHAR_CHARACTERSET = AL16UTF16
NLS_LANG setting on client side is ENGLISH_UNITED KINGDOM.WE8MSWIN1252.
Why can I not insert greek characters with the above setup and what do I need to do/change in order to be able to insert greek characters in a database using the national characterset for storing unicode data ?

Result of running SELECT a, dump(a) FROM TEST after insert an 'O' and an 'Ω' is as follows:
A DUMP(A)
O     Typ=1 Len=2: 0,79
O     Typ=1 Len=2: 0,79
I added a VARCHAR column, column B, there is no difference in what is getting stored in the NVARCHAR column when inserting 'Ω', result is below:
INSERT INTO TEST (A, B)
VALUES(N'Ω', 'Ω');
SELECT a, dump(a, 1016), b, dump(b, 1016) FROM TEST;
A DUMP(A, 1016) B DUMP(B, 1016)
O     Typ=1 Len=2 CharacterSet=AL16UTF16: 0,4f     O     Typ=1 Len=1 CharacterSet=WE8MSWIN1252: 4f
I'm I missing something here, since my understanding is that NVARCHAR2 columns should be able to store Unicode data ?

Storing unicode (khmer) woes

Similar Messages

Maybe you are looking for