Unicode and Chinese

This is driving me nuts.
Created a page where there is a mix of English and Chinese,
used unicode
and worked fine.
But then created another page exactly the same and now the
unicode is
not being converted..
First link is fine
http://www.destinationcdg.com/Bonaparte/BonaparteC.cfm
But this link is all screwed up.
http://www.destinationcdg.com/Bonaparte/areaC.cfm
Any ideas please.
DW8.02 CFMX7 and Apache2

Hi guys
I've just realised that the solution here isn't totally complete. If you are still interested in helping I would be really grateful.
Quick re-cap:
The problem was Java was mis-calculating the length of unicode strings.
e.g. ...
String nihao = "??"; //Should read 2 chinese characters, may display here as ??
System.out.println(nihao.length()) ;... would print 6 or something, but not 2 as it should.
I was recommended to use a parameter when invoking javac which fixed this problem.
javac -encoding UTF-8 ClassName.javaNow, this solved the problem so far.
However!!!! What I assumed would work and didn't test until now is this:
System.out.println(nihao);But it doesn't work.
So in a nutshell. If I have a Class which contains unicode strings out of the usual latin set and encode that text file as unicode, use a -encoding UTF-8 parameter when compiling, Java still prints out ?? to the command line.
Is it my shell or is it Java?
I'm using the Bash shell.
If I had a file called ??.txt (should be 2 chinese chars) and used ls then ?? (should be 2 chiense chars) would not display properly. I would get ??.txt.
To get the file name to display properly I would need to use ls -v. This -v flag makes things work.
I've tried it with the java command but java doesn't like it.
This is really doing my head in. If anyone has any ideas please help.
Thanks.
Chinese characters don't seem to be uploading to this website so it makes this post difficult. Where you are supposed to see chinese I have said so. It might display as ??. There are places where I wanted to write ??.
I can't award Duke Dollars to this post as I did it already. I have posted a fresh version of this problem in the Java Programming forum. I have allocated Duke Dollars to that post so best to reply there if you have any ideas :)
Message was edited by:
stanton_ian

Similar Messages

Layout in Arabic, Russian and Chinese. Exporting text from a PDF

I am laying out long documents in Arabic, Russian and Chinese. The text has been provided as a PDF when I copy and paste this into Indesign it comes up as boxes question marks and other characters having nothing to do with the text I am trying to layout. I have set the typeface to the Myriad Arabic and the Arabic dictionary still nothing resembling Arabic or any language for that matter. Same with Chinese and Russian. Any suggestions on how to get the text in from the PDF where it is the actual language. Appreciate any help with this. Thank you.

Thanks for the callout, Ellis
Soooo, KK: you are in for a world of hurt. The intials "WP" at the beginning of these fonts means that the text came out of WordPerfect. Doing multilingual layouts in WP was annoying, but possible. It was developed in the pre-Unicode world where every single method of complex-script layout was a dirty hack. If you like knowing All of the Nerdy Dirty Details, I can tell you how it worked, but suffice it to say that trying to harvest non-Latin-script text from WP and repurpose it for use in InDesign is just pure pain. The WordPerfect-specific codepages were never really supported anywhere outside of WP.
That being said, I have a script laying around somewhere for conversion of WP-Cyrillic into Unicode. (Actually, I think it does Windows CP 1251, but that works just as well.) But that is only one out of forty-five languages? And the Chinese has been rasterized? And the PDFs were originally generated by Distiller 3? If you have any choice, it's time to walk away. If you don't have any choice, I really hope you are billing hourly. My experience in this area (painfully extensive) is that it will cost three to five times as much to extract the text as it would to have a translation professional rekey the text, and then to have a second translation professional review the rekeyed text looking for typos.
Russian OCR is pretty damn good these days, but Chinese OCR is hit-or-miss. I have never seen good Arabic OCR - doesn't mean it's not out there, but I couldn't help you find it. But chances that all 45 languages have reliable OCR available, and that the result of said OCRing will not need to be reviewed by someone who knows the language, are basically nil.

Unicode and non-unicode

WHAT IS DIFFRENTS BETWEEN UNICODE AND NON UNICODE ?
BRIEFLY EXPLAIN ABOUT UNICODE?
THANKS IN ADVANCES

A 16-bit character encoding scheme allowing characters from Western European, Eastern European, Cyrillic, Greek, Arabic, Hebrew, Chinese, Japanese, Korean, Thai, Urdu, Hindi and all other major world languages, living and dead, to be encoded in a single character set. The Unicode specification also includes standard compression schemes and a wide range of typesetting information required for worldwide locale support. Symbian OS fully implements Unicode. A 16-bit code to represent the characters used in most of the world's scripts. UTF-8 is an alternative encoding in which one or more 8-bit bytes represents each Unicode character. A 16-bit character set defined by ISO 10646. A code similar to ASCII, used for representing commonly used symbols in a digital form. Unlike ASCII, however, Unicode uses a 16-bit dataspace, and so can support a wide variety of non-Roman alphabets including Cyrillic, Han Chinese, Japanese, Arabic, Korean, Bengali, and so on. Supporting common non-Roman alphabets is of interest to community networks, which may want to promote multicultural aspects of their systems.
ABAP Development under Unicode
Prior to Unicode the length of a character was exactly one byte, allowing implicit typecasts or memory-layout oriented programming. With Unicode this situation has changed: One character is no longer one byte, so that additional specifications have to be added to define the unit of measure for implicit or explicit references to (the length of) characters.
Character-like data in ABAP are always represented with the UTF-16 - standard (also used in Java or other development tools like Microsoft's Visual Basic); but this format is not related to the encoding of the underlying database.
A Unicode-enabled ABAP program (UP) is a program in which all Unicode checks are effective. Such a program returns the same results in a non-Unicode system (NUS) as in a Unicode system (US). In order to perform the relevant syntax checks, you must activate the Unicode flag in the screens of the program and class attributes.
In a US, you can only execute programs for which the Unicode flag is set. In future, the Unicode flag must be set for all SAP programs to enable them to run on a US. If the Unicode flag is set for a program, the syntax is checked and the program executed according to the rules described in this document, regardless of whether the system is a US or a NUS. From now on, the Unicode flag must be set for all new programs and classes that are created.
If the Unicode flag is not set, a program can only be executed in an NUS. The syntactical and semantic changes described below do not apply to such programs. However, you can use all language extensions that have been introduced in the process of the conversion to Unicode.
As a result of the modifications and restrictions associated with the Unicode flag, programs are executed in both Unicode and non-Unicode systems with the same semantics to a large degree. In rare cases, however, differences may occur. Programs that are designed to run on both systems therefore need to be tested on both platforms.
Refer to the below related threads
Re: Why the select doesn't run?
what is unicode
unicode
unicode
Regards,
Santosh

Substring between unicode and non-unicode

Hi, experts,
We are upgrading our system from 4.7 to 6.0 in chinese, but after that, there is a problem:
we have a program which hadle some txt files which create by an non-sap and non-unicode system,
for example, there is a line contains '你好 1234', we will extract the information as below:
data: field1 type string, field2 type string.
field1 = line+0(10).
field2 = line+10(4).
the result in 4.7 is:
field1 = '你好'
field2 = '1234'
but, in ECC, field1 is '你好12' and field2 is '34'.
can any one help me? thank you!

hey, max, thanks for your help!
I am sorry I did not show my question clearly!
There are 6 space between '你好' and '1234 in the line '你好 1234', and in the top 10 charaters of this string line, may be all numbers, may be all chinese charaters, may be numbers and chinese charaters together.
in 4.7, line+0(10) aways correct, but in ECC, because it is a unicode system, so, it is correct only when the string is all single-byte charaters but not any double-byte charaters.

Unicode and mdmp

lads,
Can somebody send the docs related to unicode and mdmp.
james

Dear James,
MDMP stands for Multi Display, Multi Processing.
A Multi-Display, Multi-Processing code pages system (MDMP system) uses more than a single code page on the application server. Depending on the login language, it is possible to switch dynamically between the installed code pages. MDMP therefore provides a vehicle for using languages from different code pages in a single system.
MDMP was the solution SAP developed for support of combinations of multiple code pages in one system prior to the availability of unicode database support. MDMP effectively enabled an SAP ERP system to be installed with a non-unicode database, and to support connections to the ERP application by users with language combinations not supported by a single code page. Example: support of one ERP system with English, French, Japanese, and Chinese.
MDMP implementations implemented strict rules and restrictions in order to ensure data consistency and avoid data corruption.
MDMP was only supported for SAP R/3, SAP R/3 Enterprise, and mySAP ERP applications. No other SAP applications or SAP NetWeaver components support MDMP.
SAP's Unicode Strategy
SAP commits itself fully to providing you with a Unicode-based mySAP.com e-business platform.
To help their customers transition smoothly to future-proof technologies, future versions of SAP applications will be exclusively in 64-bit and Unicode starting in 2007.
Global business processes require IT systems to support multilingual data without any restrictions - Unicode represents the first technology capable of meeting these requirements.
Web interfaces open the door to a global customer base, and IT systems must consequently be able to support multiple local languages simultaneously.
With J2EE integration, the mySAP.com e-business platform fully supports web standards, and with Unicode, it now can take full advantage of XML and Java.
Only Unicode makes it possible to seamlessly integrate in homogeneous SAP and non-SAP system landscapes, enabling truly collaborative business.
Regards,
Rakesh

Unicode and non-unicode string data types Issue with 2008 SSIS Package

Hi All,
I am converting a 2005 SSIS Package to 2008. I have a task which has SQL Server as the source and Oracle as the destination. I copy the data from a SQL server view with a field nvarchar(10) to a field of a oracle table varchar(10). The package executes fine
on my local when i use the data transformation task to convert to DT_STR. But when I deploy the dtsx file on the server and try to run from an SQL Job Agent it gives me the unicode and non-unicode string data types error for the field. I have checked the registry
settings and its the same in my local and the server. Tried both the data conversion task and Derived Column task but with no luck. Pls suggest me what changes are required in my package to run it from the SQL Agent Job.
Thanks.

What is Unicode and non Unicode data formats
Unicode :
A Unicode character takes more bytes to store the data in the database. As we all know, many global industries wants to increase their business worldwide and grow at the same time, they would want to widen their business by providing
services to the customers worldwide by supporting different languages like Chinese, Japanese, Korean and Arabic. Many websites these days are supporting international languages to do their business and to attract more and more customers and that makes life
easier for both the parties.
To store the customer data into the database the database must support a mechanism to store the international characters, storing these characters is not easy, and many database vendors have to revised their strategies and come
up with new mechanisms to support or to store these international characters in the database. Some of the big vendors like Oracle, Microsoft, IBM and other database vendors started providing the international character support so that the data can be stored
and retrieved accordingly to avoid any hiccups while doing business with the international customers.
The difference in storing character data between Unicode and non-Unicode depends on whether non-Unicode data is stored by using double-byte character sets. All non-East Asian languages and the Thai language store non-Unicode characters
in single bytes. Therefore, storing these languages as Unicode uses two times the space that is used specifying a non-Unicode code page. On the other hand, the non-Unicode code pages of many other Asian languages specify character storage in double-byte character
sets (DBCS). Therefore, for these languages, there is almost no difference in storage between non-Unicode and Unicode.
Encoding Formats:
Some of the common encoding formats for Unicode are UCS-2, UTF-8, UTF-16, UTF-32 have been made available by database vendors to their customers. For SQL Server 7.0 and higher versions Microsoft uses the encoding format UCS-2 to store the UTF-8 data. Under
this mechanism, all Unicode characters are stored by using 2 bytes.
Unicode data can be encoded in many different ways. UCS-2 and UTF-8 are two common ways to store bit patterns that represent Unicode characters. Microsoft Windows NT, SQL Server, Java, COM, and the SQL Server ODBC driver and OLEDB
provider all internally represent Unicode data as UCS-2.
The options for using SQL Server 7.0 or SQL Server 2000 as a backend server for an application that sends and receives Unicode data that is encoded as UTF-8 include:
For example, if your business is using a website supporting ASP pages, then this is what happens:
If your application uses Active Server Pages (ASP) and you are using Internet Information Server (IIS) 5.0 and Microsoft Windows 2000, you can add "<% Session.Codepage=65001 %>" to your server-side ASP script.
This instructs IIS to convert all dynamically generated strings (example: Response.Write) from UCS-2 to UTF-8 automatically before sending them to the client.
If you do not want to enable sessions, you can alternatively use the server-side directive "<%@ CodePage=65001 %>".
Any UTF-8 data sent from the client to the server via GET or POST is also converted to UCS-2 automatically. The Session.Codepage property is the recommended method to handle UTF-8 data within a web application. This Codepage
setting is not available on IIS 4.0 and Windows NT 4.0.
Sorting and other operations :
The effect of Unicode data on performance is complicated by a variety of factors that include the following:
1. The difference between Unicode sorting rules and non-Unicode sorting rules
2. The difference between sorting double-byte and single-byte characters
3. Code page conversion between client and server
Performing operations like >, <, ORDER BY are resource intensive and will be difficult to get correct results if the codepage conversion between client and server is not available.
Sorting lots of Unicode data can be slower than non-Unicode data, because the data is stored in double bytes. On the other hand, sorting Asian characters in Unicode is faster than sorting Asian DBCS data in a specific code page,
because DBCS data is actually a mixture of single-byte and double-byte widths, while Unicode characters are fixed-width.
Non-Unicode :
Non Unicode is exactly opposite to Unicode. Using non Unicode it is easy to store languages like ‘English’ but not other Asian languages that need more bits to store correctly otherwise truncation will occur.
Now, let’s see some of the advantages of not storing the data in Unicode format:
1. It takes less space to store the data in the database hence we will save lot of hard disk space.
2. Moving of database files from one server to other takes less time.
3. Backup and restore of the database makes huge impact and it is good for DBA’s that it takes less time
Non-Unicode vs. Unicode Data Types: Comparison Chart
The primary difference between unicode and non-Unicode data types is the ability of Unicode to easily handle the storage of foreign language characters which also requires more storage space.
Non-Unicode
Unicode
(char, varchar, text)
(nchar, nvarchar, ntext)
Stores data in fixed or variable length
Same as non-Unicode
char: data is padded with blanks to fill the field size. For example, if a char(10) field contains 5 characters the system will pad it with 5 blanks
nchar: same as char
varchar: stores actual value and does not pad with blanks
nvarchar: same as varchar
requires 1 byte of storage
requires 2 bytes of storage
char and varchar: can store up to 8000 characters
nchar and nvarchar: can store up to 4000 characters
Best suited for US English: "One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple
encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters."<sup>1</sup>
Best suited for systems that need to support at least one foreign language: "The Unicode specification defines a single encoding scheme for most characters widely used in businesses around the world.
All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred
from one database or computer to another without concern that the receiving system will translate the bit patterns into characters incorrectly.
https://irfansworld.wordpress.com/2011/01/25/what-is-unicode-and-non-unicode-data-formats/
Thanks Shiven:) If Answer is Helpful, Please Vote

Is iPod multi-lingual (English and Chinese simultaneously)?

On my iCal and Addressbook on my Mac I have some entries in Chinese, while the rest is either in Finnish or English. Will the iPod (especially Nano) be able to show the entries correctly; that is, even if the language is set to English (or Finnish if possible) will the entries in Chinese be shown correctly?
Second question: Do the attached notes made in iCal (calendar and tasks) show in the iPod, or does the iPod only show the topic and time?

I found an article from Apple Support (http://docs.info.apple.com/article.html?artnum=61894) stating:
"By default all note files are considered to be encoded in Latin1, unless the iPod language preference is set to Japanese, Korean, or traditional or simplified Chinese, in which case all note files are assumed to be in that encoding.
Note: You can tag a note file with a different encoding by including the following line: <?xml encoding="MacJapanese"?>
iPod handles these encodings: Latin1, MacRoman, MacJapanese, Korean, simplified Chinese, traditional Chinese, UTF8 Unicode and UTF16 Unicode.
The only way to display multiple encodings in the same note is to use Unicode."
So that means, if I interpret it correctly, that by adding a tag saying the text (including both Chinese and English) is in Unicode, iPod will display it correctly?
(And aren't text created with Macs' inbuilt TeXt editor automatically in Unicode format?)
How about the Calender and Address Book; if they contain both Chinese and English, will they be displayed correctly on iPod (are those applications on iPod automatically Unicode-savvy?)?

What is the programming (ABAP) difference between Unicode and non Unicode?

What is the programming(ABAP) difference between Unicode and non Unicode?
Edited by: NIV on Apr 12, 2010 1:29 PM

Hi
The difference between programming in Unicode or not Unicode is that you should consider some adjustments to make on the Program "Z" to comply with the judgments Unicode Standard.
In the past, developments in SAP using multiple systems to encode the characters of different alphabets. For example: ASCII, EBCDI, or double-byte code pages.
These coding systems mostly use 1 byte per character, which can encode up to 256 characters. However, other alphabets such as Japanese or Chinese use a larger number of characters in their alphabets. That's why the system using double-byte code page, which uses 2 bytes per character.
In order to unify the different alphabets, it was decided to implement a single coding system that uses 2 bytes per character regardless of what language is concerned. That system is called Unicode.
Unicode is also the official way to implement ISO/IEC 10646 and is supported in many operating systems and all modern browsers.
The way of verifying whether a program was adjusted or not, is through the execution of the UCCHECK transaction. Additionally, you can check by controlling syntax (making sure that this asset verification check Unicode).
The main decisions to adjust / replace are (examples):
ASSIGN H-SY-INDEX TEXT TO ASSIGN <F1> by
H-SY-INDEX TEXT (*) TO <F1>.
DATA INIT (50) VALUE '/'. by
DATA INIT (1) VALUE '/'.
DESCRIBE FIELD text LENGTH lengh2 by
DESCRIBE FIELD text LENGTH lengh2 in character mode.
T_ZSMY_DEMREG_V1 = record_tab by
record_tab TO MOVE-Corresponding t_zsmy_demreg_v1.
escape_trick = hot3. by
escape_trick-x1 = hot3.
itab_txt TYPE wt by
ITAB_TXT TYPE TABLE OF TEXTPOOL
DATA: string3 (3) TYPE X VALUE B2023 '3 'by
DATA: string3 (6) B2023 TYPE c VALUE '3 '.
OPEN DATASET file_name IN TEXT MODE by
OPEN DATASET file_name FOR INPUT IN TEXT MODE ENCODING NON-UNICODE.
or
OPEN DATASET file_name FOR INPUT IN TEXT MODE ENCODING DEFAULT.
CODE FROM PAGE TRANSLATE a_codepage record by
record TRANSLATE USING a_codepage.
CALL FUNCTION 'DOWNLOAD' by
CALL METHOD cl_gui_frontend_services => gui_download
CALL FUNCTION 'WS_DOWNLOAD' by
CALL METHOD cl_gui_frontend_services => gui_download
CALL FUNCTION 'UPLOAD' by
CALL METHOD cl_gui_frontend_services => gui_upload
CALL FUNCTION 'WS_UPLOAD' by
CALL METHOD cl_gui_frontend_services => gui_upload
PERFORM USING HEAD APPEND_XFEBRE +2. by
PERFORM USING HEAD APPEND_XFEBRE +2 (98).
Best Regars
Fabio Rodriguez

How to make English and Chinese pdf files searchable?

I have scanned many A4 English and Chinese print-outs into pdf format. How to make these files searchable ? People mention about the OCR issue. Will the installation of Adobe make life easy and manageable?

I know Acrobat does Japanese OCR very well; I can only assume that it works for Chinese as well.
P.S. I just checked: Chinese (Simplified) and Chinese (Traditional) are both language selection options for OCR.

Using SQL*Loader to Load Russian and Chinese Characters

We are testing our new 11.2.0.1 database using Oracle Linux 6. We created the database using the AL32UTF8 NLS Character set. We have tried using sqlldr to insert a few records that contain Russian and Chinese characters as a test. We can not seem to get them into the database in the correct format. For example, we can see the correct characters in the file we are trying to load on the Linux server, but once we load them into a table in the database, some of the characters are not displayed correctly (using SQL*Developer to select them out).
We can set the values within a column on the table by inserting them into the table and then select them out and they are correect, so it appears the problem is not in the database, but in the way sqlldr inserts them. We have tried several settings on the Linux server to set the NLS_LANG environment to AMERICAN_AMERICA.AL32UTF8, AMERICAN_AMERICA.UTF8, etc. without success.
Can someone provide us with any guidance on this? Would really appreciate any advice as to what we are not getting here.
Thanks!!

The characterset of the database does not change the language used in your input data file. The character set of the datafile can be set up by using the NLS_LANG parameter or by specifying a SQL*Loader CHARACTERSET parameter. I suggest to move this question to the appropriate forum: Export/Import/SQL Loader & External Tables for closer topic alignment.

Russian and Chinese Flash movies - general advice needed please

Hi all -
This is a plea for some general 'jumping off' advice. I am an experienced Flash developer but now have a request to convert an existing xml-fed movie into both Russian and Chinese. I speak neither of these languages so we have had the content of the movie translated by a professional translation service.
The movie contains both png/jpgs with embedded text - created in Fireworks and also (for the bulk of the content) external xml files. I still need to be able to develop in an English environment - so purchasing a full version of Flash/Fireworks in Russian/Chinese would be folly. How should I go about this? If it is a matter of fonts - where should I get them from? And are there any considerations to be met with regards the xml files? Basically, I would really appreciate some general advice on this subject as it is completely new ground for me.
Much obliged,
Hugh

Thank you. Having the airports all in proximity was the key and, of course, I eventually found the same advice in an apple help file. I set the new AExtreme up as WDS main with an ethernet disk for backups and music. An old AExtreme as WDS remote serves the Cube by ethernet and a usb printer. An AExpress as WDS remote serves one stereo. The other AExpress is WDS relay serving another stereo and helping the network reach the office where the last old AExtreme is WDS remote with another USB printer. The 3 mac laptops are happy. I have yet to try any PCs.

Display artist and title of japanese and chinese songs in itunes?

hi,
does anyone know how i can display the artist and title of japanese and chinese songs in itunes? thanks.
it used to work on my old computer on old itunes but when i moved it to my new computer, also dell, it comes up with some gibberish squares.
no issues on ipod, though.
thanks
dell c521 Windows XP

Have a look at the iTunes and iPod section of Tom Gewecke's web page for some good information.

What is alignment in unicode and what are restrictions

what is alignment in unicode and what are restrictions, dont give about unicode i want only about alignment in unicode
Points will be awarded if usefull

Hi,
Check the following Threads,
what is internal and external encoding in unicode
Unicode
UNICODE
Regards,
Padmam.

Unicode and Java

Hi
As we all know Java treat character literals as Unicode characters. I have been studying Unicode and the way they treat characters and I have a doubt which is not specific to Java code but specific to Unicode.
Unicode states that each character is assigned a number which is unique, this number is called code point.
The relationship between characters and code points is 1:1.
Eg: the String *"hello"* (which is sequence of character literals) can be represent by the following Code Points
*\u0065 \u0048 \u006c \u006c \u006f*
I also read that a certain character code point must be recognized by a specific encoding or else a question mark (?) is output in place of the character. Not all code points can be recognized by an encoding.
So, the letter *ל* would not be recognized by all encodings and should be replaced by a question mark (?) right?
The interesting is that this code point represents a different character and not a *"?"* in other encodings. It should print the same character
This is the HTML code I used for tests (save it in your hard disk and open using your navigator, then select the following encodings: UTF16, ISO-8859-1)
<html>
<body>
מעיל אחד בגשם, לילה של אפריל
נפתח כמו ענן, ורעם אז מאיר
מעיל אחד בגשם, לילה חם וקר
נפרס כמו חופה, ומתחת אני שר
מעיל אחד בגשם, רטוב, למי אכפת
אני לא על הארץ, איתך למעלה שט
רוח בפנים, טיפות הגשם האחרונות
נוגעות בלחיים, בפנייך משחקות
אמצע הרחוב, כולם כבר ישנים
הייתה עדה הרוח ועוד שני כוכבים
אמצע הרחוב, כולם כבר ישנים,
הייתה עדה הרוח ועוד שני כוכבים
ראיתי זוג עיניים, מסרבות להיפתח
צוללת אל עצמך עמוק בים שלך,
מדי פעם את עולה, לוקחת קצת אויר
לא רוצה להיסחף, מכירה את המחיר
אמצע הרחוב, כולם כבר ישנים...
</body>
</html>I would appreciate if you correct me in case I am wrong!
Edited by: charllescuba1008 on Mar 31, 2009 2:08 PM

charllescuba1008 wrote:
Unicode states that each character is assigned a number which is unique, this number is called code point. Right.
The relationship between characters and code points is 1:1.Uhm .... let's assume "yes" for the moment. (Note that the relationship between the Java type char and code point is not 1:1 and there are other exceptions ...)
Eg: the String *"hello"* (which is sequence of character literals) can be represent by the following Code Points
*\u0065 \u0048 \u006c \u006c \u006f*Those are the Java String unicode escapes. If you want to talk about Unicode Codepoints, then the correct notation for "Hello" would be
U+0048 U+0065 U+006C U+006C U+006F
Note that you swapped the H and e.
I also read that a certain character code point must be recognized by a specific encoding or else a question mark (?) is output in place of the character.This one is Java specific. If Java tries to translate some unicode character to bytes using some encoding that doesn't support that character then it will output the byte(s) for "?" instead.
Not all code points can be recognized by an encoding.Some encodings (such as UTF-8) can encode all codepoints, others (such as ISO-8859-*, EBCDIC or UCS-2) can not.
So, the letter *ל* would not be recognized by all encodings and should be replaced by a question mark (?) right?Only in a very specific case in Java. This is not a genral Unicode-level rule.
(disclaimer: the HTML code presented was using decimal XML entities to represent the unicode characters).
What you are seing is possibly the replacement character that your text rendering system uses to represent characters that it knows, but can't display (possibly because the current font has no character for them).

Cannot convert between unicode and non-unicode string datatypes

My source is having 3 fields :
ItemCode nvarchar(50)
DivisionCode nvarchar(50)
Salesplan (float)
My destination is :
ItemCode nvarchar(50)
DivisionCode nvarchar(50)
Salesplan (float)
But still I am getting this error :
Column ItemCode cannot convert between unicode and non-unicode string datatypes.
As I am new to SSIS , please show me step by step.
Thanks In Advance.

My source is having 3 fields :
ItemCode nvarchar(50)
DivisionCode nvarchar(50)
Salesplan (float)
My destination is :
ItemCode nvarchar(50)
DivisionCode nvarchar(50)
Salesplan (float)
But still I am getting this error :
Column ItemCode cannot convert between unicode and non-unicode string datatypes.
As I am new to SSIS , please show me step by step.
Thanks In Advance.
HI Subu ,
there is some information gap , what is your source ? are there any transformation in between ?
If its SQL server source and destination and the datatype is as you have mentioned I dont think you should be getting such errors ... to be sure check advance properties of your source and check metada of your source columns
just check simple oledb source as
SELECT TOP 1 ItemCode = cast('111' as nvarchar(50)),DivisionCode = cast('222' AS nvarchar(50)), Salesplan = cast(3.3 As float) FROM sys.sysobjects
and destination as you mentioned ... it should work ...
somewher in your package the source columns metadata is not right .. and you need to convert it or fix the source.
Hope that helps
-- Kunal
Hope that helps ... Kunal

Unicode and Chinese

Similar Messages

Maybe you are looking for