Unicode escapes

When I get the input of unicode escapes,
I can work with them as a String, but as soon as I add \u0022 (that is a double quote) to the input,
then it won't take it anymore, and it says ";" expected.
Now I am wondering, what is the way around this problem?
Any advice please?
public class Unicode
     String uniSt = "\u0074\u0065\u0073\u0074"; // can't add \u0022 to this string
     // As soon as I add this double quote \u0022, it won't work and would say ";" expected";
     public Unicode()
          System.out.println(uniSt);
     public static void main (String[] args)
          new Unicode();
{code}

The unicode value is a double quote and acts as one. If the string has an ending double quote character also, that's an error.
Use this
String s = "\u0074\u0065\u0073\u0074\u0022;{code}

Similar Messages

How to get the unicode escapes for characters outside a characterset

Hi!
I'm tryiing to edit into a RTF file and have been fairly successful so far. But living outside the U.S i need some characters outside ASCII. Those characters are supposed to be escaped as unicode-escapes, eg \u45. But I can't find a way to get the escapesequense for the unicode-characters that lives outside ASCII.
I'm guessing that this is a very simple thing to do but I have not been lucky with google so far.
So, how do I get the unicode escapes for characters outside a characterset?
Thanks in advance
Roland Carlsson

I'm tryiing to edit into a RTF file and have been
fairly successful so far. But living outside the U.S
i need some characters outside ASCII. Those
characters are supposed to be escaped as
unicode-escapes, eg \u45. But I can't find a way to
get the escapesequense for the unicode-characters
that lives outside ASCII.You are asking about RTF and not java correct?
As a guess....
Unicode is 32 bit (presumably you are not using the newest one.) Thus it requires a 32 bit representation. Thus \u45 actually is the same as \u0045. Thus something like \u1e45 would probably work.

Chinese characters to Unicode Escape

I'd like to implement a function that convert Chinese string into Unicode escape codes. Just like what the native2ascii doing.
I can convert single bytes with charToHex but have no clue on dealing double byte character. Any hints?

I think unicode escapes can be obtained through the tool Native2Ascii from a file. However, if you would a code, the following might be an example.
public class UnicodeTool{
static String byteToHex(byte b) {
      // Returns hex String representation of byte b
      char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7','8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};
      char[] array = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
      return new String(array);
   }   // end of method byteToHex
    static String charToHex(char c) {
      // Returns hex String representation of char c
      byte hi = (byte) (c >>> 8);
      byte lo = (byte) (c & 0xff);
      return byteToHex(hi) + byteToHex(lo);
   }   // end of method charToHex
static String toUnicodeFormat(char c){
           // int n = (int)c;
           //String body = Integer.toHexString(n);
           String body=charToHex(c);
            String zeros = "000";
             return ("\\u" + zeros.substring(0, 4-body.length()) + body);
    } //end of method toJavaUnicodeFormat
/*   public static void main(String[] args){
    String str = "09Az";//example of a string
    char[] chs = str.toCharArray();
    for(int j=0;j<chs.length;j++)
     System.out.println(toUnicodeFormat(chs[j]));
}

Parsing unicode escape codes

Hi,
I'm looking for a way to covert a string, wich is read from a file, containing unicode escape codes.
In short this means the file contains a string e.g. "Some text\nOn a new line" which i want to get into a String object as if it was the result of
String s = new String("Some text\nOn a new line"); I've been looking in the java docs but didn't find a function to do that conversion (though the compiler has to do it all the time...).
Any ideas?

Thats not what i'm looking for.
What i've got is a file that look like this:
1="Somestring"
2="another message\nAnd some more text"
3="text with\tTabs\n\tTo get some layout"
etc...
It's used as a stringtable for a program that has to be aviable in multiple languages. There versions of the file in different languages.
What i want is to be able to get e.g. string number 2 out of it in such way that System.out.println(string2); will give the following result:
another message
And some more text
instead of:
"another message\nAnd some more text"

How can I get unicode escape for £ in java

Is there any API which can translate the symbol £ to its corresponding unicode escape?

SurfManNL wrote:
Found one as well, but in the currency secion: 20A4
http://www.unicode.org/charts/PDF/U20A0.pdf
That's for Italian Lira, which has been replaced by the Euro some years ago ;)

Converting HTML Escaping to Unicode Escaping characters in Java

Hi,
I am getting some HTML escaping for special characters like pound, space, dollar etc. from database in HTML escaping format as ' £ ® etc.which I want to convert their Unicode equivalent escaping as U00A3,U0026. Java only convert & to & (U0026) but rest of the characters are not getting converted. If there is any API or way to do this please reply.
Note : I cant change Database as there are already thousands of records & My front end only needs Java to do all these conversions I cant change that also.

I have posted a method that does what you want. It was a long time ago since I wrote it and you should probably use a StringBuilder instead of a StringBuffer if you are going to use it in Java 5 or later. You can find the method in this thread:
http://forum.java.sun.com/thread.jspa?threadID=652630

Should I use escape even when I can enter Unicode characters directly

Unicode escape sequence - what is its purpose?
Should I use it even when I can enter Unicode characters directly using a text editor?

Then how does it fit into the I18n concept which provides for sending out text files with text strings for translation?
How one will translate escapes?

(Internationalization) - Unicode and Other ... Encoding Schemes

Hello,
I am developing a application that requires multiple languages
(Chinese/Japanese/English, French/German) support.
I plan to use utf-8 encoding, and not individual encoding for each language
like SHIFT_JIS for Japanese, BIG5 for Chinese etc.
This is more so because i would need to display multiple languages on the
same page, and allow the user to enter data in any language he/she chooses.
1. So, is the assumption that nothing but utf-8 can be used here, correct ?
2. If this is the case, why do people go for SHIFT_JS for Japanese or BIG5
for Chinese at all ? After the advent of Unicode, why cant they just use
utf-8.
3. I am using Weblogic 6. And my app is composed of JSPs alone at the
moment. It is working fine with utf-8 encoding, without me setting anything
at all in properties files etc. anywhere. I am getting data entered by user
in forms (in chinese/japanese etc) fine, and able to insert it into the
database and get it back too, without any problems.
So, why is it that people are talking of parameters to be set in properties
files to tell the app abt encoding being used etc.
4. My resource bundles are ASCII text files (.properties) which have name
value pairs. Hex Unicode numbers of the form /uXXXX represent the value. And
this works fine.
For example :
UserNameLabel = \u00e3\ufffd\u2039\u00e3
instead of -
UserNameLabel = ãf¦ãf¼ã
If the properties files have the original characters where values shud be
present, my java code is not able to read the name value pairs in Resource
Bundle.
Am i following the right approach ?
The problem with the current approach is after i create the Resource
Bundles, i must use native2ascii compiler to convert the characters into
their equivalent Hex Code values.
Thanks
JSB

charllescuba1008 wrote:
Unicode states that each character is assigned a number which is unique, this number is called code point. Right.
The relationship between characters and code points is 1:1.Uhm .... let's assume "yes" for the moment. (Note that the relationship between the Java type char and code point is not 1:1 and there are other exceptions ...)
Eg: the String *"hello"* (which is sequence of character literals) can be represent by the following Code Points
*\u0065 \u0048 \u006c \u006c \u006f*Those are the Java String unicode escapes. If you want to talk about Unicode Codepoints, then the correct notation for "Hello" would be
U+0048 U+0065 U+006C U+006C U+006F
Note that you swapped the H and e.
I also read that a certain character code point must be recognized by a specific encoding or else a question mark (?) is output in place of the character.This one is Java specific. If Java tries to translate some unicode character to bytes using some encoding that doesn't support that character then it will output the byte(s) for "?" instead.
Not all code points can be recognized by an encoding.Some encodings (such as UTF-8) can encode all codepoints, others (such as ISO-8859-*, EBCDIC or UCS-2) can not.
So, the letter *ל* would not be recognized by all encodings and should be replaced by a question mark (?) right?Only in a very specific case in Java. This is not a genral Unicode-level rule.
(disclaimer: the HTML code presented was using decimal XML entities to represent the unicode characters).
What you are seing is possibly the replacement character that your text rendering system uses to represent characters that it knows, but can't display (possibly because the current font has no character for them).

Convert UTF-8 (Unicode) Hex to Hex Byte Sequence while reading file

Hi all,
When java reads a utf-8 character, it does so in hex e.g \x12AB format. How can we read the utf-8 chaacter as a corresponding byte stream (e.g \x0905 is hex for some hindi character (an Indic language) and it's corresponding byte sequence is \xE0\x45\x96).
can the method to read UTF-8 character byte sequence be used to read any other (other than Utf 8, say some proprietary font) character set's byte sequence?

First, there's no such thing as a "UTF-8 character". UTF-8 is a character encoding that can be used to encode any character in the Unicode database.
If you want to read the raw bytes, use an InputStream. If you want to read text that's encoded as UTF-8, wrap the InputStream in an InputStreamReader and specify UTF-8 as the encoding. If the text is in some other encoding, specify that instead of UTF-8 when you construct the InputStreamReader. import java.io.*;
public class Test
// DEVANAGARI LETTER A (अ) in UTF-8 encoding (U+0905)
static final byte[] source = { (byte)0xE0, (byte)0xA4, (byte)0x85 };
public static void main(String[] args) throws Exception
    // print raw bytes
    InputStream is = new ByteArrayInputStream(source);
    int read = -1;
    while ((read = is.read()) != -1)
      System.out.printf("0x%02X ", read);
    System.out.println();
    is.reset();
    // print character as Unicode escape
    Reader r = new InputStreamReader(is, "UTF-8");
    while ((read = r.read()) != -1)
      System.out.printf("\\u%04X ", read);
    System.out.println();
    r.close();
} Does that answer your question?

Escape sequences

im new to Java and i am trying to experiment with escape sequences and unicode characters: I have tried this one but I am having trouble
public class XmasTree
     public static void main(String[]args)
// to print " mark
          System.out.println("\0022");
when I compile this error comes up::Tool completed with exit code 3
when I run it this error messge comes up:: cannot locate client JVM in..."it gives the path of where I have installed JSDK"

// to print " mark
          System.out.println("\0022");That won't work; unicode escapes are processed before the code is passed to the compiler. What the compiler will see is    System.out.println(""");and that's not proper syntax -- not even code prettyprinter in these forums can handle it properly.
when I compile this error comes up::Tool completed
with exit code 3
when I run it this error messge comes up:: cannot
locate client JVM in..."it gives the path of where I
have installed JSDK"If those are the only error messages you get you have misconfigured your development environment. What do you use to write code, Textpad?

Unicode: non-Latin characters in identifiers and data

I would like to use Unicode escapes in identifiers, say create
a variable name that is Japanese. But I can't seem to get this
to work.
I have a product (Japanese Partner) that lets me key in latin
characters then converts these to Japanese (kana or Kanji,
depending on various options) in Unicode and passes them
to the input line.
But I can't get these to compile.
Also, if I code:
char Jletter = '\u2f80';
System.out.println("Jletter = " + Jletter);
The runtime output is:
Jletter = ?
I thought it was supposed to display as a Unicode escape.
TIA for any help.

Perhaps, but I'm going on:
"Programs are written in Unicode (�3.1), but lexical translations are provided (�3.2) so that Unicode escapes (�3.3) can be used to include any Unicode character using only ASCII characters."
Then ...
"3.2 Lexical Translations
A raw Unicode character stream is translated into a sequence of tokens, using the following three lexical translation steps, which are applied in turn:
1. A translation of Unicode escapes (�3.3) in the raw stream of Unicode characters to the corresponding Unicode character. A Unicode escape of the form \uxxxx, where xxxx is a hexadecimal value, represents the UTF-16 code unit whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.
2. A translation of the Unicode stream resulting from step 1 into a stream of input characters and line terminators (�3.4).
3. A translation of the stream of input characters and line terminators resulting from step 2 into a sequence of input elements (�3.5) which, after white space (�3.6) and comments (�3.7) are discarded, comprise the tokens (�3.5) that are the terminal symbols of the syntactic grammar (�2.3). "
I take this to mean you can Unicode escapes for Unicode characters. But
it doesn't seem to work, so maybe my understanding is deficient. Maybe
the docs need to be more clear.

Displaying unicode characters

Dear all,
I am currently dealing with a character displaying problem on the MAM.
We will soon go live in China. Until now we only had European countries, with a Latin alphabet.
Now however this changes, so we need to use Unicode to display all characters correctly.
Therefor I have converted all our custom language files to language files with Unicode escape characters.
e.g.:
EQUIPMENTS_EQU_MAT_NR=设备材料号码
Now the strange thing is that when we login in Chinese, everything is displayed correctly, but when we login in German or Polish (countries which also have some special characters), we don't see everything displayed correctly.
This is the code how we display an entry from the language file on the screen:
<%@page language="java" contentType="text/html; charset=UTF-8"%>
meta http-equiv="Content-Type" content="text-html; charset=UTF-8"
<jsp:useBean id="res" class="com.sap.ip.me.api.services.MEResourceBundle" scope="session"></jsp:useBean>
<%=PageUtil.ConvertISO8859toUTF8(res.getString("CONFIRMATIONS_HEADER_DETAIL"))%>
For Chinese language, the characters are displayed correctly in this way.
e.g.: 最后一次同步时间
However Polish characters and German characters are not (always) displayed correctly.
e.g.: WskaÅºnik pierwszego usuniÄu2122cia usterki
The only 'difference' that I can see is that for China, every character in the language file has a special Unicode notation, while for Polish and German characters, only the special characters are displayed in special Unicode notation.
e.g.:
EQUIPMENTS_EQU_MAT_NR=Numer materia\u00c5‚u sprz\u00c4™tu
FYI, I've converted the files to Unicode escape characters with the java util native2ascii.exe.
Is there anyone who knows how to solve this issue?
Thanks already in advance!
Best regards,
Diederik
Edited by: Diederik Van Wassenhove on Jul 6, 2009 2:34 PM

Dear all,
I've found the reason for this problem.
Thanks anyway for your time!
The problem was that when converting the language files to Unicode escape characters, the source format was wrong. The files where saved as UTF-8, but the JAVA tool native2ascii is not taking UTF-8 as standard input format. So the resulting Unicode file was not containing the correct Unicode characters.
I've re-generated the language files with the parameter -encoding UTF-8, and now everything is displayed correctly!
Have a good day!
Diederik

Unicode value of a non-ASCII character

Hi,
Suppose, the unicode value of the character ् is '\u094d'.
Is there any Java function which can get this unicode value of a non-ASCII character.
Like:
char c='्';
String s=convertToUnicode(c);
System.out.println("The unicode value is "+ s);
Output:
The unicode value is \u094d
Thanks in advance

Ranjan_Yengkhom wrote:
I have tried with the parameter
c:\ javac -encoding utf8 filename.java
Still I am getting the same print i.e. \u3fIf it comes out as "\u3f" (instead of failing to compile or any other value), then your source code already contains the question mark. So you already saved it wrong and have to re-type it (at least the single character).
>
Then I studied one tutorial regarding this issue
http://vietunicode.sourceforge.net/howto/java/encoding.html
It says that we need to save the java file in UTF-8 format. I have explored most of the editors like netbean, eclipse, JCreator, etc... there is no option to save the java file in UTF-8 format.That's one way. But since that is so problematic (you'll have to remember/make sure to always save it that way and to compile it using the correct switch), the better solution by far is not to use any non-ASCII characters in your source code.
I already told you two possible ways to achieve that: unicode escapes or externalized strings.
Also read http://www.joelonsoftware.com/articles/Unicode.html (just because it's related, essential information and I just posted that link somewhere else).

Oracle 10g R2 - Unicode is converted to unknown ?��

Ok, I've read a thousand posts. Tried everything still stuck.
I am using Oracle 10g : 10.2.0.5.0
I'm on a Mac, using SQL Developer 3.1.07
When I enter "ė€£¥©", the insertion into the database converts it to: ?��
This is straight from a SQL INSERT command, in SQLDeveloper, no java code or anything.
I tried adding a file called SQLDeveloper.app/Resources/sqldeveloper/sqldeveloper/sqldeveloper.conf and added these lines:
-Doracle.jdbc.defaultNChar=true
-Doracle.jdbc.convertNcharLiterals=true
Didn't help.
I tried created a test column and made the datatype NVARCHAR2. same results. VARCHAR column does the same corruption but that's expected.
I tried N'©' - doesn't work
I tried INSERT INTO testchar (column1) VALUES ( unistr('©') );
results is still : �
I check NSL_NVAR_CHARACTERSET and it's UTF8
NLS_LANGUAGE     AMERICAN
NLS_TERRITORY     AMERICA
NLS_CURRENCY     $
NLS_ISO_CURRENCY     AMERICA
NLS_NUMERIC_CHARACTERS     .,
NLS_CHARACTERSET     US7ASCII
NLS_CALENDAR     GREGORIAN
NLS_DATE_FORMAT     DD-MON-RR
NLS_DATE_LANGUAGE     AMERICAN
NLS_SORT     BINARY
NLS_TIME_FORMAT     HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT     DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT     HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT     DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY     $
NLS_COMP     BINARY
NLS_LENGTH_SEMANTICS     BYTE
NLS_NCHAR_CONV_EXCP     FALSE
NLS_NCHAR_CHARACTERSET     UTF8
NLS_RDBMS_VERSION     10.2.0.5.0I don't know what else to do. I am not a DBA. I'm just trying to fix a bug or at least tell a customer what's wrong. Entering special characters is required. Can anyone help? thank you!
I have read that the insertion into the database does its own conversion but I don't know if that applies to me since I'm in 10g R2
Edited by: 947012 on Aug 14, 2012 11:43 AM
My mac's locales are also :
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=Edited by: 947012 on Aug 14, 2012 11:45 AM
Edited by: 947012 on Aug 14, 2012 11:49 AM

NLS_NCHAR_CHARACTERSET specifies the encoding of the NVARCHAR2 variables but not the SQL statements. SQL statements, when they arrive to the database server, are always encoded in the database character set (NLS_CHARACTERSET). This is true for the whole statement, including any character literals. Therefore, even if a literal is valid in SQL Developer (which works in Unicode), when the literal arrives to the database, it is already stripped of all characters that the database character set does not support. The characters are not actually removed but they are converted to a replacement character. In case of the US7ASCII database character set, anything non-ASCII is lost.
The trick to avoid this problem is to:
- mark the literals that need to be preserved for NVARCHAR2 columns as NVARCHAR2 literals by prefixing them with "n".
- set the mentioned property to activate the client side encoding mechanism with basically does a rough parsing of the statement and replaces all N-literals with U-literals. Undocumented U literals are similar to UNISTR calls. They encode non-ASCII characters with Unicode escape sequences. For example, n'€' (U+20AC) is encoded as u'\20AC'. As u'\20AC' contains only ASCII characters, it arrives to the database unchanged. The SQL parser recognizes the U-literals and converts them (unescapes them) to NVARCHAR2 constants before further SQL processing, such as INSERT.
-- Sergiusz

Unicode fonts in java

Greetings. I have two TrueType fonts I'm trying to display in my java application. Both fonts have been "installed" on my Windows machine, one of them works just find (I can see Graphics.drawString("A", ...) and the corresponding A symbol is displayed. However, when using the other font, I get the missing character symbol, unless I use a unicode escape sequence, e.g. "\uf040". Both fonts, incidentally are military-symbol fonts, and they are very similar in nature. Why does one font require I use the escape sequence, while one does not, and is there anyway I can configure the JRE so that I can avoid having to use the escape code?

When I open the font that isn't working directly (using a font viewer), the same letter "A" that is mapped to, say, "Times New Roman" is also mapped to the letter "A" for this font (in other words, it's not using characters outside of my keyboard capability). To put this another way, MS Word displays the characters just fine when I type them in with my keyboard-- Java 2D does not.
The reason I don't want to use the Unicode escapes is because I can't seem to figure out an easy way to display all the characters between, say, 40 and 200 in a loop-- in other words, how would I go about constructing the unicode escape string while in a loop? you can't simply say "/u" + loopVariable because it's not a string literal.

Unicode escapes

Similar Messages

Maybe you are looking for