Unicode representation

Hi,
I `m new in JAVA programming and I would like to know, how can I get the Unicode representation (Hexadecimal number of length 4) of a character.
e.g. Given '0' as an input, the output should be 0030.

Thanks but I was looking for a function or something which takes a chaarcter as an input and returns its unicode representation. You see I want to encode a text. If anybody has some idea please share it with me
Thank You

Similar Messages

Converting non-English characters to their unicode representation

I have series of files/templates where each contains a locale specific language such as Chinese, Japanese or German. I need to find out how do I get their unicode representations so I can send as html formatted email?
I can already send one for the English template as html formatted email w/out a problem. I was able to find a sample of unicode representation of Japanese and send that as a test. But how do I get the temaplates that I have and convert their contents into unicode?
Thanks in advance.
please dis-regard. I figured it out.
chehrehk

You need to know what character encoding was used for the template text.
For example, you could have Japanese text encoded using UTF-8 or
encoded using ISO-2022-JP and the same Japanese characters would
be represented as a different sequence of bytes. Without knowing which
charset was used, you won't be able to convert the byte sequence back
into Unicode characters (e.g., to store in a Java String).
If you do know which charset was used, java.io.Reader will convert the
byte stream into Unicode characters.
If the charset information is not available, there are heuristics that you
can use to try to guess the correct charset, but by their nature they're
going to be wrong sometimes.

Convert a string to its unicode representation

how can I convert characters such as "ащщ"
to their unicode numbers (1072,1097,1097)? I know it's gotta be so
easy, but it's going over my head.

ogre11 wrote:
> when I give any Cyrillic character to a cf function it
gets converted into a
> question mark at some point before the function runs so
asc returns 63 on all
> cyrillic characters
encoding issue. where's the data coming from? if db, which
one? what driver? is
the data really unicode?
do you have a simple example showing the problem?

How do I tell if a File is ANSI, unicode or UTF8?

I have a jumble of file types - they should all be the same, but they are not.
How do I tell which type a file has been saved in?
(and how do I tell a file to save in a certain type?)

"unicode or UTF-8" ?? UTF-8 is unicode !NO! UTF-8 is not UNICODE. Yes it is !!No it is not.
And to prove it I refer to your links.........
You simply cannot say "unicode or UTF-8" just because
UTF is Unicode Transformation Format.UTF is a transfomation of UNICODE but it is not UNICODE. This is not playing with words. One of the big problems I see on these forums is people saying the Java uses UTF-8 to represent Strings but it does not, it uses UNICODE point values.
You can say "UTF-8 or UTF16-BE or UTF-16LE" because
all three are different Unicode representations. But
all three are unicode.No! They are UNICODE transformations but not UNICODE.
>
So please don't play on words, I wanted to notify the
original poster that "unicode or UTF-8" is
meaningless, he/she would probably have said :
"unicode (as UTF-8 or UTF-16 or...)"You are playing with words, not me. UTF-8 is not UNICODE, it is a transformation of UNICODE to a multibyte representation - http://www.unicode.org/faq/utf_bom.html#14 .

� symbol displaying with a Unicode U+00C2 character in front of it.

Using the same java application code and the same j2sdk_1.4.1_02fcs java package I get a display difference between Redhat 7.3 and Redhat EL4 AS.
nf = NumberFormat.getCurrencyInstance(new Locale(lan, con));
String res = nf.format(money);
This results in a single strange character preceeding the monetary symbol for UK pound symbol.
Instead of just displaying a � character in front of monetary values I am getting a Unicode U+00C2 character in from of is as shown :
Good = �1.23
Bad = ��1.23
I used the following simple test program to show this :
>>
import java.text.NumberFormat;
import java.util.Locale;
public class Test555
public static void main (String[] args)
NumberFormat nf = NumberFormat.getCurrencyInstance(new Locale("en", "GB"));
System.out.println(nf.format(1.23));
<<
I compiled this and ran the class on both machines...
on a Redhat 7.3 machine: �1.23
on a Redhat EL4 AS machine : ��1.23
The /bin/unicode_start program only works on the console in a VT or xwindows with a TERM type of xterm, but allows the console to properly display the characters.

The upgrade to Red Hat 8.0 and beyond changed the default character
encoding from ISO-8859-15 to UTF-8. The UTF-8 translation scheme will translate the Unicode representation of the Pound Sterling to a 16-bit UTF-8 representation by prepending an 0xC2 to the 0xA3 (the Pound). It is this 0xC2 that we see represented as the capital A circumflex or the "T" symbol we noticed earlier.
Is there a way to remove the prepending 0xC2 that was added by the 16bit UTF-8 representation ?

Why unicode symbol (copyright) in textfield to drawString not interpreted?

I have a textfield that I can type stuff into and have printed on a JPanel. However, when I type in some unicode, it's printed as the code not what it means. Why is this? how do I fix this?
NOTE: to try this sample program, you have to type "\u00a9" (without the quotes) into the text box and then click the button.
import javax.swing.*;
import java.awt.*;
import java.awt.event.*;
public class TestSymbol extends JPanel
     private static JTextField field;
     public void paintComponent(Graphics g)
          //This prints a copyright symbol
          g.drawString( "\u00a9 Something", 50, 50 );
          //This prints the unicode
          g.drawString( field.getText(), 50, 100 );
     public static void main(String[] args)
          JFrame f = new JFrame();
          f.setSize( 500, 500 );
          JPanel p = new JPanel();
          p.setLayout( new FlowLayout() );
          //The panel that prints everything
          final TestSymbol t = new TestSymbol();
          setExactSize( t, 200, 200 );
          p.add( t );
          //The text field
          field = new JTextField( "", 10 );
          field.setSize( 100, 25 );
          p.add( field );
          JButton button = new JButton( "click" );
          button.addActionListener( new ActionListener()
               public void actionPerformed(ActionEvent e)
                    t.repaint();
          p.add( button );
          f.getContentPane().add( p );
          f.setVisible( true );
     private static void setExactSize(JComponent component, int width, int height)
          Dimension dim = new Dimension( width, height );
          component.setPreferredSize( dim );
          component.setMaximumSize( dim );
          component.setMinimumSize( dim );
          component.setSize( width, height );
}

sabre150 wrote:
Unicode representation as String literals are parsed by the compiler. Unicode representation entered by the user has to be parsed by your program and should substitute the unicode string for the appropriate unicode value. See reply #1 of http://forums.sun.com/thread.jspa?threadID=5226995&tstart=28049 .
Edited by: sabre150 on May 14, 2009 5:26 PMThanks! I just figured that out too.
This isn't the best, but it properly converts all unicode within a string from a text field (you can add it to my previous program):
     private static String UNICODE_START      = "\\u";
     private static int UNICODE_LENGTH      = 6;
     private static String convertUnicode(String base)
          //Unicode unicodePos
          int unicodePos = 0;
          int currSpot = 0;
          //String length
          int length = base.length();
          //Our return string
          StringBuffer buffer = new StringBuffer( length );
          //Loop and convert
          while ( ( unicodePos = base.indexOf( UNICODE_START, unicodePos ) ) != -1 )
               //The unicode string
               String unicode = base.substring( unicodePos, unicodePos + 6 );
               //Is actually a unicode character
               if ( isUnicode( unicode ) )
                    //Put everything up to this into the buffer
                    if ( currSpot < unicodePos ) buffer.append( base.substring( currSpot, unicodePos ) );
                    //Now append our new character
                    buffer.append( ( char ) Integer.parseInt( unicode.substring( 2 ), 16 ) );
                    //Move to end of this unicode
                    unicodePos += UNICODE_LENGTH - 1;
               //Move forward
               unicodePos++;
               currSpot = unicodePos;
          //Put in everything left
          if ( currSpot < length )
               buffer.append( base.substring( currSpot ) );
          return buffer.toString();
     private static boolean isUnicode(String unicode)
          //Check length
          if ( unicode.length() != UNICODE_LENGTH ) return false;
          //Make sure it starts with unicode start
          if ( unicode.indexOf( UNICODE_START ) != 0 ) return false;
          try
               //Can only have hex symbols
               Integer.parseInt( unicode.substring( 2 ), 16 );
          catch (NumberFormatException e)
               return false;
          return true;
     }

Unicode (numerical) character conversion

What is the problem?
Some characters written in unicode format do not get transformed to their appropriate font-character.
Example
Numerical version: ≥ is printed as ≥
String version: ≥ is printed correctly
Both have the same unicode, yet LiveCycle DS will only convert the "string"-version and not the numerical version.
Question
Is there a way to make LiveCycle convert the numerical unicode representation of special characters aswell? If so how do I do this using LiveCycle Designer. I am working with textFields
Workaround
I have written the following workaround in my java project, obviously I'd appreciate it if I LC did this work for me as I'm not willing to implement a full-scale unicode table while LC has it by itself already (≥ works ...):
private String getNodeText(NodeList nl)
if (nl == null){
return null;
if (nl.getLength() == 0){
return null;
String unescapedString = StringEscapeUtils.unescapeXml(nl.item(0).getTextContent());
return (unescapedString.equals("null") ? null : replaceUnicodeNumbers(unescapedString));
private String replaceUnicodeNumbers(String unescapedString) {
return unescapedString.replace("≥", "≥"); // I changed this code to something more usable

You want the LPAD function
select LPAD( col1 , 9, ' ') from my_table;
Hi everybody !
Can someone solve this for me. I have a varchar2(9) variable, var, with the value '1234'.
Now I want to convert this to having leading blanks, ie. ' 1234'.
How to do this?
If I use the to_char(var,'fmt') function I get some error message that says there's too many declarations of to_char. I guess because tha variable already is a varchar2.
Thanx in advance
Vrjan

String unicode -- understanding what is happening

Hi,
Can some one please explain how the following is working in string
Case1:
1) Assume I have set of bytes encoded in UTF-8.
2)Iam reading the UTF-8 bytes using inputstreamreader and store it in string
3) I know string stores it in unicode -- But what encoding? (UCS-2 or UTF-8 etc)
Case 3:
4) Suppose SJIS bytes are encoded using UTF-8 and Iam reading using UTF-8 and storing it in String and then printing it using System.out.println()(locale is set to SJIS), who does the converstion from Unicode representation of the string to SJIS
Case 2:
* Bytes are SJIS encoded
* Read it using UTF-8 ( wrong encoding used to read)
* Store it in string . How will it store?
* Print it using System.out.println in a machine whose locale is set to SJIS.
* Who is doing the conversin from unicode to SJIS

Guessing...
Case1:
1) Assume I have set of bytes encoded in UTF-8.
2)Iam reading the UTF-8 bytes using
inputstreamreader and store it in string
3) I know string stores it in unicode -- But what
encoding? (UCS-2 or UTF-8 etc)
Depends on how you 'store' it. If you use String explicitly then the default encoding of the system is used. If you use String explicitly with an encoding then that encoding is used.
Most, but not all (bugs), implicit conversions for readers use the default encoding.
Case 3:
4) Suppose SJIS bytes are encoded using UTF-8 and Iam
reading using UTF-8 and storing it in String and then
printing it using System.out.println()(locale is set
to SJIS), who does the converstion from Unicode
representation of the string to SJIS
What conversion? If it is a String then it is UTF-8. If you 'display' it then it uses the OS encoding.
Case 2:
* Bytes are SJIS encoded
* Read it using UTF-8 ( wrong encoding used to
o read)
* Store it in string . How will it store?Garbage for the most part.
* Print it using System.out.println in a machine
e whose locale is set to SJIS.
* Who is doing the conversin from unicode to SJIS
Garbage in, Garbage out.
By the way there is a Internationalization forum and presumably you could get answers there.

Unicode and mdmp

lads,
Can somebody send the docs related to unicode and mdmp.
james

Dear James,
MDMP stands for Multi Display, Multi Processing.
A Multi-Display, Multi-Processing code pages system (MDMP system) uses more than a single code page on the application server. Depending on the login language, it is possible to switch dynamically between the installed code pages. MDMP therefore provides a vehicle for using languages from different code pages in a single system.
MDMP was the solution SAP developed for support of combinations of multiple code pages in one system prior to the availability of unicode database support. MDMP effectively enabled an SAP ERP system to be installed with a non-unicode database, and to support connections to the ERP application by users with language combinations not supported by a single code page. Example: support of one ERP system with English, French, Japanese, and Chinese.
MDMP implementations implemented strict rules and restrictions in order to ensure data consistency and avoid data corruption.
MDMP was only supported for SAP R/3, SAP R/3 Enterprise, and mySAP ERP applications. No other SAP applications or SAP NetWeaver components support MDMP.
SAP's Unicode Strategy
SAP commits itself fully to providing you with a Unicode-based mySAP.com e-business platform.
To help their customers transition smoothly to future-proof technologies, future versions of SAP applications will be exclusively in 64-bit and Unicode starting in 2007.
Global business processes require IT systems to support multilingual data without any restrictions - Unicode represents the first technology capable of meeting these requirements.
Web interfaces open the door to a global customer base, and IT systems must consequently be able to support multiple local languages simultaneously.
With J2EE integration, the mySAP.com e-business platform fully supports web standards, and with Unicode, it now can take full advantage of XML and Java.
Only Unicode makes it possible to seamlessly integrate in homogeneous SAP and non-SAP system landscapes, enabling truly collaborative business.
Regards,
Rakesh

Unicode Line Separators

The Unicode Newline Guidelines suggests that a LS (Line Separator) character should be used. I'm currently using a '\n' in my strings but would like to use the unicode representation of this. It looks like the Unicode representation for LS is 2028. However my tests are resulting in the question mark (?) instead of a newline. Am I using the wrong code?
sample code:
public class Test {
private static void main(String args[]) {
String line = "First Name \u2028 Last Name";
System.out.println(line);
[\code]
actual output:
First Name ? Last Name
desired output
First Name
Last Name

My understanding is that if you use the following:
String newline=System.getProperty("line.separator");
Java will use the correct line separator for the Locale. Most like the reason that you are getting ? is because your system is interpreting the unicode character \u2028 with a wrong character set.
V.V.

How to convert unicode to characters

can any one help me??? In java how can we convert the unicode representation to character and display it on textbox.
that is a function which takes as input the unicode and gives the output the corresponding character which the unicode represents and then display this character on a textarea
if any one can help me i will be thankful to him/her.
thankx
prince arora

that is a function which takes as input the unicode
and gives the output the corresponding character
which the unicode represents and then display this
character on a textareaYes, there is such functionality.
How do you get your input ?

Convert xml referenced entity to unicode?

Hi,
Anyone know how one converts all XML Referenced Entities, of the form "&#xxx;", which appear in a String object, to their unicode representation?
For example:
If String A contains the phrase "two letters: Á á", I would like to get a new String which contains the phrase "two letters: � �"

nobody71 wrote:
I can write some code to do the conversion by parsing the string, finding each and every XML reference and then convert each to unicode... I know how to do that...Then go ahead and do it.
but there must already be something that does this in Java's standard API/library...Actually there isn't.
For example, when this type of XML entity reference is found in the text of an XML node, and this node is read into a java DOM object using java's API/library, there is some code somewhere which does the conversion I am inquiring about, because when I view the String representing that text, it now contains the unicode characters. So, there must be a quick and already existing way to do this.The code must exist somewhere, yes. It doesn't follow that the code must be encapsulated in a public method. It's a specialized requirement of XML parsers so there's really no need to make it available outside the parsers where it exists.
I guess in the time it took me to write this post I could have written the converter... :-(Why do you need to do that, anyway? Why not just let an XML parser do it for you?

Fatal Error encountered while creating a Shopping Cart.

Hello people,
Im an amateur ABAP/WDJ Developer, and i am facing a serious issue,i would be so glad if anyone can help me out with this.
When i create a shopping cart on the portal, it goes all smooth until the checkout, when i click on checkout, it flashes an error...
"Fatal Error: com.sap.engine.lib.xml.parser.ParserException: Incorrect encoded sequence detected at character (hex) 0xa0, (bin) 10100000. Check whether the input parsed contains correctly encoded characters. Encoding used is: 'utf-8'(:main:, row:2, col:59)Exception"
I have tried taking traces and have have gone through the FMs that are called from the SRM, but with no luck.
Can anyone please help me out with this?
-Regards

Raghav
Did you get this issue resolved. We're having a similar problem. When the data is entered into the Shopping Cart Name field, it sometime has Hex "C2A0" which looks like the Unicode representation of the UTF-8 Non-Repeating Space character. This is user entered data and my initial thought was that they were copying and pasting from a web site, but the user claims not.

PDF conversion

Hi all,
I am using adobe reader 10.0.1 in Windows Xp.
I am using XPDF 3.02pls 'pdftotext.exe -enc utf-8 myfile.pdf' to convert a Tamil language pdf file to text.
I am getting the text file but with some of the characters not shown and some are broken.
Will anyone help on this issue on how to convert a non-english PDF into a txt file with all of its characters retained.
Thanking you,
A.Araskumar

Extracting plain text from a PDF file is a complex task, and it's not uncommon for a PDF file to have incomplete lookup tables (so the glyphs on screen don't have a Unicode representation). This results in errors and omissions in the exported text, and there's not a whole lot anyone can do about it other than re-creating the PDF file properly from the original source material. Adobe Acrobat may do a better job of the conversion, but there are no guarantees.
Please note that these forums are for discussion of Adobe products and related topics; we do not provide support for non-Adobe software.

Characters conversion

Hello,
I'm working on a conversion problem between windows and unicode representations of characters.
I would like to get, for instance for the euro character, the windows encoding value (128) from its unicode encoding value (8364), and vice versa.
�        8364          128   (unicode / windows)
�        8218          130
�        402          131
�        8222          132
�        8230          133
�        8224          134
...I've found on the internet a way to get 128 from 8364 :
String s = "�";
byte b[] = s.getBytes();
int code = (int) (b[0] & 0xff);
By the way, if someone could explain me how it works... ;)
I'm looking now for a way to do the opposite, get 8364 from 128...
Thank you a lot in advance.
Bye!

Thank you for your answers.
hunter90000, you're right, the "windows" charset is windows-1252. But what I'm looking for is a way to get 8364 from 128. If someone could help me unterstand the code below, I think I could find it :
String s = "�";
System.out.println( (int) s.charAt(0));                 --> 128
byte b[] = s.getBytes();
int code = (int) (b[0] & 0xff);
System.out.println(code);                                -->8364
System.out.println((char) code);                        -->�The reason why I want to get 8364:
I'm manipulating a xml file to send data to a web browser via an ajax function. This data provide from an Oracle database, in which the euro character has the value 128.
The only way I've found to display correctly the euro character on the browser is to encode it & # 8364; in the xml file, even if the charset of this one and of the JSP is 'ISO-8859-15'...
The problem is not limited to the euro character, but to all the carachters in the following list :
   (8364 . 128)
    (8218 . 130)
    (402 . 131)
    (8222 . 132)
    (8230 . 133)
    (8224 . 134)
    (8225 . 135)
    (710 . 136)
    (8240 . 137)
    (352 . 138)
    (8249 . 139)
    (338 . 140)
    (381 . 142)
    (8216 . 145)
    (8217 . 146)
    (8220 . 147)
    (8221 . 148)
    (8226 . 149)
    (8211 . 150)
    (8212 . 151)
    (732 . 152)
    (8482 . 153)
    (353 . 154)
    (8250 . 155)
    (339 . 156)
    (382 . 158)
    (376 . 159)

Unicode representation

Similar Messages

Maybe you are looking for