Unicode(String) to actual Unicode !

Hi, i have an unicode data which is retrieved from database, the unicode is in String format(\u4eba\u53c2), how to make it to be actual unicode "\u4eba\u53c2". i have a problem where the unicode from database doesn't give me the actual character but the unicode string itself. Please comment on it . Thanks.
_calv

Hi Calv,
I'm pretty sure that the conversion from the ASCII string to Unicode is not available within the API. (If someone knows otherwise please jump in). However it should be fairly easy for you to program this conversion: for example, you could parse your string into the six character substrings that represent characters, strip off the \u, and then cast the sixteen bit integer into a Character.
In case it's helpful, I am pasting a couple of methods I wrote to go in the opposite direction:
returns a (ASCII) string that represents the specified character by a unicode escape sequence
static public String toUnicodeString( char character) {
     short unicode = (short) character;
   char hexDigit[] = {
      '0', '1', '2', '3', '4', '5', '6', '7',
      '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
   char[] array = {hexDigit[( unicode >> 12) & 0x0f],hexDigit[( unicode >> 8) & 0x0f],
        hexDigit[( unicode >> 4) & 0x0f], hexDigit[ unicode & 0x0f] };
String result = new String ("\\u" + new String (array));
   return result;
returns a (ASCII) string representing the java string argument:
e.g. -> "\u1234\u5678"
static public String toUnicodeString( String string) {
     String result = "\"";
     for (int   index =      0; index < string.length (); index++) {
          result = result + toUnicodeString (string.charAt (index));
     result = result + "\"";
   return result;
}   Regards,
Joe

Similar Messages

BareCode reader and insert String into actual selected JTextField

Hi everyone,
I can't invent anything appropriate about my concept. I would like to write a program for BareCode reading. I have working code witch gets a text string from reader which is connected over RS232. But I have to send this String to actual selected JTextField in other java program. I think to use clipboard to overcome this problem but I'm not sure if it's a good solution. Copy this String to clipboard and auto Paste... Any ideas ?
Please help me!
Many thanks for any advices :)

Hmm... I missed that bit about having to poke it into
another Java program. In that case I would
look into modifying the other Java program instead of
trying to write a separate program to deal with it.
Otherwise you run into management issues like making
sure the other program is running, and not minimized,
and located at the right place on the screen, and has
the JTextField in question in focus, and so on.In most cases, I would agree. But if his java program is header-less and just responds to the serial events and calls Robot.keyPress() and Robot.keyRelease() he will just be imitating the keyboard, which is exactly what most barcode readers can already do. And this would work in any program that can get keyboard input, no matter what the language was written in.
We are currently doing this with a web-based application. The web page just has a text field and when they scan the barcode it submits the page. Of course the barcode reader we are using just imitates the keyboard, no mucking around converting serial data into keyboard events.
I bet if the OP looks around he could find software that will already convert the barcode RS232 data into keyboard events.

What String.intern() actually improve

Everyone is saying that calling String.intern() for comparison is a much better way to do than calling String.equals() method. I make a quick test to make sure about it before using in my program. However, it comes out as a surprise that by running a loop of 10 millions iterations, version of equals() spend 800 ms while vesion of intern() spend 3000 ms which is much longer.
Anyone has any idea why? Is .intern() really an improvement?
My code:
class testIntern {
public static void main(String[] args) {
String sTest = new String("Hello Everyone");
System.out.println(sTest.intern()=="Hello Everyone");
System.out.println("version".intern()=="version");
long x = System.currentTimeMillis();
int j = 0;
for (int i = 0; i < 10000000; i ++) {
sTest = new String("Hello Everyone");
if(sTest.intern() == "Hello Everyone") {
j++;
System.out.println(System.currentTimeMillis() - x);
System.out.println(j);
}

Grepping the JDK 1.5 source for the pattern "intern *(", I found 56 files. So far, I'm considering it an example of how not to use intern(). For example, in com.sun.org.apache.xml.internal.utils.NamespaceSupport2, we have this gem:
    void declarePrefix (String prefix, String uri)
                                // Lazy processing...
        if (!tablesDirty) {
            copyTables();
        if (declarations == null) {
            declarations = new Vector();
        prefix = prefix.intern();
        uri = uri.intern();
        if ("".equals(prefix)) {
            if ("".equals(uri)) {
                defaultNS = null;
            } else {
                defaultNS = uri;
        } else {
            prefixTable.put(prefix, uri);
            uriTable.put(uri, prefix); // may wipe out another prefix
        declarations.addElement(prefix);
    }So, they intern both the namespace prefix and URI, taking up space in the permgen. Then they use .equals() to compare those to a constant string, and put both values in standard Hashtables!
There are several cases where intern() is used to prevent the compiler from performing constant substitution:
    public final static String PREFIX_XMLNS = "xmlns".intern();It's a nice little trick, when your constants are likely to change: no need to recompile dependent classes. However, since this particular prefix is defined by the W3C namespace spec, the trick is of very dubious value here.
There are a few cases, such as in java.lang.Class, where intern() is used so that you can do a fast string search over a fixed array. I suspect these cases exist to avoid a dependency on Map. Whether or not a simple equals() would have been sufficient is an open question.

Characters in String : Unicode 16-bit to custom 32-bit

I understand that internally in Java, characters in Strings are actually Unicode characters, with each character represented with 16 bits.
So, character �L� in Unicode is 0x004C
which is also 0000 0000 0100 1100
Now, I wish to encode each of the 4 bits above into individual ASCII characters:
= 0 0 4 C
= 0x30 0x30 0x34 0x43
= 00110000 00110000 00110100 01000011
So, from the original 16-bit character in Java, I want a final 32-bit.
Eventually, I�ll need to send the final result over the network, via OutputStream/writer and socket.
Can someone help me on this ? Or give me some ideas... Thanks.

trick: prepend the number with 1 and use substring... like int charWith1 = c + 0x10000. That'll make charWith1 to be of the format 0x1XXXX. Then call hexstring on that, you get a string like "1XXXX." Then you can drop the 1 with a call to substring.
of course there are methods that use only bit operations and additions to do it, making it a bit faster.. like this:
byte byte0 = (byte) ((c & 0x000F) + '0');
byte byte1 = (byte) (((c & 0x00F0) >> 4) + '0');
...

How to deal with such Unicode source data in BI 7.0?

I encountered error when activating DSO data. It turned out that the source data is Unicode in the HTML representation style. For example, the source character string is:
ABCDEFG& #65288;XYZ (I added a space in between & and # so that it won't be interpreted to Unicode in SDN by web browser)
After some analysis, I see it's actually the Unicode string
ABCDEFG（XYZ
Please notice the wide left parenthesis. It's the actual character from the HTML $#xxx style above. To compare, here is the Unicode parenthesis '（' and here is the ASCII one '(' . You see they are different.
My question is: as I have trouble loading the &#... string, I think I should translate the string to actual Unicode character (like '（' in this case). But how can I achieve this?
Thanks!
Message was edited by:
Tom Jerry

I found this is called "Numeric character reference", or NCR, in HTML term. So the question is how to convert string in NCR fashion back to Unicode. Thanks.

TABLE_ENTRIES_GET_VIA_RFC in unicode system

Hi all,
I know this is going to be a long initial post, but please please take the time to read it. Otherwise there would be many unnecessary questions.
We are using a middleware for mobile devices that reads connects to SAP for reading table data (via a.m. RFC) and posting RFCs/BAPIs.
Now we try to connect to an unicode SAP system (6.20). The statement
SELECT * FROM (TABNAME) INTO TABLE TABENTRY WHERE (SEL_TAB)
where TABENTRY is a table of type char(2048) does not work any more as in unicode systems the structure of the db table and the internal table have to be the same.
So we found the fm CRM_CODEX_GET_TABLE_VIA_RFC in CRM which is built from a copy of the a.m. fm and solves this problem by
1.) creating dynamically an internal table of the same type as the db table.
2.) select the data into this new internal table
3.) loop over the internal table and converting each field of its structure to a char variable and then appending it to a the result char(2048).
Theoretically everything's ok. The fm works now and returns correct data. But there's still one problem, the middleware doesn't convert the data correctly, as the values of fields of type 'p' are passed differently.
non unicode (standard fm):
1000000000000280401011000COMPDL 200408 ###E8##############################
unicode (changed fm):
1000000000000280401016000COMPDL 200504 10.000 0.000 0.000 0.000
As you can see, the select statement from the top of this post just puts the data into the string without actually converting the numbers in fields of type p (or QUAN, CURR in db).
The changed fm with converting every field also converts the number values, now they appear as char fields.
The middleware tries to convert the number values, but always returns 0 (I can only the results as the actually programming is a black box for me).
Has anyone any idea how to solve this problem? (beside getting help from the middleware vendor, which is difficult, as there is a new release working with unicode systems. But we will stay on the old release for some months from now...)

Hi Raja,
thanks for your answer.
I had already searched the forum and found your document about RFC_READ_TABLE which I think is quite interesting and a good solution.
But unfortunately, I cannot change the middleware's RFC logic, e. g. change the BAPI or make changes to the in-/output streams.
I now live with a workaround:
I modified the RFC to convert all p type fields to character fields and also changed the metadata RFCs accordingly, which works OK.
For all RFCs I use to post data to SAP, I write a wrapper RFC with character only structures and convert them to the internal RFCs inside SAP.
This is not my preferred solution, but I am very short of time and it works pretty well.
Regards,
Hans

Unicode characters longer than 2 bytes

It seems that Flex 3 only handles double-byte Unicode characters. Unicode has characters outside the BMP (Basic Multilingual Plane), which have codes greater than 2^16 and cannot be encoded in two bytes, but can be encoded in UTF-8. Will such characters be supported in the future, e.g. in Flex 4?
Thanks,
Francisco

How to tell whether a "character" (really a UTF-16 code unit) in an AS String is actually part of a surrogate pair:
D800..DBFF: high surrogate
DC00..DFFF: low surrogate
everything else: a character

Display String - UCS2

Hi,
I have a web based application. The default language is English.
I have some data in UCS2. Using Java, how can I display the decoded value of the UCS2 String on the screen? Say, if the UCS2 String is actually Chinese/Thailand Language Character, how can I do so?
Thank you,

I have some data in UCS2. Using Java, how can I
display the decoded value of the UCS2 String on the
screen? Say, if the UCS2 String is actually
Chinese/Thailand Language Character, how can I do
so?What do you mean 'on the screen'? If you're using System.out.println(), it may be impossible. See
http://forum.java.sun.com/thread.jspa?threadID=525433&messageID=2519054
if you want to learn more, maybe take a look at
http://www.jorendorff.com/articles/unicode/java.html

Null String and Empty String problem

Hello everyone,
since i am totally new in JSP, i am getting problem in handling strings.
Suppose i have a variable users = ""; then
I want to ask when to use:
if (users.equals(""))
and
if(users == "")
in my code, variable users has value "regional" for regional users.
and i am checking this code as:
if (users.equals{"regional")) {
out.print ("I am inside code");
at that time, the code is throwing error (run time error)
and when i changed the code as:
if (users == "regional") {
out.print ("I am inside code");
this time, the code is not generating error but the part message "I am inside code " is not displaying. The code do not inserts inside the if condition
I hope u understand my problem. Can anybody help me out with this.

This has basically nothing to do with JSP, but with basic Java knowledge.
When using the '==' operator to compare Objects (yes, String is actually a subclass of Object), then it will look if they are of the same reference. Using the '==' operator to compare primitive datatypes (int, boolean, char, etc) will look if they have the same value.
That is why the Object class has the equals() method to give the ability compare with another objects. And you can only invoke it when the Object is actually instantiated. So if it is not null.
if (string != null && string.equals("somevalue")) {
// or
if ("somevalue".equals(string)) {
}should work.
Edit rym82: this will not throw a NPE, but an ordinary compilation error ;)
Message was edited by:
BalusC

Using a String in the "IN" clause

Hello folks,
I am trying to output results from a table based on a String which I was planning on using in the "IN" clause. When I run the query through the PL/SQL procedure, I get no results.
In the PL/SQL program, I have a variable p_string where I am appending the ID's in a loop. So, when I print the string, I am seeing '1001','1002','1003' but when I do the following I get nothing.
select * from test_tb
where ID IN (p_string);
Is this because there is an extra quote at the beginning and end of the string and the string is actually ''1001','1002','1003''?
create table test_tb(ID varchar2(4), description varchar2(20));
INSERT INTO TEST_TB (ID, DESCRIPTION) VALUES ('1001', 'Testing 1001');
INSERT INTO TEST_TB (ID, DESCRIPTION) VALUES ('1002', 'Testing 1002');
INSERT INTO TEST_TB (ID, DESCRIPTION) VALUES ('1003', 'Testing 1003');
INSERT INTO TEST_TB (ID, DESCRIPTION) VALUES ('1004', 'Testing 1004');
INSERT INTO TEST_TB (ID, DESCRIPTION) VALUES ('1005', 'Testing 1005');
Thanks

Thanks for the link, Greg.
I was able to find another link which worked for me as I am not too familiar with Collections. I used Regular Expressions instead.
https://blogs.oracle.com/aramamoo/entry/how_to_split_comma_separated_string_and_pass_to_in_clause_of_select_statement

"scan from string" to timestamp doesn't work for 18:00:00 (6PM)

I just found a strange issue in LabVIEW. I hope I'm doing something silly, but I just may have found an unusual bug.
run the snippet below with the following for the input string: 03:00:00,18:00:00,17:00:00
Time converts fine for just about any other time EXCEPT 18:00:00 (6 PM) for which it is returned as 00:00:00 (midnight). If you even add a second to it (18:00:01) you get back the expected result.
Here's hoping I'm not loosing my mind
Matt Holt
Certified LabVIEW Architect
Solved!
Go to Solution.
Attachments:
TimeParseBug.vi ‏11 KB

As annoying as it may seem, this exact scenario is an abuse of the timestamp. A timestamp is meant to be used for absolute times. And that includes a date. As Ravens Fan already pointed out, the 0 seconds since January 1, 1904 GMT is used in all timestamp display routines to mean the canonical invalid timestamp and hence the timestamp control displays the default string indicating the actual date/time format rather than a specific date/time.
If you need an absolute timestamp, for instance because you do want to have a local time indication, although the date is not relevant, adding an offset of 86400 to all values would fix it once and for all. Now the timezone offset can't cause the timestamp to reach 0 ever, even if you reside west of GMT, and it will be fine (until you start to do timestamp arithmetic that involves subtraction of relative timespans, then you would have to make the offset big enough that this will never get an issue). The current date would serve as a nice offset for that, which would be MattH's last suggestion. Nice to see that the Scan from String routine actually does use the passed in timestamp as default value and only replaces the values it is configured to parse.
Rolf Kalbermatter
CIT Engineering Netherlands
a division of Test & Measurement Solutions

Obtaining the " #text " string in a nodeList !!!!

hello guys,
i just have a bizarre problem, i'm getting the #text string in a nodeList whereas this string does not exist in any of the .xml files.
here is my function:
private Vector<String> getClassAttributs(Document doc)
      String nN="";
      NodeList listeClassAttributes = doc.getElementsByTagName("attribut");
      for(int i=0;i<listeClassAttributes.getLength();i++)
         nN=listeClassAttributes.item(i).getParentNode().getNodeName();
      // System.out.println(nN);
         if (nN.equals("classe"))
             NodeList children=listeClassAttributes.item(i).getChildNodes();
             for(int j=0;j<children.getLength();j++){
                System.out.println( children.item(j).getNodeName());//here it displays me among other string(that actually exist) the #text (which does not exist)
               //if(children.item(j).getNodeName().equals("valeur"))
                 // System.out.println(children.item(j).getNodeValue());
              //classAttributes.add(children.item(i).getNodeValue());
      return classAttributes;
    }thank you

In your for loop check for node type before printing the value. Something like this...
if (children.item(j).getNodeType()== Node.ELEMENT_NODE)

Scan From String White Space

i all,
I'm trying to use Scan From String in order to parse some data coming in from UDP.
Input String: ASCII [00 01 02 03 ... FF]
What I want: s[00 .. 30] d[12], d[34], d[56] leftover s[37 38 39 ... FF]
ATTEMPT1
Format String: %49s%2d%2d%2d
What I get: s[00-09] RUNTIME ERROR!
ATTEMPT2
Format String: %49[^]%2d%2d%2d
What I get: Only allows first output. Will error out if I use any additional outputs from Scan From String
ATTEMPT3
Format String: %49[^(0xFF)]%2d%2d%2d Value in () is ASCII character FF.
What I get: s[00 .. 30] d[12], d[34], d[56] leftover s[37 38 39 ... FF]
It appears as though when I use %##[^] it thinks I'm looking for the ENTIRE string so it will not let me add any more Formatting. If I add a delimiter other than ^ it will run, and it will work presuming that character isn't within the first 49 characters... and I can't guarentee that it won't.
I'm aware I can parse my string using subsets and whatnot... but Scan From String is so elegant. It would be great if %S allowed for white space... or if $##[^] would simply take the first ## characters and allow me to Format after that.
Is there a simple, elegant way to do this? I wish my dataset was only 3-4 outputs. It'd be ideal if I could. Thanks.
Edit:
It might be more helpful if I provide a less abstract example:
I have an ASCII Header (Finite Length String), a Sender IP (Finite Length String), a Timestamp, a Message ID (Finite Length Decimal), A Message in ASCII ( '1' actually means 0x31, not 0x01) And for some ungodly reason... no delimiters.
So I was HOPING %##s%##s%<%H:%M:%s>t%##d (With leftover string to be my message) would work, but if any white space is contained within there... it messes up.

I cannot provide exact strings because the string is actually ASCII characters, most of which aren't displayable.
I have a string where I have:
24 ASCII Characters representing 6x U32 Header Data
13 ASCII characters represening the sender IP (string)
12 ASCII Characters representing the name of the message (String)
12 ASCII Characters representing 3x U32 Data
12 ASCII Characters represneting the name of packet (String)
12 ASCII Characters representing 12x U8 Data
256 ASCII Characters represening 256x U8 Data
etc...
It would be ideal to simply Scan from the string and output the data with the appropriate data types already assigned instead of splitting string and type casting each individualy. But if, for example, my header starts with an ASCII representation of a U32 of 2560(decimal) it would look like this: [00][00][0A][00]. ASCII 0A is considered white space. So my header would only contain 2 ASCII characters instead of the desired 24.

From string to method

Hi all,
i have a property file.
i need to read from the file some string and activate a method with the same name.
i know how to read the file.
my question is how can i transfer from the string to actual method.
Gabi

How much of this are you doing? If it's a lot, then you may be reinventing the Spring .

Eliminating ###'s after hex reconversion to char string

This is interesting.
I was originally required to pass a segment to my subroutine. I had to do so using text literals. The   symbol is not recognizable by text literals. Therefore, I had to convert the entire string to hex.
I then passed the hex segment to the subroutine, and reconverted it back to char string.
After this I found a ton of unexpected # symbols in my segment. I tried doing a
REPLACE '###' WITH '' INTO CHARSTRING.
, but unsuccessfully. There is not much documentation on hex_to_char conversion, there are 2 FMs on 46, but I cannot get them to work. If anyone knows how to eliminate these unexpected # symbols please let me know.

DATA: HEXSTRINGER(6000) TYPE 'X'.
DATA: STRINGER(3000) TYPE 'C'.
FIELD-SYMBOLS: <FSHEX>, <RECONVERT2>.
CALL FUNCTION 'ARCHIVE_GET_NEXT_RECORD'
               EXPORTING
                     ARCHIVE_HANDLE       = READ_HANDLE
               IMPORTING
                     RECORD               = ARC_BUFFER-SEGMENT
                     RECORD_STRUCTURE     = ARC_BUFFER-RNAME
           MOVE ARC_BUFFER-SEGMENT TO SEGSTER.
           assign SEGSTER to <fsHEX> type 'X'.
           MOVE <fsHEX> to HEXSTRINGER.
APPEND 'REPORT ZSUBR.' TO CODE.
APPEND 'FORM DYN1 USING HEXSTRINGER.' TO CODE.
APPEND 'DATA: SEGSTER2(3000) TYPE ''C''.' TO CODE.
APPEND 'ASSIGN HEXSTRINGER TO <RECONVERT2> TYPE ''C'' .' TO CODE.
APPEND 'MOVE <RECONVERT2> TO SEGSTER2.' TO CODE.
APPEND 'WRITE:/ SEGSTER2.' TO CODE.
APPEND 'ENDFORM.' TO CODE.
I write out Segster2 and compare it to segster.
I've done some variations of this where after I change the char string to hex string I replace 'C' with '7C' into hexstringer and then re-replace it in the subroutine, and then reconvert it to a char string- but actually I don't think that was a necessary step. I believe I can just convert it to hex, and then reconvert it back to hex in the subroutine.

Unicode(String) to actual Unicode !

Similar Messages

Maybe you are looking for