Converting \u00e0 to appropriate character.
Dear Java folks,
I'm trying to import some Java source code produced on one system into another.
I have a zip file which contains some .java source files in which non-ascii characters have been converted to their equivalent Unicode string, for example:
String s1 = "\u00e0" ;
The original code looked like this:
String s1 = "�" ; <- that's an 'a' with an accent over it.
I'm reading the source code from a zip file using an InputStreamReader.
Is there any way to convert the "\u00e0" and other Unicode characters back into the correct characters as I read the zip entry?
fyi I'm running on WIndows XP and the original code is part of a Lotus Notes database. In Notes the characters look fine but when the code is 'exported' they get converted to \u00e0 by NOtes and there's not much I can do about that. Both the export and the import are running on the same computer.
TIA, Keith
If not, my next question is this, are there any
classes that will convert a Unicode escaped character
of the form \uxxxx to the appropriate Modified UTF-8
2,3 or 4 byte character required by Java?I have no idea what those last two lines mean. Which
part of Java is it that you think requires UTF-8
encoding? Certainly the compiler doesn't.To OP and to DrClap:
Indeed the compiler doesn't require UTF-8.
The compiler is perfectly happy accepting
either UTF-8 or \uxxxx.
However, the OP had lots of files (in UTF8 presumably) in Lotus.
When exported, Lotus turns the UTF8 into \uxxxx, which is very
hard for the OP to read on the screen.
So the OP wants to write a function that translates \uxxxx back into UTF8.
For example: whenever he sees \u05D0,
he would like to write out 2 bytes: 0xD7 and 0x90
(which is the UTF-8 encoding of the code point 0x05D0)
This way, when he loads the file in a text editor,
he would see the international character (rather than a obscure code like \u05D0)
To OP:
It's really simple. The conversion table is in the following link:
http://en.wikipedia.org/wiki/UTF8
There are resources on that page on the encoding.
But here is a slightly more detailed peusodocode:
public void write(int n)
byte b=(byte)(n&0xFF);
output.write(b);
// This output object must be a Stream, not a Writer.
// Otherwise, Java will interpret the number as a character, and mess it up
Whenever you see \uxxxx
First, convert xxxx as a hexadecimal number, and store into integer varabie n
if (n<=0x7F) {
write(n);
else if (n<=0x7FF) {
write(0xC0|(n>>6));
write(0x80|(n&0x3F));
else if (n<=0xFFFF) {
write(0xE0|(n>>12));
write(0x80|((n>>6)&0x3F));
write(0x80|(n&0x3F));
else {
write(0xF0|(n>>18));
write(0x80|((n>>12)&0x3F));
write(0x80|((n>>6)&0x3F));
write(0x80|(n&0x3F));
}
Similar Messages
-
How to convert bits into a character
The code which i could get from this forum is specified below which converts a String into Bits.......i could successfully use this code in BitStuffing method for client side.....at server side i need to convert these bits into characters again.........
class StringBits
String str="This is a string";
char ch[]=str.toCharArray();
for(int i=0;i<ch.length;i++)
System.out.printl(Integer.toBinaryString(ch));
In this code each character is converted into its binary bits form.........now my problem is how to do it function the reverse way......i dont even know the return type of the method Integer.toBinaryString(char ch)..........
please anyone let me know...which method to use to convert given binarycode into characters..........
i.e if i have 0111100 how do i convert these bits into character again....
thanks in advance
Deepika1. Can't you do this by sending whole bytes instead of bits? It would make everything a lot easier.
2. The solution you have probably is not going to work as is, since Integer.toBinaryString does not pad with 0's. If you have to have the bits, you should do something like:
String s = ...
byte bytes[] = s.getByteArray();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < i bytes.length; i++) {
int bit = 2;
while (bit < 256) {
sb.append(bytes[i] % bit);
bit *=2;
}This is of course little-endian, so be aware of that when translating on the other side -
Convert/Interpret XML to Character/ABAP
Hi,
I am consuming a web-Service via client-proxy method. for this I've created the client proxy and called one of the methods in the class in the program.
After passing the input values (country Code) to the web-service I am receiving the result (country Name) in a varliable (string format) but, the format of the returned data is XML.
example:
<NewDataSet> <Table> <countrycode>in</countrycode> <name>India</name> </Table> <Table> <countrycode>in</countrycode> <name>India</name> </Table> </NewDataSet>
How can I get the vaule mentioned in variable <name> in the output ??
Is there a way to convert this XML into character format and read the value in the variable <name> or parse every field from the output in the internal table what can be the approch and solution to do this ?
/MikeTry using the code below for xml parsing...
TYPE-POOLS: ixml.
TYPE-POOLS : abap.
FIELD-SYMBOLS: <dyn_table> TYPE STANDARD TABLE,
<dyn_table1> TYPE STANDARD TABLE,
<dyn_wa>,
<dyn_fieldvalue>,
<dyn_wa1>,
<dyn_field>,
<dyn_field1>,
<fs_1> TYPE table,
<fs_2> TYPE ANY,
<fs_3> TYPE ANY,
<fs_5> TYPE ANY.
FIELD-SYMBOLS: <fs_fields>.
DATA: dy_table TYPE REF TO data,
dy_line TYPE REF TO data,
dy_datatype TYPE REF TO data,
dy_table1 TYPE REF TO data,
dy_line1 TYPE REF TO data,
new_line TYPE REF TO data,
xfc TYPE lvc_s_fcat,
ifc TYPE lvc_t_fcat.
TYPES: BEGIN OF t_xml_line,
data(256) TYPE x,
END OF t_xml_line.
TYPES: BEGIN OF gs_elem_value,
element(30) TYPE c,
value(30) TYPE c,
recordid TYPE i,
END OF gs_elem_value.
DATA: gi_elem_value TYPE TABLE OF gs_elem_value ,
gw_elem_value TYPE gs_elem_value.
DATA: l_ixml TYPE REF TO if_ixml,
l_streamfactory TYPE REF TO if_ixml_stream_factory,
l_parser TYPE REF TO if_ixml_parser,
l_istream TYPE REF TO if_ixml_istream,
l_document TYPE REF TO if_ixml_document,
l_node TYPE REF TO if_ixml_node,
l_xmldata TYPE string.
DATA: l_elem TYPE REF TO if_ixml_element,
l_root_node TYPE REF TO if_ixml_node,
l_next_node TYPE REF TO if_ixml_node,
l_name TYPE string,
l_iterator TYPE REF TO if_ixml_node_iterator.
DATA: l_xml_table TYPE TABLE OF t_xml_line,
l_xml_line TYPE t_xml_line,
l_xml_table_size TYPE i.
DATA: l_filename TYPE string.
DATA : gv_projectdetails TYPE string .
DATA : xref TYPE REF TO cx_dynamic_check .
PERFORM get_complete_path USING p_path2 p_file2 CHANGING gv_complete_path .
Creating the main iXML factory
l_ixml = cl_ixml=>create( ).
Creating a stream factory
l_streamfactory = l_ixml->create_stream_factory( ).
PERFORM get_xml_table CHANGING l_xml_table_size l_xml_table.
wrap the table containing the file into a stream
l_istream = l_streamfactory->create_istream_itable( table = l_xml_table
size = l_xml_table_size ).
Creating a document
l_document = l_ixml->create_document( ).
Create a Parser
l_parser = l_ixml->create_parser( stream_factory = l_streamfactory
istream = l_istream
document = l_document ).
Validate a document
IF pa_val EQ 'X'.
l_parser->set_validating( mode = if_ixml_parser=>co_validate ).
ENDIF.
Parse the stream
IF l_parser->parse( ) NE 0.
IF l_parser->num_errors( ) NE 0.
DATA: parseerror TYPE REF TO if_ixml_parse_error,
str TYPE string,
i TYPE i,
count TYPE i,
index TYPE i.
count = l_parser->num_errors( ).
WRITE: count, ' parse errors have occured:'.
index = 0.
WHILE index < count.
parseerror = l_parser->get_error( index = index ).
i = parseerror->get_line( ).
WRITE: 'line: ', i.
i = parseerror->get_column( ).
WRITE: 'column: ', i.
str = parseerror->get_reason( ).
WRITE: str.
index = index + 1.
ENDWHILE.
SKIP 2.
WRITE : 'The input xml ' , p_file , ' is invalid and does not conform to the inset DTD. '.
EXIT.
ENDIF.
Process the document if there are no errors
ELSEIF l_parser->is_dom_generating( ) EQ 'X'.
PERFORM process_dom USING l_document.
ENDIF.
*& Form get_xml_table
FORM get_xml_table CHANGING l_xml_table_size TYPE i
l_xml_table TYPE STANDARD TABLE.
Local variable declaration
DATA: l_len TYPE i,
l_len2 TYPE i,
l_tab TYPE tsfixml,
l_content TYPE string,
l_str1 TYPE string,
c_conv TYPE REF TO cl_abap_conv_in_ce,
l_itab TYPE TABLE OF string.
l_filename = p_file.
upload a file from the client's workstation
CALL METHOD cl_gui_frontend_services=>gui_upload
EXPORTING
filename = l_filename
filetype = 'BIN'
IMPORTING
filelength = l_xml_table_size
CHANGING
data_tab = l_xml_table
EXCEPTIONS
OTHERS = 19.
IF sy-subrc <> 0.
MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno
WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.
ENDFORM. "get_xml_table
*& Form process_dom
FORM process_dom USING document TYPE REF TO if_ixml_document.
DATA: node TYPE REF TO if_ixml_node,
iterator TYPE REF TO if_ixml_node_iterator,
nodemap TYPE REF TO if_ixml_named_node_map,
attr TYPE REF TO if_ixml_node,
name TYPE string,
prefix TYPE string,
value TYPE string,
indent TYPE i,
count TYPE i,
index TYPE i.
node ?= document.
CHECK NOT node IS INITIAL.
ULINE.
IF node IS INITIAL. EXIT. ENDIF.
create a node iterator
iterator = node->create_iterator( ).
get current node
node = iterator->get_next( ).
loop over all nodes
WHILE NOT node IS INITIAL.
indent = node->get_height( ) * 2.
indent = indent + 20.
CASE node->get_type( ).
WHEN if_ixml_node=>co_node_element.
element node
name = node->get_name( ).
nodemap = node->get_attributes( ).
gw_elem_value-element = name.
IF NOT nodemap IS INITIAL.
attributes
count = nodemap->get_length( ).
DO count TIMES.
index = sy-index - 1.
attr = nodemap->get_item( index ).
name = attr->get_name( ).
prefix = attr->get_namespace_prefix( ).
value = attr->get_value( ).
ENDDO.
ENDIF.
WHEN if_ixml_node=>co_node_text OR
if_ixml_node=>co_node_cdata_section.
text node
value = node->get_value( ).
TRANSLATE value TO UPPER CASE.
gw_elem_value-value = value.
IF gw_elem_value-element = 'table_name'.
gv_id = gv_id + 1.
ENDIF.
gw_elem_value-recordid = gv_id.
APPEND gw_elem_value TO gi_elem_value.
CLEAR gw_elem_value.
ENDCASE.
advance to next node
node = iterator->get_next( ).
ENDWHILE.
ENDFORM. "process_dom -
IMP-00069: Could not convert to environment national character set's handle
While importing database objects from dmp we are getting the following Error
C:\>imp chem/chem@chemdb full=y file='E:\eiproject\expdat.dmp' log=y;
Import: Release 8.1.5.0.0 - Production on Thu Sep 13 10:28:54 2001
(c) Copyright 1999 Oracle Corporation. All rights reserved.
Connected to: Oracle8i Enterprise Edition Release 8.1.5.0.0 - Production
With the Partitioning and Java options
PL/SQL Release 8.1.5.0.0 - Production
Export file created by EXPORT:V08.01.07 via conventional path
import done in WE8ISO8859P1 character set and WE8ISO8859P1 NCHAR character set
IMP-00069: Could not convert to environment national character set's handle
IMP-00000: Import terminated unsuccessfully
nullHi James,
IMP-69 can occur if you try to use an EARLIER version of IMPORT against an export (.dmp) file produced by a LATER version of EXPORT.
How about trying this:
Use the 8.1.5 EXPORT utility from Win2K to connect to your Solaris 8.1.7 database; then use the 8.1.5 IMPORT utility to import the file into the 8.1.5 W2K database.
Nat -
Prevent XI from converting & #163; to pound character
The requirement is in input XML file & #163; comes..by default in output file xml it is converting the above grouped of characters to £ sign.
how do I retain & # 163; in output xml file without writing any adapter module for solution
Points will be awarded.
Edited by: gvs kiran on Nov 7, 2008 6:46 AM
Edited by: gvs kiran on Nov 7, 2008 6:46 AM
Edited by: gvs kiran on Nov 7, 2008 6:47 AMI do not see a chance. It seems that all & #xxx; are replaced by their character equivalent.
You have to replace the £ by a Java mapping or adapter module after the graphical mapping again.
Regards
Stefan -
How 2 convert numeric value to character value?
Hi friends,
I want to convert numeric value to the character value.
Is there any FM available?
Points rewared soon.
Regards
RonnREPORT ZSPELL.
TABLES SPELL.
DATA : T_SPELL LIKE SPELL OCCURS 0 WITH HEADER LINE.
DATA : PAMOUNT LIKE SPELL-NUMBER VALUE '1234510'.
SY-TITLE = 'SPELLING NUMBER'.
PERFORM SPELL_AMOUNT USING PAMOUNT 'USD'.
WRITE: 'NUMBERS', T_SPELL-WORD, 'DECIMALS ', T_SPELL-DECWORD.
FORM SPELL_AMOUNT USING PWRBTR PWAERS.
CALL FUNCTION 'SPELL_AMOUNT'
EXPORTING
AMOUNT = PAMOUNT
CURRENCY = PWAERS
FILLER = SPACE
LANGUAGE = 'E'
IMPORTING
IN_WORDS = T_SPELL
EXCEPTIONS
NOT_FOUND = 1
TOO_LARGE = 2
OTHERS = 3.
ENDFORM. " SPELL_AMOUNT -
Converting Japanese two Byte Character...
Hi,
I am doing a Scenario outbound from R/3.
I am triggering the message via proxy using japanese language and sending to XI.
In XI, we are getting the Mapping Error.
Some records in the message contains single byte characters and some records having double byte characters.
For single byte characters, XI is able to generate the target structure in the Mapping. But it is not able to convert the double byte characters to the target structure.
Can any one help me to resolve this issue....
Thanks in advance...
Regards,
VasuHi,
Japanese data are Shift JIS encoded ? Maybe changing the encoding (or encoding declaration) could help ?
Chris -
Is it possible to convert � to a hex character?
I'm looking for a character that represents the hex value of �
I've been looking through the internet but haven't found any encouraging leads or clues. I'm thinking it may not even be possible.
Any help will be appreciated.
Thanks in advanceAt http://www.unicode.org you should be able to download tables. Alternatively you could write the character to a file and read it in as bytes and display the bytes in hex.
-
How to convert ascii value into character and vice versa
Hello the java world people,
I want to convert each characters from my array into their corespondent ascii value and vice versa, how can I do that ?The term "ASCII" is often used very loosely.
Java char values are UNICODE and the ASCII codes are indentical to UNICODE characters in the range 0 .. 127. UNICODE values 128 and above don't have coresponding ASCII values, though 128-255 corespond to ISO-8859-1 which is one of the encodings often called "extended ASCII".
As shown above you can covert between chars and coresponding int value simply with a cast, but you should be aware that the more exotic characters will not give you sensible values. -
Converting local character styles (overrides) to proper styles
Hi all,
I've been handed a document that has no styles defined at all. All styling is done the local character styles ("overrides").
This tutorial http://www.lynda.com/InDesign-tutorials/160-Convert-local-formatting-character-styles/8532 4/192449-4.html shows how to do a preflight, that detects all of the overrides, which is great, but you still have to manually fix them.
The fastest way I've found so far is to do a search for a particular font-size/font-type, and then apply a proper character style to each match, until the overrides are all gone. But maybe there's a faster way? E.g. is it possible to "select all matching text" (as you can do in Google Docs), that would select all text matching that font-size/type, and then with one click apply a style to all selected text?
Or maybe there's another way to convert all text matching e.g. font-size/font-type into a character style?
Thanks!
BjoernThe auto-char styles script tries to be semi-intelligent: It ignores the underlying paragraph style completely. Instead, for each paragraph, it checks which is the longest continuous run of formatting in that paragraph (textStyleRange). That formatting is considered the "underlying formatting of the paragraph", and any other formatting in that paragraph is considered an override, and has an appropriate character created (if necessary) and applied to it.
The only exception to that is "italics", which is always considered an override, even there is more italics than regular in the paragraph (something that can happen in a bibliography entry, for instance).
So, for instance, with the formatting you give as an example ("No style, + Arial Bold + size: 24 + Leading 28"), where everything but the final paragraph return has that applied, Auto-Char Styles will not apply a character style to most of the text, and apply an appropriate char style to the final enter only.
That way, "all" that remains is to go through the document creating and applying paragraph styles as necessary, knowing that all local overrides have been styled with an appropriate char style. -
How to convert character streams to byte streams?
Hi,
I know InputStreamReader can convert byte streams to character streams? But how to convert the character streams back to byte streams? Is there a Java class for that?
Thanks in advance.When do you have to do this? There's probably another way. If you just start out using only InputStreams you shouldn't have that problem.
-
Approach to converting database character set from Western European to Unicode
Hi All,
EBS:12.2.4 upgraded
O/S: Red Hat Linux
I am looking for the below information. If anyone could help provide would be great!
INFORMATION NEEDED: Approach to converting database character set from Western European to Unicode for source systems with large data exceptions
DETAIL: We are looking to convert Oracle EBS database character set from Western European to Unicode to support Kanji characters. Our scan results show
both “lossy (110K approx.)” and “truncation (26K approx.)” exceptions in the database which needs to be fixed before the database is converted to Unicode.
Oracle Support has suggested to fix all open and closed transactions in the source Production instance using forms and scripts.
We’re looking for information/creative approaches who have performed similar exercises without having to manipulate data in the source instance.
Any help in this regard would be greatly appreciated!
Thanks for yourn time!
Regards,There are two aspects here:
1. Why do you have such large number of lossy characters? Is this data coming from some very old eBS release, i.e. from before the times of the Java applet interface to Oracle Forms? Have you analyzed the nature of this lossy data?
2. There is no easy way around truncation issues as you cannot modify eBS metadata (make columns wider). You must shorten or remove the data manually through the documented eBS interfaces. eBS does not support direct manipulation of data in the database due to complex consistency rules enforced by the application itself (e.g. forms).
Thanks,
Sergiusz -
Import error converting character set
Hi there, the first time i ever post something, so i hope i give enough information so my question can be answered.
I have an oracle 8.1.6 database on Windows NT4. On the same machine i have installed Designer 6.
Everything is working just fine i just can't import a dump file made from repository 6. It's not a full dump, so i first created a whole new repository.
Now, when i want to import the dump file in command prompt, using the imp.exe (present in C:\<oracle_home>\bin), the next error occurs:
IMP-00069: Could not convert to environment national character set's handle
IMP-00000: Import terminated unsuccessfully.
As the import is due to start, it gives the following message:
import done in WE8iSO8859P1 character set and WE8IS08859P1 NCHAR character set.
Then the error occurs.
Now i wonder if this can be solved. I cannot see in the dump file what character set was used there. I did not make the export myself. Is there a way that i can convert the character set in the dumpfile or something? Or should i solve this problem by making changes in the database?
Thx for your help, i am trying to solve this problem myself for over a week now and i'm getting tired of it.Hi friend,
Same Problem i have faced, and i found that this is due to only the Version mismatch, u attempting to import data from higher version to lower one.
Just check at ur end also.
Bye
Tehzeeb Ahmed -
Convert chinese character to unicode
hello,i have a problem.
how can i know the unicode of chinese character in a file?
like this character ' 称 ' in a file a.txt
then how can i do to read the a.txt file then get the unicode of the character?
really need help..i want to know what is the algorithm and the coding
of that tool.
thanksYou might consider downloading the open source project and looking directly at the code for that tool. I imagine that you could create something similar in...oh, maybe 25 lines of code.
retrieve the command line args
open the specified file using the charset encoding specified
for (all characters in the file) {
if character is > 0xFF convert to \uXXXX
output character to new file
John O'Conner -
Convert Character to ASCII Number
Hello.
I would greatly apprectiate if someone could tell me how to convert a Number or Character to its ASCII decimal representation. For example if I have the number 9 I would like to save its ASCII decimal number 39 instead.
Thank YOU!!By the way, getNumericValue doesn't return the unicode value of the character but the numeric value - for instance it returns the int 9 for the character '9' and the int 15 for the character 'F' (since 15 is F in hex).
Maybe you are looking for
-
Adobe Creative Suite 5.5 Master Collection disables autorun during installation
This is on a completely fresh installation of windows 7 x64. Before installing Adobe Creative Suite 5.5 Master Collection x64 autoplay for my camera works, after installation it does not work. If I uninstall it, it works again. Where is this program
-
How to use Function Module KCD_CSV_FILE_TO_INTERN_CONVERT?
How to convert an .CSV file to an internal table format .Please give me any sample code for the same
-
Pdf printing not same in different printers
Hi, We have a pdf file which different departments would be printing on different printer but on a pre-perforated sheet. But while testing we noticed that the pdf is not being printed same in all the printers. Can we force this somehow? Thanks for yo
-
Text Variable using Customer Exit .
Hello Friends. I have a requirement to create Text Variable using Customer Exit . So I created the Text Variable as "ZVRBMS" and assigned that with YTD &ZVRBMS& in the Structure of Query Desigener . so that Out put will YTD AUTO OR YTD RETAIL Now thi
-
"Prodcut allocation check not allowed for ATPcat.'BR' and application".
SAP Experts, Whenever, I delete a delivery, I am seeing a pop-up message with the information message as "Prodcut allocation check not allowed for ATPcat.'BR' and application". Is this a standard functionality. We dont have ATP Check/Allocation check