Native2ascii

hi,
Can I use native2ascii for files which are not properties file for e.g xml configurations.
or any other files.
has anyone tried that.
Thanks
Rajesh Mittal

native2ascii doesn't really care what the file is. You can use it on any text file.
John O'Conner

Similar Messages

Using native2ascii

Hi,
we are using Oracle 8 db.Our application uses
XML and while inserting data , somehow illegal
XML characters are being inserted in the records.
And when the records are reterieved, the XML
is throwing exception as Illegal XML characters.
I have read about native2ascii tool.But, was
not clear on how to use for our purpose to
read the illegal XML characters.
If i use,
native2ascii -encoding 646 p1.txt p2.txt
If p1.txt contains illegal XML characters,
how r they represented in p2.txt(output file).
Since, i have to know how the illegal characters
are represented.
Please, advise me.
Thanks

Hi Ron,
This looks like an interesting problem.
I am familiar with native2ascii but have very little knowledge of XML.
I'm not sure exactly what you're thinking of doing, but here's how native2ascii works:
native2ascii assumes that you have a file of correctly formed characters in some encoding and that you want the java reader to be able to input this file ( for example, a .java or .properties file). Suppose the file p1.txt is encoded in UTF8. Then you would call
native2ascii -encoding UTF8 p1.txt p2.txt. In file p2.txt, all ASCII characters (all characters from p1.txt with code points less than 0x0080) are represented as single byte ASCII characters. Any characters with code points >= 0x0080 are represented as a sequence of six single byte ASCII characters representing the unicode code point. Thus, for example, if the input file contains the Chinese character for tea, this would be represented in the output file as \u8336 and would be correctly read in by the java reader.
What is that 646 encoding you used in your example?
Regards,
Joe

Is it possible to read native encodings without using native2ascii

Hi All,
I have a text file containing japanese characters saved in a UTF8 format. I can read this into my java application after converting it using the native2ascii tool.
However i'm wondering whether i can read this directly into my java application without going through the native2ascii tool since this file is already in UTF8 format. I have tried, but it doesnt work. Please advise.
Thanks.

You missed reading the java.util.Properties documentation:
"When saving properties to a stream or loading them from a stream, the ISO 8859-1 character encoding is used. For characters that cannot be directly represented in this encoding, Unicode escapes are used; however, only a single 'u' character is allowed in an escape sequence. The native2ascii tool can be used to convert property files to and from other character encodings."
What's so bad about using native2ascii?

Native2ascii as java code

Hi all,
I need to write a function that does the work of the command native2ascii, in order to read a file containing japanese characters and write their representations \uxxxx in another one (without calling native2ascii with exec()).
I have written this function, but it is buggy since it sometimes it adds characters to the converted String.
here is the function:
public static String convertToUnicode(String entry, String encoding) {
byte[] bytes = entry.getBytes();
String temp = "";
try {
temp = new String(bytes, encoding);
} catch(Exception e) {
e.printStackTrace();
char[] chtb = temp.toCharArray();
temp = "";
for(int i = 0; i < chtb.length; i++) {
temp += "\\u" + Integer.toHexString((int)chtb);
return temp;
Regards,
Hani

I can see multiple possible problems there:
byte[] bytes = entry.getBytes();Conversion will happen in the default character encoding; you can get anything as result. It would be safer to use one of the 16 bit unicode encodins or get the data as an array characters.
temp = new String(bytes, encoding);I don't quite understand why you do this... if "encoding" is different from the system's default character encoding you'll most probably get nonsense back.
temp += "\\u" + Integer.toHexString((int)chtb[ i]);Integer.toHexString will not pad zeroes in front of the value so for instance the question mark will come out as "\u3f" instead of "\u003f". And by the way, you really should use StringBuffer instead of String in that loop...
here's my try:public static String convertToUnicode(String entry) {
        char[] chtb = entry.toCharArray();
        StringBuffer temp = new StringBuffer(6*chtb.length);
        int BIT_MASK = (1 << 16);
        for(int i = 0; i < chtb.length; i++) {
            temp.append("\\u");
            temp.append(Integer.toHexString(chtb[ i] | BIT_MASK).substring(1));
        return temp.toString();
}There's still some space for left for optimization...

Native2ascii failure -asian locales - Solaris specific.

I am trying to run the native2ascii utility to convert my asian locale property files to Unicode. The utility works on Windows, HP-UX, and IBM AIX, but fails on Solaris. It does work with ASCII and ISO8859_1 encodings. Below is the error I am getting. I am using 1.4.2_03-b02 on Solaris. Note that when I do look in the /usr/j2se/jre/lib directory, the charsets.jar file is there. Any ideas what I may be missing?
/usr/j2se/bin/native2ascii -encoding EUC_JP properties/PLMVisResourceBundle_ja_JP.properties com/eai/visweb/properties/PLMVisResourceBundle_ja_JP.properties
Exception in thread "main" java.lang.NoSuchMethodError: sun.io.CharToByteJIS0208.getIndex1()[S
at sun.nio.cs.ext.JIS_X_0208$Encoder.<init>(JIS_X_0208.java:77)
at sun.nio.cs.ext.EUC_JP$Encoder.<init>(EUC_JP.java:210)
at sun.nio.cs.ext.EUC_JP$Encoder.<init>(EUC_JP.java:199)
at sun.nio.cs.ext.EUC_JP.newEncoder(EUC_JP.java:46)
at sun.tools.native2ascii.Main.initializeConverter(Main.java:331)
at sun.tools.native2ascii.Main.convert(Main.java:110)
at sun.tools.native2ascii.Main.main(Main.java:376)

I was able to figure this out, using some of the other native2ascii options. Using -J-verbose showed me that the CharToByteJIS0208 class was found in the i18n.jar file and not the charsets.jar file as I had presumed. Because of the order of loading the JRE jar files by Java and i18n was loaded first. I have overcome this problem by adding the -J-Xbootclasspath/p:"path to charsets.jar" option to the native2ascii command to ensure it finds the class in the right jar file.

Native2ascii - same input - different output!??

Hello all,
I have to convert this UTF-8-Property-String:
label_from=От
into "ISO 8859-1 with Unicode escapes" with native2ascii.
On my Linux-Machine (Java 5) I get:
label_from=\u041e\u0442
This is right and leads to a correct output :)
On a Windows-Machine (I unfortunately have to work with time by time) with Java 6 I get:
label_from=\u00d0\u017e\u00d1\u201a
This leads to a broken output :(
Can anybody tell me why this happens? Are there ways to make the Windows-machine behave like the Linux-machine (besides "format c:" and than install LInux...)?
Thanks a lot in Advance!
Best regards
Stephan

I think you need to specify the encoding: native2ascii -encoding UTF-8 <infile> <outfile> Otherwise, it treats <infile> as if it were in the system default encoding (which just happens to be UTF-8 on your Linux box).

Native2ascii.exe error when using 1.4.1_01

I am facing a strange problem that I did not encounter when using 1.4.0. I am having a properties file "LabelCommon_it.properties" on which I run the native2ascii . The first time I run it, the output file is correctly gets created in the desired directory. But when I run it the second time, it gives an error:
java.lang.Exception: c:\test\LabelCommon_it.properties could not be written.
If I manually remove the previously created file and then run the native2ascii, no exception is thrown and the file is correctly getting created.
Has anyone come across this problem? Any ideas why this problem occurs? I did not find anything in the 1.4.1_01 release notes.
Thanks

I don't think it's a bug. However, the automatical override functionality should be provided in the future release version of JDKs. It is, in my opinion, a little annoying. Since we have the source properties file, the override functionality should be enabled when the target localization properties file is made. Fortunately, manually removing the old target properties file is no more than a couple of mouse clicks before the new compilation. I haven't tested native2ascii.exe in the latest Beta version of JDK1.4.2, either.

How to use native2ascii tool?

Hi Friends,
I created one property file in english named string.property for my application. I need to convert that property file for different languages like chinese, arabic, japanese, spanish, etc.
I found native2ascii too is use to convert propert file. But I have no idea how to use it. can any one plese explain me by giving proper example if possible.
I would appreciate your help for this.
Thank you

native2ascii [options] [inputfile [outputfile]]
-reverse
Perform the reverse operation: convert a file with Latin-1 and/or Unicode encoded characters to one with native-encoded characters.
-encoding encoding_name
Specify the encoding name which is used by the conversion procedure. The default encoding is taken from System property file.encoding. The encoding_name string must be taken from the first column of the table of supported encodings in the Supported Encodings document.
-Joption
Pass option to the Java virtual machine, where option is one of the options described on the reference page for the java application launcher. For example, -J-Xms48m sets the startup memory to 48 megabytes.

Why native2ascii is embedding \ufeff in start of properties file

Hi,
I am new in java internationalization world and I got astonished that it is not possible to use non-ANSI encoded property files. Any how, I tried to use native2ansi utility to convert UTF8 encoded property file to ANSI using this command:
native2ascii -encoding UTF-8 MessageBundle-UTF8.properties MessageBundle_en_US.propertiesIn response I got an ANSI encode file with this text:
\ufeffTitle = Window Title
Now after setting the locale to English US, I tried to access it using this java code:
Locale sysLocale = Locale.getDefault();
ResourceBundle messages = ResourceBundle.getBundle("MessageBundle",sysLocale);
System.out.println(messages.getString("Title"));
This code throws an exception "Can't find resource for bundle java.util.PropertyResourceBundle, key Title" when it tries to execute messages.getString("Title");
However, if I remove \ufeff from the start of my text in properties file, the code works fine. \ufeff was not added in the original UTF8 file and was generated by native2ascii.
Can you guys plz tell me, what is wrong.
Thanks
Saqib

What you are seeing is the BOM (byte order mark), and if you are seeing it in the result file from native2ascii, then the BOM was definitely also there in the original file (the reason you are seeing it now is that native2ascii inserted the \u notation in front of it). So native2ascii did not generate anything.
Try looking at your source file in a hex editor, then you will see the BOM.
You probably saved your file in Notepad or another application that automatically inserts a BOM in UTF-8 files. Use a proper editor to save your files (one that does not insert a BOM), and you won't have the problem.

How to use Native2ASCII Extension

hello
in need sample code for Native2ASCII Extension any code example how to use it in project

Here is the basic explanation of what is Native2ASCII -
http://java.sun.com/j2se/1.4.2/docs/tooldocs/windows/native2ascii.html
To activate the extension just right click on a file in the application navigator and choose the option.

Native2ascii tool

i need to show the japanese text in my web application. this is how i am doing it.
first i develop my code on windows NT and then preapre a jar file and migrate it to unix server which will host the application.
in my windows NT , i convert the native characters to ascii using native2ascii tool and use these unicode characters in my properties file. every thing works fine and when i host the application on my windows NT machine , i could see the japanese text correctly.
now when i create jar file and migrate to unix server , the japanese characters get screwed up. why ? is it that the unicode characters are different on windows NT and unix server ? do i have to convert the characters on unix machine and use those unicode characters ? assuming that i did that , are these characters different from version of solaris machine to another ? should the jdk version be same ?
on my windows
java version "1.3.1_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_02-b02)
Java HotSpot(TM) Client VM (build 1.3.1_02-b02, mixed mode)
on my unix server
java version "Solaris_JDK_1.2.2_12"
Solaris VM (build Solaris_JDK_1.2.2_12, native threads, sunwjit)
Thanks,
bala

if i were you i will try this:
1. what's the encoding you are using for native2ascii, if it's SJIS, i will check first if i can browse some other web pages with SJIS encoding, so i can determine if it's only my wep application can't show jap text correctly in the unix box.
2.if your web browser cannot display jap text with SJIS encoding, then you need to install some fonts.
you can find some clue from the following link.
http://users.erols.com/eepeter/chinesecomputing/programming/java.html
you can test your jap text by changing the following sample code with an unicode for your jap string
Displaying Chinese
Finding Chinese Fonts Java 2 allows the programmer to directly access the fonts on the machine. The code sample below gets a list of all the fonts on the system, and then checks each font to see if it can display a sample Chinese string. Matching fonts are printed. Variations of the below code can be used to automatically find Chinese fonts and set the font of the Swing components accordingly. A bug in the JVM currently also lists all the logical fonts, whether they support Chinese or not, so I remove them from the list.
// Determine which fonts support Chinese here ...
Font[] allfonts = GraphicsEnvironment.getLocalGraphicsEnvironment().getAllFonts();
int fontcount = 0;
String chinesesample = "\u4e00";
for (int j = 0; j < allfonts.length; j++) {
if (allfonts[j].canDisplayUpTo(chinesesample) == chinesesample.length()) {
if (fontcount == 0) {
ctextarea.setFont(new Font(allfonts[j].getFontName(), Font.PLAIN, 12));
if (allfonts[j].getFontName().startsWith("dialog") == false &&
allfonts[j].getFontName().startsWith("monospaced") == false &&
allfonts[j].getFontName().startsWith("sansserif") == false &&
allfonts[j].getFontName().startsWith("serif") == false) {
cfontchooser.addItem(allfonts[j].getFontName());
fontcount++;
}

Native2ascii problem

Hello Group,
I have some problem with native2ascii ant task for
non-english languages. When I run native2ascii task through ant task it
generates some junk characters for portuguese characters e.g.
Input char: Efectuar liga��o
Output : Efectuar liga\ufffd\ufffdo
But if I run antive2ascii in commandline without through ant task
then it generates correct characters e.g.
Input char: Efectuar liga��o
Output : Efectuar liga\u00e7\u00e3o
which is correct.
Could you please help me to find out what is the problem when it runs
through ant task?
- BR

Hold on - do you have only languages that can actually be represented in 8859-1? If you do, then you don't even need to run native2ascii, since Java Properties files are expected to be in 8859-1: http://java.sun.com/javase/6/docs/api/java/util/PropertyResourceBundle.html
But if you support 12 languages I would think it highly likely that you actually do have languages that cannot be represented in 8859-1, such as Japanese, Chinese, Russian, etc. In that case you either need to use a different legacy encoding for each distinct script (and therefore run native2ascii with a different encoding option for each group of scripts) or use UTF-8 for all (allowing you to run native2ascii once, on all files. It shouldn't be difficult to choose the best option from those.
With respect to ensuring that the properties files are in the correct encoding: simply saving a file with one of the options you see in Notepad does not convert the file to the encoding in question, it simply saves the bytes as they are - Notepad assumes that you know what kind of text you have in your file. In other words, if you have a file encoded in 8859-1, open it in Notepad and then save it as UTF-8, the bytes in the file are still encoded as 8859-1, and treating it as UTF-8 will just result in errors.
Of the Notepad options 'ANSI' would be the option you would choose for a file encoded in 8859-1 if you are on a system where the default system codepage is 1252 - if you are on a system where the default system codepage is e.g. Russian, however, ANSI would result in the file being saved in the legacy Russian Windows codepage, however. 'ANSI' is a Windows invention, and simply means whatever non-Unicode codepage the Windows box happens to be running in.
To actually convert the file from one encoding to another you need to use a conversion tool such as iconv. In order to use those tools correctly you will still need to know what codepage the file is in, however. So you really need to keep track of the encoding of the files you are dealing with. The easiest way to do that is to always use UTF-8, and these days that is fairly easy.
There are tools that can help you, if you have somehow lost track of the encoding, or the file comes to you from unknown sources, etc. Encoding detection is part of ICU, for instance, and a good Unicode editor like SC Unipad will at least tell you whether the file is encoded in Unicode or not. But none of those methods are work 100%, so again, you are back to keeping track of the encoding yourself if you want to be sure.

Cannot find native2ascii.exe (JDK 1.5.0_05 on Windows)

Hi,
I am sorry if this is not the right forum for this issue, but I have a small problem:
I recently installed JDK 1.5 on my Windows XP box, and I need to use the native2ascii tool. For JDK 1.3 and 1.4, I have found this tool in the bin/ folder of the jdk installation. However, I cannot find the native2ascii tool in my JDK 1.5 installation, even though it is described in the documentation...
Any ideas?
Should I try re-installing the 1.5 JDK?

Never mind, I reinstalled as soon as I got the chance, and there it is, in the bin directory...

(Internationalization) - Unicode and Other ... Encoding Schemes

Hello,
I am developing a application that requires multiple languages
(Chinese/Japanese/English, French/German) support.
I plan to use utf-8 encoding, and not individual encoding for each language
like SHIFT_JIS for Japanese, BIG5 for Chinese etc.
This is more so because i would need to display multiple languages on the
same page, and allow the user to enter data in any language he/she chooses.
1. So, is the assumption that nothing but utf-8 can be used here, correct ?
2. If this is the case, why do people go for SHIFT_JS for Japanese or BIG5
for Chinese at all ? After the advent of Unicode, why cant they just use
utf-8.
3. I am using Weblogic 6. And my app is composed of JSPs alone at the
moment. It is working fine with utf-8 encoding, without me setting anything
at all in properties files etc. anywhere. I am getting data entered by user
in forms (in chinese/japanese etc) fine, and able to insert it into the
database and get it back too, without any problems.
So, why is it that people are talking of parameters to be set in properties
files to tell the app abt encoding being used etc.
4. My resource bundles are ASCII text files (.properties) which have name
value pairs. Hex Unicode numbers of the form /uXXXX represent the value. And
this works fine.
For example :
UserNameLabel = \u00e3\ufffd\u2039\u00e3
instead of -
UserNameLabel = ãf¦ãf¼ã
If the properties files have the original characters where values shud be
present, my java code is not able to read the name value pairs in Resource
Bundle.
Am i following the right approach ?
The problem with the current approach is after i create the Resource
Bundles, i must use native2ascii compiler to convert the characters into
their equivalent Hex Code values.
Thanks
JSB

charllescuba1008 wrote:
Unicode states that each character is assigned a number which is unique, this number is called code point. Right.
The relationship between characters and code points is 1:1.Uhm .... let's assume "yes" for the moment. (Note that the relationship between the Java type char and code point is not 1:1 and there are other exceptions ...)
Eg: the String *"hello"* (which is sequence of character literals) can be represent by the following Code Points
*\u0065 \u0048 \u006c \u006c \u006f*Those are the Java String unicode escapes. If you want to talk about Unicode Codepoints, then the correct notation for "Hello" would be
U+0048 U+0065 U+006C U+006C U+006F
Note that you swapped the H and e.
I also read that a certain character code point must be recognized by a specific encoding or else a question mark (?) is output in place of the character.This one is Java specific. If Java tries to translate some unicode character to bytes using some encoding that doesn't support that character then it will output the byte(s) for "?" instead.
Not all code points can be recognized by an encoding.Some encodings (such as UTF-8) can encode all codepoints, others (such as ISO-8859-*, EBCDIC or UCS-2) can not.
So, the letter *ל* would not be recognized by all encodings and should be replaced by a question mark (?) right?Only in a very specific case in Java. This is not a genral Unicode-level rule.
(disclaimer: the HTML code presented was using decimal XML entities to represent the unicode characters).
What you are seing is possibly the replacement character that your text rendering system uses to represent characters that it knows, but can't display (possibly because the current font has no character for them).

Weird character filter

I used to code in RedHat. Something happened (something nasty) and I changed to Mandrake to give it a try. Among my source file I had a Class with a method that filtered characters, some usual characters (dollar, punctuation) and some not so usual (yen, euro, dot in the middle of the line, long hyphen, long underline,...). When i reopened my source file in Mandrake, all of the characters in the second group had turned into "?\200" , "?\202\2004" and things of the sort, sometimes even worse. I guess this has to do with encoding and Mandrake, but it's not only a question of display, my classes (which have compiled for a long time without any problems) are now unable to go through this source file, due to the character stuff. Is there a way of going around this? I've tried to find out what character set those escape sequences come from, but I've been unable to find out.
Any ideas?
Here's (a version of) the method:
    private String filterDoc(String in){
     char c;
     char prec;
     StringBuffer out = new StringBuffer();
     for (int i = 0; i < in.length(); i++){
         c = in.charAt(i);
         if ( ( c >= 'a' ) && ( c <= 'z' ) ) out.append(c);
         else switch ( c ) {
         case '.' : {
          prec = in.charAt( i > 1 ? i-2 : i+2);
          if ( (prec == '.') || (prec == ' ') ) out.append(".");
          else out.append("\n");
         } break;
         case '��' : out.append('.');break; // this line : mid-height dot
         case ',' : out.append('\n'); break;
         case ';' : out.append('\n'); break;
         case ':' : out.append('\n'); break;
         case '?' : out.append('\n'); break;
         case '$' : out.append('\n'); break;
         case '��' : out.append('\n'); break; // this line : pound symbol
         case '�,B,(B' : out.append('\n'); break; // this line : euro symbol
         case '��' : out.append('\n'); break; // this line : yen symbol
         case '��' : out.append('\n'); break; // this line: copyright symbol
         case '*' : out.append('\n'); break;
         case '-' : out.append('-'); break;
         case '��$(D' (B: out.append('-'); break; // this line : hyphen
         case '_' : out.append('-'); break;
         default : out.append(c); break;
     return out.toString();
    }I say it's 'a version' of the method, because in windows it looks just like in mandrake, and when I copy paste it here I can see 'some' things have changed, for instance, I see squares (or black squares) up here, but I don't see any squares on emacs, I get the \20something.

The Java compiler uses the platform's default character encoding to read source files, so for portability you shouldn't write non-ASCII characters directly, instead you should use the \uXXXX unicode escape sequences, for instance \u20AC for euro.
It appears that in RedHat your default encoding was UTF-8 in which the characters you mention are encoded with multiple bytes, and when you transferred to Mandrake the default encoding changed and now it's some encoding where each character is encoded with a single byte.
There are a number of things you can do:
o Convert the source files to use unicode escape sequences. The easiest way is to use the native2ascii tool that comes with the SDK: run this command to all your source files:    native2ascii -encoding UTF-8 inputfile outputfileAlso, if you are looking for the escape sequence of a particular character you can find it from http://www.unicode.org/charts/
o Continue using UTF-8 and pass the option -encoding UTF-8 to the compiler:    javac -encoding UTF-8 MyClass.javao Change the platform's default character encoding by modifying locale settings, for instance     export LANG=en_US.UTF-8You can use the command "locale -a" to list all available locales.

Native2ascii

Similar Messages

Maybe you are looking for