Supported Encodings

Hi,
I found the following statements in the internationalization tutorial that I need help understanding:
"The list of supported character encodings is not part of the Java programming language specification. Therefore the character encodings supported by the APIs may vary with platform. To see which encodings the Java Development Kit supports, see the Supported Encodings document."
So do they mean that the character encodings supported in Unix are different from those supported on Windows and So on? If so isnt the next statement contradicting it? If supported encodings is not part of java programming language what is the meaning of the encodings that JDK supports?
Thanks
Pratima

http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
This link should explain things for you.
Mark

Similar Messages

Find supported encodings at runtime

Is it possible to find out the list of characters encodings (UTF-8, cp1252) supported by the JVM dynamically at runtime?
I am aware that there is a list of supported encodings listed on the Sun site for Sun's JVM. However, I also know that the supported encodings can vary depending upon the implementation of the JVM. Is there a method that would allow me to get all the supported encodings at runtime? Thanks :)

Try following code:
        System.out.println("default: " + java.nio.charset.Charset.defaultCharset());
        SortedMap<String,java.nio.charset.Charset> map = java.nio.charset.Charset.availableCharsets();
        Collection<java.nio.charset.Charset> coll = map.values();
        Iterator<java.nio.charset.Charset> it = coll.iterator();
        while(it.hasNext())
            System.out.println(it.next());

Problem with Character Set after upgrade

Hello,
I have a probelm and was wondering if anyone has seen this before.
I have been running Java 1.4.2_13 for a while now on some Windows servers. About a month ago we upgraded to Java 1.6.0_12.
We are reading input files that are in the character set Big5 HKSCS. After the upgrade our application started to report certain characters as invalid. Some of the characters are 0xD843, 0xD844.
These should be valid characters and were in the previous version of Java (1.4.2_13).
Has anyone seen this?
Thanks in advance,
Jerry

The [Supported Encodings|http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html] page for the Java 6 release describes that charset as
Big5 with Hong Kong extensions, Traditional Chinese (incorporating 2001 revision)You would know more than I do about that 2001 revision (or at least, not less than I do, since I know nothing about it). Perhaps that's the source of your problem?
Edit: especially since the [Supported Encodings|http://www.j2ee.me/j2se/1.4.2/docs/guide/intl/encoding.doc.html] page for Java 1.4 doesn't mention the 2001 revision.

Reading UTF-7 XML document in Java

I have an XML document and I want to add PrcessingInstruction to it.
<?xml version="1.0" encoding="UTF-7" ?>
<root_element>
<sections>
<section_introduction>
<order-no>108800674</order-no>
<order-type>219</order-type>
<created-date>5. november 2008</created-date>
when trying to parse it like
File tmpFile = new File("C:\\ProvisioningEventError58412.xml");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
Document doc = factory.newDocumentBuilder().parse(tmpFile);
System.out.println("end");
I get an exception on Document doc = factory.newDocumentBuilder().parse(tmpFile);
org.xml.sax.SAXParseException: Invalid encoding name "UTF-7".
[Fatal Error] ProvisioningEventError58412.xml:1:40: Invalid encoding name "UTF-7".
Is it possible to set encoding on DocumentBuilder, so it can work with UTF-7 or is there any other was to parse it for adding ProcessingInstruction. Actually I want to add a CSS/XSL file for formatting the XML File.
Edited by: aamaam on Jul 15, 2009 5:18 AM

As you will see from [this Supported Encodings document|http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html], it's Java which doesn't support UTF-7, not just DocumentBuilder.
So you have a couple of options. You could send the document back to whoever produced it and tell them that you can't process it the way it is. That's a legitimate thing to do because [the XML Recommendation|http://www.w3.org/TR/REC-xml/] only requires parsers to support UTF-8 and UTF-16. Support for other encodings is optional.
Or you could write your own subclass of InputStreamReader which converts a stream of UTF-7 bytes into a stream of chars, and use that as a Reader which you pass to the parser. Somebody may even have done that already and posted it on the web somewhere.

Array of bytes..

hi..
how can i convert array of bytes to string and vise versa
thanks

You can convert the string to a char array with String.toCharArray(), but if you want them as a byte array you have to first decide what character encoding should be used with the conversion. You can e.g. use "8859_1" if you only work with characters in the range 0 to 255. Then it would be String.getBytes("8859_1"). You have to add a try-catch block. It will throw an exception if you use an unsupported encoding. ISO 8859-1 is one of the basic supported encodings, so you will not get any exceptions when you use this.
To go the other way, you use one of the String constructors.

Some japanese character encoded to "?". Please help me..

My system is below listed.
J2SE 1.4.1
MySql Ver 11.18 Distrib 3.23.52,
Resin 2.1.4
Java application load html document encoded 'SHIFT_JIS' using HtmlURLConnection.
And read the document in 'SHIFT_JIS'.
Almost it appears properly but some of characters printed in '?'.
hm....
I will show my source code.
private String readDocument(URL url) {
String METHOD_NM = ".readDocument()";
try {
HttpURLConnection URLCon = (HttpURLConnection)url.openConnection();
BufferedReader in = new BufferedReader( new InputStreamReader(url.openStream(), "SJIS"));
String inputLine;
On my web server, input the character(printed '?') on textbox in IE6(japanese language pack). and submit. then the character translated the code(ex> #4575; )
The code inserted DB. select from DB the tuple.
and display the IE6. it's ok.
but loaded from japanese html. It's inserted to DB '?'.
and displayed '?' on IE6.
I want to translate the character to code(ex> #5455;).
OK?
Please reply...
p.s I'm sorry my poor english..

Thank you for reply.
but I've already tried that.
I have tried all japanese encoding of "Supported Encodings" from java.
http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html
I want to convert that.
ex)
?fe? => & # 4532; fe & # 3456; => original charcater is removed blank
This is convert '?' to code number.
In this case '?' is Japanese character.
Please let me know the way..

Custom Installation for non European languages

Hi,
While installing JRE (particularly 1.4.2_XX ) in custom mode, second option enables user to support for additional languages.
The disk size requirement is 18 MB but after installation i figured out that it takes ~8 MB.
As per documentation at http://java.sun.com/j2se/1.4.2/docs/guide/intl/locale.doc.html#jfc-table , it states that JDK on windows has language support installed as a default but for JRE custom installation is required.
I have following question related to this installation
- What files are updated with this installation? I found few files e.g. charset.jar , font files are installed within JRE installation directory.
- Does this installation updates/adds any Windows System files/font/registry?
The question has arrived as our Swing application hangs with JRE and works fine with on JDK on event of inserting thai character into JTextPane.
Any help will be highly appreciated!!!
Thanks,
Advait

Hello Advait,
The custom feature you are refering to is related to this RFE:
4508848 "Small" J2RE for Windows should include Western European languages
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508848
It implies selecting "support for additional languages" would include lib/charset.jar and lib/ext/localedata.jar.
This feature was also described in http://java.sun.com/j2se/1.4.2/changes.html
Internationalization
The following changes have been made to internationalization functionality in J2SE 1.4.2.
* Installation of the J2RE on Windows has changed, with impact on the supported locales. There now is a single installer, which by default installs a runtime with support for European languages if the Windows host system only supports such languages, or a runtime with support for all languages if the Windows host system supports at least one non-European language. Users can request installation of additional languages in a custom setup. For complete information on which locales and encodings are supported with which installation, see the Supported Locales and Supported Encodings documents. For general information, see internationalization.
Regarding your second question, I received the answer from our install engineer:
Yes, the installer creates many registries, and we copy java.exe/javaw.exe/javaws.exe to the Windows system directory.
I am not sure why your Swing application hangs with JRE. Did you select the "support for additional language" option during installation? (I assume you did, correct?)
thanks,
miko

Java.io.UnsupportedEncodingException: KOI8-U

I'm getting a host of exceptions from my bounce mail processor.
Does anyone have any link or book recommendations where I can find out more about this? Or suggestions?
14:58:01,130 ERROR [BouncedMailJob] Unknown encoding type: null
java.io.UnsupportedEncodingException: KOI8-U
        at sun.io.Converters.getConverterClass(Converters.java:218)
        at sun.io.Converters.newConverter(Converters.java:251)
        at sun.io.ByteToCharConverter.getConverter(ByteToCharConverter.java:68)
        at sun.nio.cs.StreamDecoder$ConverterSD.<init>(StreamDecoder.java:224)
        at sun.nio.cs.StreamDecoder$ConverterSD.<init>(StreamDecoder.java:210)
        at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:77)
        at java.io.InputStreamReader.<init>(InputStreamReader.java:83)
        at com.sun.mail.handlers.text_plain.getContent(text_plain.java:64)
        at javax.activation.DataSourceDataContentHandler.getContent(DataHandler.java:745)
        at javax.activation.DataHandler.getContent(DataHandler.java:501)
        at javax.mail.internet.MimeMessage.getContent(MimeMessage.java:1342)
        at com.imc.quartz.jobs.BouncedMailJob.dumpPart(BouncedMailJob.java:540)
        at com.imc.quartz.jobs.BouncedMailJob.execute(BouncedMailJob.java:260)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:195)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:520)Thanks!

I found a "Supported Encodings" page in the documentation for my Java installation. It says that Java 1.4 supports KOI8-R but not KOI8-U. So the Ukrainians are left out again :)
I had similar problems when I had to read e-mail messages encoded in UTF-7, which apparently is not uncommon in e-mail but still not supported by Java. So I had code like this:ContentType ct = new ContentType(p.getContentType());
String charset = ct.getParameter("charset");
Object o = null;
// If the charset is UTF-7, call the method for converting the bytes to a String.
// We have to do this because UTF-7 isn't a supported charset in Java.
if ("UTF-7".equalsIgnoreCase(charset) || "UNICODE-1-1-UTF-7".equalsIgnoreCase(charset)) {
InputStream in = p.getInputStream();
ByteArrayOutputStream out = new ByteArrayOutputStream();
int c;
while ((c = in.read()) != -1) {
    out.write(c);
ByteToCharUTF7 btc = new ByteToCharUTF7();
byte[] bytes = out.toByteArray();
char[] chars = new char[bytes.length];
int realLength = btc.convert(bytes, 0, bytes.length, chars, 0, chars.length);
o = new String(chars, 0, realLength);
} else {
o = p.getContent();
}Where that ByteToCharUTF7 is a class that extends sun.io.ByteToCharConverter and overrides a method whose signature ispublic int convert(
byte[] bytes, int byteStart, int byteEnd, char[] chars, int charStart, int charEnd)
throws sun.io.ConversionBufferFullException, sun.io.UnknownCharacterExceptionThe purpose of that method is to take an array of bytes and decode it to an array of chars, using (in your case) the rules for decoding KOI8-U. You would have to write that yourself, but if they are similar to the rules for KOI8-R then perhaps you could use KOI8-R to help you.
I believe that in Java 5 there is a way to insert new charsets into the system, so if you are using Java 5 then you should investigate that. You would still have to write the charset decoder yourself, though, as Java 5 still doesn't support KOI8-U. Unless you can find it already done on the Internet.

"error opening file" on html files

I have an existing site with several html files that edge code displays an "error opening file... the file could not be read" alert on. These html files display fine, and open in my other text/html editors.

Yep -- unfortunately it's a known issue that Edge Code and Brackets don't support encodings other than UTF-8 yet. Please add your vote here: https://trello.com/c/DShzQztM/731-support-non-utf-8-encodings
- Peter

How to use native2ascii tool?

Hi Friends,
I created one property file in english named string.property for my application. I need to convert that property file for different languages like chinese, arabic, japanese, spanish, etc.
I found native2ascii too is use to convert propert file. But I have no idea how to use it. can any one plese explain me by giving proper example if possible.
I would appreciate your help for this.
Thank you

native2ascii [options] [inputfile [outputfile]]
-reverse
Perform the reverse operation: convert a file with Latin-1 and/or Unicode encoded characters to one with native-encoded characters.
-encoding encoding_name
Specify the encoding name which is used by the conversion procedure. The default encoding is taken from System property file.encoding. The encoding_name string must be taken from the first column of the table of supported encodings in the Supported Encodings document.
-Joption
Pass option to the Java virtual machine, where option is one of the options described on the reference page for the java application launcher. For example, -J-Xms48m sets the startup memory to 48 megabytes.

Recording live vocals using loopback

I would like to record live vocals through mainstage2 using the loopback plugin. Hoping to record multiple loops to play around with layering vocals and harmonies. Is this possible? I'm a brand new mainstage user and have no idea how to do this. If this is at all possible, can someone give me the step-by-step process on how to set this up?

Firstly, I have no ideal whether you want to perform a live broadcast. If so, you may not need to save the content as a file, but you will need a rebroadcast server.
Ok, for your encoding problem, if you are using JMF to implement your application, please notice that JMF does support a limited set of AVI payload/encodings. If the encoding doesn't match with the playing client, then obviously it can't be played. Try go to JMF page to see the supported encodings you can choose.
http://java.sun.com/products/java-media/jmf/2.1.1/formats.html

Char set conversion to ASMO 708, KOI8-U, x-IA5-German

hi,
I am doing some charset conversions to support various char sets widely used on the internet. The Character Sets I am supposed to support also include
"iso-8859-8-i", "ASMO-708", "koi8-u", "x-IA5-German".
However I don't find the canonical names for these encodings in the list of Supported Encodings by java.
I checked the i18n.jar but the ByteToChar classes for these encodings were not there. Is it possible to develop your own ByteToChar classes. If yes how?
Can anybody advise how to proceed forward with this?
Thanks,
Khurram

Hi Khurram,
You could catch the exceptions thrown by the String(byte[] bytes, String encoding) constructor and by thegetBytes(String encoding) method and then dispatch to your own encoding/decoding Code. Something like the following.
Post back if you need more elaboration.
Regards,
Joe
public class MyConverter {
        public static String makeString ( byte[] bytes, String encoding)
        throws UnsupportedEncodingException {
          String resultString= null;
        try {
              resultString= new String (bytes, encoding);
               return resultString;
          catch      (UnsupportedEncodingException e1){
               try {
                    resultString = convertBytes (bytes, encoding);
                    return resultString;
               catch (UnsupportedEncodingException e2) {
                    throw new UnsupportedEncodingException();
        public static String convertBytes ( byte[] bytes, String encoding)
        throws UnsupportedEncodingException {
          String resultString= null;
          if (encoding.equalsIgnoreCase ("x-IA5-German")) {
               resultString = bytesToStringx_IA5_German (bytes);
               return resultString;
          } else {
               throw new UnsupportedEncodingException();
        public static String bytesToStringx_IA5_German ( byte[] bytes) {
          //...here's where you put the actual decoding Code
}

Char Set Conversion

hi,
I am doing some charset conversions to support various char sets widely used on the internet. The Character Sets I am supposed to support also include
"iso-8859-8-i", "ASMO-708", "koi8-r", "Johab".
However I don't find the canonical names for these encodings in the list of Supported Encodings by java.
Can anybody advise how to proceed forward with this?
Thanks,
Khurram

First, look for a file called "i18n.jar" in your JVM or JRE installation. If you don't have that, you will have to get it from Sun's downloads. Once you have it, look inside it (using any zip utility). You will find a large number of files that have names like "ByteToCharBig5.class"; each of them defines a supported encoding, in the case of that example it would be "Big5". It's probably possible to create your own encoding if you don't find it in that list, but I don't know how. You might find out by searching the Internationalization forum.

I/o streams and charsets....

Sorry for posting this in two forums but I got no response in the reg java forum....either my question is too advanced or no one knows??
I was wondering if anyone could help clairify something for me. I don't really know how the character sets work so by all means let me know if my thinking is all wrong. Sorry in advance for long post.
Say user 1 has a system default charset of ASCII. They write a message in a JTextArea and hit the save button. The pgm calls JTextArea.write(myFileWriter) which saves the text to a file (using system default charset). They send the file to another user whose default charset is UTF-16. If the pgm simply loads the file into the JTextArea using JTextArea.read(myFileReader), wouldn't the text message get jumbled up? The UTF-16 machine would be reading two bytes per character when in fact the file was written out as 1 byte per character. Same is true the other way around. When the ASCII user loaded a UTF-16 file, it would treat each byte as 1 character when in fact two bytes represent one character. That is where the confusion is.
The only way I could see to control this was to have a rule that says the files will always be in a specific format, say ASCII? Then before writing the contents to file, I would call String.getBytes("ASCII") on the of the JTextArea -- when doing this on the UTF-16 machine, I assume if it encountered a char whose value was > 255 it would simply convert it to some char like "?" whose value was <= 255 so it would fit in 8 bits? Then write that byte[] to the output stream.
Then to load the file, instead of using JTextArea.read(), I would have to read the bytes into a byte array then create a new String using String(byte[], "ASCII") and pass that to the JTextArea?
Any dropped information from the UTF-16 file would simply show up as "?" on the ASCII machine. On the UTF-16 machine, everything would look fine? No double spaced characters or such?
Is there another way? Anyone????
Jim

wouldn't the text message get jumbled up?Yes, I agree
I assume if it encountered a char whose value was > 255 it would simply convert it to some char like "?" whose value was <= 255 so it would fit in 8 bits?Values in the ASCII character set are 7 bits. When a Unicode character is converted to an ASCII character, a value > 127 is converted to 63. 63 == �?�.
Any dropped information from the UTF-16 file would simply show up as "?" on the ASCII machine.Yes
I assume then that my thinking is correct...unless there is some conversion going on, the text will not show up correctly on the different machines?Yes
files will be written out/saved in a predefined format: 8 bits per character.ASCII characters are 7 bits, whereas ISO-8859-1 (ISO Latin Alphabet No. 1, a.k.a ISO-LATIN-1) are 8-bits. You might consider using the ISO-8859-1 character set. According to the Charset API documentation, every implementation of the Java Platform is required to support US-ASCII and ISO-8859-1.
FYI, I found the following document called Supported Encodings in the SDK, C:\j2sdk1.4.0\docs\guide\intl\encoding.doc.html
but I am also curious about the actual conversion process from unicode to bytes? Is my thinking correct in that if the unicode character is too "large" to fit in 8 bits (byte) the system just defaults to some byte value?I haven�t read the source code, but according to my experiments, when I convert from Unicode to ASCII, values > 127 are converted to 63.
Also, according to my experiments, if a byte has a value between 0x80 and 0xff (unsigned 128 and 255), an attempt to convert from ASCII to Unicode results in a value of 65533 (no typo here, not 65535).
When responding to this thread, I had to make some simple programs to simulate writing/reading sending/receiving and converting character sets to make sure what I am saying is correct. Here is an idea for your own inquiries.
import java.nio.charset.Charset;
import java.io.*;
class Test {
    Charset cs = Charset.forName("ASCII");
    void m() {
        char c = (char)('x' + 20); //120 + 20 > 127
        System.out.println((int)c);
        ByteArrayOutputStream bout = new ByteArrayOutputStream();
        OutputStreamWriter out = new OutputStreamWriter(bout, cs);
        try {
            out.write(c);
            out.flush();
        } catch (IOException e) {
            return;
        byte[] b = bout.toByteArray();
        System.out.println(b[0]);
        ByteArrayInputStream bin = new ByteArrayInputStream(b);
        InputStreamReader in = new InputStreamReader(bin);
        try {
            int i = in.read();
            System.out.println((char)i);
        } catch (IOException e) {
            return;
    public static void main(String[] args) {
        new Test().m();
}Also, I found this helpful.
import java.nio.charset.Charset;
import java.util.SortedMap;
import java.util.Set;
import java.util.Iterator;
class Test {
    void m() {
        SortedMap m = Charset.availableCharsets();
        Set s = m.keySet();
        Iterator it = s.iterator();
        while (it.hasNext()) {
            System.out.println((String)it.next());
    public static void main(String[] args) {
        new Test().m();
}And this.
import java.nio.charset.Charset;
import java.util.Set;
import java.util.Iterator;
class Test {
    void m(String name) {
        Charset s = Charset.forName(name);
        System.out.println("display name= " + s.displayName());
        Set aliases = s.aliases();
        Iterator it = aliases.iterator();
        while (it.hasNext()) {
            String x = (String)it.next();
            System.out.println("alias= " + x);
    public static void main(String[] args) {
        if (args.length > 0) new Test().m(args[0]);
}

Adding support for non-typical encodings (charsets)

Hi,
I need to support encoding that are not included in standard charset lib lib/charsets.jar (in J2SE 5.0). For instance gb_2312-80 which is some chinese and it's valid charset from IANA Charset Registry.
Can any one help me to know where to get matadata for with charset definition and what I need to do to include it in my app? Thanks.

Native menus encompass more than just the Mac application
window and the MS Windows window menu. You can display a native
menu anywhere on the stage using the NativeMenu class display()
method. This sounds like it would do at least most of what you
want.
Also, in AIR, the context menus do not have built-in items.
The context menu will not even display by default.

Supported Encodings

Similar Messages

Maybe you are looking for