Java / Linux - why ??? instead of non-ascii symbols

Hi All,
I've run into the problem during creating a new file from Java program on Linux gentoo.
All non-ascii characters in the file name are converted into ??? symbols.
I've tried to run the program with -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8, and set LC_ALL='UTF-8' - but the problem still exists.
At the same time I can create the files with non-ascii characters in the file manager.
Could you advise how can I fix this issue?
Regards,
Boris

Is it related to File name or File content ?
If it is file name related then Linux might have restriction on File name character !!.
About File Content : Make one File reader program using Java and check are you able to read it back. if yes then It will be fault of your FileManager which is not correctly interpret content of your file.

Similar Messages

How to enter non-ASCII symbols in a JTextArea

A user asked me a question today that I have been unable to answer from a Google search or review of the Java documentation. If an application has a JTextArea and the user operating an American-style keyboard wishes to enter a non-ASCII character (a tilde, an accented e, an umlaut, a copyright symbol, etc.), are there key sequences that will allow him to do so? I was thinking that there might be some kind of alt key combination that would let him enter unicode values. Is anything like that supported?

Thanks. I'd not heard of this technique.
Throwback that I am, I'm running under Windows-XP. My results were a bit spotty, I'd be interested in hearing if anyone has different results.
Microsoft describes two techniques for enterring "special" characters my typing numeric values. The best description I found was at http://www.windowsvistasecret.net/secret.asp?haber=54 Also, there is a page with a nice table of symbol values at http://www.irongeek.com/alt-numpad-ascii-key-combos-and-chart.html (the author recommends using alternate key codes for passwords, but given my mixed success with the technique, that seems a bit risky).
Anyway, the first method is to turn the num lock on and then enter ALT followed by a three digit decimal code value for the character string (you must enter the values on the number keypad, not the main keypad). This seems to work great in a Java text field. There is an alternate method, where you can specify a hexadecimal unicode value by typing a string in the form U+00A9 and then pressing ALT-X. This requires a special registry setting which may not be the default on you computer. For me, at least, it works in WordPad, but not in the JTextArea.
Finally, I think that the engineer at Microsoft who decided to name this technique the Alt<+>xxx key sequence had a pretty twisted sense of humor. You can guess what a Google search yields for "ALT XXX".
Tomorrow, when I have access to a Linux machine, I'll have to revisit this. Thanks again for all your help.
Gary

Dynamo Admin response is cut off when get non-ASCII symbol

I use Dynamo Admin to get information of assets. And when I try to print some item in the ContentRepository (item should contain non-ASCII character in the property value) the response is cut off at this character. how to solve this problem?
(as example I have added property with value "€" in BCC)
ATG 9.2
JBoss-4.3.0
Thanks.
Edited by: user9140564 on Dec 22, 2011 3:35 AM

Hi,
I am not sure there is a solution for that. Basically dyn/admin does not work well with special characters using print-item and add-item. This should be done directly on the database. Dyn/admin is more of a "comfort" tool.
If you are doing this to update your DB dynamically without restarting the instance, you should avoid if you will be using some special characetrs... believe me... we work in french :-)
Cheers,
Vina.

Non ASCII characters are converted to '?' or ASCII characters

Non ASCII symbols like æ ø in the xml file have been converted to ? or other ascii characters.
What could be the reason behind this.

Mayil wrote:
This file we are loading through the Flex application in the front end.
Through java class file we are making changes to this city.xml file and adding and deleting this information in the city.xml.
Now suddenly, i dont know what happen.. 'ø' in the city name has replaced with the '?'
If we try to chaange this to 'ø' also, it again changes to '?'.
I dont know how to rectify this error.I would suggest you start by finding out when it happens. Does it happen as soon as you change the XML through this mysterious "java class file"? Or does it happen when Flex reads it? And is the underlying file actually changing, or are you just seeing those question marks after Flex handles the file?
In short a much better problem description is necessary.

Java 5, Linux, 64-bit: Non-ASCII chars over socket

Hi,
I am having issues with reading non-ASCII chars from a socket. I send a mixed message, with the first part in ASCII and the last bit in non-ASCII. There are no issues with reading the non-ASCII characters on Windows. However, when I try running the server on Linux. The following is a message sample:
Start message<CRLF>
фвафывафвыафыв<CRLF>
The second part (which is encoded in either Windows-1250 or KOI8-R), comes out as 3F (when you look at the bytes) on Linux.
Any suggestions?
Thanks,
Max

You must be using Readers and Writers, and you need to make sure you specify the same charsets when constructing them. Don't leave this to the default, as this seems to vary across platforms and definitely has varied across releases.

Problems with non-ASCII characters on Linux Unit Test Import

I found a problem with non-ASCII characters in the Unit Test Import for Linux. This problem does not appear in the Unit Test Import for Windows.
I have attached a Unit Test export called PROC1.XML It tests a procedure that is included in another attachment called PROC1.txt. The unit test includes 2 implementations. Both implementations pass non-ASCII characters to the procedure and return them unchanged.
In Linux, the unit test import will change the non-ASCII characters in the XML file to xFFFD. If I copy/paste the the non-ASCII characters into the Unit Test after the import, they will be stored and executed correctly.
Amazon Ubuntu 3.13.0-45-generic / lubuntu-core
Oracle 11g Express Edition - AL32UTF8
SQL*Developer 4.0.3.16 Build MAIN-16.84
Java(TM) SE Runtime Environment (build 1.7.0_76-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)
In Windows, the unit test will import the non-ASCII characters unchanged from the XML file.
Windows 7 Home Premium, Service Pack 1
Oracle 11g Express Edition - AL32UTF8
SQL*Developer 4.0.3.16 Build MAIN-16.84
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
If SQL*Developer is coded the same between Windows and Linux, The JVM must be causing the problem.

Set the System property "mail.mime.decodeparameters" to "true" to enable the RFC 2231 support.
See the javadocs for the javax.mail.internet package for the list of properties.
Yes, the FAQ entry should contain those details as well.

Can Java be started in a directory that contains non ascii char

I installed a product developed using Java in a folder whose name contains non-ascii chars, such as Japanese chars or german chars.
This will cause error said: unable initialise java virtual machine, error code -1
Some one said Java doesn't like being started in a directory that contains non ascii characters. There appears to be no way of passing it unicode parameters.
Is there anyone once hit the similiar issue or know the root cause of such problem?
Thanks

Yes you can use your web start application console. To enter data which is required for your application it is better idea to use java application which runs in console mode althou you may try to run console of windows and then read data from its input stream.

Does Java 5 accept non-ascii chars as identifiers?

I am surprised to find out that Java 5.0 accept non-ascii chars as identifiers.
Is it true that Java 5 really accept non-ascii chars?
Thanks.

Here is the code:
public class non��name {
private static void ��(){
    System.out.println("this is called from a function with a chinese name!");
    int ��1 = 1, ��2 = 2;
    System.out.println("��1 = " + ��1 + ", ��2=" + ��2 + ", �� = " + (��1+��2));
public static void main(String[] args) {
    ��();
}

Why does non-ASCII text display improperly?

One of the things that has long baffled me about OS X is the occasionally improper display of text on web sites. Sometimes, though less than before, the Mac still can't properly diplay non-ASCII characters. Today, for instance, I bought a GPS from Amazon, and the word nüvi has junk characters where the umlaut "ü" should be, as the text image below should show. Why is this? Is there a setting that corrects the problem?

Hi Yawder, do you want to file a bug report on the problem that when Firefox generates the faux bold face for Droid Sans Mono it is doing a bad job compared with other browsers?
You can submit that here: https://bugzilla.mozilla.org/

GSSName is corrupted for non ascii chars

Hi,
I have a setup where a web application is deployed to use SPNEGO for user authentication ( using kerberos V ) and authorization.
We have several users with non english characters in the user ID and even though kerberos authentication succeeds for such users ( KDC / Active Directory is returning valid kerberos ticket which the client embeds in the SPNEGO token). Hoowever, on passing the SPNEGO token to GSS API and extracting the user name from GSS API returns incorrect user name. All non ascii characters in the user name are replaced with some junk byte sequences.
We use JGSS API (with JRE 1.4.08) for extracting the SPNEGO token and create a GSS secruity context object. Later, the GSS Name is extracted from the GSS context object.
Currently I am tesitng the SPNEGo authentication for a user with user ID 123<sp char> . The <sp char> 's unicode value is FE and UTF-8 encoded byte sequence is C3 BE. However, if I invoke 'export' method of the GSSName object and examine the returned byte sequence, instead of C3 BE, the byte sequence EF BF BD EF BF BD is present. The byte sequence for other english characters are proper.
Is this a defect in GSS-API ? Or am I not using GSS properly?
Do I need to have any special setup / configuration for using JGSS with kerberos V for users with non ascii characters in the user ID?
Please advise.
Regards,
Jayaram.
Message was edited by:
s_jayaram_s

I understand that this is an older thread.
We have spent lot of time on the internet to find out any possible workarounds or permanent solutions to enable Multi-Byte character support for username / password / SPN with Kerberos and Java . But no luck so far :(
Are there any new updates on this i18n issue ?
Thanks,
Venkatesh

Unicode value of a non-ASCII character

Hi,
Suppose, the unicode value of the character ् is '\u094d'.
Is there any Java function which can get this unicode value of a non-ASCII character.
Like:
char c='्';
String s=convertToUnicode(c);
System.out.println("The unicode value is "+ s);
Output:
The unicode value is \u094d
Thanks in advance

Ranjan_Yengkhom wrote:
I have tried with the parameter
c:\ javac -encoding utf8 filename.java
Still I am getting the same print i.e. \u3fIf it comes out as "\u3f" (instead of failing to compile or any other value), then your source code already contains the question mark. So you already saved it wrong and have to re-type it (at least the single character).
>
Then I studied one tutorial regarding this issue
http://vietunicode.sourceforge.net/howto/java/encoding.html
It says that we need to save the java file in UTF-8 format. I have explored most of the editors like netbean, eclipse, JCreator, etc... there is no option to save the java file in UTF-8 format.That's one way. But since that is so problematic (you'll have to remember/make sure to always save it that way and to compile it using the correct switch), the better solution by far is not to use any non-ASCII characters in your source code.
I already told you two possible ways to achieve that: unicode escapes or externalized strings.
Also read http://www.joelonsoftware.com/articles/Unicode.html (just because it's related, essential information and I just posted that link somewhere else).

Non-ASCII chars in applets?

hi,
Spent 4 hours to find a way to use non-ASCII chars in applets (buttons, textareas), but didn't make it.
Simply saying
TextFieldObj.setText("\uxxxx");
//or any equivalent obj. Ex. of \uxxxx: \u015F
doesn't work. I even went into Graphics.paint() example, but it too can paint only ASCII chars.
My hunch is that it is smt. about Character.Subset but i still can't figure out how to do it.
Please SOS,
Reshat.

Hi,
I just managed to get Buttons to show Greek characters, so it appears that static buttons are fine.
However, i still face the same problem for TextField's:
TextFields work fine for IE, but in NN they sometimes convert into ASCII and sometimes give ? The same in HotJava.
So there are 2 questions in my head:
1. why can't NN use the fonts used by IE to display Non-ASCII chars?
2. What is the safest font to use for Non-ASCII chars, to cover the widest possible audience.
P.S. Java solves most cross-platform-browser problems, but the font issue still seems to be dependent on a user and his/her browser. It appears Java is not font-independent in non-ASCII context. If so, it would be nice to develop a plug-in to make sure that if the user doesn't have the font, then a Java-standardized Unicode-based font is used. Otherwise, non-ASCII world is still w/o a real Java.)
Thank you for your feedback,
Reshat.

Can't get the attachment filename out of a Part (with non ascii characters)

Hello, all and happy new year :)
My issue is with non ascii filename in attachments... Yes i've read the FAQ : http://www.oracle.com/technetwork/java/faq-135477.html#encodefilename
I can't get the filename out of the BodyPart for those kind of attachments
here's my unit test :
     * contains various parts from various mailer encoded in different ways...
     private enum EncodedFileNamePart{
          OUTLOOK("Content-Type: text/plain;\n name=\"=?iso-8859-1?Q?c'estd=E9j=E0no=EBl=E7ac'estcool.txt?=\" \nContent-Transfer-Encoding: 7bit\nContent-Disposition: attachment;\n filename=\"=?iso-8859-1?Q?c'estd=E9j=E0no=EBl=E7ac'estcool.txt?=\" \n\nnoel 2010\n","c'estdéjànoëlçac'estcool.txt"),
          GMAIL("Content-Type: text/plain; charset=US-ASCII; name=\"=?ISO-8859-1?B?ZOlq4G5v62znYWNlc3Rjb29sLnR4dA==?=\"\nContent-Disposition: attachment; filename=\"=?ISO-8859-1?B?ZOlq4G5v62znYWNlc3Rjb29sLnR4dA==?=\"\nContent-Transfer-Encoding: base64\nX-Attachment-Id: f_giityr5r0\n\namluZ2xlIGJlbGxzIQo=\n","déjànoëlçacestcool.txt"),
          THUNDERBIRD("Content-Type: text/plain;\n name=\"=?ISO-8859-1?Q?d=E9j=E0no=EBl=E7acestcool=2Etxt?=\"\nContent-Transfer-Encoding: 7bit\nContent-Disposition: attachment;\n filename*0*=ISO-8859-1''%64%E9%6A%E0%6E%6F%EB%6C%E7%61%63%65%73%74%63%6F;\n filename*1*=%6F%6C%2E%74%78%74\n\njingle bells!\n","déjànoëlçacestcool.txt"),
          EVOLUTION("Content-Disposition: attachment; filename*=ISO-8859-1''d%E9j%E0no%EBl.txt\nContent-Type: text/plain; name*=ISO-8859-1''d%E9j%E0no%EBl.txt; charset=\"UTF-8\" \nContent-Transfer-Encoding: 7bit\n\njingle bells\n","déjànoël.txt"),
          String content=null;
          String target=null;
          private EncodedFileNamePart(String content,String target){
               this.content=content;
               this.target=target;
          public Part get(){
               try{
               ByteArrayInputStream bis = new ByteArrayInputStream(this.content.getBytes());
               Part part = new MimeBodyPart(bis);
               bis.close();
               return part;
               catch(Throwable e){
                    return null;
          public String getTarget(){
               return this.target;
     @Test
     public void testJavamailDecode() throws MessagingException, UnsupportedEncodingException{
          System.setProperty("mail.mime.encodefilename", "true");
          System.setProperty("mail.mime.decodefilename", "true");
          for(EncodedFileNamePart part : EncodedFileNamePart.values())
               assertEquals(part.name(),MimeUtility.decodeText(part.get().getFileName()),part.getTarget());
I take a NullPointerExcepion in the decodeText because getFileName() return null for the EVOLUTION case, and work well with OUTLOOK, THUNDERBIRD and GMAIL.
Evolution's content type is "Content-Disposition: attachment; filename*=ISO-8859-1''d%E9j%E0no%EBl.txt" wich doesn't look like the other (looks like the RFC 2616 or RFC5987 to do it.)
How can i handle this situation except by writting my own decoder?
Thanks for your answers!
Edited by: user13619058 on 4 janv. 2011 07:44

Set the System property "mail.mime.decodeparameters" to "true" to enable the RFC 2231 support.
See the javadocs for the javax.mail.internet package for the list of properties.
Yes, the FAQ entry should contain those details as well.

Non ascii characters being sent from a parameter in a form

Hi!
I have seen many topics posted on passing non ascii characters through parameters from one servlet to another and converting them into whatever format is necessary.
However, I have not seen anyone answer the following question. I have a jsp page (html) with the character encoding set to utf-8. The user inputs some data in to a text field which is inside a form. The data could be in non ascii characters such as hebrew or arabic. This form is then sent to another jsp where i try to retreive the data from teh text field. No matter what i do, i cannot get the data presented correctly. It is either question marks or other wierd symbols.
I have tried every permetation of encoding of the actual html page, the ecoding of the string from request.getParameter etc but it still is not presented on the new html page correctly.
Can anyone help??
Spencer

Ok, I solved the problem.
I had to put at the top request.setCharacterEncoding("utf-8");
Spencer

Replacing non-ASCII characters with HTML charcter references

Hi All,
In Oracle 10g or greater is there a built-in function that will convert a string with non-ASCII characters like this
a b č 뮼
into an ASCII string with HTML character references like this?
a b & # x 0 1 0 D ; & # x B B B C ;
(note I had to include spaces between each character in the sample code for message to prevent the forum software from converting my text)
I tried using
utl_i18n.escape_reference( val, 'us7ascii' )
but for some reason it returns
a b c & # x B B B C ;
Note how it converted the Western European character "č" to its unaccented counterpart "c", not "& # x 0 1 0 D ;" (is this a bug?).
I also tried a custom solution using regexp_replace and asciistr (which I can't include here because the forum software chokes on it) but it only returns the correct result for values <=4000 characters long. Unfortunately asciistr doesn't appear to accept CLOB values larger than 4000 characters. It returns an error message like
(ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 30251, maximum: 4000) ).
I'm looking for a solution that works on CLOB data of any size.
Thanks in advance for any insight you can provide.
Joe Fuda

So with that (UTF8) in mind, let's take another look.....
As shown below, I used a AL32UTF8 database.
Note: I did not use a unicode capable tool for querying. So I set console mode code page to 1250 just to have č displayed properly (instead of posing as an è).
Also, as a result of using windows-1250 for client character set, in the val column and in the second select's ncr column (iso8859-1), è (00e8) has been replaced with e through character set conversion going from server back to client.
Running the same code on a database with a db character set such as we8mswin1252, that doesn't define the č (latin small c with caron) character, would yield results with a c in the ncr column.
C:\>chcp 1250
Aktuell teckentabell: 1250
C:\>set nls_lang=.ee8mswin1250
C:\>sqlplus test/test
SQL*Plus: Release 11.1.0.6.0 - Production on Fri May 23 21:25:29 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the OLAP option
SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET';
PARAMETER              VALUE
NLS_CHARACTERSET       AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16
SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'us7ascii') NCR from dual;
VAL NCR
č e c e
SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'we8iso8859p1') NCR from dual;
VAL NCR
č e &# x10d; e     <- "è"
SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'ee8iso8859p2') NCR from dual;
VAL NCR
č e č &# xe8;
SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'cl8iso8859p5') NCR from dual;
VAL NCR
č e &# x10d; &# xe8;In the US7ASCII case, where it should be possible for all non-ascii characters to be escaped, it seems as if the actual escape step is skipped over.
Hope this helps to understand whether utl_i8n is usable or not in your case.
Message was edited by:
orafad
Fixed replaced character references :)

Java / Linux - why ??? instead of non-ascii symbols

Similar Messages

Maybe you are looking for