File encoding

How do I get the the encoding value of a file. I am reading files that have been saved as a mixture of UTF-8, ANSI, UNICODE etc. How can I get my code to understand what encoding the files have been saved in ?

There must be a way to examine what format files have been saved in in the same way you can determine whether its read-only. Actually, no. If it's an XML file, then part of the XML specification tells you how to determine the file's encoding (assuming that it was produced correctly). Otherwise, it's up to you and the maker of the file to communicate that information between you.
There are some heuristics that allow you to make an informed guess, or so I have seen, but nothing nearly as simple as the read-only flag.

Similar Messages

File encoding cp1252 problem

Hi there,
I have a problem concerning the file encoding in a web application.
I'll sketch the problem for you.
I'm working on adjustments and bug fixes of an e-mail archive at the company i work for. With this archive, users can search e-mails using a Struts 1.0 / JSP web application, read them, and send them back to their mail inbox.
Recently a bug has appeared, concerning character sets.
We have mails with french characters or other uncommon characters in it.
Like the following mail:
Subject: Test E-mail archief co�rdinatie Els
Content: Test co�rdinatie r�d�marrage ... test weird characters � � �
In the web application itself, everything is fine...but when i send this mail back to my inbox, the subject gets all messed up:
=?ANSI_X3.4-1968?Q?EMAILARCHIVE_*20060419007419*_Tes?=
=?ANSI_X3.4-1968?Q?t_E-maill_archief_co=3Frdinatie_Els?=
The content appears to be fine.
We discovered this problem recently, and a lot of effort and searching has been done to solve it.
Our solution was to put the following line in catalina.sh , with what our Tomcat 4.1 webserver starts.
CATALINA_OPTS="-server -Dfile.encoding=cp1252"
On my Local Win2K computer, the encoding didn't pose a problem, so catalina.sh wasn't changed. It was only a problem (during testing) on our Linux test server ... a VMWare server which is a copy of our production environment.
On the VMWare, i added the line to the catalina.sh file. And it worked fine.
Problem Solved !
Yesterday, we were putting the archive in production. On our production server ... BANG --> NullPointerException.
We thought it has something to do with jars he couldn't find, older jars, cache of tomcat ... but none of this solved the problem.
We put the old version back into production, but the same NullPointerException occured.
We then put the "CATALINA_OPTS="-server -Dfile.encoding=cp1252" " line in comment ... and then it worked again.
We put the new version into production (without the file encoding line), and it worked perfectly, except for those weird ANSI characters.
Anyone have any experience with this?
I use that same file encoding to start a batch, but there i call it Cp1252 (with a capital C) ... might that be the problem? But i have to be sure...because the problem doesn't occur in the test environment, and i can't just test in production ... and switch off the server whenever i'd like to.
Does anyone see if making cp1252 --> Cp1252 might be a solution, or does anyone have another solution?
Thanks in advance.

First, I will start by saying that JInitiator was not intended to run on Win7, especially 64bit. So, it may be time to think about moving to the Java Plugin. Preferably one which is certified with your Forms version.
To your issue, I suspect you need to change the "Region and Language" settings on the client machine. This can be found on the Control Panel. If that doesn't help, take a look at this:
http://stackoverflow.com/questions/4850557/convert-string-from-codepage-1252-to-1250

How to set File Encoding to UTF-8 On Save action in JDeveloper 11G R2?

Hello,
I am facing issue when I am modifying a File using JDeveloper 11G R2. JDeveloper is changing the Encoding of the File to System default Encoding (ANSI) instead of UTF-8. I have updated the Encoding to UTF-8 in "Tools | Preferences | Environment | Encoding" option and restarted the JDeveloper. I have also updated "Project Properties | Compiler | Character Encoding" option to UTF-8. None of them are working.
I am using below version of JDeveloper,
Oracle JDeveloper 11g Release 2 11.1.2.3.0
Studio Edition Version 11.1.2.3.0
Product Version: 11.1.2.3.39.62.76.1
I created a file in UTF-8 Encoding. I opened it, do some changes and Save it.
When I open the "Properties" tab using "Help | About" Menu, I can see that the Properties of JDeveloper are showing encoding as Cp1252. Is it related?
Properties
sun.jnu.encoding
Cp1252
file.encoding
Cp1252
Any idea how to make sure JDeveloper saves the File in UTF-8 always?
- Sujay

I have already done that. That is the first thing I did as mentioned in my Thread. I have also added below 2 options in jdev.conf and restarted JDeveloper, but that also did not work.
AddVMOption -Dfile.encoding=UTF-8
AddVMOption -Dsun.jnu.encoding=UTF-8
- Sujay

Jinitiator 1.3.1.2.6 on win 7 64 and win xp (different file.encoding)

Hello,
our customer has moved from windows XP to Windows 7 and he uses Jinitiator 1.3.1.2.6...
In some "Forms" I have implemented a PJC to save datas from clob to local file system..
But there is a problem....
If I run the same application with Windows XP I get file.encoding=Cp1250 which is ok....
If I run the same application with Windows 7 (64) I get file.encoding=CP1252 and here is the problem...
Is there any way to run Jinitiator (or set up file.encoding to/with) Cp1250?
Maybe is this a local problem with windows?
thank you..

First, I will start by saying that JInitiator was not intended to run on Win7, especially 64bit. So, it may be time to think about moving to the Java Plugin. Preferably one which is certified with your Forms version.
To your issue, I suspect you need to change the "Region and Language" settings on the client machine. This can be found on the Control Panel. If that doesn't help, take a look at this:
http://stackoverflow.com/questions/4850557/convert-string-from-codepage-1252-to-1250

How to set the file.encoding in jvm?

I have some error in showing Chinese by used servlet,
some one tole me that I can change the file.encoding in jvm to zh_CN, how can I do that?

Add the java argument in your servlet engine.
e.g
java -Dfile.encoding=ISO8859-1
garycafe

File encoding in sender file comunication channel

hello everyboy,
i have a strange situation.
2 PI 7.0 installation: develop and production. Identical. Same SP level, java vm, etc etc
I have a interface file to idoc.
File sender comunication channel are FTP and with content conversion.
They are identical!!!!!!
but....
in production i added parameter File Encoding = ISO-8859-1 because if i have presence of strange characters....it work better.
the same files...in develop installation, they work without this parameter.
why?
there are a place maybe in Config Tool or j2ee admin tool where is set this parameter?
thanks in advance
Edited by: apederiva on Mar 12, 2010 3:55 PM

Hi,
Make sure your both the systems are unicode so that you will not have any issues. Also please see this document for how to work with character encodings in PI. Also we dont have any special config in j2ee admin tool.
http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42?quicklink=index&overridelayout=true
Regards,
---Satish

HOW SPECIFY FILE.ENCODING=ANSI FORMAT IN J2SE ADAPTER.

Hi All,
we are using j2se plain adapter, we need the outputdata in ANSI FORMAT.
Default file.encoding=UTF-8
how to achive this.
thanks in advance.
Regards,
Mohamed Asif KP

File adapter would behave in a similar fashion on J2ee. Providing u the link to ongoing discussion
is ANSI ENCODING possible using file/j2see adapter
Regards,
Prateek

Text File Encoding used by TextEdit/OS X

Hi all folks,
does someone know the code page are used with the text file encoding "Western (EBCDIC US)"
available from the "Customize Encodings List" in the TextEdit "Plain Text File Encoding" Preferences.
The text file encoding "Western (EBCDIC Latin 1)" works well, but "EBCDIC US" does not,
the character set is very limited.
Thanks for any help,
Lutz

Yeah unfortunately they're all listed as 0kb files. I guess that means the faulty hard drive didn't transfer them properly, even though the Mac did the copy confirmation sound.
Hundreds of folio files... all gone. ;___;

XI File Adapter Custom File Encoding for issues between SJIS and CP932

Dear SAP Forum,
Has anybody found a solution for the difference between the JVM (IANA) SJIS and MS SJIS implementation ?
When users enter characters in SAPGUI, the MS SJIS implementation is used, but when the XI file adapter writes SJIS, the JVM SJIS implementation is used, which causes issues for 7 characters:
1. FULLWIDTH TILDE/EFBD9E                 8160     ～     〜
2. PARALLEL TO/E288A5                          8161     ∥     ‖
3. FULLWIDTH HYPHEN-MINUS/EFBC8D     817C     －     −
4. FULLWIDTH CENT SIGN/EFBFA0             8191     ￠     \u00A2
5. FULLWIDTH POUND SIGN/EFBFA1           8192     ￡     \u00A3
6. FULLWIDTH NOT SIGN/EFBFA2              81CA     ￢     \u00AC
7. REVERSE SOLIDUS                             815F     ＼     \u005C
The following line of code can solve the problem (either in an individual mapping or in a module)
String sOUT = myString.replace(\u0027～\u0027,\u0027〜\u0027).replace(\u0027∥\u0027,\u0027‖\u0027).replace(\u0027－\u0027,\u0027−\u0027).replace(\u0027￠\u0027,\u0027\u00A2\u0027).replace(\u0027￡\u0027,\u0027\u00A3\u0027).replace(\u0027￢\u0027,\u0027\u00AC\u0027);
But I would prefer to add a custome Character set to the file encoding. Has anybody tried this ?

Dear SAP Forum,
Has anybody found a solution for the difference between the JVM (IANA) SJIS and MS SJIS implementation ?
When users enter characters in SAPGUI, the MS SJIS implementation is used, but when the XI file adapter writes SJIS, the JVM SJIS implementation is used, which causes issues for 7 characters:
1. FULLWIDTH TILDE/EFBD9E                 8160     ～     〜
2. PARALLEL TO/E288A5                          8161     ∥     ‖
3. FULLWIDTH HYPHEN-MINUS/EFBC8D     817C     －     −
4. FULLWIDTH CENT SIGN/EFBFA0             8191     ￠     \u00A2
5. FULLWIDTH POUND SIGN/EFBFA1           8192     ￡     \u00A3
6. FULLWIDTH NOT SIGN/EFBFA2              81CA     ￢     \u00AC
7. REVERSE SOLIDUS                             815F     ＼     \u005C
The following line of code can solve the problem (either in an individual mapping or in a module)
String sOUT = myString.replace(\u0027～\u0027,\u0027〜\u0027).replace(\u0027∥\u0027,\u0027‖\u0027).replace(\u0027－\u0027,\u0027−\u0027).replace(\u0027￠\u0027,\u0027\u00A2\u0027).replace(\u0027￡\u0027,\u0027\u00A3\u0027).replace(\u0027￢\u0027,\u0027\u00AC\u0027);
But I would prefer to add a custome Character set to the file encoding. Has anybody tried this ?

File.encoding - Help

Hi,
We are converting German and English DATA to XML. And
German data (UTF-8) is running through the system fine if I logged
and tested. But strangely its giving ?? chars when I run as a
different user. (Sun Solaris / JDK 1.4)
When I looks at the properties.
for me:
file.encoding = ISO8859-1
for the other user its
file.encoding=ISO646-US
Where is this property being set, and how can I change it. This is
killing me.. I could not find any thing on my system that enforces the
char set. I am not using any special writers or readers..
Thanks in advance
- Ravi

I do not think that the Latin-1 Supplement characters
\u00c4\u00e4\u00df\u00dc\u00fc\u00d6\u00f6
(used in German)
which are encoded in UTF-8 can be displayed correctly with other encoding than UTF-8, to my knowledge.
I suppose your browser automatically sets encoding to UTF-8 when reading those data while other users' browsers do not.

File encoding problen (charset) on glassfish / Sun App Server

Hi all!
I hope someone here can point me the right way since I am trying to solve my problems for quite some time now. First my setup: Suse Linux Box with Glassfish V2. I am creating files for users of our website with an EnterpriseBean in an EJB-Module. The users are supposed to be able to choose the file encoding themselves. The files are created from lucene index files that are UTF-8 encoded. When writing a result file I use a OutputStreamWriter with a CharsetEncoder Object and the user-chosen encoding. This works perfectly when the result is utf-8 too. But whenever I try to generate ISO-8859-1 files the encoding in the output files is messed up. It's neither utf-8 nor Latin 1 or any other valid encoding. On my development windows machine it seemed to work just fine.
So thanks in advanve and many greetings from germany!
Phil

For future references:
this happens to me too and I found that the cause is that the AM server you are going to configure, is already registered into the directory server.
Try running this command (with the obvious parameters substituted)
ldapsearch -B -T -D 'cn=directory manager' -w YOUR_CREDENTIALS -b ou=1.0,ou=iPlanetAMPlatformService,ou=services,YOUR_BASEDN -s base objectclass=* | grep YOUR_SERVERNAMEIf you found that the server you are configuring is listed here try going to AMserver console (if you have another AMserver configured) and browse to Configuration->System Properties->Platforms. If the server is here, remove it, if not, just hit Save (very important).
If this is your first AM and is a first installation you can just get rid of the Directory Server suffix and recreate it with the Top Entry alone.
Edited by: flistello on Mar 27, 2008 4:30 PM

File.encoding won't swap

trying to help someone who has a windows file server originally installed for russian locale. they managed to swap it back to what they require (en_GB) but the JVM file encoding's still stuck on cp1251. they say they've changed the JVM config's file encoding ("-Dfile.encoding=cp1252") yet the system still reports cp1251 for the file encoding (even after several reboots). i don't see anything in the bug parade & i'm at a loss as to why this box won't swap it's JVM file encoding.
any ideas?
thanks.

which version of the videoencoder are you using?
2.0.0.494 edition (Brand New)
Remember that you can't crop all files at once . .
Oh, didn't see that in the manual!
Where the heck is it listed?
I tried to save a profile with the compression setting -
including cropping - but upon trying to use that saved profile - a
dialog box comes up with:
Warning: some of the settings (crop, trim, cue point . . . .
Too bad, so sad!
2,000+ movies is gonna take a long time!
Their huge files - broadcast DVC-Pro - and the overscan - and
interlace top & bottom show up . . . . Aaarrrgggg!
Bye the way - is it 4 pixels cropped - top & bottom - to
clean it up ?
And shouldn't a few (maybe 2 pixels) on each side be cropped
as well?
Any other suggestions on how to automate what I need to do
(deinterlace, crop 4 pixels top & bottom & 2 pixels sides -
exported "High Quality / large)
Thanks for responding so far - sure helps clarify things!
Luke

File.encoding = Cp1252

Dear List,
I have noticed in my system.properties that my windows xp 1.4 jdk has
file.encoding = Cp1252
What does this mean generally ? In my jsp web application, I have noticed that the jsp pages themselves which have html company headers have been corrupted somewhat, e.g. copyright symbol becomes something else (looks like utf8!). Could this be something to do with it ? Does this mean that text stream IO readers etc will adopt this encoding as default
what affects can it have if I change this using the setProperty method?
regards
Ben

Setting with
java -Dfile.encoding= works.
public class a {
public static void main(String a[] ) {
System.out.println(System.getProperty("file.encoding") + " \u0150\u0151\u0170\u0171");
}D:\doku\source\colors\src\web>java a
Cp1252 ????
D:\doku\source\colors\src\web>java -Dfile.encoding=utf8 a
utf8 ┼�┼�┼░┼▒
D:\doku\source\colors\src\web>java -Dfile.encoding=latin2 a
latin2 ╒⌡█√

File encoding with UTF-8

Hello all,
My scenario is IDoc -> XI -> File (txt).
Everything was working fine until i have to handle eastern european language with weird symbol
So in my file adapter receiver, i'm using the file encoding code UTF-8 and when i look my field in output, everything is fine.
BUT, when i look in binary, the length of these field is not longer fixed because a special character takes 2 bytes instead of one.
I would like to know if it's possible to handle those characters with a file encoding code UTF-8 in a fixed length field of 40 characters for example don't want a variable length for my fields...
Thanks by advance,
JP

I agree with you. In XI, i don't have this problem, i have it in my ouput file when i edit my text file in binary mode !
My field should be on 40 characters but the special symbol which take 2 bytes instead of 1 make the length of my output fields variable !!!
My question was to know if there is a way to have a fixed length in my output file..
Sorry if i wasn't clear in my first post.
JP

Convert Text file encoding in perticular format(Unicode)

Hi Expert,
I have requirement of transfering text file (encoding) in perticular file format to Application server ,by default SAP system generates in ANSI ,is it possible to convert it to Unicode format like UTF-8.If possible then how to generate the text file in unicode.
Thanks,
Regards

Check
Note 752835 - Usage of the file interfaces in Unicode systems
Markus

File.encoding in windows influence by the locale

How can I set the file.encoding in windows platform that will not be influence by the locale.
For example, in the Control Panel->Regional Options the locale is set to Russian
and what I get is that I use file.encoding Cp1251 even though I pass the parameter in the command line
-Dfile.encoding=Cp1252 (I want to keep Cp1251,
Cp1251 is US westen default and Cp1252 is Windows Cyrillic)
I run java program to see what encoding I use
D:\ProgramFiles\jdk1.3.1\bin\java -Dfile.encoding= Cp1252 TestEncodingThe locale in my pc is Russian and the result is:
System.getProperty("file.encoding") == Cp1252
Default ByteToChar Class == sun.io.ByteToCharCp1251
Default CharToByte Class == sun.io.CharToByteCp1251
Default CharacterEncoding == Cp1251
OutputStreamWriter encoding == Cp1251
InputStreamReader encoding == Cp1251
TestEncoding.java
import java.io.PrintStream;
import java.io.ByteArrayOutputStream;
import java.io.OutputStreamWriter;
import java.io.InputStream;
import java.io.ByteArrayInputStream;
import java.io.InputStreamReader;
class TestEncoding{
public static void main(String[] args) {
String encProperty = System.getProperty("file.encoding");
System.out.println("System.getProperty(\"file.encoding\") == " + encProperty);
String byteToCharClass = sun.io.ByteToCharConverter.getDefault().getClass().getName();
System.out.println("Default ByteToChar Class == " + byteToCharClass);
String charToByteClass = sun.io.CharToByteConverter.getDefault().getClass().getName();
System.out.println("Default CharToByte Class == " + charToByteClass);
String defaultCharset = sun.io.ByteToCharConverter.getDefault().getCharacterEncoding();
System.out.println("Default CharacterEncoding == " + defaultCharset);
ByteArrayOutputStream buf = new ByteArrayOutputStream(10);
OutputStreamWriter writer = new OutputStreamWriter(buf);
System.out.println("OutputStreamWriter encoding == " + writer.getEncoding());
byte[] byteArray = new byte[10];
InputStream inputStream = new ByteArrayInputStream(byteArray);
InputStreamReader reader = new InputStreamReader(inputStream);
System.out.println("InputStreamReader encoding == " + reader.getEncoding());

What are you really trying to accomplish? Applications should avoid relying on undocumented or implementation dependent features, such as the file.encoding property and sun.* classes (see http://java.sun.com/products/jdk/faq/faq-sun-packages.html).
On the other hand, there's plenty of documented public API that lets you work with specific character encodings. For example, you can specify the character encoding for conversion between byte arrays and String objects (see the String class specification) or when reading or writing files (see the InputStreamReader and OutputStreamWriter classes in java.io).
The default encoding is needed by the Java runtime when accessing the Windows file system, for example file names, so changing it would likely result in erroneous behavior.
Norbert Lindenberg

File encoding

Similar Messages

Maybe you are looking for