Utf-8 encoding problem on solaris

Hello all.
I am using weblogic 9.2 and I am facing a very weird problem regarding the encoding. I fetch data from the db (informix btw) and I forward data as utf-8 to jsps. I have set up everything succesfully on my web.xml, weblogic.xml and all jsps include the page directive for utf-8. When I deploy my application on windows 2k machine everything goes smooth. But when the deployment happens on a solaris machine my jsps show "?" instead of letters. Has anyone faced this problem before? Could you plz direct me towards a solution because this thing has taken me days and days and I still haven't managed to find a solution
Thanx in advance
axel

Hi,
Start the app, and hook an Eclipse debug project to it. Check if the enconding problem is while retrieving from the DB or while generating the response. If the issue is on the DB, you may need to define the enconding on the connection (I am not sure what driver you are using, but should be able to check this out.) If the issue is while generating the response, just XML escape every character.
Regards,
LG

Similar Messages

File encoding problem in solaris sparc

hi there,
I wrote a simple program to create a text file that with the content of japanese character that same way I wrote in Windows JP Platform.
However, the encoding is different from what window, oracle(solaris sparc) does.
this is very interesting that we view from oracle and its workd fine from Oracle which run on the same machine(solaris Sparc) .
I believe I missed out some setting for the internationalization for JDK in solaris, but I'm not familiar with solaris. Hence, I would like to seek is there one of you came across this before? Please help.
Here is the String.
\u6771\u9999\u91CC\u5357\u753A�T�W�|�T�|�P�O�P
Please note that 58-5-101 is double full width JP character.
The main problem is the "-" encoding for solaris. It changed to '?'
My env are,
-jdk1.3.1_03
-Solaris Sparc LANG=ja
pls feel fre to ask for more information if you need further details.
regards,
elvis
scjp

I don't fully understand what you're asking for, but I wonder if this might help : the default encoding of a Solaris machine is probably EUC (In Windows, this is SJIS). So you'll probably have to read the file using the EUC encoding. Have a look at InputStreamReader(InputStream in, String enc) where you can set the encoding of the input stream.
(Note that there is also a forum called 'Internationalization' at java.sun.com)

UTF-8 encoding problem in HTTP adapter

Hi Guys,
I am facing problem in the UTF-8 multi-byte character conversion.
Problem:
I am posting data from SAP CRM to third party system using XI as middle ware. I am using HTTP adapter to communicate XI to third party system.
in HTTP configuration i have given XML code as UT-8 in the XI payload manipulation block.
I am trying to post Chines characters from SAP CRM to third party system. junk characters are going to third party system. my assumption is it is double encoding.
I have checked the Xml messages in the Message monitoring in XI, i can able to see the chines charaters in XML files. But in the third party system it is showing as junk characters.
Can you please any one help me regarding this issue.
Please let me know if you need more info.
Regards,
Srini

Srinivas
Can you please go through the SAP Notes 856597 Question No.3 which may resolve your issue? Also have you checked SAP Notes 761608,639882, 666574, 913116, 779981 which might help you.
---Satish

Encoding Problem in solaris

I generated a document in a solaris machine which follows the encoding scheme "ISO-8859-1". This document contains symbols such as British currency pound symbols.
Now, I am passing this document as an input to my application. My application runs on another machine which follows "US-ASCII 7bit" encoding scheme. Here, the input file undergoes some processing by my application and when I see the file now, the British currency pound symbols is converted to question marks.
To stop this, can I use the -Dfile.encoding option as
java -Dfile.encoding=ISO-8859-1 -classpath.......
Is the format right? will I get any problem changing the format? once I complete running my application will the encoding scheme revert back to the the default one?
Thanks
Shiv

That should be fine but you should avoid setting file.encoding for reading only one file. What will you do when your program grows and needs to communicate with the system using the system's own default character encoding? It's better to set the encoding in the program itself, like this:final String ENCODING = "ISO-8859-1";
Reader in = new InputStreamReader(new FileInputStream(file), ENCODING);The encoding does not have to be hardcoded, instead you could use System.getProperty("my.program.encoding", "ISO-8859-1") or what ever.

UTF-8 encoded JSPs compilation problem

Hi,
          I'm using Weblogic 9.0 Beta. I have an XML-format UTF-8 encoded JSP (with the proper encoding declarations). I can see that this is compiled into a UTF-8 Java servlet by WebLogic.
          At the compilation to a class file though, the encoding is corrupted. I guess that the Java compiler is assuming a system-encoded (which would be ISO-8859-1) Java file instead of the actual UTF-8 encoding.
          This problem did not occur with WebLogic 8.1.
          I have tried to explicitly tell the Java compiler to treat the source files as UTF-8 in weblogic.xml, i.e.
          <jsp-param>
          <param-name>compileFlags</param-name>
          <param-value>-encoding UTF8</param-value>
          </jsp-param>
          but that had no effect.
          Anyone else noticed this?
          I assume that correct behaviour is for WebLogic to preserve encoding from JSP to servlet to class file, rather than for me to set encoding in weblogic.xml. Is that correct?
          Is there a workaround?
          Thanks for any help you can offer!

Solved
It is about Tomcat's character encoding not about the codes..
For more info:
[http://wiki.apache.org/tomcat/Tomcat/UTF-8]

Need help for UTF-8 Encoding comparison problem

Hi, all
I'm doing some unit testing for comparing two xml results. The problem is they are using "UTF-8" encoding. After I grab them out, I don't know how to compare them in order to get expected result. I have tried IOUtils and set the system properties by System.setProperty( "file.encoding", "UTF-8" ); None of them works. Anyone can give me a hand? Thanks
Edited by: AllenZhao on Nov 12, 2007 1:38 PM

rayon.m wrote:
hunter9000 wrote:
What exactly are you trying to compare? Are they files, Strings in memory, streams? Remember that encodings only come into play when you convert text from Strings in memory to bytes on disk (or some other external source, like an output stream).eh?
Encodings are rather important when you read a file.Yes, you're right, thanks. I got lazy and assumed the OP knew I was only talking about outputting Strings, without actually stating that, so I forgot to describe the other half of the process. Posting this close to quitting time is a bad idea ;)

[SOLVED] Problems opening folders with UTF-8 encoded characters

Hello everyone, I'm having an issue when I acess folders in all my programs ( except Dolphin File Manager). Every time I open the folder navigation window in my programs, folders with UTF-8 encoded characters ( such as "ç", "á ", "ó", "í", etc ) are not shown or the folder name not show these characters, therefore, I can not open documents inside these folders.
However, as you saw, I can type these characters normally. Here's my "locale.conf" :
LANG="en_US.UTF-8:ISO-8859-1"
LC_TIME="pt_BR.UTF-8:ISO-8859-1"
And here's the output of the command "locale -a" :
C
en_US.utf8
POSIX
Last edited by regmoraes (2015-04-17 12:55:19)

Thing is, when I run locale -a, I get
$ locale -a
C
de_DE@euro
de_DE.iso885915@euro
de_DE.utf8
en_US
en_US.iso88591
en_US.utf8
ja_JP
ja_JP.eucjp
ja_JP.ujis
ja_JP.utf8
japanese
japanese.euc
POSIX
So an entry for every locale I have uncommented in my locale.conf. Just making sure, by "following the steps in the beginner's guide", you also mean running locale-gen?
Are those folders on a linux filesystem like ext4 or on a windows (ntfs?)

Encoding Problem - can't read UTF-8 file correctly

Windows XP, JDK 7, same with JDK 6
I can't read a UTF-8 file correctly:
Content of File (utf-8, thai string):
เม็ดเลือดขาว
When opened in Editor and copy pasted to JTextField, characters are displayed correctly:
String text = jtf.getText();
text.getBytes("utf-8");
-32 -71 -128 -32 -72 -95 -32 -71 -121 -32 -72 -108 -32 -71 -128 -32 -72 -91 -32 -72 -73 -32 -72 -83 -32 -72 -108 -32 -72 -126 -32 -72 -78 -32 -72 -89
Read file with FileReader/BufferedReader:
line = br.readLine();
buffs = line.getBytes("utf-8"); //get bytes with UTF-8 encoding
-61 -65 -61 -66 32 0 64 14 33 14 71 14 20 14 64 14 37 14 55 14 45 14 20 14 2 14 50 14 39 14
buffs = line.getBytes(); // get bytes with default encoding
-1 -2 32 0 64 14 33 14 71 14 20 14 64 14 37 14 55 14 45 14 20 14 2 14 50 14 39 14
Read file with:
FileInputStream fis...
InputStreamReader isr = new InputStreamReader(fis,"utf-8");
BufferedReader brx = new BufferedReader(isr);
line = br.readLine();
buffs = line.getBytes("utf-8");
-17 -65 -67 -17 -65 -67 32 0 64 14 33 14 71 14 20 14 64 14 37 14 55 14 45 14 20 14 2 14 50 14 39 14
buffs = line.getBytes();
63 63 32 0 64 14 33 14 71 14 20 14 64 14 37 14 55 14 45 14 20 14 2 14 50 14 39 14
Anybody has an idea? The file seems to be UTF-8 encoded. What could be wrong here?

akeiser wrote:
Windows XP, JDK 7, same with JDK 6
I can't read a UTF-8 file correctly:
Content of File (utf-8, thai string):
เม็ดเลือดขาว
When opened in Editor and copy pasted to JTextField, characters are displayed correctly:
String text = jtf.getText();
text.getBytes("utf-8");
-32 -71 -128 -32 -72 -95 -32 -71 -121 -32 -72 -108 -32 -71 -128 -32 -72 -91 -32 -72 -73 -32 -72 -83 -32 -72 -108 -32 -72 -126 -32 -72 -78 -32 -72 -89 These values are the bytes of your original string "เม็ดเลือดขาว" utf-8 encoded with no BOM (Byte Order Marker) prefix.
>
Read file with FileReader/BufferedReader:
line = br.readLine();
buffs = line.getBytes("utf-8"); //get bytes with UTF-8 encoding
-61 -65 -61 -66 32 0 64 14 33 14 71 14 20 14 64 14 37 14 55 14 45 14 20 14 2 14 50 14 39 14
buffs = line.getBytes(); // get bytes with default encoding
-1 -2 32 0 64 14 33 14 71 14 20 14 64 14 37 14 55 14 45 14 20 14 2 14 50 14 39 14
Read file with:
FileInputStream fis...
InputStreamReader isr = new InputStreamReader(fis,"utf-8");
BufferedReader brx = new BufferedReader(isr);
line = br.readLine();
buffs = line.getBytes("utf-8");
-17 -65 -67 -17 -65 -67 32 0 64 14 33 14 71 14 20 14 64 14 37 14 55 14 45 14 20 14 2 14 50 14 39 14
buffs = line.getBytes();
63 63 32 0 64 14 33 14 71 14 20 14 64 14 37 14 55 14 45 14 20 14 2 14 50 14 39 14 These values are the bytes of your original string UTF-16LE encoded with a UTF-16LE BOM prefix.
This means that there is nothing wrong (the String has been read correctly) with the code and that your default encoding is UTF-16LE .
Edited by: sabre150 on Aug 1, 2008 5:48 PM

Romaji yen sign in Terminal in the UTF-8 encoding

Hello all,
I have a MacBook Pro with a Japanese keyboard running Mac OS X 10.6.2. In Romaji mode, the Japanese keyboard has a dedicated yen sign (¥) key, and Option-¥ produces a backslash (\). In Terminal, for some reason, the ¥ key produces \ without the Option modifier. (Option-¥ also produces \ in Terminal, which is normal behavior.)
A similar situation was discussed in an older topic, http://discussions.apple.com/thread.jspa?messageID=10665836 , where the problem was diagnosed as having the Shift JIS encoding enabled in Terminal. However, this doesn‘t reflect my situation, since the only encoding that is enabled in my Terminal is UTF-8 – and there‘s certainly a yen sign available in UTF-8.
I am able to type other UTF-8 characters in Terminal in Romaji mode; for example, I can type Option-e e to produce é, and entering the command *echo é | od -x* within Terminal shows that the correct UTF-8 byte sequence is generated for é. Since the command *echo -e '\0302\0245'* within Terminal will produce a yen sign there, the problem seems to be connected to the key mapping rather than to a stty interface problem.
Is there anyone running 10.6.2 with a Japanese keyboard who can type the ¥ key in Romaji mode in Terminal with the UTF-8 encoding enabled, and have a yen sign appear rather than a backslash?
(This topic was initially posted in the +Installation and Setup+ forum, and I‘ve taken the advice of a kind soul there to repost the topic in this forum.)

I don't know the exact reason why ¥ is forcefully converted to \ in Terminal (even in UTF-8 encoding), and anyway it would be better to add an option to turn off this conversion (or there may already be a hidden option which I can't find).
But the conversion may be helpful for many users, as expected from the following reasons:
I guess there is no key for backslash on the Japanese keyboard of MacBook Pro. If this is the case, then being able to input \ by just hitting the ¥-key (instead of typing option-¥) may be "useful" for may Terminal users (because \ is used much more frequently than ¥ in programs). Kotoeri has an option to swap ¥ and option-¥ keys (so hitting ¥-key inputs \ and option-¥ inputs ¥), but this setting is global (i.e., not restricted to Terminal.app), so making this as the default setting may confuse most of Japanese users (they don't use Terminal.app at all, but uses ¥ as the currency symbol in other apps). Even Terminal users would use ¥ more frequently than \ in apps other then Terminal, so don't want to modify the global setting.
Another reason may be that there are still many Japanese textbooks for programing which uses ¥ as the escape character (I guess you know why). For example the first C program looks like: printf("Hello World!¥n"); So many beginners would try to input ¥ as written in the textbook, without knowing the escape character in UTF-8 should be \, not ¥. Converting ¥ to \ may be helpful for these users (of course they would be surprised to see not ¥ but \ appears on the screen, but anyway the program would work).
You can send a bug report or feature request at:
http://www.apple.com/feedback/macosx.html

UTF-8 encoding trouble

I need to use UTF8 encoding throughout a site. For that purpose, I have the following
tags on JSP:
<%@ page contentType="text/html; charset=UTF-8" %>
<meta http-equiv="Content-Type" CONTENT="text/html; charset=UTF-8">
Next, in my weblogic.xml, I have the following:
<jsp-param>
<param-name>encoding</param-name>
<param-value>UTF8</param-value>
</jsp-param>
<charset-params>
<input-charset>
<resource-path>*.jsp</resource-path>
<java-charset-name>UTF8</java-charset-name>
</input-charset>
</charset-params>
Having configured this, I have two simple JSP files. The first one submits a field
(whose contents I enter in Greek), and the second page writes them to a file. The
code for writing to a file looks like this:
FileOutputStream of = new FileOutputStream (fileName, false);
OutputStreamWriter ow = new OutputStreamWriter (of, "UTF-8");
ow.write (request.getParameter("test"));
When I enter the Greek character Alpha as input, the file has a weird string +I in
it. To fix the problem, I did the following (and it works):
String s = request.getParameter ("TestName");
byte b[] = new byte [5000];
b = s.getBytes ();
s = new String (b, "UTF-8");
writeToFile (s);
Which means that for some reason, the page gets the right String, but it seems to
be encoded with default encoding (not UTF8). When I convert it into bytes, and create
another String using the same byte-stream but a different encoding, what I get is
correct UTF-8 encoded string. Please also note that the same problem occurs with
DB as well (Oracle 8.1.7 with UTF8 on Win2k), and fixing the above code fixes problem
at both file and database level.
Rather than the above workaround, what's the proper way to accomplish this?
Thanks,
Raja

In GlassFish i have changed this now below here. Under each listeners both for Network Listeners and Protocols there are an HTTP tab and under that one i have change this,
Network Config
Network Listeners
http-listeners-1
http-listeners-2
admin-listeners
Protocols
http-listeners-1
http-listeners-2
admin-listeners
URI Encoding: UTF-8
Default Response Type: text/plain; charset=UTF-8
Forced Response Type: text/plain; charset=UTF-8
So when i run curl in a terminal window i get this response:
Macintosh:~ jespernyqvist$ curl -I http://neptunediving.com/neptune/index.jsp
HTTP/1.1 200 OK
Date: Mon, 17 May 2010 04:14:17 GMT
Server: GlassFish v3
X-Powered-By: JSP/2.1
Content-Type: text/html;charset=UTF-8
Content-Language: en-US
Transfer-Encoding: chunked
Set-Cookie: JSESSIONID=478269c08e050484d1d6fa29fc44; Path=/neptune
As you can see now my HTTP Header is looking good, no more charset=iso-8859-1. The only problem i have here is that there is no space in between text/html;charset=UTF-8. I think this should be like this instead or not, text/html; charset=UTF-8? I have noticed that they are very case sensitive so maybe this is a problem for me?
On top of my header i have this;
<%@page import="com.neptunediving.*"%>
<%@include file="WEB-INF/include/LangSupport.jsp"%>
<%@page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
In my header i have this;
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
I have changed in the preferences for Eclipse to use UTF-8. I have gone thru all properties files in my project and changed them to UTF-8 also. So what else are they to change?
Still my page is nor displayed properly, now in all browsers like Safari, Firefox, Opera and Internet Explorer. So what is wrong with my page since this don't work for me? Can anybody please explain this to me?

UTF-8 encoding

Hi,
I'm having trouble with parsing XML stored in NCLOB column using UTF-8 encoding.
Here is what I'm running:
Windows NT 4.0 Server
Oracle 8i (8.1.5) EE
JDeveloper 3.0, JDK 1.1.8
Oracle XML Parser v2 (2.0.2.5?)
The following XML sample that I loaded into the dabase contains two UTF-8 multi-byte characters:
<?xml version="1.0" encoding="UTF-8"?>
<G><A>GBotingen, BrC<ck_W</A></G>
G(0xc2, 0x82)otingen, Br(0xc3, 0xbc)ck_W
If I'm not mistaken, both multibyte characters are valid UTF-8 encodings and they are defined in ISO-8859-1 as:
0xC2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0xFC LATIN SMALL LETTER U WITH DIAERESIS
I wrote a Java stored function that uses the default connection object to connect to the database, runs a Select query, gets the OracleResultSet, calls the getCLOB method and calls the getAsciiStream() method on the CLOB object. Then it executes the following piece of code to get the XML into a DOM object:
DOMParser parser = new DOMParser();
parser.setPreserveWhitespace(true);
parser.parse(istr); // istr getAsciiStream
XMLDocument xmldoc = parser.getDocument();
Before the stored function can do other thinks, this code seems to throw an exception complaining that the above XML contains "Invalid UTF8 encoding".
Now, when I remove the first mutlibyte character (0xc2, 0x82) from the XML, it parses fine.
Also, when I do not remove this character, but connect via the jdbc racle:thin driver (note that now I'm not running inside the RDBMS as stored function anymore) the XML is parsed with no problem and I can do what ever I want with the XMLDocument. Note that I loaded the sample XML into the database using the thin jdbc driver.
One more thing, I tried two database configurations with WE8ISO8859P1/WE8ISO8859P1 and WE8ISO8859P1/UTF8 and both showed the same problem.
I'll appreciate any help with this issue. Thanks...

I inserted the document once by using the oci8 driver and once by using the thin driver. Then I used the DBMS_LOB package to look at the individual characters and convert those characters using the ASCII function.
It looks like that when I inserted the document using the OCI8 driver, they got converted into a pair of 191 (0xbf) characters. However, when I used the thin driver they ended up being stored as 195 (0xc3) and 130 (0x82).
So it looks like that the OCI8 driver is corrupting the individual characters and that if the characters is not corrupted they cause a following exception to be thrown:
Error: 440, SQL execution error, ORA-29532: Java call terminated by uncaught Java exception: java.io.UTFDataFormatException: Invalid UTF8 encoding. ORA-06512: at "SYSTEM.GETWITHSTYLE", line 0 ORA-06512: at line 1
Note that my other example of mutli-byte character (C<) also gets corrupted by the OCI8 driver but does not cause the above exception to be thrown if it's inserted via the thin driver.
null

UTF-16 parsing problem in XI

Hi! we are having a problem with XML encoding. An external system A sends us a XML message through HTTP. An example of the XML is:
<?xml version="1.0" encoding="UTF-16"?>
<Envelope version="01.00">
</Envelope>
The system A uses UTF-16 for the encoding. We write a Java mapping to manipulate the data and writes out the data in UTF-8 and use UTF-8 in the header. Then we would do a message mapping. The trace shows that the Java mapping gets executed successfully. However, it throws out a ParserException in the message mapping. This is a copy of the trace log:
<Trace level="1" type="T">RuntimeException during appliction Java mapping com/sap/xi/tf/_Can_HRXML_to_RFC_Req_</Trace>
<Trace level="1" type="T">com.sap.aii.utilxi.misc.api.BaseRuntimeException: Fatal Error: com.sap.engine.lib.xml.parser.ParserException: Invalid char #0x0(:main:, row:3, col:0) at com.sap.aii.mappingtool.tf3.Transformer.checkParserException(Transformer.java:41) at com.sap.aii.mappingtool.tf3.Transformer.start(Transformer.java:79) at com.sap.aii.mappingtool.tf3.AMappingProgram.execute(AMappingProgram.java:232) at com.sap.aii.ibrun.server.mapping.JavaMapping.executeStep(JavaMapping.java:63) at com.sap.aii.ibrun.server.mapping.Mapping.execute(Mapping.java:91) at com.sap.aii.ibrun.server.mapping.SequenceMapping.executeStep(SequenceMapping.java:55) at com.sap.aii.ibrun.server.mapping.Mapping.execute(Mapping.java:91) at com.sap.aii.ibrun.server.mapping.MappingHandler.run(MappingHandler.java:78) at com.sap.aii.ibrun.sbeans.mapping.MappingRequestHandler.handleMappingRequest(MappingRequestHandler.java:88) at com.sap.aii.ibrun.sbeans.mapping.MappingRequestHandler.handleRequest(MappingRequestHandler.java:63) at com.sap.aii.ibrun.sbeans.mapping.MappingServiceImpl.processFunction(MappingServiceImpl.java:79) at com.sap.aii.ibrun.sbeans.mapping.MappingServiceObjectImpl0.processFunction(MappingServiceObjectImpl0.java:131) at sun.reflect.GeneratedMethodAccessor294.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at com.sap.engine.services.ejb.session.stateless_sp5.ObjectStubProxyImpl.invoke(ObjectStubProxyImpl.java:187) at $Proxy42.processFunction(Unknown Source) at sun.reflect.GeneratedMethodAccessor568.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at com.sap.engine.services.rfcengine.RFCDefaultRequestHandler.handleRequest(RFCDefaultRequestHandler.java:95) at com.sap.engine.services.rfcengine.RFCJCOServer.handleRequestInternal(RFCJCOServer.java:113) at com.sap.engine.services.rfcengine.RFCJCOServer$ApplicationRunnable.run(RFCJCOServer.java:171) at com.sap.engine.core.thread.impl3.ActionObject.run(ActionObject.java:37) at java.security.AccessController.doPrivileged(Native Method) at com.sap.engine.core.thread.impl3.SingleThread.execute(SingleThread.java:95) at com.sap.engine.core.thread.impl3.SingleThread.run(SingleThread.java:159)</Trace>
When I run SXMB_MONI to show the processed XML message, the message from the system A isnt shown correctly. This is what it says:
The XML page cannot be displayed
Switch from current encoding to specified encoding not supported. Error processing resource 'file:///C:/Documents and Setti...
<?xml version="1.0" encoding="UTF-16"?>
I tried to use a HTTP post tool to post the XML message, if I changed UTF-16 to UTF-8, the message could be processed successfully.
How do we resolve this encoding problem? The external system A could change the encoding scheme to UTF-8, but the header is hard-coded, and will be remained to be UTF-16 whatever scheme it uses. Any input will be appreciated.
Julie

Hi,
I have a similar problem.
Input is xml with UTF8 encoding, output is xml with UTF8 encoding.
In spite of that simple situation, Transformer returns bad output:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(outputStream);
/*Properties properties = transformer.getOutputProperties();
properties.setProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperties(properties);*/ this didn't help
transformer.transform(source, result);
//source --> correct in UTF8
//result --> after transformation is incorrect in UTF8
And the result ist:
An invalid character was found in text content. Error processing resource 'file:///C:/Documents and Settings/rlatta/Local S...
<?xml version="1.0" encoding="utf-8" ?><SDS_XSD_ZPPM_POB><pob>070</pob><skratkaPobocky>SE</...
Does anybody know some answer?
p.s. we recently installed support packages SAP_BASIS 13, SAP_ABA 13. Before it worked so far I know.

Parsing a UTF-8 encoded XML Blob object

Hi,
I am having a really strange problem, I am fetching a database BLOB object containing the XMLs and then parsing the XMLs. The XMLs are having some UTF-8 Encoded characters and when I am reading the XML from the BLOB, these characters lose their encoding, I had tried doing several things, but there is no means I am able to retain their UTF encoding. The characters causing real problem are mainly double qoutes, inverted commas, and apostrophe. I am attaching the piece of code below and you can see certain things I had ended up doing. What else can I try, I am using JAXP parser but I dont think that changing the parser may help because, here I am storing the XML file as I get from the database and on the very first stage it gets corrupted and I have to retain the UTF encoding. I tried to get the encoding info from the xml and it tells me cp1252 encoding, where did this come into picture and I couldn't try it retaining back to UTF -8
Here in the temp.xml itself gets corrupted. I had spend some 3 days on this issue. Help needed!!!
ResultSet rs = null;
    Statement stmt = null;
    Connection connection = null;
    InputStream inputStream = null;
    long cifElementId = -1;
    //Blob xmlData = null;
    BLOB xmlData=null;
    String xmlText = null;
    RubricBean rubricBean = null;
    ArrayList arrayBean = new ArrayList();
      rs = stmt.executeQuery(strQuery);
     // Iterate till result set has data
      while (rs.next()) {
        rubricBean = new RubricBean();
        cifElementId = rs.getLong("CIF_ELEMENT_ID");
                // get xml data which is in Blob format
        xmlData = (oracle.sql.BLOB)rs.getBlob("XML");
        // Read Input stream from blob data
         inputStream =(InputStream)xmlData.getBinaryStream();
        // Reading the inputstream of data into an array of bytes.
        byte[] bytes = new byte[(int)xmlData.length()];
         inputStream.read(bytes);
       // Get the String object from byte array
         xmlText = new String(bytes);
       // xmlText=new String(szTemp.getBytes("UTF-8"));
        //xmlText = convertToUTF(xmlText);
        File file = new File("C:\\temp.xml");
        file.createNewFile();
        // Write to temp file
        java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));
        out.write(xmlText);
        out.close();

What the code you posted is doing:
// Read Input stream from blob data
inputStream =(InputStream)xmlData.getBinaryStream();Here you have a stream containing binary octets which encode some text in UTF-8.
// Reading the inputstream of data into an
into an array of bytes.
byte[] bytes = new byte[(int)xmlData.length()];
inputStream.read(bytes);Here you are reading between zero and xmlData.length() octets into a byte array. read(bytes[]) returns the number of bytes read, which may be less than the size of the array, and you don't check it.
xmlText = new String(bytes);Here you are creating a string with the data in the byte array, using the platform's default character encoding.
Since you mention cp1252, I'm guessing your platform is windows
// xmlText=new new String(szTemp.getBytes("UTF-8"));I don't know what szTemp is, but xmlText = new String(bytes, "UTF-8"); would create a string from the UTF-8 encoded characters; but you don't need to create a string here anyway.
//xmlText = convertToUTF(xmlText);
File file = new File("C:\\temp.xml");
file.createNewFile();
// Write to temp file
java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));This creates a Writer to write to the file using the platform's default character encoding, ie cp1252.
out.write(xmlText);This writes the string to out using cp1252.
So you have created a string treating UTF-8 as cp1252, then written that string to a file as cp1252, which is to be read as UTF-8. So it gets mis-decoded twice.
As the data is already UTF-8 encoded, and you want the output, just write the binary data to the output file without trying to convert it to a string and then back again:// not tested, as I don't have your Oracle classes
final InputStream inputStream = new BufferedInputStream((InputStream)xmlData.getBinaryStream());
final int length = xmlData.length();
final int BUFFER_SIZE = 1024;                  // these two can be
final byte[] buffer = new byte[BUFFER_SIZE];   // allocated outside the method
final OutputStream out = new BufferedOutputStream(new FileOutputStream(file));
for (int count = 0; count < length; ) {
   final int bytesRead = inputStream.read(buffer, 0, Math.min(BUFFER_SIZE, (length - count));
   out.write(buffer, 0, bytesRead);
   count += bytesRead;
}Pete

How to read UTF-8 encoded text file randomly?

I am trying to read a text file which has been encoded in UTF-8. The problem is that I need to access the file randomly. The RandomAccessFile is a low-level class and there seems to be no-way to wrap it in InputStreamReader so that UTF-8 encoding can be done on-the-fly. Is there any easy way to do that. Below is the simplified version of my program.
import java.io.*;
public class Test{
        public Test(String filename){
                try{
                        RandomAccessFile rafTemIn = new RandomAccessFile(new File(filename), "r");
                        while(true){
                                char chr = rafTemIn.readChar();
                                System.err.println(chr);
                } catch (EOFException e) {
                        System.err.println("File read.");
                } catch (IOException e) {
                        System.err.println("File input error");
        public static void main(String[] args){
                Test t= new Test("template.idx");
}

The file that I am going to read could be few hundreds of MBs or GBs. Hence, I will index interesting items in the file. The index file contain the keyword and the byte offset in the file. So, I will need to seek to any byte to read it. The file could be UTF-8 encoded XML or UTF-8 encoded plain text.
Also, would like to add-up that in the sample program above I am reading the file sequentially. The concerned class has another method which actually does the reading randomly. If this helps, I am pasting the simplified version of code again but this also includes the said method.
import java.io.*;
public class Test{
        long bloc;
        long eloc;
        RandomAccessFile rafTemIn;
        public Test(String filename){
                bloc=0L;
                eloc=0L;
                try{
                        rafTemIn = new RandomAccessFile(new File(filename), "r");
                        while(true){
                                char chr = rafTemIn.readChar();
                                System.err.println(chr);
                } catch (EOFException e) {
                        System.err.println("File read.");
                } catch (IOException e) {
                        System.err.println("File input error");
        public String getVal(String templateName){
                String stemval=null;
                try {
                        rafTemIn.seek(bloc); //bloc is a long value for beginng location to read from. It changes.
                        byte[] b = new byte[(int)(eloc - bloc + 1L)];
                        rafTemIn.read(b,0,(int) (eloc - bloc + 1L));
                        stemval = new String(b,"UTF-8");
                } catch(IOException eio) {
                        System.err.println("Template Dump file IO error.");
                return stemval;
        public static void main(String[] args){
                Test t= new Test("template.idx");
                System.out.println(t.getVal("wikipedia"));
}

How to save a UTF-8 encoded text file ?

hi People
I have a little script which reads the source text from a layer and saves it to a .txt file. This is on a Mac and all was good until recently when I tried opening the .txt file on a PC in Notepad and found my ˚ degree symbols all whack.
Resaving the .txt file in TextEdit as Unicode (UTF-8) encoding solved the problem, now opens fine in Notepad.
But ideally I'd like the script to output the .txt as UTF-8 in the first place. It's currently Western (Mac OS Roman). I've tryed adding in myfile.encoding = "UTF8" but the resulting file is still Western (and the special charaters have wigged out again)
any help greatly appreciated../daniel
    var theComp = app.project.activeItem;
    var dataRO = theComp.layer("dataRO").sourceText;
    // prompt user to save file
    var theFile = new File ("~/Desktop/"+ theComp.name + "_output.txt");
    theFile = theFile.saveDlg("Save an ASCII export file.");
    if (theFile != null) {          // check user didn't cancel dialog
        theFile.lineFeed = "windows";
        //theFile.encoding = "UTF8";
        theFile.open("w","TEXT","????");
        theFile.writeln("move details:");
        theFile.writeln(dataRO.value.toString());
    theFile.close();

Hi,
Got it, it seems, the utf-8 standard use 2-bytes (and more) encoding on accents and special characters.
I found some info there with some code http://ivoronline.com/Coding/Theory/Tutorials/Encoding%20-%20Text%20-%20UTF%208.php
However there was some error so I fixed it. (However for 3 and 4 bytes characters i didnt test it. So maybe you'll have to change back the 0xbf to 0x3f or something else.)
So here is the code.
Header 1
function convertCharToUTF(character){
    var utfBytes = "";
    c = character.charCodeAt(0)
    if (c < 0x80) {
        utfBytes = String.fromCharCode (c);
    else if (c < 0x800) {
        utfBytes = String.fromCharCode (0xC0 | c>>6);
        utfBytes += String.fromCharCode (0x80 | c & 0xbF);
    else if (c < 0x10000) {
        utfBytes = String.fromCharCode (0xE0 | c>>12);
        utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF);
        utfBytes += String.fromCharCode (0x80 | c & 0xbF);
    else if (c < 0x200000) {
        utfBytes += String.fromCharCode (0xF0 | c>>18);
        utfBytes += String.fromCharCode (0x80 | c>>12 & 0xbF);
        utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF);
        utfBytes =+ String.fromCharCode (0x80 | c & 0xbF);
        return utfBytes
function convertStringToUTF(stringToConvert){
    var utfString = ""
    for (var i = 0 ; i < stringToConvert.length; i++){
        utfString = utfString + convertCharToUTF(stringToConvert.charAt (i))
    return utfString;
var theFile= new File("~/Desktop/_output.txt");
theFile.open("w", "TEXT");
theFile.encoding = "BINARY"
theFile.linefeed = "Unix"
theFile.write("ï»¿");//or theFile.write(String.fromCharCode (0xEF) + String.fromCharCode (0xEB) + String.fromCharCode (0xBF)
theFile.write(convertStringToUTF("Your stuff éàçËôù"));
theFile.close();

Utf-8 encoding problem on solaris

Similar Messages

Maybe you are looking for