UTF-8 encoding trouble

I need to use UTF8 encoding throughout a site. For that purpose, I have the following
tags on JSP:
<%@ page contentType="text/html; charset=UTF-8" %>
<meta http-equiv="Content-Type" CONTENT="text/html; charset=UTF-8">
Next, in my weblogic.xml, I have the following:
<jsp-param>
<param-name>encoding</param-name>
<param-value>UTF8</param-value>
</jsp-param>
<charset-params>
<input-charset>
<resource-path>*.jsp</resource-path>
<java-charset-name>UTF8</java-charset-name>
</input-charset>
</charset-params>
Having configured this, I have two simple JSP files. The first one submits a field
(whose contents I enter in Greek), and the second page writes them to a file. The
code for writing to a file looks like this:
FileOutputStream of = new FileOutputStream (fileName, false);
OutputStreamWriter ow = new OutputStreamWriter (of, "UTF-8");
ow.write (request.getParameter("test"));
When I enter the Greek character Alpha as input, the file has a weird string +I in
it. To fix the problem, I did the following (and it works):
String s = request.getParameter ("TestName");
byte b[] = new byte [5000];
b = s.getBytes ();
s = new String (b, "UTF-8");
writeToFile (s);
Which means that for some reason, the page gets the right String, but it seems to
be encoded with default encoding (not UTF8). When I convert it into bytes, and create
another String using the same byte-stream but a different encoding, what I get is
correct UTF-8 encoded string. Please also note that the same problem occurs with
DB as well (Oracle 8.1.7 with UTF8 on Win2k), and fixing the above code fixes problem
at both file and database level.
Rather than the above workaround, what's the proper way to accomplish this?
Thanks,
Raja

In GlassFish i have changed this now below here. Under each listeners both for Network Listeners and Protocols there are an HTTP tab and under that one i have change this,
Network Config
Network Listeners
http-listeners-1
http-listeners-2
admin-listeners
Protocols
http-listeners-1
http-listeners-2
admin-listeners
URI Encoding: UTF-8
Default Response Type: text/plain; charset=UTF-8
Forced Response Type: text/plain; charset=UTF-8
So when i run curl in a terminal window i get this response:
Macintosh:~ jespernyqvist$ curl -I http://neptunediving.com/neptune/index.jsp
HTTP/1.1 200 OK
Date: Mon, 17 May 2010 04:14:17 GMT
Server: GlassFish v3
X-Powered-By: JSP/2.1
Content-Type: text/html;charset=UTF-8
Content-Language: en-US
Transfer-Encoding: chunked
Set-Cookie: JSESSIONID=478269c08e050484d1d6fa29fc44; Path=/neptune
As you can see now my HTTP Header is looking good, no more charset=iso-8859-1. The only problem i have here is that there is no space in between text/html;charset=UTF-8. I think this should be like this instead or not, text/html; charset=UTF-8? I have noticed that they are very case sensitive so maybe this is a problem for me?
On top of my header i have this;
<%@page import="com.neptunediving.*"%>
<%@include file="WEB-INF/include/LangSupport.jsp"%>
<%@page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
In my header i have this;
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
I have changed in the preferences for Eclipse to use UTF-8. I have gone thru all properties files in my project and changed them to UTF-8 also. So what else are they to change?
Still my page is nor displayed properly, now in all browsers like Safari, Firefox, Opera and Internet Explorer. So what is wrong with my page since this don't work for me? Can anybody please explain this to me?

Similar Messages

UTF-8 encoding

Hi,
I'm having trouble with parsing XML stored in NCLOB column using UTF-8 encoding.
Here is what I'm running:
Windows NT 4.0 Server
Oracle 8i (8.1.5) EE
JDeveloper 3.0, JDK 1.1.8
Oracle XML Parser v2 (2.0.2.5?)
The following XML sample that I loaded into the dabase contains two UTF-8 multi-byte characters:
<?xml version="1.0" encoding="UTF-8"?>
<G><A>GBotingen, BrC<ck_W</A></G>
G(0xc2, 0x82)otingen, Br(0xc3, 0xbc)ck_W
If I'm not mistaken, both multibyte characters are valid UTF-8 encodings and they are defined in ISO-8859-1 as:
0xC2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0xFC LATIN SMALL LETTER U WITH DIAERESIS
I wrote a Java stored function that uses the default connection object to connect to the database, runs a Select query, gets the OracleResultSet, calls the getCLOB method and calls the getAsciiStream() method on the CLOB object. Then it executes the following piece of code to get the XML into a DOM object:
DOMParser parser = new DOMParser();
parser.setPreserveWhitespace(true);
parser.parse(istr); // istr getAsciiStream
XMLDocument xmldoc = parser.getDocument();
Before the stored function can do other thinks, this code seems to throw an exception complaining that the above XML contains "Invalid UTF8 encoding".
Now, when I remove the first mutlibyte character (0xc2, 0x82) from the XML, it parses fine.
Also, when I do not remove this character, but connect via the jdbc racle:thin driver (note that now I'm not running inside the RDBMS as stored function anymore) the XML is parsed with no problem and I can do what ever I want with the XMLDocument. Note that I loaded the sample XML into the database using the thin jdbc driver.
One more thing, I tried two database configurations with WE8ISO8859P1/WE8ISO8859P1 and WE8ISO8859P1/UTF8 and both showed the same problem.
I'll appreciate any help with this issue. Thanks...

I inserted the document once by using the oci8 driver and once by using the thin driver. Then I used the DBMS_LOB package to look at the individual characters and convert those characters using the ASCII function.
It looks like that when I inserted the document using the OCI8 driver, they got converted into a pair of 191 (0xbf) characters. However, when I used the thin driver they ended up being stored as 195 (0xc3) and 130 (0x82).
So it looks like that the OCI8 driver is corrupting the individual characters and that if the characters is not corrupted they cause a following exception to be thrown:
Error: 440, SQL execution error, ORA-29532: Java call terminated by uncaught Java exception: java.io.UTFDataFormatException: Invalid UTF8 encoding. ORA-06512: at "SYSTEM.GETWITHSTYLE", line 0 ORA-06512: at line 1
Note that my other example of mutli-byte character (C<) also gets corrupted by the OCI8 driver but does not cause the above exception to be thrown if it's inserted via the thin driver.
null

Romaji yen sign in Terminal in the UTF-8 encoding

Hello all,
I have a MacBook Pro with a Japanese keyboard running Mac OS X 10.6.2. In Romaji mode, the Japanese keyboard has a dedicated yen sign (¥) key, and Option-¥ produces a backslash (\). In Terminal, for some reason, the ¥ key produces \ without the Option modifier. (Option-¥ also produces \ in Terminal, which is normal behavior.)
A similar situation was discussed in an older topic, http://discussions.apple.com/thread.jspa?messageID=10665836 , where the problem was diagnosed as having the Shift JIS encoding enabled in Terminal. However, this doesn‘t reflect my situation, since the only encoding that is enabled in my Terminal is UTF-8 – and there‘s certainly a yen sign available in UTF-8.
I am able to type other UTF-8 characters in Terminal in Romaji mode; for example, I can type Option-e e to produce é, and entering the command *echo é | od -x* within Terminal shows that the correct UTF-8 byte sequence is generated for é. Since the command *echo -e '\0302\0245'* within Terminal will produce a yen sign there, the problem seems to be connected to the key mapping rather than to a stty interface problem.
Is there anyone running 10.6.2 with a Japanese keyboard who can type the ¥ key in Romaji mode in Terminal with the UTF-8 encoding enabled, and have a yen sign appear rather than a backslash?
(This topic was initially posted in the +Installation and Setup+ forum, and I‘ve taken the advice of a kind soul there to repost the topic in this forum.)

I don't know the exact reason why ¥ is forcefully converted to \ in Terminal (even in UTF-8 encoding), and anyway it would be better to add an option to turn off this conversion (or there may already be a hidden option which I can't find).
But the conversion may be helpful for many users, as expected from the following reasons:
I guess there is no key for backslash on the Japanese keyboard of MacBook Pro. If this is the case, then being able to input \ by just hitting the ¥-key (instead of typing option-¥) may be "useful" for may Terminal users (because \ is used much more frequently than ¥ in programs). Kotoeri has an option to swap ¥ and option-¥ keys (so hitting ¥-key inputs \ and option-¥ inputs ¥), but this setting is global (i.e., not restricted to Terminal.app), so making this as the default setting may confuse most of Japanese users (they don't use Terminal.app at all, but uses ¥ as the currency symbol in other apps). Even Terminal users would use ¥ more frequently than \ in apps other then Terminal, so don't want to modify the global setting.
Another reason may be that there are still many Japanese textbooks for programing which uses ¥ as the escape character (I guess you know why). For example the first C program looks like: printf("Hello World!¥n"); So many beginners would try to input ¥ as written in the textbook, without knowing the escape character in UTF-8 should be \, not ¥. Converting ¥ to \ may be helpful for these users (of course they would be surprised to see not ¥ but \ appears on the screen, but anyway the program would work).
You can send a bug report or feature request at:
http://www.apple.com/feedback/macosx.html

XML stream utf-8 encoding

Hi folks,
I'm trying to establish a CSTRING XML stream with utf-8 encoding. I've only managed to do this in XSTRING so far.
If i use this coding, i get a binary output.
DATA: gt_result TYPE TABLE OF string,
         l_result type string.
constants: encoding type string value 'UTF-8'.
data: g_ixml type ref to if_ixml.
data: g_stream_factory type ref to IF_IXML_STREAM_FACTORY.
data: g_encoding type ref to if_ixml_encoding.
g_ixml = cl_ixml=>create( ).
g_stream_factory = g_ixml->CREATE_STREAM_FACTORY( ).
g_encoding = g_ixml->create_encoding( character_set = 'utf-8'
                                        byte_order = 0 ).
data: resstream type ref to if_ixml_ostream.
resstream = g_stream_factory->create_ostream_cstring( l_result ).
call method resstream->set_encoding
    EXPORTING
      encoding = g_encoding.
* XML Transformieren
CALL TRANSFORMATION id_indent
    SOURCE     itab = it_Itab
    RESULT XML resstream.
* Temporär, XML File erzeugen
refresh gt_result.
APPEND l_result TO gt_result.
CALL METHOD cl_gui_frontend_services=>gui_download
    EXPORTING
      filename         = 'c:test.xml'
    CHANGING
      data_tab         = gt_result
    EXCEPTIONS
      file_write_error = 1.
Without this expression :
g_encoding = g_ixml->create_encoding(
             character_set = 'utf-8' byte_order = 0 ).
I get a cstring stream, but in utf-16.
My question now is, how do I manage to get a utf-8 encoded stream in cstring?
Thanks for your help.
Cheers
Daniel

This is the solution:
METHOD TRANSFORM_XML.
TYPE-POOLS TRUXS.
*********************** XML ***********************************
DATA: GT_RESULT TYPE TABLE OF STRING,
         L_RESULT TYPE ETXML_LINE_STR.
CONSTANTS: ENCODING     TYPE STRING VALUE 'UTF-8'.
DATA: G_IXML TYPE REF TO IF_IXML.
DATA: G_STREAM_FACTORY TYPE REF TO IF_IXML_STREAM_FACTORY.
DATA: G_ENCODING TYPE REF TO IF_IXML_ENCODING.
G_IXML = CL_IXML=>CREATE( ).
G_STREAM_FACTORY = G_IXML->CREATE_STREAM_FACTORY( ).
G_ENCODING = G_IXML->CREATE_ENCODING( CHARACTER_SET = ENCODING
                                        BYTE_ORDER = 0 ).
DATA: RESSTREAM TYPE REF TO IF_IXML_OSTREAM.
RESSTREAM = G_STREAM_FACTORY->CREATE_OSTREAM_XSTRING( L_RESULT ).
CALL METHOD RESSTREAM->SET_ENCODING
    EXPORTING
      ENCODING = G_ENCODING.
* XML Transformieren
CALL TRANSFORMATION ID_INDENT
    SOURCE     ITAB = IT_ITAB
    RESULT XML RESSTREAM.
* XString to String
CALL FUNCTION 'ECATT_CONV_XSTRING_TO_STRING'
    EXPORTING
      IM_XSTRING = L_RESULT
      IM_ENCODING = 'UTF-8'
    IMPORTING
      EX_STRING   = E_XML.
* Temporär, XML File erzeugen
refresh gt_result.
APPEND e_xml TO gt_result.
CALL METHOD cl_gui_frontend_services=>gui_download
    EXPORTING
      filename         = 'c:test.xml'
    CHANGING
      data_tab         = gt_result
    EXCEPTIONS
      file_write_error = 1.
ENDMETHOD.
How can I give the 10 points to myself?
Cheers
Daniel

How to get UTF-8 encoding when create XML using DBMS_XMLGEN and UTL_FILE ?

How to get UTF-8 encoding when create XML using DBMS_XMLGEN and UTL_FILE ?
Hi,
I do generate XML-Files by using DBMS_XMLGEN with output by UTL_FILE
but it seems, the xml-Datafile I get on end is not really UTF-8 encoding
( f.ex. cannot verifying it correct in xmlspy )
my dbms is
NLS_CHARACTERSET          = WE8MSWIN1252
NLS_NCHAR_CHARACTERSET     = AL16UTF16
NLS_RDBMS_VERSION     = 10.2.0.1.0
I do generate it in this matter :
declare
xmldoc CLOB;
ctx number ;
utl_file.file_type;
begin
-- generate fom xml-view :
ctx := DBMS_XMLGEN.newContext('select xml from xml_View');
DBMS_XMLGEN.setRowSetTag(ctx, null);
DBMS_XMLGEN.setRowTag(ctx, null );
DBMS_XMLGEN.SETCONVERTSPECIALCHARS(ctx,TRUE);
-- create xml-file:
xmldoc := DBMS_XMLGEN.getXML(ctx);
-- put data to host-file:
vblob_len := DBMS_LOB.getlength(xmldoc);
DBMS_LOB.READ (xmldoc, vblob_len, 1, vBuffer);
bHandle := utl_file.fopen(vPATH,vFileName,'W',32767);
UTL_FILE.put_line(bHandle, vbuffer, FALSE);
UTL_FILE.fclose(bHandle);
end ;
maybe while work UTL_FILE there is a change the encoding ?
How can this solved ?
Thank you
Norbert
Edited by: astramare on Feb 11, 2009 12:39 PM with database charsets

Marco,
I tryed to work with dbms_xslprocessor.clob2file,
that works good,
but what is in this matter with encoding UTF-8 ?
in my understandig, the xmltyp created should be UTF8 (16),
but when open the xml-file in xmlSpy as UTF-8,
it is not well ( german caracter like Ä, Ö .. ):
my dbms is
NLS_CHARACTERSET = WE8MSWIN1252
NLS_NCHAR_CHARACTERSET = AL16UTF16
NLS_RDBMS_VERSION = 10.2.0.1.0
-- test:
create table nh_test ( s0 number, s1 varchar2(20) ) ;
insert into nh_test (select 1,'hallo' from dual );
insert into nh_test (select 2,'straße' from dual );
insert into nh_test (select 3,'mäckie' from dual );
insert into nh_test (select 4,'euro_€' from dual );
commit;
select * from nh_test ;
S0     S1
1     hallo
1     hallo
2     straße
3     mäckie
4     euro_€
declare
rc sys_refcursor;
begin
open rc FOR SELECT * FROM ( SELECT s0,s1 from nh_test );
dbms_xslprocessor.clob2file( xmltype( rc ).getclobval( ) , 'XML_EXPORT_DIR','my_xml_file.xml');
end;
( its the same when using output with DBMS_XMLDOM.WRITETOFILE )
open in xmlSpy is:
<?xml version="1.0"?>
<ROWSET>
<ROW>
<S0>1</S0>
<S1>hallo</S1>
</ROW>
<ROW>
<S0>2</S0>
<S1>straޥ</S1>
</ROW>
<ROW>
<S0>3</S0>
<S1>m㢫ie</S1>
</ROW>
<ROW>
<S0>4</S0>
<S1>euro_</S1>
</ROW>
</ROWSET>
regards
Norbert

UTF-8 Encoding errors during nightly batch runs

My boss recently tasked me with researching (and hopefully resolving) why our XML frequently has UTF-8 encoding errors.
I've been in the IS world for less than a year now so please bear with me when it comes to terms, data flow, etc.
Overview:
Our Oracle DB spits out XML for the nightly batch runs into a file location, lets say C:\xPression\CustomerData\Certificate.xml. The XML is in Courier New font but some characters make their way into the XML but arent supported. The big one is the elongated ' - ' character. Just one instance of this and the entire XML fails.
When the batch job is run sometimes there are encoding errors (¿, ¡, -, etc) and every morning I have to come in, finding the invalid character, fix it and have the job re-run.
I want to know if there's a way so that the XML that comes out is always in the Courier New font, or is there a way to convert it.

I want to know if there's a way so that the XML that comes out is always in the Courier New font, or is there a way to convert it.
First thing first, an XML file is a text file, it doesn't have a "font" but an encoding.
The font is the graphical representation of characters and it is related to whatever client tool you're using to view the content, not to the content itself.
That being said, a lot of fonts do not support the full range of unicode characters so you may get replacement characters in some case.
We're missing some information to provide an answer :
- what's the database version?
- what's the character set of the database?
- how are you generating and writing the XML to the file ? UTL_FILE, dbms_xslprocessor, dbms_xmldom?
If the file is generated using UTF-8 encoding then the issue might just be that you're not using an UTF-8-enable editor.

UTF-8 encoding vs ISO 8859-1 encoding

The iTunes tech specs call for UTF-8 encoding of the XML feed file; a friend of mine uses feed generator software through his blog that uses ISO 8859 encoding. Is there a way to convert the latter to UTF-8 so that iTunes tags may be successfully added?
When I tried editing his XML file, I got error messages when I submitted the file to RSS feed validator sites (such as http://feedvalidator.org/. Any help or knowledge is appreciated because I am not the least bit expert in this coding arena.

You don't need to convert iso 8859-1 (us-ascii) to utf-8 unless you have nonstandard characters. Basically, ascii is a subset of utf-8 and for English it will serve you just fine. You can have iTunes tags in the xml file even if the file itself is encoded in iso 8859-1.
The error you see at feedvalidator.org is most likely a warning.
Hope this helps!
- Andy Kim
Potion Factory
http://www.potionfactory.com

Steps to UTF-8 Encoding with Oracle 8i and Weblogic 6.1SP1

What are the Steps to UTF-8 Encoding with Oracle 8i and Weblogic
          6.1SP1?
          I have:
          - Oracle 8.1.5 database created with character set=UTF8 and national
          character set=UTF8
          - Weblogic 6.1SP1 without any encoding mechanism set
          (though I did play with
          <jsp-param><param-name>encoding</param-name>
          <param-value>UTF-8</param-value>
          </jsp-param>
          in the weblogic.xml for a while though it seemed not to make a
          difference)
          - JSP pages set to content='text/html; charset=UTF-8'
          - JSP form POSTs set to enctype="UTF-8"
          I can copy and paste Chinese Kanji from a UTF8 encoded web page into
          form text boxes but when I post the data it comes back as different
          Kanji. Then once it is posted the Kanji stays the same on repeated
          posts. The same Kanji text also looks different when viewed in a form
          text box than when viewed as straight text on the page.
          Is there anything else? Or am I already encoding characters twice?
          Please help!
          Mel Christie


Hi Experts,
Please correct me if am asking you the question in wrong way.
I have ARCGIS with oracle database 10gr2 in production server.
My work is to connect AUTOCAD S/W (client computer which is connected in LAN) to ARCGIS in order to access the toposheets available in SDE user.
When iam trying to connect iam getting this error:The specified credentials are not valid or provider is not able to establish a connection.
I checked the path to production server by pinging and user/passcode too but not helpful.
Please help me in this , very urgent.
Thanks.
Edited by: user13355644 on Jul 3, 2010 3:53 AM
Edited by: user13355644 on Jul 22, 2011 2:55 AM

How to write csv or txt file through utl_file with UTF-8 Encoding

Hi All,
I need your help to write the data from DB to csv or txt file with UTF-8 encoding through utl_file.
Database character set:AL32UTF8
Database version:10G
All the columns in the DB are of varchar2 type.
Please let me know if there is any way of doing it.

What was wrong with the info provided in the link(s) given?
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions027.htm#SQLRF00620]

Parsing a UTF-8 encoded XML Blob object

Hi,
I am having a really strange problem, I am fetching a database BLOB object containing the XMLs and then parsing the XMLs. The XMLs are having some UTF-8 Encoded characters and when I am reading the XML from the BLOB, these characters lose their encoding, I had tried doing several things, but there is no means I am able to retain their UTF encoding. The characters causing real problem are mainly double qoutes, inverted commas, and apostrophe. I am attaching the piece of code below and you can see certain things I had ended up doing. What else can I try, I am using JAXP parser but I dont think that changing the parser may help because, here I am storing the XML file as I get from the database and on the very first stage it gets corrupted and I have to retain the UTF encoding. I tried to get the encoding info from the xml and it tells me cp1252 encoding, where did this come into picture and I couldn't try it retaining back to UTF -8
Here in the temp.xml itself gets corrupted. I had spend some 3 days on this issue. Help needed!!!
ResultSet rs = null;
    Statement stmt = null;
    Connection connection = null;
    InputStream inputStream = null;
    long cifElementId = -1;
    //Blob xmlData = null;
    BLOB xmlData=null;
    String xmlText = null;
    RubricBean rubricBean = null;
    ArrayList arrayBean = new ArrayList();
      rs = stmt.executeQuery(strQuery);
     // Iterate till result set has data
      while (rs.next()) {
        rubricBean = new RubricBean();
        cifElementId = rs.getLong("CIF_ELEMENT_ID");
                // get xml data which is in Blob format
        xmlData = (oracle.sql.BLOB)rs.getBlob("XML");
        // Read Input stream from blob data
         inputStream =(InputStream)xmlData.getBinaryStream();
        // Reading the inputstream of data into an array of bytes.
        byte[] bytes = new byte[(int)xmlData.length()];
         inputStream.read(bytes);
       // Get the String object from byte array
         xmlText = new String(bytes);
       // xmlText=new String(szTemp.getBytes("UTF-8"));
        //xmlText = convertToUTF(xmlText);
        File file = new File("C:\\temp.xml");
        file.createNewFile();
        // Write to temp file
        java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));
        out.write(xmlText);
        out.close();

What the code you posted is doing:
// Read Input stream from blob data
inputStream =(InputStream)xmlData.getBinaryStream();Here you have a stream containing binary octets which encode some text in UTF-8.
// Reading the inputstream of data into an
into an array of bytes.
byte[] bytes = new byte[(int)xmlData.length()];
inputStream.read(bytes);Here you are reading between zero and xmlData.length() octets into a byte array. read(bytes[]) returns the number of bytes read, which may be less than the size of the array, and you don't check it.
xmlText = new String(bytes);Here you are creating a string with the data in the byte array, using the platform's default character encoding.
Since you mention cp1252, I'm guessing your platform is windows
// xmlText=new new String(szTemp.getBytes("UTF-8"));I don't know what szTemp is, but xmlText = new String(bytes, "UTF-8"); would create a string from the UTF-8 encoded characters; but you don't need to create a string here anyway.
//xmlText = convertToUTF(xmlText);
File file = new File("C:\\temp.xml");
file.createNewFile();
// Write to temp file
java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));This creates a Writer to write to the file using the platform's default character encoding, ie cp1252.
out.write(xmlText);This writes the string to out using cp1252.
So you have created a string treating UTF-8 as cp1252, then written that string to a file as cp1252, which is to be read as UTF-8. So it gets mis-decoded twice.
As the data is already UTF-8 encoded, and you want the output, just write the binary data to the output file without trying to convert it to a string and then back again:// not tested, as I don't have your Oracle classes
final InputStream inputStream = new BufferedInputStream((InputStream)xmlData.getBinaryStream());
final int length = xmlData.length();
final int BUFFER_SIZE = 1024;                  // these two can be
final byte[] buffer = new byte[BUFFER_SIZE];   // allocated outside the method
final OutputStream out = new BufferedOutputStream(new FileOutputStream(file));
for (int count = 0; count < length; ) {
   final int bytesRead = inputStream.read(buffer, 0, Math.min(BUFFER_SIZE, (length - count));
   out.write(buffer, 0, bytesRead);
   count += bytesRead;
}Pete

[SOLVED] Problems opening folders with UTF-8 encoded characters

Hello everyone, I'm having an issue when I acess folders in all my programs ( except Dolphin File Manager). Every time I open the folder navigation window in my programs, folders with UTF-8 encoded characters ( such as "ç", "á ", "ó", "í", etc ) are not shown or the folder name not show these characters, therefore, I can not open documents inside these folders.
However, as you saw, I can type these characters normally. Here's my "locale.conf" :
LANG="en_US.UTF-8:ISO-8859-1"
LC_TIME="pt_BR.UTF-8:ISO-8859-1"
And here's the output of the command "locale -a" :
C
en_US.utf8
POSIX
Last edited by regmoraes (2015-04-17 12:55:19)

Thing is, when I run locale -a, I get
$ locale -a
C
de_DE@euro
de_DE.iso885915@euro
de_DE.utf8
en_US
en_US.iso88591
en_US.utf8
ja_JP
ja_JP.eucjp
ja_JP.ujis
ja_JP.utf8
japanese
japanese.euc
POSIX
So an entry for every locale I have uncommented in my locale.conf. Just making sure, by "following the steps in the beginner's guide", you also mean running locale-gen?
Are those folders on a linux filesystem like ext4 or on a windows (ntfs?)

UTF-8 encoded JSPs compilation problem

Hi,
          I'm using Weblogic 9.0 Beta. I have an XML-format UTF-8 encoded JSP (with the proper encoding declarations). I can see that this is compiled into a UTF-8 Java servlet by WebLogic.
          At the compilation to a class file though, the encoding is corrupted. I guess that the Java compiler is assuming a system-encoded (which would be ISO-8859-1) Java file instead of the actual UTF-8 encoding.
          This problem did not occur with WebLogic 8.1.
          I have tried to explicitly tell the Java compiler to treat the source files as UTF-8 in weblogic.xml, i.e.
          <jsp-param>
          <param-name>compileFlags</param-name>
          <param-value>-encoding UTF8</param-value>
          </jsp-param>
          but that had no effect.
          Anyone else noticed this?
          I assume that correct behaviour is for WebLogic to preserve encoding from JSP to servlet to class file, rather than for me to set encoding in weblogic.xml. Is that correct?
          Is there a workaround?
          Thanks for any help you can offer!

Solved
It is about Tomcat's character encoding not about the codes..
For more info:
[http://wiki.apache.org/tomcat/Tomcat/UTF-8]

How to read UTF-8 encoded text file randomly?

I am trying to read a text file which has been encoded in UTF-8. The problem is that I need to access the file randomly. The RandomAccessFile is a low-level class and there seems to be no-way to wrap it in InputStreamReader so that UTF-8 encoding can be done on-the-fly. Is there any easy way to do that. Below is the simplified version of my program.
import java.io.*;
public class Test{
        public Test(String filename){
                try{
                        RandomAccessFile rafTemIn = new RandomAccessFile(new File(filename), "r");
                        while(true){
                                char chr = rafTemIn.readChar();
                                System.err.println(chr);
                } catch (EOFException e) {
                        System.err.println("File read.");
                } catch (IOException e) {
                        System.err.println("File input error");
        public static void main(String[] args){
                Test t= new Test("template.idx");
}

The file that I am going to read could be few hundreds of MBs or GBs. Hence, I will index interesting items in the file. The index file contain the keyword and the byte offset in the file. So, I will need to seek to any byte to read it. The file could be UTF-8 encoded XML or UTF-8 encoded plain text.
Also, would like to add-up that in the sample program above I am reading the file sequentially. The concerned class has another method which actually does the reading randomly. If this helps, I am pasting the simplified version of code again but this also includes the said method.
import java.io.*;
public class Test{
        long bloc;
        long eloc;
        RandomAccessFile rafTemIn;
        public Test(String filename){
                bloc=0L;
                eloc=0L;
                try{
                        rafTemIn = new RandomAccessFile(new File(filename), "r");
                        while(true){
                                char chr = rafTemIn.readChar();
                                System.err.println(chr);
                } catch (EOFException e) {
                        System.err.println("File read.");
                } catch (IOException e) {
                        System.err.println("File input error");
        public String getVal(String templateName){
                String stemval=null;
                try {
                        rafTemIn.seek(bloc); //bloc is a long value for beginng location to read from. It changes.
                        byte[] b = new byte[(int)(eloc - bloc + 1L)];
                        rafTemIn.read(b,0,(int) (eloc - bloc + 1L));
                        stemval = new String(b,"UTF-8");
                } catch(IOException eio) {
                        System.err.println("Template Dump file IO error.");
                return stemval;
        public static void main(String[] args){
                Test t= new Test("template.idx");
                System.out.println(t.getVal("wikipedia"));
}

How to save a UTF-8 encoded text file ?

hi People
I have a little script which reads the source text from a layer and saves it to a .txt file. This is on a Mac and all was good until recently when I tried opening the .txt file on a PC in Notepad and found my ˚ degree symbols all whack.
Resaving the .txt file in TextEdit as Unicode (UTF-8) encoding solved the problem, now opens fine in Notepad.
But ideally I'd like the script to output the .txt as UTF-8 in the first place. It's currently Western (Mac OS Roman). I've tryed adding in myfile.encoding = "UTF8" but the resulting file is still Western (and the special charaters have wigged out again)
any help greatly appreciated../daniel
    var theComp = app.project.activeItem;
    var dataRO = theComp.layer("dataRO").sourceText;
    // prompt user to save file
    var theFile = new File ("~/Desktop/"+ theComp.name + "_output.txt");
    theFile = theFile.saveDlg("Save an ASCII export file.");
    if (theFile != null) {          // check user didn't cancel dialog
        theFile.lineFeed = "windows";
        //theFile.encoding = "UTF8";
        theFile.open("w","TEXT","????");
        theFile.writeln("move details:");
        theFile.writeln(dataRO.value.toString());
    theFile.close();

Hi,
Got it, it seems, the utf-8 standard use 2-bytes (and more) encoding on accents and special characters.
I found some info there with some code http://ivoronline.com/Coding/Theory/Tutorials/Encoding%20-%20Text%20-%20UTF%208.php
However there was some error so I fixed it. (However for 3 and 4 bytes characters i didnt test it. So maybe you'll have to change back the 0xbf to 0x3f or something else.)
So here is the code.
Header 1
function convertCharToUTF(character){
    var utfBytes = "";
    c = character.charCodeAt(0)
    if (c < 0x80) {
        utfBytes = String.fromCharCode (c);
    else if (c < 0x800) {
        utfBytes = String.fromCharCode (0xC0 | c>>6);
        utfBytes += String.fromCharCode (0x80 | c & 0xbF);
    else if (c < 0x10000) {
        utfBytes = String.fromCharCode (0xE0 | c>>12);
        utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF);
        utfBytes += String.fromCharCode (0x80 | c & 0xbF);
    else if (c < 0x200000) {
        utfBytes += String.fromCharCode (0xF0 | c>>18);
        utfBytes += String.fromCharCode (0x80 | c>>12 & 0xbF);
        utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF);
        utfBytes =+ String.fromCharCode (0x80 | c & 0xbF);
        return utfBytes
function convertStringToUTF(stringToConvert){
    var utfString = ""
    for (var i = 0 ; i < stringToConvert.length; i++){
        utfString = utfString + convertCharToUTF(stringToConvert.charAt (i))
    return utfString;
var theFile= new File("~/Desktop/_output.txt");
theFile.open("w", "TEXT");
theFile.encoding = "BINARY"
theFile.linefeed = "Unix"
theFile.write("ï»¿");//or theFile.write(String.fromCharCode (0xEF) + String.fromCharCode (0xEB) + String.fromCharCode (0xBF)
theFile.write(convertStringToUTF("Your stuff éàçËôù"));
theFile.close();

Mail adapter and UTF-7 encoded messages

Hi,
a customer of us wants to know if it is possible to receive UTF-7 encoded messages using the Mail adapter.
Is there a configurable parameter to do this? Or is the only solution to change the Mail adapter code. If so are there examples available?
Thanks!

Tamil,
The best thing would be create alerts for both the mapping and adapter alerts. If there is any mapping failure then an email will be send to you. With adapter alerts if there any errors on adapters it will send you a mail. You dont need a fault message for it.
Check this weblogs for creating alerts:
/people/michal.krawczyk2/blog/2005/09/09/xi-alerts--step-by-step
/people/michal.krawczyk2/blog/2005/09/09/xi-alerts--troubleshooting-guide
---Satish

UTF-8 encoding trouble

Similar Messages

Maybe you are looking for