UTF-8 encoding and BOM

I'm reading in a file that's encoded in UTF-8 and begins with the byte-order mark of EF BB BF. I'm curious to know why a byte-order mark is needed for something encoded in UTF-8, because aren't BOMs only used to figure out endianness, which isn't an issue with UTF-8 as some tutorials I've seen say. But then again, UTF-8 can use multiple bytes to specify a character, in which case endianness does matter, right?
My question is therefore whether endianness matters with UTF-8?
Also, at the online converter at the URL http://macchiato.com/unicode/convert.html, why can you convert from FE FF in UTF-16 to EF BB BF in UTF-8, but cannot do the conversion in the opposite way? Is it a problem with the converter, or something to do with the encodings themselves?
Thanks.

http://www.unicode.org/faq/utf_bom.html#25
As for that URL, not likely that anyone here would know what's wrong with it. Could be that it doesn't assume UTF-8 will have a BOM, as it's not needed anyway, so doesn't treat it as such and converts it as if they were regular characters.

Similar Messages

Problems with RFC Adapter, utf-8 encoding and special characters

Hi,
How can I change my enconding UTF-8 for ISO-8859-1 in my RFC ADAPTER SENDER?
Regards,
Sérgio

Hi,
To change encoding of xml you can either use this piesce of xslt after your mapping as a next step
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="ISO-8859-1"/>
<xsl:template match="/">
<xsl:copy-of select="*" />
</xsl:template>
</xsl:stylesheet>
or use adapter module but now I can't remember name of it.
Best regards,
Wojciech

Resource bundle UTF-8 encoding and Turkish special characters problem

Hi dear developers,
I'm developing a WebCenter Portal App in JDev 11.1.1.6.0 version. My project uses resource bundle for being multilingual. I have two bundles :
1) <bundleName>_tr.properties (this the default one)
2) <bundleName>_en.properties (this is the supported locale.)
Now I have problem about the turkish characters in turkish bundle. When i run my project, it has a view just like this --> http://postimage.org/image/wr8nrm345/ (browser view) and http://postimage.org/image/3mf3fp2kl/ (bundleView)
(Watch the "?" chars ! )
You can see the bundle doesn't support Turkish special characters. How can i overcome with this problem?
Thanks in advance.Regards
erdo
Edited by: erdo on 09.Oca.2013 17:24
Edited by: erdo on 09.Oca.2013 17:25

If you want to edit Unicode text resource you need a Unicode-aware text editor.
You can search the Internet, but I had a good experience with Notepad++
You may also ask in the JDeveloper forum ( JDeveloper and ADF ) if there is a way to resolve it. I found this: http://blog.newtrics.com/?p=242 , so I believe it may be possible somehow.

Romaji yen sign in Terminal in the UTF-8 encoding

Hello all,
I have a MacBook Pro with a Japanese keyboard running Mac OS X 10.6.2. In Romaji mode, the Japanese keyboard has a dedicated yen sign (¥) key, and Option-¥ produces a backslash (\). In Terminal, for some reason, the ¥ key produces \ without the Option modifier. (Option-¥ also produces \ in Terminal, which is normal behavior.)
A similar situation was discussed in an older topic, http://discussions.apple.com/thread.jspa?messageID=10665836 , where the problem was diagnosed as having the Shift JIS encoding enabled in Terminal. However, this doesn‘t reflect my situation, since the only encoding that is enabled in my Terminal is UTF-8 – and there‘s certainly a yen sign available in UTF-8.
I am able to type other UTF-8 characters in Terminal in Romaji mode; for example, I can type Option-e e to produce é, and entering the command *echo é | od -x* within Terminal shows that the correct UTF-8 byte sequence is generated for é. Since the command *echo -e '\0302\0245'* within Terminal will produce a yen sign there, the problem seems to be connected to the key mapping rather than to a stty interface problem.
Is there anyone running 10.6.2 with a Japanese keyboard who can type the ¥ key in Romaji mode in Terminal with the UTF-8 encoding enabled, and have a yen sign appear rather than a backslash?
(This topic was initially posted in the +Installation and Setup+ forum, and I‘ve taken the advice of a kind soul there to repost the topic in this forum.)

I don't know the exact reason why ¥ is forcefully converted to \ in Terminal (even in UTF-8 encoding), and anyway it would be better to add an option to turn off this conversion (or there may already be a hidden option which I can't find).
But the conversion may be helpful for many users, as expected from the following reasons:
I guess there is no key for backslash on the Japanese keyboard of MacBook Pro. If this is the case, then being able to input \ by just hitting the ¥-key (instead of typing option-¥) may be "useful" for may Terminal users (because \ is used much more frequently than ¥ in programs). Kotoeri has an option to swap ¥ and option-¥ keys (so hitting ¥-key inputs \ and option-¥ inputs ¥), but this setting is global (i.e., not restricted to Terminal.app), so making this as the default setting may confuse most of Japanese users (they don't use Terminal.app at all, but uses ¥ as the currency symbol in other apps). Even Terminal users would use ¥ more frequently than \ in apps other then Terminal, so don't want to modify the global setting.
Another reason may be that there are still many Japanese textbooks for programing which uses ¥ as the escape character (I guess you know why). For example the first C program looks like: printf("Hello World!¥n"); So many beginners would try to input ¥ as written in the textbook, without knowing the escape character in UTF-8 should be \, not ¥. Converting ¥ to \ may be helpful for these users (of course they would be surprised to see not ¥ but \ appears on the screen, but anyway the program would work).
You can send a bug report or feature request at:
http://www.apple.com/feedback/macosx.html

Parsing a UTF-8 encoded XML Blob object

Hi,
I am having a really strange problem, I am fetching a database BLOB object containing the XMLs and then parsing the XMLs. The XMLs are having some UTF-8 Encoded characters and when I am reading the XML from the BLOB, these characters lose their encoding, I had tried doing several things, but there is no means I am able to retain their UTF encoding. The characters causing real problem are mainly double qoutes, inverted commas, and apostrophe. I am attaching the piece of code below and you can see certain things I had ended up doing. What else can I try, I am using JAXP parser but I dont think that changing the parser may help because, here I am storing the XML file as I get from the database and on the very first stage it gets corrupted and I have to retain the UTF encoding. I tried to get the encoding info from the xml and it tells me cp1252 encoding, where did this come into picture and I couldn't try it retaining back to UTF -8
Here in the temp.xml itself gets corrupted. I had spend some 3 days on this issue. Help needed!!!
ResultSet rs = null;
    Statement stmt = null;
    Connection connection = null;
    InputStream inputStream = null;
    long cifElementId = -1;
    //Blob xmlData = null;
    BLOB xmlData=null;
    String xmlText = null;
    RubricBean rubricBean = null;
    ArrayList arrayBean = new ArrayList();
      rs = stmt.executeQuery(strQuery);
     // Iterate till result set has data
      while (rs.next()) {
        rubricBean = new RubricBean();
        cifElementId = rs.getLong("CIF_ELEMENT_ID");
                // get xml data which is in Blob format
        xmlData = (oracle.sql.BLOB)rs.getBlob("XML");
        // Read Input stream from blob data
         inputStream =(InputStream)xmlData.getBinaryStream();
        // Reading the inputstream of data into an array of bytes.
        byte[] bytes = new byte[(int)xmlData.length()];
         inputStream.read(bytes);
       // Get the String object from byte array
         xmlText = new String(bytes);
       // xmlText=new String(szTemp.getBytes("UTF-8"));
        //xmlText = convertToUTF(xmlText);
        File file = new File("C:\\temp.xml");
        file.createNewFile();
        // Write to temp file
        java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));
        out.write(xmlText);
        out.close();

What the code you posted is doing:
// Read Input stream from blob data
inputStream =(InputStream)xmlData.getBinaryStream();Here you have a stream containing binary octets which encode some text in UTF-8.
// Reading the inputstream of data into an
into an array of bytes.
byte[] bytes = new byte[(int)xmlData.length()];
inputStream.read(bytes);Here you are reading between zero and xmlData.length() octets into a byte array. read(bytes[]) returns the number of bytes read, which may be less than the size of the array, and you don't check it.
xmlText = new String(bytes);Here you are creating a string with the data in the byte array, using the platform's default character encoding.
Since you mention cp1252, I'm guessing your platform is windows
// xmlText=new new String(szTemp.getBytes("UTF-8"));I don't know what szTemp is, but xmlText = new String(bytes, "UTF-8"); would create a string from the UTF-8 encoded characters; but you don't need to create a string here anyway.
//xmlText = convertToUTF(xmlText);
File file = new File("C:\\temp.xml");
file.createNewFile();
// Write to temp file
java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));This creates a Writer to write to the file using the platform's default character encoding, ie cp1252.
out.write(xmlText);This writes the string to out using cp1252.
So you have created a string treating UTF-8 as cp1252, then written that string to a file as cp1252, which is to be read as UTF-8. So it gets mis-decoded twice.
As the data is already UTF-8 encoded, and you want the output, just write the binary data to the output file without trying to convert it to a string and then back again:// not tested, as I don't have your Oracle classes
final InputStream inputStream = new BufferedInputStream((InputStream)xmlData.getBinaryStream());
final int length = xmlData.length();
final int BUFFER_SIZE = 1024;                  // these two can be
final byte[] buffer = new byte[BUFFER_SIZE];   // allocated outside the method
final OutputStream out = new BufferedOutputStream(new FileOutputStream(file));
for (int count = 0; count < length; ) {
   final int bytesRead = inputStream.read(buffer, 0, Math.min(BUFFER_SIZE, (length - count));
   out.write(buffer, 0, bytesRead);
   count += bytesRead;
}Pete

How to get UTF-8 encoding when create XML using DBMS_XMLGEN and UTL_FILE ?

How to get UTF-8 encoding when create XML using DBMS_XMLGEN and UTL_FILE ?
Hi,
I do generate XML-Files by using DBMS_XMLGEN with output by UTL_FILE
but it seems, the xml-Datafile I get on end is not really UTF-8 encoding
( f.ex. cannot verifying it correct in xmlspy )
my dbms is
NLS_CHARACTERSET          = WE8MSWIN1252
NLS_NCHAR_CHARACTERSET     = AL16UTF16
NLS_RDBMS_VERSION     = 10.2.0.1.0
I do generate it in this matter :
declare
xmldoc CLOB;
ctx number ;
utl_file.file_type;
begin
-- generate fom xml-view :
ctx := DBMS_XMLGEN.newContext('select xml from xml_View');
DBMS_XMLGEN.setRowSetTag(ctx, null);
DBMS_XMLGEN.setRowTag(ctx, null );
DBMS_XMLGEN.SETCONVERTSPECIALCHARS(ctx,TRUE);
-- create xml-file:
xmldoc := DBMS_XMLGEN.getXML(ctx);
-- put data to host-file:
vblob_len := DBMS_LOB.getlength(xmldoc);
DBMS_LOB.READ (xmldoc, vblob_len, 1, vBuffer);
bHandle := utl_file.fopen(vPATH,vFileName,'W',32767);
UTL_FILE.put_line(bHandle, vbuffer, FALSE);
UTL_FILE.fclose(bHandle);
end ;
maybe while work UTL_FILE there is a change the encoding ?
How can this solved ?
Thank you
Norbert
Edited by: astramare on Feb 11, 2009 12:39 PM with database charsets

Marco,
I tryed to work with dbms_xslprocessor.clob2file,
that works good,
but what is in this matter with encoding UTF-8 ?
in my understandig, the xmltyp created should be UTF8 (16),
but when open the xml-file in xmlSpy as UTF-8,
it is not well ( german caracter like Ä, Ö .. ):
my dbms is
NLS_CHARACTERSET = WE8MSWIN1252
NLS_NCHAR_CHARACTERSET = AL16UTF16
NLS_RDBMS_VERSION = 10.2.0.1.0
-- test:
create table nh_test ( s0 number, s1 varchar2(20) ) ;
insert into nh_test (select 1,'hallo' from dual );
insert into nh_test (select 2,'straße' from dual );
insert into nh_test (select 3,'mäckie' from dual );
insert into nh_test (select 4,'euro_€' from dual );
commit;
select * from nh_test ;
S0     S1
1     hallo
1     hallo
2     straße
3     mäckie
4     euro_€
declare
rc sys_refcursor;
begin
open rc FOR SELECT * FROM ( SELECT s0,s1 from nh_test );
dbms_xslprocessor.clob2file( xmltype( rc ).getclobval( ) , 'XML_EXPORT_DIR','my_xml_file.xml');
end;
( its the same when using output with DBMS_XMLDOM.WRITETOFILE )
open in xmlSpy is:
<?xml version="1.0"?>
<ROWSET>
<ROW>
<S0>1</S0>
<S1>hallo</S1>
</ROW>
<ROW>
<S0>2</S0>
<S1>straޥ</S1>
</ROW>
<ROW>
<S0>3</S0>
<S1>m㢫ie</S1>
</ROW>
<ROW>
<S0>4</S0>
<S1>euro_</S1>
</ROW>
</ROWSET>
regards
Norbert

Steps to UTF-8 Encoding with Oracle 8i and Weblogic 6.1SP1

What are the Steps to UTF-8 Encoding with Oracle 8i and Weblogic
          6.1SP1?
          I have:
          - Oracle 8.1.5 database created with character set=UTF8 and national
          character set=UTF8
          - Weblogic 6.1SP1 without any encoding mechanism set
          (though I did play with
          <jsp-param><param-name>encoding</param-name>
          <param-value>UTF-8</param-value>
          </jsp-param>
          in the weblogic.xml for a while though it seemed not to make a
          difference)
          - JSP pages set to content='text/html; charset=UTF-8'
          - JSP form POSTs set to enctype="UTF-8"
          I can copy and paste Chinese Kanji from a UTF8 encoded web page into
          form text boxes but when I post the data it comes back as different
          Kanji. Then once it is posted the Kanji stays the same on repeated
          posts. The same Kanji text also looks different when viewed in a form
          text box than when viewed as straight text on the page.
          Is there anything else? Or am I already encoding characters twice?
          Please help!
          Mel Christie


Hi Experts,
Please correct me if am asking you the question in wrong way.
I have ARCGIS with oracle database 10gr2 in production server.
My work is to connect AUTOCAD S/W (client computer which is connected in LAN) to ARCGIS in order to access the toposheets available in SDE user.
When iam trying to connect iam getting this error:The specified credentials are not valid or provider is not able to establish a connection.
I checked the path to production server by pinging and user/passcode too but not helpful.
Please help me in this , very urgent.
Thanks.
Edited by: user13355644 on Jul 3, 2010 3:53 AM
Edited by: user13355644 on Jul 22, 2011 2:55 AM

Mail adapter and UTF-7 encoded messages

Hi,
a customer of us wants to know if it is possible to receive UTF-7 encoded messages using the Mail adapter.
Is there a configurable parameter to do this? Or is the only solution to change the Mail adapter code. If so are there examples available?
Thanks!

Tamil,
The best thing would be create alerts for both the mapping and adapter alerts. If there is any mapping failure then an email will be send to you. With adapter alerts if there any errors on adapters it will send you a mail. You dont need a fault message for it.
Check this weblogs for creating alerts:
/people/michal.krawczyk2/blog/2005/09/09/xi-alerts--step-by-step
/people/michal.krawczyk2/blog/2005/09/09/xi-alerts--troubleshooting-guide
---Satish

FCC with ASCII and UTF-8 encoding issue

Hi,
I have File to IDoc scenario and I am doing FCC which has Japanese chars. (PO with HEADER,1,ITEMS,*)
I have specified UTF-8 encoding in file adapter to processed the file.
Earlier, my source file was in ASCII format which had junk chars; my file was picked and Idoc posted had junk chars.
Then I used UTF-8 encoding for my source file to correct this issue. XI showed proper Japanese chars but this time Header part is missing.
Do I have to specify encoding in "module" for File adapter?
Regards,

Thanks for your replies Chirag/Gabriel,
ISO encoding didn't work.
My source file will be in UTF-8 format.
There is one correction. It is ANSI encoding, not ASCII as in the subject.
I still have this issue when my document offset is 0.
I tried to play around with FCC and found this odd thing.
When first line of my input file is blank....and I omit reading the first line with offset 1, then file is read in its entirety.
Again, when I remove this blank line and the file starts with Header and with offset 0 in File adapter, then again my Header part is missing.
What to do?
Regards,
AV

[svn:fx-trunk] 7661: Change from charset=iso-8859-1" to charset=utf-8" and save file with utf-8 encoding.

Revision: 7661
Author:   [email protected]
Date:     2009-06-08 17:50:12 -0700 (Mon, 08 Jun 2009)
Log Message:
Change from charset=iso-8859-1" to charset=utf-8" and save file with utf-8 encoding.
QA Notes:
Doc Notes:
Bugs: SDK-21636
Reviewers: Corey
Ticket Links:
    http://bugs.adobe.com/jira/browse/iso-8859
    http://bugs.adobe.com/jira/browse/utf-8
    http://bugs.adobe.com/jira/browse/utf-8
    http://bugs.adobe.com/jira/browse/SDK-21636
Modified Paths:
    flex/sdk/trunk/templates/swfobject/index.template.html

same problem here with wl8.1
have you sold it and if yes, how?
thanks

Export SQL View to Flat File with UTF-8 Encoding

I've setup a package in SSIS to export a SQL view to a flat file and it's working fine. I now need to make that flat file UTF-8 encoded. The package executes but still shows the files as ANSI encoded.
My package consists of a Source (SQL View) -> Derived Column (casts the fields to DT_WSTR) -> Destination Flat File (Set to output UTF-8 file).
I don't get any errors to help me troubleshoot further. I'm running SQL Server 2005 SP2.

Unless there is a Byte-Order-Marker (BOM - hex file prefix: EF BB BF) at the beginning of the file, and unless your data contains non-ASCII characters, I'm unsure there is a technical difference in the files, Paul.
That is, even if the file is "encoded" UTF-8, if your data is only ASCII values (decimal values 0-127, hex 00-7F), UTF-8 doesn't really serve a purpose over ANSI encoding. Now if you're looking for UTF-8 with specifically the BOM included, and your data is all standard ASCII, the Flat File Connection Manager can't do that, it seems.
What the flat file connection manager is doing correctly though, is encoding values that are over decimal 127/hex 7F in UTF-8 when the encoding of the connection manager is set to 65001 (UTF-8).
Example:
Input data built with a script component as a source (code at the bottom of this post) and with only one WSTR output column hooked to a flat file destination component:
a string containing only decimal value 225 (german Eszett character - ß)
Encoding set to ANSI 1252 looks like:
E1 0D 0A (which is the ANSI encoding of the decimal character value 225 (E1) and a CR-LF (0D 0A)
Encoding set to UTF-8 65001 looks like:
C3 A1 0D 0A (which is the UTF-8 encoding of the decimal character value 225 (C3 A1) and a CR-LF (0D 0A)
Note that for values over decimal 127, UTF-8 takes at least two bytes and up to four for the remaining values available.
So, I'm comfortable now, after sitting down and going through this, that the flat file connection manager is working correctly, unless you need a BOM.
1
Imports System
2
Imports System.Data
3
Imports System.Math
4
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
5
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
6
7
Public Class ScriptMain
8
    Inherits UserComponent
9
10
    Public Overrides Sub CreateNewOutputRows()
11
        Output0Buffer.AddRow()
12
        Output0Buffer.col1 = ChrW(225)
13
    End Sub
14
15
End Class
Phil

UTF-8 files with BOM chrashes DOMParser?

Hi.
We are storing XML-documents in an 8i databse with UTF-8 encoding (in CLOBS).
Problem: If the Unicode XML-document contains a BOM the oracle.xml.parser.v2.DOMParser's
parse()-method throws an exception.
I get the following output when using the ParseXMLFromURL.java class supplied in JDeveloper 3.2 samples directory:
Sample output >>>>
System Output: XML parse error in file http://localhost/UTF-8_With_BOM.xml
System Output: at line 1, character 1
System Output: Start of root element expected.
<<<<<<< Sample output
If I change the XML-file not to include a BOM the parser works fine.
(I set/unset the BOM using EmEditor from http://www.emurasoft.com/ if you'd like to try for yourselves).
To me it looks like DOMParser interprets the BOM at the start of the XML-file as XML-content instead of a Unicode signature.
IE 5.5 can handle both formats, shouldn't DOMParse also be able to handle that?
Any ideas how I can get DOMParse to work with UTF-8(BOM) XML-files?
Regards,
Jan-Erik
Sample XML:
<?xml version="1.0" encoding='UTF-8'?>
<newsdoc>
<news>
<newstitle>
Document contains no BOM
</newstitle>
<introduction>
See http://www.unicode.org/unicode/faq/utf_bom.html for info on BOM
</introduction>
</news>
</newsdoc>
null

I have the same problem when trying to store UTF-8 encoded XML files with BOM marks in iFS version 1.1.9.0.7.
The database is 8.1.7.1.1 created with UTF-8 charset.
I have loaded the XDK for PLSQL 9.0.2.0.0A into the database and replaced the original %ORACLE_HOME%\lib\xmlparserv2.jar with the one distributed in this XDK.
I get the following error message:
Wed Aug 01 10:10:06 GMT+02:00 2001: \public\CV-Bank\CV_Patrik_Johansson_intDTD_BOM.xml:
oracle.ifs.common.IfsException: IFS-12608: Error while pre-parsing with the SAXParser: at line (1), column (1): oracle.xml.parser.v2.XMLParseException: Start of root element expected.
at oracle.ifs.beans.parsers.IfsXmlParser.preParse(IfsXmlParser.java, Compiled Code)
at java.lang.Exception.<init>(Exception.java, Compiled Code)
at oracle.ifs.common.IfsException.<init>(IfsException.java, Compiled Code)
at oracle.ifs.common.IfsException.<init>(IfsException.java, Compiled Code)
at oracle.ifs.beans.parsers.IfsXmlParser.preParse(IfsXmlParser.java, Compiled Code)
at oracle.ifs.beans.parsers.IfsXmlParser.getParserName(IfsXmlParser.java, Compiled Code)
at oracle.ifs.beans.parsers.IfsXmlParser.parse(IfsXmlParser.java, Compiled Code)
at oracle.ifs.beans.parsers.IfsXmlParser.parse(IfsXmlParser.java, Compiled Code)
at oracle.ifs.utils.common.ParserHelper.parseExistingDocument(ParserHelper.java, Compiled Code)
at oracle.ifs.protocols.ntfs.server.FileProxy.parseFile(FileProxy.java, Compiled Code)
at oracle.ifs.protocols.ntfs.server.FileProxy.cleanupFile(FileProxy.java, Compiled Code)
at oracle.ifs.protocols.ntfs.server.FileProxy.runFileProxy(Native Method)
at oracle.ifs.protocols.ntfs.server.FileProxy.run(FileProxy.java, Compiled Code)
This is a serious problem since we use an XML editor that adds BOM's.
Regards
Patrik Johansson

XML, XSL - HTML(JSP), bad UTF-8 encoding on Tomcat

I use simple transformation to create HTML (JSP) page from XML, XSL (both are encoded in UTF-8}. When doing this in embedded web server in JD9i developer, the result is fine. But when run at Tomcat 4.0.3, UTF-8 chars are displayed as 2 characters. It's interesting that the transformation gives different results:
- in JD9i the UTF-8 chars are left unchanged and shown good in IE browser
- in Tomcat the chars with acute (/) are transformed into "í" , "á" and so on and shown properly, whereas chars with caron (V) are encoded and shown badly.
Does somebody know solution to this ? Thanks.

Doing some more experiments reveals a subtle difference between GET and POST with Tomcat - it may apply to other application servers, but I have only tested with Tomcat 5 and Tomcat 6, so I can't say.
The following is essentially what most people have indicated you need to do to have your JSP handle UTF-8:
<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>My Title</title>
</head>
<body>
</body>
</html>This is what I couldn't get working with my GET (the default for forms) and ended up working when I specified the URIEncoding for the connector in the server.xml file.
It turns out that the above will work with no modifications to the server.xml file if the form method is POST. If you need to be able use GETs then you will need to modify the connector, otherwise it would seem nothing you do at the JSP or servlet level will make a difference, unless you want to convert each parameter manually:
String parameter = new String( request.getParameter("myParam").getBytes("ISO-8859-1"), "UTF-8" );Edited by: ajmasx on Sep 27, 2007 10:20 AM

UTF-8 encoding trouble

I need to use UTF8 encoding throughout a site. For that purpose, I have the following
tags on JSP:
<%@ page contentType="text/html; charset=UTF-8" %>
<meta http-equiv="Content-Type" CONTENT="text/html; charset=UTF-8">
Next, in my weblogic.xml, I have the following:
<jsp-param>
<param-name>encoding</param-name>
<param-value>UTF8</param-value>
</jsp-param>
<charset-params>
<input-charset>
<resource-path>*.jsp</resource-path>
<java-charset-name>UTF8</java-charset-name>
</input-charset>
</charset-params>
Having configured this, I have two simple JSP files. The first one submits a field
(whose contents I enter in Greek), and the second page writes them to a file. The
code for writing to a file looks like this:
FileOutputStream of = new FileOutputStream (fileName, false);
OutputStreamWriter ow = new OutputStreamWriter (of, "UTF-8");
ow.write (request.getParameter("test"));
When I enter the Greek character Alpha as input, the file has a weird string +I in
it. To fix the problem, I did the following (and it works):
String s = request.getParameter ("TestName");
byte b[] = new byte [5000];
b = s.getBytes ();
s = new String (b, "UTF-8");
writeToFile (s);
Which means that for some reason, the page gets the right String, but it seems to
be encoded with default encoding (not UTF8). When I convert it into bytes, and create
another String using the same byte-stream but a different encoding, what I get is
correct UTF-8 encoded string. Please also note that the same problem occurs with
DB as well (Oracle 8.1.7 with UTF8 on Win2k), and fixing the above code fixes problem
at both file and database level.
Rather than the above workaround, what's the proper way to accomplish this?
Thanks,
Raja

In GlassFish i have changed this now below here. Under each listeners both for Network Listeners and Protocols there are an HTTP tab and under that one i have change this,
Network Config
Network Listeners
http-listeners-1
http-listeners-2
admin-listeners
Protocols
http-listeners-1
http-listeners-2
admin-listeners
URI Encoding: UTF-8
Default Response Type: text/plain; charset=UTF-8
Forced Response Type: text/plain; charset=UTF-8
So when i run curl in a terminal window i get this response:
Macintosh:~ jespernyqvist$ curl -I http://neptunediving.com/neptune/index.jsp
HTTP/1.1 200 OK
Date: Mon, 17 May 2010 04:14:17 GMT
Server: GlassFish v3
X-Powered-By: JSP/2.1
Content-Type: text/html;charset=UTF-8
Content-Language: en-US
Transfer-Encoding: chunked
Set-Cookie: JSESSIONID=478269c08e050484d1d6fa29fc44; Path=/neptune
As you can see now my HTTP Header is looking good, no more charset=iso-8859-1. The only problem i have here is that there is no space in between text/html;charset=UTF-8. I think this should be like this instead or not, text/html; charset=UTF-8? I have noticed that they are very case sensitive so maybe this is a problem for me?
On top of my header i have this;
<%@page import="com.neptunediving.*"%>
<%@include file="WEB-INF/include/LangSupport.jsp"%>
<%@page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
In my header i have this;
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
I have changed in the preferences for Eclipse to use UTF-8. I have gone thru all properties files in my project and changed them to UTF-8 also. So what else are they to change?
Still my page is nor displayed properly, now in all browsers like Safari, Firefox, Opera and Internet Explorer. So what is wrong with my page since this don't work for me? Can anybody please explain this to me?

UTF-8 Encoding errors during nightly batch runs

My boss recently tasked me with researching (and hopefully resolving) why our XML frequently has UTF-8 encoding errors.
I've been in the IS world for less than a year now so please bear with me when it comes to terms, data flow, etc.
Overview:
Our Oracle DB spits out XML for the nightly batch runs into a file location, lets say C:\xPression\CustomerData\Certificate.xml. The XML is in Courier New font but some characters make their way into the XML but arent supported. The big one is the elongated ' - ' character. Just one instance of this and the entire XML fails.
When the batch job is run sometimes there are encoding errors (¿, ¡, -, etc) and every morning I have to come in, finding the invalid character, fix it and have the job re-run.
I want to know if there's a way so that the XML that comes out is always in the Courier New font, or is there a way to convert it.

I want to know if there's a way so that the XML that comes out is always in the Courier New font, or is there a way to convert it.
First thing first, an XML file is a text file, it doesn't have a "font" but an encoding.
The font is the graphical representation of characters and it is related to whatever client tool you're using to view the content, not to the content itself.
That being said, a lot of fonts do not support the full range of unicode characters so you may get replacement characters in some case.
We're missing some information to provide an answer :
- what's the database version?
- what's the character set of the database?
- how are you generating and writing the XML to the file ? UTL_FILE, dbms_xslprocessor, dbms_xmldom?
If the file is generated using UTF-8 encoding then the issue might just be that you're not using an UTF-8-enable editor.

UTF-8 encoding and BOM

Similar Messages

Maybe you are looking for