Encoding autodetection chooses ISO-8859-5 instead of UTF-8

A certain site producing HTML in UTF-8 without meta tags. The Firefox detects the encoding as ISO-8859-5, an absolutely dead and useless 'standard'. How to exclude it from the autodetection and maki it proper unicode?

It's a strange guess, if it's a guess. Are you sure the server isn't sending a header specifying that encoding? To view the content-type header, you can use an add-on (or if it's not a secure page, an external proxy):
* Live HTTP Headers extension: https://addons.mozilla.org/en-us/firefox/addon/live-http-headers/
* Firebug extension: https://addons.mozilla.org/en-US/firefox/addon/firebug/
* Fiddler debugging proxy: http://www.fiddler2.com/fiddler2/

Similar Messages

JasperReports Integration PDF is created with ISO-8859-1 instead of UTF-8

Hi,
This is a strange problem i'm struggling with. Since about one month the PDF's are created with the wrong character set. Instead of UTF-8 (this is also given as a parameter calling the procedure as you see below) the PDF are always created with ISO-8859-1.
declare
l_jasper_report_url VARCHAR2(100) DEFAULT 'http://localhost:8180/JasperReportsIntegration/report';
l_rep_name VARCHAR2(80) DEFAULT'SLRReports/SLR_Statistic_Service_Level_Report_Monat';
l_rep_format VARCHAR2(10) DEFAULT 'pdf';
l_data_source VARCHAR2(20) DEFAULT 'default';
l_rep_locale VARCHAR2(10) DEFAULT 'de_CH';
l_rep_encoding VARCHAR2(10) DEFAULT 'UTF-8';
l_StartDate DATE;
l_EndDate DATE;
l_MinDowntime INTEGER DEFAULT 1;
l_ServiceHour CHAR DEFAULT 'Y';
l_name VARCHAR2(50);
l_typ VARCHAR2(20) DEFAULT 'monatsreport_ccps';
l_additional_params VARCHAR2(200);
l_mime_type VARCHAR2(30);
l_blob BLOB;
BEGIN
l_StartDate := to_date('01.'||to_char( add_months(SYSDATE,-12),'MM.YYYY'),'DD.MM.YYYY');
l_EndDate := to_date('30.'||to_char( add_months(SYSDATE,-1),'MM.YYYY'),'DD.MM.YYYY');
l_name := 'slr-statistik-'||to_char(l_StartDate,'YYYY-MM')||'.pdf';
l_additional_params := 'UserName=SYSTEM'||chr(38)||'String_StartDate='||to_char(l_StartDate,'dd.mm.yyyy')||chr(38)||'String_EndDate='||to_char(l_EndDate,'dd.mm.yyyy')||chr(38)||'SLR_MinDowntime='||l_MinDowntime||chr(38)||'SLR_ServiceHour='||chr(38)||l_ServiceHour;
xlog ('PRC_GET_REPORT_TUNNEL', 'url (orig):' || l_rep_name||':'||l_additional_params);
-- generate the report and return in BLOB
xlib_jasperreports.set_report_url (l_jasper_report_url);
xlib_jasperreports.get_report (
p_rep_name => l_rep_name,
p_rep_format => l_rep_format,
p_data_source => l_data_source,
p_rep_locale => l_rep_locale,
p_rep_encoding => l_rep_encoding,
p_additional_params => l_additional_params,
p_out_blob => l_blob,
p_out_mime_type => l_mime_type );
dbms_output.put_line('p_out_mime_type: '||l_mime_type);
-- insert into report (id,name,typ,mime_type,lob_text,datum) values (p_report_seq.nextval,l_name,l_typ,l_mime_type,l_blob,sysdate);
-- commit;
-- apex_application.g_unrecoverable_error := TRUE;
EXCEPTION
WHEN OTHERS THEN
xlog ('PUT_BLB', SQLERRM, 'ERROR');
RAISE;
END;
It looks like the procedure UTL_HTTP.get_header ignores the encoding parameter and overwrites it with ISO-8859-1. I also tried to set it with UTL_HTTP.set_header before calling UTL_HTTP.get_header buth that didn't help either.
Then I started the JVM is started with the UTF-8 option and that made no change.
There were no new installation on Oracle, tomcat or JasperReports side. So I'm wondering what has caused this string behaviour.
Does anyone had this problem before or has a solution for it?
I appreciate your help very much.
Thanks and regards,
Chris

Hi,
As my understanding, you could choose the encoding ways by yourself:
Change your Internet Explorer 9 language
encoding settings
Alex Zhao
Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

IE 9 incorrectly encoding Unicode characters in URIs to ISO-8859-1 instead of UTF8

Lets take the example word
präsentation
In Firefox, if I specify that as a CGI parameter, on the receiving end, I recieve:
pr\\303\\244sentation
which decoded as UTF-8 gives me: pr{U+00E4}sentation or my submitted word präsentation.
What does IE give me, well let's see.
pr\\344sentation
which well, doesn't decode as UTF8 because 0o344 is 0xE4.
ä in Unicode is at the codeopint 0xE4. Which as we've seen above, encoded to UTF8 is
0xC3 0xA4
So question boils down to this.
Why does IE9 use ISO-8859-1 instead of UTF8 for non-ASCII characters in URIs?

Hi,
As my understanding, you could choose the encoding ways by yourself:
Change your Internet Explorer 9 language
encoding settings
Alex Zhao
Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

[svn:fx-trunk] 7661: Change from charset=iso-8859-1" to charset=utf-8" and save file with utf-8 encoding.

Revision: 7661
Author:   [email protected]
Date:     2009-06-08 17:50:12 -0700 (Mon, 08 Jun 2009)
Log Message:
Change from charset=iso-8859-1" to charset=utf-8" and save file with utf-8 encoding.
QA Notes:
Doc Notes:
Bugs: SDK-21636
Reviewers: Corey
Ticket Links:
    http://bugs.adobe.com/jira/browse/iso-8859
    http://bugs.adobe.com/jira/browse/utf-8
    http://bugs.adobe.com/jira/browse/utf-8
    http://bugs.adobe.com/jira/browse/SDK-21636
Modified Paths:
    flex/sdk/trunk/templates/swfobject/index.template.html

same problem here with wl8.1
have you sold it and if yes, how?
thanks

Encoding XML in ISO-8859-1 from a unicode system

Hello
I want to generate an XML with an encoding ISO-8859-1. I'm on a unicode platform.
I've done the following program :
It works well with the line 'encoding UTF-16.
With the line encoding 'encoding ISO ...', I have special characters in the sting xml_string.
NB : The program works correctly on a non-unicode platform.
Can you help me ?
Thank you
REPORT .
DATA : xml_string TYPE string.
DATA : BEGIN OF l_id,
         numero(10),
         systeme   TYPE gsval,
         date      TYPE d,
         heure     TYPE uzeit,
         type(7),
         nb_nid TYPE i,
       END OF l_id.
DATA: ixml            TYPE REF TO if_ixml,
      streamfactory   TYPE REF TO if_ixml_stream_factory,
      encoding        TYPE REF TO if_ixml_encoding,
      ixml_ostream    TYPE REF TO if_ixml_ostream.
START-OF-SELECTION.
l_id-date    = sy-datum.
l_id-heure   = sy-uzeit.
l_id-type    = 'BATCH'.
ixml = cl_ixml=>create( ).
streamfactory = ixml->create_stream_factory( ).
ixml_ostream = streamfactory->create_ostream_cstring( xml_string ).
encoding = ixml->create_encoding( character_set = 'ISO-8859-1' byte_order = 0 ).
encoding = ixml->create_encoding( character_set = 'UTF-16' byte_order = 0 ).
ixml_ostream->set_encoding( encoding = encoding ).
CALL TRANSFORMATION ztest_xml
        SOURCE id   = l_id
        RESULT XML ixml_ostream.
BREAK-POINT.

Forum rules say: no mail (we must share the solution)
I didn't understand what was exactly his issue, and what he exactly meant by "then to convert with the good encoding".
His first sentence means that he used the following program (using Xstring instead of string):
REPORT .
DATA : xml_xstring TYPE xstring.
DATA : BEGIN OF l_id,
numero(10),
systeme TYPE gsval,
date TYPE d,
heure TYPE uzeit,
type(7),
nb_nid TYPE i,
END OF l_id.
DATA: ixml TYPE REF TO if_ixml,
streamfactory TYPE REF TO if_ixml_stream_factory,
encoding TYPE REF TO if_ixml_encoding,
ixml_ostream TYPE REF TO if_ixml_ostream.
START-OF-SELECTION.
l_id-date = sy-datum.
l_id-heure = sy-uzeit.
l_id-type = 'BATCH'.
ixml = cl_ixml=>create( ).
streamfactory = ixml->create_stream_factory( ).
ixml_ostream = streamfactory->create_ostream_xstring( xml_xstring ).
encoding = ixml->create_encoding( character_set = 'ISO-8859-1' byte_order = 0 ).
ixml_ostream->set_encoding( encoding = encoding ).
CALL TRANSFORMATION id
SOURCE id = l_id
RESULT XML ixml_ostream.
* in debug here, you'll see that xml_xstring contains
* XML result in ISO-8859-1 encoding
BREAK-POINT.

SPA504G SPA514G Default Character Encoding stay in ISO-8859-1

Hi,
I have configure like:
<Dictionary_Server_Script ua="na">serv=http://{{ provisioning.server }}/telecom/language/;d0=English;x0=spa50x_30x_en_v754.xml;d1=French;x1=spa50x_30x_fr_v754.xml;</Dictionary_Server_Script>
<Language_Selection ua="na">French</Language_Selection>
<Default_Character_Encoding ua="na">UTF-8</Default_Character_Encoding>
<Locale ua="na">fr-FR</Locale>
Dictionary and Provisioning Profile are encoded in UTF-8.
but when the phone start after provisioning the Default_Character_Encoding set to ISO-8859-1
and the lines labels are misprinted.
Ligne 1
Ligne 2
Olivier
FranÃ§oise
instead of
Ligne 1
Ligne 2
Olivier
Françoise
Any idea ?

I got an answer from the developer.
Pasted here.
I think the default encoding is set back to ISO8859 after customer download the dictionary.
Here is the reason: after 7.5.3, SPA 50x will parse the trkLocaleName in dictionary, for French it will set the phone’s default encoding to iso8859-1 since it is good to French.
French
=================================
•1. If the customer want to use UTF-8 after xml downloading, please modify the trkLocaleName in the French dictionary xml as following:
croatian
It is a workaround, but it's strange why French user will use UTF8. Thanks.
•2. Another way is that user can manually set the default encoding value to UTF-8 after xml downloading.

XML data encoding iso-8859-1 . Currently utf-16 is default encoding

Hello ABAP Gurus ,
Need a help from you .
Scenario : We have SAP4.7 enterprise version which we have now converted to Unicode system . There is a BSP application which talks to an external web application (Non Unicode) thru HTTP protocol and sends data thru XML.
Problem : Problem is at the time when BSP application prepares the XML While preparing XML data , before converting to Unicode environment the encoding was "iso-8859-1" . But now after Unicode conversion , the encoding is "UTF-16".
The XML data looks like
<?xml version="1.0" encoding="utf-16"?><DATA><ACTION>CREATE_TICKET</ACTION><CRI
I have tried replacing "utf-16" by "iso-8859-1 " . The interface works . But at the recieving end , ie external web application , the German umlauts appear as some garbage values .
I know we need to enforce encoding . I have tried with the following code but could not suceed . The encoding appears as "utf-16".
Following is the section of code which I have written in BSP application.
Daten in DOM-Baum wandeln
    CALL FUNCTION 'SDIXML_DATA_TO_DOM'
      EXPORTING
        name        = 'DATA'
        dataobject = ls_cr_xml
      IMPORTING
        data_as_dom = if_dom
      CHANGING
        document    = if_document
      EXCEPTIONS
        OTHERS      = 1.
    IF sy-subrc NE 0.
      error_out text-f47 text-f48 space space.
    ENDIF.
DOM-Baum in Character-Stream wandeln
    if_pixml = cl_ixml=>create( ).
    IF if_pixml IS INITIAL.
      error_out text-f50 text-f53 space space.
    ENDIF.
    if_pstreamfact = if_pixml->create_stream_factory( ).
    IF if_pstreamfact IS INITIAL.
      error_out text-f51 text-f53 space space.
    ENDIF.
    if_postream = if_pstreamfact->create_ostream_cstring( string = xml_doc ).
    IF if_pstreamfact IS INITIAL.
      error_out text-f52 text-f53 space space.
    ENDIF.
--Encoding--
data: gv_str type string.
data: gv_l_xml_encoding type ref to if_ixml_encoding.
data gv_l_resultb type boolean.
gv_str = 'ISO-8859-1' .
clear gv_l_xml_encoding.
      call method if_pixml->create_encoding
        EXPORTING
          byte_order    = 0
          character_set = gv_str
        RECEIVING
          rval          = gv_l_xml_encoding.
      clear gv_l_resultb.
      call method gv_l_xml_encoding->set_character_set
        EXPORTING
          charset = gv_str
        RECEIVING
          rval    = gv_l_resultb.
        call method if_document->set_encoding
        EXPORTING
          encoding = gv_l_xml_encoding.
----Append child -
CALL METHOD if_document->append_child
      EXPORTING
        new_child = if_dom
      RECEIVING
        rval      = lv_return.
    IF lv_return NE 0.
      error_out text-f47 text-f49 space space.
    ENDIF.
--Render the XML data to output stream--
    CALL METHOD if_document->render
      EXPORTING
        ostream = if_postream.
After this section of code is executed , the variable xml_doc gets filled up with XML data .
<?xml version="1.0" encoding="utf-16"?><DATA><ACTION>CREATE_TICKET</ACTION><CRI
Can anybody help how to enforce encoding ?
Regards,
Laxman Nayak.

Hi Aslam Riaz,
Did you find any solution..?
Kindly help me on this issue.
Thanks and Regards,
Shailaja Chityala

Any Other "standard encoding character Like ISO 8859-1 "

hi all
Can any body suggest some standard sncoding character set like
ISO-8859-1 .
Its giving some problem.
Its not replacing "Space" with %20.
Thanks

see what i am doing here is first getting the default URL and then appended some parameter like msgtxt="BAL SO000IN" with the URL.
to encode this "space" i am using
EncodingUtil.formUrlEncode(nameValuePairs,Constants.ISO_8859_1);
But the the Http not supporting this type of charcter encode formar
like ISO 8859_1

Error(3): Invalid encoding specified, expecting ISO-8859-1, got windows-125

What to do about this??Don't know why this isn't working even when I change the charset to ISO-8859-1in my code file,,,what's the solution??please help here.....

I was missing the pageEncoding parameter in my included line but even after having included it it's giving me this error msg,
" Error: recursive include directive "
I just feel some conflict might be arising between my code file and web.xml.The web.xml file's code is:-
<?xml version = '1.0' encoding = 'ISO-8859-1'?>
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd" version="2.4" xmlns="http://java.sun.com/xml/ns/j2ee">
<description>Empty web.xml file for Web Application</description>
<session-config>
<session-timeout>35</session-timeout>
</session-config>
<mime-mapping>
<extension>html</extension>
<mime-type>text/html</mime-type>
</mime-mapping>
<mime-mapping>
<extension>txt</extension>
<mime-type>text/plain</mime-type>
</mime-mapping>
<jsp-config>
     <jsp-property-group>
     <url-pattern>*.jsp</url-pattern>
     <include-prelude>/header2.jsp</include-prelude>
     <include-coda>/footer2.jsp</include-coda>
     </jsp-property-group>
</jsp-config>
</web-app>
-----------------------------------------------------------------------------------------------

Reverting from UTF-8 to ISO-8859-1

Hi,
i have a database installed in UTF-8, it´s a new instalation and the guides i had didnt mention any restrictions on characterset for the teams that were migrating.
Well the problem is some teams are moving some of their projects to the new server and can´t insert in a VARCHAR2 (3), for example the word "não".
My question is: Can i change the whole database to ISO-8859-1 instead of UTF-8 in order to have words like "não" inserted correctly? If so, is it a simple alter database or a more complicated operation?
Another question, is there any possibility of letting the database as is and make it work without expanding the fields value restriction?
Alx

You can't change a database character set from ISO-8859-1 to UTF8. You can only move from one character set to a strict superset, which doesn't apply here. The supported way to change the character set here would be to create a new database with the ISO-8859-1 character set, export the existing data, and import it into the new system. That assumes, of course, that all the existing characters have an ISO-8859-1 representation (characters like the Euro symbol or Microsoft's curly quotes do not).
By default, a VARCHAR2(3) allocates 3 bytes of space for data. That gets complicated when you use a multi-byte character set like UTF-8 where a character like 'ã' requires 2 bytes of storage. You can define the columns as VARCHAR2(3 CHAR) to allocate 3 characters of storage regardless of the character set. You can also set the parameter NLS_LENGTH_SEMANTICS to CHAR to make the default when you create a table that character rather than byte length semantics are set. Personally, if I'm creating a UTF8 database, I'd want to set NLS_LENGTH_SEMANTICS to CHAR.
Justin

How to set the Xml Encoding ISO-8859-1 to Transformer or DOMSource

I have a xml string and it already contains an xml declaration with encoding="ISO-8859-1". (In my real product, since some of the element/attribute value contains a Spanish character, I need to use this encoding instead of UTF-8.) Also, in my program, I need to add more attributes or manipulate the xml string dynamically, so I create a DOM Document object for that. And, then, I use Transformer to convert this Document to a stream.
My problme is: Firstly, once converted through the Transformer, the xml encoding changed to default UTF-8, Secondly, I wanted to check whether the DOM Document created with the xml string maintains the Xml Encoding of ISO-8859-1 or not. So, I called Document.getXmlEncoding(), but it is throwing a runtime error - unknown method.
Is there any way I can maintain the original Xml Encoding of ISO-8859-1 when I use either the DOMSource or Transformer? I am using JDK1.5.0-12.
Following is my sample program you can use.
I would appreciate any help, because so far, I cannot find any answer to this using the JDK documentation at all.
Thanks,
Jin Kim
import java.io.*;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Attr;
import org.xml.sax.InputSource;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Templates;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;
public class XmlEncodingTest
    StringBuffer xmlStrBuf = new StringBuffer();
    TransformerFactory tFactory = null;
    Transformer transformer = null;
    Document document = null;
    public void performTest()
        xmlStrBuf.append("<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n")
                 .append("<TESTXML>\n")
                 .append("<ELEM ATT1=\"Yes\" />\n")
                 .append("</TESTXML>\n");
        // the encoding is set to iso-8859-1 in the xml declaration.
        System.out.println("initial xml = \n" + xmlStrBuf.toString());
        try
            //Test1: Use the transformer to ouput the xmlStrBuf.
            // This shows the xml encoding result from the transformer which will change to UTF-8
            tFactory = TransformerFactory.newInstance();
            transformer = tFactory.newTransformer();
            StreamSource ss = new StreamSource( new StringBufferInputStream( xmlStrBuf.toString()));
            System.out.println("Test1 result = ");
            transformer.transform( ss, new StreamResult(System.out));
            //Test2: Create a DOM document object for xmlStrBuf and manipulate it by adding an attribute ATT2="No"
            DocumentBuilderFactory dfactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = dfactory.newDocumentBuilder();
            document = builder.parse( new StringBufferInputStream( xmlStrBuf.toString()));
            // skip adding attribute since it does not affect the test result
            // Use a Transformer to output the DOM document. the encoding becomes UTF-8
            DOMSource source = new DOMSource(document);
            StreamResult result = new StreamResult(System.out);
            System.out.println("\n\nTest2 result = ");
            transformer.transform(source, result);
        catch (Exception e)
            System.out.println("<performTest> Exception caught. " + e.toString());
    public static void main( String arg[])
        XmlEncodingTest xmlTest = new XmlEncodingTest();
        xmlTest.performTest();
}

Thanks DrClap for your answer. With your information, I rewrote the sample program as in the following, and it works well now as I intended! About the UTF-8 and Spanish charaters, I think you are right. It looks like there can be many factors involved on this subject though - for example, the real character sets used to create an xml document, and the xml encoding information declared will matter. The special character I had a trouble was u00F3, and somehow, I found out that Sax Parser or even Document Builder parser does not like this character when encoding is set to "UTF-8" in the Xml document. My sample program below may not be a perfect example, but if you replaces ISO-8859-1 with UTF-8, and not setting the encoding property to the transfermer, you may notice that the special character in my example is broken in Test1 and Test2. In my sample, I decided to use ByteArrayInputStream instead of StringBufferInpuptStream because the documentation says StringBufferInputStream may have a problem with converting characters into bytes.
Thanks again for your help!
Jin Kim
import java.io.*;
import java.util.*;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Attr;
import org.xml.sax.InputSource;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Templates;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;
* XML encoding test for Transformer
public class XmlEncodingTest2
    StringBuffer xmlStrBuf = new StringBuffer();
    TransformerFactory tFactory = null;
    Document document = null;
    public void performTest()
        xmlStrBuf.append("<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n")
                 .append("<TESTXML>\n")
                 .append("<ELEM ATT1=\"Resoluci�n\">\n")
                 .append("Special charatered attribute test")
                 .append("\n</ELEM>")
                 .append("\n</TESTXML>\n");
        // the encoding is set to iso-8859-1 in the xml declaration.
        System.out.println("**** Initial xml = \n" + xmlStrBuf.toString());
        try
            //TransformerFactoryImpl transformerFactory = new TransformerFactoryImpl();
            //Test1: Use the transformer to ouput the xmlStrBuf.
            tFactory = TransformerFactory.newInstance();
            Transformer transformer = tFactory.newTransformer();
            byte xmlbytes[] = xmlStrBuf.toString().getBytes("ISO-8859-1");
            StreamSource streamSource = new StreamSource( new ByteArrayInputStream( xmlbytes ));
            ByteArrayOutputStream xmlBaos = new ByteArrayOutputStream();
            Properties transProperties = transformer.getOutputProperties();
            transProperties.list( System.out); // prints out current transformer properties
            System.out.println("**** setting the transformer's encoding property to ISO-8859-1.");
            transformer.setOutputProperty("encoding", "ISO-8859-1");
            transformer.transform( streamSource, new StreamResult( xmlBaos));
            System.out.println("**** Test1 result = ");
            System.out.println(xmlBaos.toString("ISO-8859-1"));
            //Test2: Create a DOM document object for xmlStrBuf to add a new attribute ATT2="No"
            DocumentBuilderFactory dfactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = dfactory.newDocumentBuilder();
            document = builder.parse( new ByteArrayInputStream( xmlbytes));
            // skip adding attribute since it does not affect the test result
            // Use a Transformer to output the DOM document.
            DOMSource source = new DOMSource(document);
            xmlBaos.reset();
            transformer.transform( source, new StreamResult( xmlBaos));
            System.out.println("\n\n****Test2 result = ");
            System.out.println(xmlBaos.toString("ISO-8859-1"));
            //xmlBaos.flush();
            //xmlBaos.close();
        catch (Exception e)
            System.out.println("<performTest> Exception caught. " + e.toString());
        finally
    public static void main( String arg[])
        XmlEncodingTest2 xmlTest = new XmlEncodingTest2();
        xmlTest.performTest();
}

Java App on Linux : Unable to read iso-8859-1 encoded file correctly.

I have a file which is encoded as iso-8859-1, and contains characters such as ô .
I am reading this file with java code, something like:
File in = new File("myfile.csv");
InputStream fr = new FileInputStream(in);
byte[] buffer = new byte[4096];
while (true) {
int byteCount = fr.read(buffer, 0, buffer.length);
if (byteCount <= 0) {
break;
String s = new String(buffer, 0, byteCount,"ISO-8859-1");
System.out.println(s);
However the ô character is always garbled, usually printing as a ? .
I am running this on a Linux machine. It works fine on my XP machine.
I have verified that I can see the correct characters when I cat the file on the terminal.
(Interestingly, but I think maybe only by co-incidence, it works when I run with the -Dfile.encoding=UTF16 option, but not with UTF8, although this appears a hack rather than a fix since this option was not intended for developer use by sun - but I thought mentioning it may provide some clues as to what is going on)

I think your main probelm is with the console. When you send text to the console, it's sent in the system default encoding. On an English-locale system that might be ASCII, ISO-8859-1, windows-1252, UTF-8, MacRoman, and probably several other possibilities. Then the console decodes the the bytes using whatever encoding it feels like using--on my WinXP machine, it uses cp437 by default (just for laughs, as far as I can tell). If the text happens to be pure, seven-bit ASCII, there's no problem, since all those encodings are identical in that range.
But if you need to output anything other than ASCII characters, avoid the console. Send the output to a file and specify an encoding that you know will be able to handle your characters--UTF-8 can handle anything. Then open the file with an editor that can read that encoding; most of them can handle UTF-8 these days, and many will even detect it automatically. You also need to be using a font that can display your characters.
However, you're also going about the reading part wrong. Instead of reading the text in as bytes and passing them to a String constructor, you should use an InputStreamReader and read it as text from the beginning: BufferedReader br = new BufferedReader(
new InputStreamReader(
new FileInputStream("myfile.csv"), "ISO-8859-1"));I am curious about your statement that "it works" when you run with the -Dfile.encoding=UTF16 option. I wouldn't be surprised to see it output the correct characters (ASCII characters, anyway), but I would expect to see the characters interspersed with blank spaces or rectangles.

File adapter ISO-8859-1 encoding problems in XI 3.0

We are using the XI 3.0 file adapter and are experiencing some XML encoding troubles.
A SAP R/3 system is delivering an IDoc outbound. XI picks up the IDoc and converts it to an external defined .xml file. The .xml file is send to a connected ftp-server. At the remote FTP server the file is generating an error, as it is expected to arrive in ISO-8859-1 encoding. The Transfer Mode is set to Binary, File Type Text, and Encoding ISO-8859-1.
The .xml file is encoded correctly in ISO-8859-1, but the problem is that the XML encoding declaration has the wrong value 'UTF-8'.
Does anybody know of a work around, to change the encoding declaration to ISO-8859-1 in the message mapping program?

An example of the XSL code might be as follow:
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method='xml' encoding='ISO-8859-1' />
<xsl:template match="/">
<xsl:copy-of select="*" />
</xsl:template>
</xsl:stylesheet>

Problems with reading XML files with ISO-8859-1 encoding

Hi!
I try to read a RSS file. The script below works with XML files with UTF-8 encoding but not ISO-8859-1. How to fix so it work with booth?
Here's the code:
import java.io.File;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import java.net.*;
* @author gustav
public class RSSDocument {
    /** Creates a new instance of RSSDocument */
    public RSSDocument(String inurl) {
        String url = new String(inurl);
        try{
            DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
            Document doc = builder.parse(url);
            NodeList nodes = doc.getElementsByTagName("item");
            for (int i = 0; i < nodes.getLength(); i++) {
                Element element = (Element) nodes.item(i);
                NodeList title = element.getElementsByTagName("title");
                Element line = (Element) title.item(0);
                System.out.println("Title: " + getCharacterDataFromElement(line));
                NodeList des = element.getElementsByTagName("description");
                line = (Element) des.item(0);
                System.out.println("Des: " + getCharacterDataFromElement(line));
        } catch (Exception e) {
            e.printStackTrace();
    public String getCharacterDataFromElement(Element e) {
        Node child = e.getFirstChild();
        if (child instanceof CharacterData) {
            CharacterData cd = (CharacterData) child;
            return cd.getData();
        return "?";
}And here's the error message:
org.xml.sax.SAXParseException: Teckenkonverteringsfel: "Malformed UTF-8 char -- is an XML encoding declaration missing?" (radnumret kan vara f�r l�gt).
    at org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1100)
    at org.apache.crimson.parser.InputEntity.fillbuf(InputEntity.java:1072)
    at org.apache.crimson.parser.InputEntity.isXmlDeclOrTextDeclPrefix(InputEntity.java:914)
    at org.apache.crimson.parser.Parser2.maybeXmlDecl(Parser2.java:1183)
    at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:653)
    at org.apache.crimson.parser.Parser2.parse(Parser2.java:337)
    at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:448)
    at org.apache.crimson.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:185)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
    at getrss.RSSDocument.<init>(RSSDocument.java:25)
    at getrss.Main.main(Main.java:25)

I read files from the web, but there is a XML tag
with the encoding attribute in the RSS file.If you are quite sure that you have an encoding attribute set to ISO-8859-1 then I expect that your RSS file has non-ISO-8859-1 character though I thought all bytes -128 to 127 were valid ISO-8859-1 characters!
Many years ago I had a problem with an XML file with invalid characters. I wrote a simple filter (using FilterInputStream) that made sure that all the byes it processed were ASCII. My problem turned out to be characters with value zero which the Microsoft XML parser failed to process. It put the parser in an infinite loop!
In the filter, as each byte is read you could write out the Hex value. That way you should be able to find the offending character(s).

Change encoding from utf-8 to iso-8859-1 in JMS receiver!

Hi.
I have some problems regarding encoding.
The simple setup: dummy datatype as input, XSLT mapping and standard XI output(to JMS).
Are there any way to tell the JMS adapter to deliver the message in iso-8859-1 and not utf-8?
Regards Peter

> Hi Henrique.
>
> This sounds like an idea. Can you guide me to some
> documentation, that describes adding mapping in the
> jms adapter module?
>
> Regards Peter
To use modules in JMS adapter: http://help.sap.com/saphelp_nw2004s/helpdata/en/0f/80243b4a66ae0ce10000000a11402f/frameset.htm
Now, you add the MessageTransforBean module, to use the XSLT mapping. Check the end of this blog to learn how to use XSLT mapping on MessageTransformBean: /people/michal.krawczyk2/blog/2005/11/01/xi-xml-node-into-a-string-with-graphical-mapping
Regards,
Henrique.

Encoding autodetection chooses ISO-8859-5 instead of UTF-8

Similar Messages

Maybe you are looking for