Character encoding with xsql

I have a similar application running to the xsql document demo. However, when I am pulling the xml clob out of the database, the character set is incorrect and the xml is not parsed correctly.
Instead of a "<" in the returned xml source, I get the following characters:
& # 6 0 ;
(No spaces in between them).
I seem to have this problem regardless of what type of field I pull the data out of (varchar2, char).
Whats going on here?
null

Thanks for all your help! I just have encountered one more problem blocking me from completing this.
I use the following two xsql pages:
<?xml version='1.0'?>
<?xml:stylesheet type="text/xsl" href="doctorv3_IE5.xsl" ?>
<xsql:include-xsql connection="demo" xmlns:xsql="urn:oracle-xsql"
href="doc_detail.xsql?id={@id}"/>
and
<?xml version="1.0" encoding="Windows-1250" ?>
<?xml-stylesheet type="text/xsl" href="clob.xsl" result-type="text/xml"?>
<xsql:query connection="demo" rowset-element="" row-element="" max-rows="1"
xmlns:xsql="urn:oracle-xsql">
select /* x.xml_clob */ x.xml_clob DOC
from user_xml x
where x.ihc_user_id= {@id}
</xsql:query>
combined with the following two stylesheets
doctorv3_ie5.xsl:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*|/"><xsl:apply-templates/></xsl:template>
<xsl:template match="text()|@*"><xsl:value-of select="."/></xsl:template>
<xsl:template match="/">
<head>
<title>Sample XSL Stylesheet for Doctor Viewing Data</title>
</head>
<body>
<center>
<table width="100%" height="5%" border="5" cellspacing="0">
<tr>
<td bgcolor="#000077"><font size="+5" color="#FFFF00"><b><center>CLINICARE Patient Data</center></b></font></td>
</tr>
</table>
</center>
<p>
<br></br>
</p>
<center>
<table width="90%" height="5%" border="5" cellspacing="0">
<tr>
<td colspan="2" bgcolor="#000077"><font size="+3" color="#FFFF00"><b><center>Demographic Data</center></b></font></td>
</tr>
<tr>
<td bgcolor="#000077" height="18" width="30%"><font size="+2" color="#FFFF00"><b>Surname</b></font></td>
<td bgcolor="#FFFFFF" height="18"><font size="+2" color="#000077">
<xsl:value-of select="/PatientSummary/ADMINISTRATIVE/PATIENT/PatName"/></font></td>
</tr>
<tr>
<td bgcolor="#000077" height="18" width="30%"><font size="+2" color="#FFFF00"><b>First Name</b></font></td>
<td bgcolor="#FFFFFF" height="18"><font size="+2" color="#000077">
<xsl:value-of select="/PatientSummary/ADMINISTRATIVE/PATIENT/PatFirstName"/></font></td>
</tr>
<tr>
<td bgcolor="#000077" heigh45t="18" width="30%"><font size="+2" color="#FFFF00"><b>Gender</b></font></td>
<td bgcolor="#FFFFFF" height="18"><font size="+2" color="#000077">
<xsl:value-of select="/PatientSummary/ADMINISTRATIVE/PATIENT/PatGenderCode"/></font></td>
</tr>
<tr>
<td bgcolor="#000077" heigh45t="18" width="30%"><font size="+2" color="#FFFF00"><b>Date of Birth</b></font></td>
<td bgcolor="#FFFFFF" height="18"><font size="+2" color="#000077">
<xsl:value-of select="/PatientSummary/ADMINISTRATIVE/PATIENT/PatBirthDtm"/></font></td>
</tr>
<tr>
<td bgcolor="#000077" heigh45t="18" width="30%"><font size="+2" color="#FFFF00"><b>Personal Health Number</b></font></td>
<td bgcolor="#FFFFFF" height="18"><font size="+2" color="#000077">
<xsl:value-of select="/PatientSummary/ADMINISTRATIVE/PATIENT/PatIDNum"/></font></td>
</tr>
<tr>
<td bgcolor="#000077" height="18" width="30%"><font size="+2" color="#FFFF00"><b>Address</b></font></td>
<td bgcolor="#FFFFFF" height="18"><font size="+2" color="#000077">
<xsl:value-of select="/PatientSummary/ADMINISTRATIVE/PATIENT/PatPrimAddress/PatPrimAddressStreet"/></font></td>
</tr>
<tr>
<td bgcolor="#000077" height="18" width="30%"><font size="+2" color="#FFFF 00"><b>Postal Code</b></font></td>
<td bgcolor="#FFFFFF" height="18"><font size="+2" color="#000077">
<xsl:apply-templates select="/PatientSummary/ADMINISTRATIVE/PATIENT/PatPrimAddress/PatPrimAddressPost"/></font></td>
</tr>
<tr>
<td bgcolor="#000077" height="18" width="30%"><font size="+2" color="#FFFF00"><b>City</b></font></td>
<td bgcolor="#FFFFFF" height="18"><font size="+2" color="#000077">
<xsl:value-of select="/PatientSummary/ADMINISTRATIVE/PATIENT/PatPrimAddress/PatPrimAddressCity"/></font></td>
</tr>
<tr>
<td bgcolor="#000077" height="18" width="30%"><font size="+2" color="#FFFF00"><b>Phone Number</b></font></td>
<td bgcolor="#FFFFFF" height="18"><font size="+2" color="#000077">
<xsl:value-of select="/PatientSummary/ADMINISTRATIVE/PATIENT/PatPrimAddress/PatPrimPhoneNum"/></font></td>
</tr>
</table>
</center>
<p>
<br></br>
</p>
<center>
<table width="90%" height="5%" border="5" cellspacing="0">
<tr>
<td colspan="2" bgcolor="#000077"><font size="+3" color="#FFFF00"><b><center>Primary Care Physician</center></b></font></td>
</tr>
<tr>
<td bgcolor="#000077" height="18" width="30%"><font size="+2" color="#FFFF00"><b>Surname</b></font></td>
<td bgcolor="#FFFFFF" height="18"><font size="+2" color="#000077">
<xsl:apply-templates select="/PatientSummary/ADMINISTRATIVE/PRIMECAREPHYSICIAN/PatPrimCarePhysName"/></font></td>
</tr>
<tr>
<td bgcolor="#000077" height="18" width="30%"><font size="+2" color="#FFFF00"><b>First Name</b></font></td>
<td bgcolor="#FFFFFF" height="18"><font size="+2" color="#000077">
<xsl:apply-templates select="/PatientSummary/ADMINISTRATIVE/PRIMECAREPHYSICIAN/PatPrimCarePhysFirstName"/></font></td>
</tr>
<tr>
<td bgcolor="#000077" height="18" width="30%"><font size="+2" color="#FFFF00"><b>Specialty</b></font></td>
<td bgcolor="#FFFFFF" height="18"><font size="+2" color="#000077">
<xsl:apply-templates select="/PatientSummary/ADMINISTRATIVE/PRIMECAREPHYSICIAN/PatPrimCarePhys"/></font></td>
</tr>
</table>
</center>
<p>
<br></br>
</p>
<center>
<table width="90%" height="5%" border="5" cellspacing="0">
<tr>
<td bgcolor="#000077"><font size="+3" color="#FFFF00"><b><center>Patient History</center></b></font></td>
</tr>
</table>
</center>
<p>
<br></br>
</p>
<xsl:for-each select="PatientSummary/HEALTHITEMS/PHYSICALEXAMS" order-by="HExamDate">
<hr></hr>
<p><font color="#000077" size="+1"><b>Notes Entry Date:</b><xsl:value-of select="HExamDate"/></font></p>
<p><font color="#000077" size="+1"><b>Doctor's Name:</b><xsl:value-of select="HExamExaminerName"/></font></p>
<p><font color="#000077" size="+1"><b>Doctor's Number:</b><xsl:value-of select="HExamExaminerNUM"/></font></p>
<xsl:for-each select="HExamItem">
<p><font color="#000077" size="+1"><b>Category Description:</b><xsl:value-of select="HExamItemIDName"/></font></p>
<p><font color="#000077" size="+1"><b>Category Code:</b><xsl:value-of select="HExamItemIDCode"/></font></p>
<p><font color="#000077" size="+1"><b>Notes Entry:</b></font></p>
<xsl:for-each select="HExamText">
<dd><font c olor="#000077" size="+1"><xsl:value-of/></font></dd>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
<p>
<br></br>
</p>
<center>
<table width="90%" height="5%" border="5" cellspacing="0">
<tr>
<td bgcolor="#000077"><font size="+3" color="#FFFF00"><b><center>Lab Test History</center></b></font></td>
</tr>
</table>
</center>
<p>
<br></br>
</p>
<xsl:for-each select="PatientSummary/HEALTHITEMS/TESTS/CLINICALTESTS" order-by="HExamDate">
<hr></hr>
<p><font color="#000077" size="+1"><b>Report Date:</b><xsl:value-of select="PHProbNum"/></font></p>
<p><font color="#000077" size="+1"><b>Report Time:</b><xsl:value-of select="DXProcSpecTypeCode"/></font></p>
<p><font color="#000077" size="+1"><b>Requisition #:</b><xsl:value-of select="DXOrdIDNum"/></font></p>
<xsl:for-each select="DXClinLabTest">
<p><font color="#000077" size="+1"><b>Test Name:</b><xsl:value-of select="DXClinLabTestName"/></font></p>
<p><font color="#000077" size="+1"><b>Result Comments:</b></font></p>
<xsl:for-each select="DXProcReportText">
<dd><font color="#000077" size="+1"><xsl:value-of/></font></dd>
</xsl:for-each>
</xsl:for-each>
<xsl:for-each select="DXProcNmeasAnalyte">
<p><font color="#000077" size="+1"><b>Test Value:</b><xsl:value-of select="DXProcNmeasAnalValQty"/></font></p>
<p><font color="#000077" size="+1"><b>Normality:</b><xsl:value-of select="DXProcNmeasAnalInterpCode"/></font></p>
</xsl:for-each>
</xsl:for-each>
</body>
</xsl:template>
<xsl:template match="/PatientSummary/ADMINISTRATIVE/PRIMECAREPHYSICIAN/PatPrimCarePhysName">
<xsl:value-of select="/PatientSummary/ADMINISTRATIVE/PRIMECAREPHYSICIAN/PatPrimCarePhysName"/>
</xsl:template>
<xsl:template match="/PatientSummary/ADMINISTRATIVE/PRIMECAREPHYSICIAN/PatPrimCarePhysFirstName">
<xsl:value-of select="/PatientSummary/ADMINISTRATIVE/PRIMECAREPHYSICIAN/PatPrimCarePhysFirstName"/>
</xsl:template>
<xsl:template match="/PatientSummary/ADMINISTRATIVE/PRIMECAREPHYSICIAN/PatPrimCarePhys">
<xsl:value-of select="/PatientSummary/ADMINISTRATIVE/PRIMECAREPHYSICIAN/PatPrimCarePhys"/>
</xsl:template>
<xsl:template match="/PatientSummary/ADMINISTRATIVE/PATIENT/PatPrimAddress/PatPrimAddressPost">
<xsl:value-of select="/PatientSummary/ADMINISTRATIVE/PATIENT/PatPrimAddress/PatPrimAddressPost"/>
</xsl:template>
</xsl:stylesheet>
and clob.xsl:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="DOC">
<xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:template>
</xsl:stylesheet>
If I remove the reference to the doctorv3_IE5 stylesheet, I recieve properly formatted xml code displayed in xml, thanks to the clob.xsl stylesheet. When I try to reference the stylesheet, however, the stylesheet doesn't format correctly - IE5 displays only the XSL stylesheet, not the XSL combined with the XML.
Saving this xml code to a file and then combining it with the xsl works exactly like in should, but I cannot get the xsql page to combine the two properly.
Any ideas on whats wrong?
null

Similar Messages

  • Character problems with xsql:include-xsql reparse="yes"

    I have a problem retrieving XML-fragments from CLOB columns.
    Danish ISO-8859-1 characters (aelig, oslash, aring) are returned as "?" from Apache/Jserv when using xsql:include-xsql reparse="yes".
    My platform is Solaris9/Oracle-9.2.0.2/XDK-9.2.0.4.
    Database characterset is we8iso8859p1.
    I'm using the Apache/Jserv that comes with Oracle 9.2.0.1.
    Steps to reproduce problem:
    -- Table data:
    create table tab1 (id number,clob_col clob);
    insert into tab1 values(1, '<x>fxe</x>');
    /*inserted characters are aelig(230), oslash(248), aring(229)*/
    commit;
    -- test.xsql:
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <testdata xmlns:xsql="urn:oracle-xsql" connection="pnrtest">
    <xsql:include-xsql reparse="yes" href="inc.xsql" />
    </testdata>
    -- inc.xsql:
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <?xml-stylesheet type="text/xsl" href="unquote_clob_col.xsl"?>
    <xsql:query
    xmlns:xsql="urn:oracle-xsql"
    connection="pnrtest"
    tag-case="lower"
    >
    select clob_col
    from tab1
    </xsql:query>
    -- unquote_clob_col.xsl:
    <xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="no" encoding="ISO-8859-1"/
    <xsl:include href="identity.xsl"/>
    <xsl:template match="clob_col">
    <clob_col>
    <xsl:value-of select="." disable-output-escaping="yes"/>
    </clob_col>
    </xsl:template>
    </xsl:stylesheet>
    -- identity.xsl:
    <!-- The Identity Transformation -->
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <!-- Whenever you match any node or any attribute -->
    <xsl:template match="node()|@*">
    <!-- Copy the current node -->
    <xsl:copy>
    <!-- Including any attributes it has and any child nodes -->
    <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
    </xsl:template>
    </xsl:stylesheet>
    -- Notes:
    Running test.xsql works fine with XSQL command-line, but FAILS through Apache/Jserv (danish characters are returned as "?").
    inc.xsql works fine through XSQL command-line and Apache/Jserv, problem only happens with xsql:include-xsql reparse="yes" (e.g. test.xsql).
    xsql:include-xml works fine, but I cannot use this, bca. in my real business case I'm selecting more than one row from the database.
    I've checked and double-checked my jserv.properties several times, and believes it to be correct.
    The xsql:include-xsql reparse="yes" technique works fine in our Solaris9/Oracle-8.1.7/iAS-1.0.2.2 environment.
    Any suggestions ?
    -- Peter ([email protected])

    If I put the following line in jserv.properties:
    wrapper.env=LANG=en_US.ISO8859-1
    the problem with xsql:include-xsql reparse="yes" seems to go away.
    Really strange, since Oracle products in my experience normally only uses NLS_LANG, not LANG.
    Also, we're accessing several databases with different charactersets from the same ApacheJserv installation, so I don't understand why LANG (or NLS_LANG) should be set to a particular value.
    Can anybody explain ?
    -- Peter

  • Character encoding with CF and MySQL

    Okay, I thought this should be rather straight forward but
    apparently not. I have set up my site to use UTF-8— my cfm
    pages, the MySQL table, even Dreamweaver. The problem is when I
    input international character via a form they get written correctly
    to the MySQL table; however, when I retrieve them in a query and
    display them on the page I get them displayed incorrectly.
    On my input.cfm page I'll enter the string
    "Téstïñg" in the textbox and submit it. If I look at
    the record via the MySQL Browser it appears as it should. However
    when I display it on my output.cfm page it shows the record as
    "T�st��g" and will do so until I change the
    meta tag to use charset=ISO-8859-1. Am I missing something or is
    this how it is suppose to work?
    My input.cfm page is set up with both the
    <cfprocessingdirective suppresswhitespace="YES"
    pageencoding="UTF-8">
    <meta http-equiv="Content-Type" content="text/html;
    charset=UTF-8">
    tags and a regular input formfield that writes to the MySQL
    database.
    The MySQL table is configured to use the utf8 char set and
    utf8_unicode_ci collation.
    And just to be safe I included
    useUnicode=true&characterEncoding=utf8&characterSetResults=utf8
    in the connection string on the CF Admin datasource setup page.
    I'm running CF 6.1, MySQL 4.1, the latest version of Apache
    Server on a Win2K3 box. I was running the 3.0.16 MySQL JDBC driver
    but I upgraded it to the 5.0.6 this morning thinking that may fix
    my issue.

    I'm still unsure why this works but I've found a solution. I
    switched all my pages over to character set ISO-8859-1 with the
    exception of my database table and it works. I get all the normal
    range character along with the extended Unicode characters to write
    to the database and output correctly. Unicode characters actually
    write to the table as their HTML coded character.
    If someone feels the need to enlighten me as to why this
    works please feel free, I'm always willing to learn.

  • Character encoding with cfntauthenticate

    I've been authenticating users of my web app against Active Directory using the cfntauthenticate tag which works pretty well and was simple to implement.  We have one or two European users who cannot log in because they are getting an invalid password error.  There username and password was checked as being valid by using other non-CFapplications that also authenticate against the domain .  The common thread seems to be that they use umlauted or other European accented characters.  These characters can be multi-byte in a unicode environment so I've tried a number of methods to altering the encoding of the page so that the password is recognized as valid but have had no luck so far.  Has anyone encountered this or have any information on the encoding of the form field when passing it through this tag? 
    Brian

    I don't have an answer for your issue, but I'm surprised that it works at all.  The docs for cfntauthenticate (at least the ColdFusion 10/11 docs) specifically say that it won't work with Active Directory, only with a Windows NT domain.  You might have better luck using CFLDAP.
    -Carl V.

  • Detecting character encoding from BLOB stream... (PLSQL)

    I'am looking for a procedure/function which can return me the character encoding of a "text/xml/csv/slk" file stored in BLOB..
    For example...
    I have 4 files in different encodings (UTF8, Utf8BOM, ISO8859_2, Windows1252)...
    With java I'can simply detect the character encoding with JuniversalCharDet (http://code.google.com/p/juniversalchardet/)...
    thank you

    Solved...
    On my local PC I have installed Java 1.5.0_00 (because on DB is 1.5.0_10)...
    With Jdeveloper I have recompiled source code from:
    http://juniversalchardet.googlecode.com/svn/trunk/src/org/mozilla/universalchardet
    http://code.google.com/p/juniversalchardet/
    After that I have made a JAR file and uploaded it with loadjava to my database...
    C:\>loadjava -grant r_inis_prod -force -schema insurance2 -verbose -thin -user username/password@ip:port:sid chardet.jarAfter that I have done a java procedure and PLSQL wrapper example below:
       public static String verifyEncoding(BLOB p_blob) {
           if (p_blob == null) return "-1";
           try
            InputStream is = new BufferedInputStream(p_blob.getBinaryStream());
            UniversalDetector detector = new UniversalDetector(null);
            byte[] buf = new byte[p_blob.getChunkSize()];
            int nread;
            while ((nread = is.read(buf)) > 0 && !detector.isDone()) {
                detector.handleData(buf, 0, nread);
            detector.dataEnd();
            is.close();
           return detector.getDetectedCharset();
           catch(Exception ex) {
               return "-2";
       }as you can see I used -2 for exception and -1 if input blob is null.
    then i have made a PLSQL procedure:
    function f_preveri_encoding(p_blob in blob) return varchar2 is
    language Java name 'Zip.Zip.verifyEncoding(oracle.sql.BLOB) return java.lang.String';After that I have uploaded 2 different txt files in my blob field.. (first one is encoded with UTF-8, second one with WINDOWS-1252)..
    example how to call:
    declare
       l_blob blob;
       l_encoding varchar2(100);
    begin
    select vsebina into l_blob from dok_vsebina_dokumenta_blob where id = 401587359 ;
    l_encoding := zip_util.f_preveri_encoding(l_blob);
    if l_encoding = 'UTF-8' then
       dbms_output.put_line('file is encoded with UTF-8');
    elsif l_encoding = 'WINDOWS-1252' then
       dbms_output.put_line('file is encoded with WINDOWS-1252');
    else
        dbms_output.put_line('other enc...');
    end if;
    end;Now I can get encoding from blob and convert it to database encoding and store datas in CLOB field..
    Here you have a chardet.jar file if you need this functionality..
    https://docs.google.com/open?id=0B6Z9wNTXyUEeVEk3VGh2cDRYTzg
    Edited by: peterv6i.blogspot.com on Nov 29, 2012 1:34 PM
    Edited by: peterv6i.blogspot.com on Nov 29, 2012 1:34 PM
    Edited by: peterv6i.blogspot.com on Nov 29, 2012 1:38 PM

  • FF character encoding issue in Mageia 2 ?

    Hi everyone,
    I'm running Mozilla Firefox 17.0.8 in a KDE distro of Linux called Mageia 2. I'm having problems in character encoding with certain web pages, meaning that certain icons like the ones next to menu entries (Login, Search box etc.) and in section headlines don't appear properly. Instead they appear either in some arabic character or as little grey boxes with numbers and letters written in it.
    I've tried experimenting with different encoding systems: Western (ISO 8859-1), (ISO 8859-15), (Windows 1252), Unicode (UTF-8), Central European (ISO 8859-2) but none of them does the job. Currently the char encoding is set to UTF-8. The same web page in Chrome (UTF-8) gives no such problem.
    Can you help me, please?

    Thank you!
    I solved my problem, however I find fonts are too small for certain web pages when compared to Chrome (see attached pictures of nytimes.com).
    Chrome's font size are set to "Medium".

  • Reading Advance Queuing with XMLType payload and JDBC Driver character encoding

    Hi
    I've got a problem retrieving the message from the queue with XMLType payload in Java.
    It was working fine in 10g database but after the switch to 11g it returns corrupted string instead of real XML message. Database NLS_LANG setting is AL32UTF8
    It is said that JDBC driver should deal with that automatically but it obviously don't in this case. When I dequeue the message using database functionality (DBMS_AQ package) it looks fine but not when using JDBC driver so Ithink it is character encoding issue or so. The message itself is enqueued by the database and supposed to be retrieved by dedicated EJB.
    Driver file used: ojdbc6.jar
    Additional libraries: aqapi.jar, xdb.jar
    All file taken from 11g database installation.
    What shoul dI do to get the xml message correctly?

    Do you mean NLS_LANG is AL32UTF8 or the database character set is AL32UTF8? What is the database character set (SELECT value FROM nls_database_parameters WHERE parameter='NLS_CHARACTERSET')?
    Thanks,
    Sergiusz

  • Character Encoding for IDOC to JMS scenario with foreign characters

    Dear Experts,
    The scenario is desribed as follows:
    Issue Description:
    There is an IDOC which is created after extracting data from different countries (but only one country at a time). So, for instance first time the data is picked in Greek and Latin and corresponding IDOC is created and sent to PI, the next time plain English and sent to PI and next Chinese and so on. As of now every time this IDOC reaches PI ,it comes with UTF-8 character encoding as seen in the IDOC XML.
    I am converting this IDOC XML into single string flat file (currently taking the default encoding UTF-8) and sending it to receiver JMS Queue (MQ Series). Now when this data is picked up from the end recepient from the corresponding queue in MQ Series, they see ? wherever there is a Greek/latin characters (may be because that should be having a different encoding like ISO-8859_7). This is causing issues at their end.
    My Understanding
    SAP system should trigger the IDOC with the right code page i.e if the IDOC is sent with Greek/Latin code page should be ISO-8859_7, if this same IDOC is sent with Chinese characters the corresponding code page else UTF-8 or default code page.
    Once this is sent correctly from SAP, Java Mapping should have to use the correct code page when righting the bytes to outputstream and then we would also need to set the right code page as JMS Header before putting the message in the JMS queue so that receiver can interpret it.
    Queries:
    1. Is my approach for the scenario correct, if not please guide me to the right approach.
    2. Does SAP support different code page being picked for the same IDOC based on different data set. If so how is it achieved.
    3. What is the JMS Header property to set the right code page. I think there should be some JMS Header defined by MQ Series for Character Encoding which I should be setting correctly) I find that there is a property to set the CCSID in JMS Receiver Adapter but that only refers to Non-ASCII names and doesn't refer to the payload content.
    I would appreciate if anybody can give me pointers on how to resolve this issue.
    Thanks,
    Pratik

    Hi Pratik,
         http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42?quicklink=index&overridelayout=true
    This link might help.
    regards
    Anupam

  • Problems with Forms and character encoding

    I'm having problems trying to read unicode data inputted into a Form on my JSP page.
    I've used the meta tag <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> to set the charset of the page to UTF-8. I've inputted some chinese characters inot my form and when I try to read the subsequent request parameter in my servlet using request.getParameter() the string returned is this
    "&#26469;&#28304;" which is the escape sequence required by HTML to display these characters.
    From what I've read on the subject this doesn't seem like the expected value. I've tried other ways of getting the correct string value such as setting the character encoding request.setCharacterEncoding("UTF-8") and then converting the bytes using this encoding value but it doesn't seem to work.
    I could write a method to split up the string using the ; as a token and working out the correct unicode character but this doesn't seem like the right thing to do.
    Any help on how to pass the correct information from the Form in the JSP page to the servlet would be greatly appreciated

    I don't believe that is correct, but if it's returning HTML escapes instead of URL Encoded characters, then it's the browser doing it. This is my test page for playing with Chinese...
    <%@ page language="java" contentType="text/html; charset=UTF-8" %>
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
    <html>
    <head>
         <title></title>
         <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    </head>
    <body bgcolor="#ffffff" background="" text="#000000" link="#ff0000" vlink="#800000" alink="#ff00ff">
    <%
    request.setCharacterEncoding("UTF-8");
    String str = "\u7528\u6237\u540d";
    String name = request.getParameter("name");
    %>
    req enc: <%= request.getCharacterEncoding() %><br />
    rsp enc: <%= response.getCharacterEncoding() %><br />
    str: <%= str %><br />
    name: <%= name %><br />
    <form method="GET" action="_lang.jsp" encoding="UTF-8">
    Name: <input type="text" name="name" value="" >
    <input type="submit" name="submit" value="GET Submit" />
    </form>
    <form method="POST" action="_lang.jsp" encoding="UTF-8">
    Name: <input type="text" name="name" value="" >
    <input type="submit" name="submit" value="POST Submit" />
    </form>
    </body>
    </html>

  • Web pages display OK, but print with garbage characters. I think it's character encoding, but don't know WHICH I should use. Have tried all Western and UTF options. Firefox 3.6.12

    I used to only have troubles with headers & footers printing out as garbage characters. I tried changing Character Encoding, now entire pages have garbage characters, even though pages view ok when browsing.

    If the pages look OK when you are browsing then it is not a problem with the encoding.<br />
    It can be a problem with the font that is used and you can try to disable website fonts and posibly try a few different default fonts to see if that helps.
    Tools > Options > Content : Fonts & Colors: Advanced (Allow pages to choose their own fonts, instead of my selections above)

  • What every developer should know about character encoding

    This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
    If you write code that touches a text file, you probably need this.
    Lets start off with two key items
    1.Unicode does not solve this issue for us (yet).
    2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
    And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts.
    The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
    And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
    And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
    Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
    Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
    Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
    UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
    But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
    Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
    Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
    Here's a key point about these text files – every program is still using an encoding. It may not be setting it in code, but by definition an encoding is being used.
    Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
    Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
    Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
    Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
    Wrapping it up
    I think there are two key items to keep in mind here. First, make sure you are taking the encoding in to account on text files. Second, this is actually all very easy and straightforward. People rarely screw up how to use an encoding, it's when they ignore the issue that they get in to trouble.
    Edited by: Darryl Burke -- link removed

    DavidThi808 wrote:
    This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
    If you write code that touches a text file, you probably need this.
    Lets start off with two key items
    1.Unicode does not solve this issue for us (yet).
    2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
    And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts. Pretty sure most Americans do not use character sets that only have a range of 0-127. I don't think I have every used a desktop OS that did. I might have used some big iron boxes before that but at that time I wasn't even aware that character sets existed.
    They might only use that range but that is a different issue, especially since that range is exactly the same as the UTF8 character set anyways.
    >
    The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
    And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
    And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
    The above is only true for small volume sets. If I am targeting a processing rate of 2000 txns/sec with a requirement to hold data active for seven years then a column with a size of 8 bytes is significantly different than one with 16 bytes.
    Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
    The above is out of place. It would be best to address this as part of Point 1.
    Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
    Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
    UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
    The first part of that paragraph is odd. The first 128 characters of unicode, all unicode, is based on ASCII. The representational format of UTF8 is required to implement unicode, thus it must represent those characters. It uses the idiom supported by variable width encodings to do that.
    But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
    Not sure what you are saying here. If a file is supposed to be in one encoding and you insert invalid characters into it then it invalid. End of story. It has nothing to do with html/xml.
    Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
    The browser still needs to support the encoding.
    Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
    I know java files have a default encoding - the specification defines it. And I am certain C# does as well.
    Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
    It is important to define it. Whether you set it is another matter.
    Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
    Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
    Unicode character escapes are replaced prior to actual code compilation. Thus it is possible to create strings in java with escaped unicode characters which will fail to compile.
    Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
    No. A developer should understand the problem domain represented by the requirements and the business and create solutions that appropriate to that. Thus there is absolutely no point for someone that is creating an inventory system for a stand alone store to craft a solution that supports multiple languages.
    And another example is with high volume systems moving/storing bytes is relevant. As such one must carefully consider each text element as to whether it is customer consumable or internally consumable. Saving bytes in such cases will impact the total load of the system. In such systems incremental savings impact operating costs and marketing advantage with speed.

  • XML Character Encoding Using UTL_DBWS

    Hi,
    I have a database with WINDOWS-1252 character encoding. I'm using UTL_DBWS to call a web service method which echoes a given string. For this purpose, I do the following:
    DECLARE
        v_wsdl CONSTANT VARCHAR2(500) := 'http://myhost/myservice?wsdl';
        v_namespace CONSTANT VARCHAR2(500) := 'my.namespace';
        v_service_name CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'MyService');
        v_service_port CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'MySoapServicePort');
        v_ping CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'ping');
        v_wsdl_uri CONSTANT URITYPE := URIFACTORY.getURI(v_wsdl);
        v_str_request CONSTANT VARCHAR2(4000) :=
    '<?xml version="1.0" encoding="UTF-8" ?>
    <ping>
        <pingRequest>
            <echoData>Dev Team üöäß</echoData>
        </pingRequest>
    </ping>';
        v_service UTL_DBWS.SERVICE;
        v_call UTL_DBWS.CALL;
        v_request XMLTYPE := XMLTYPE (v_str_request);
        v_response SYS.XMLTYPE;
    BEGIN
        DBMS_JAVA.set_output(20000);
        UTL_DBWS.set_logger_level('FINE');
        v_service := UTL_DBWS.create_service(v_wsdl_uri, v_service_name);
        v_call := UTL_DBWS.create_call(v_service, v_service_port, v_ping);
        UTL_DBWS.set_property(v_call, 'oracle.webservices.charsetEncoding', 'UTF-8');
        v_response := UTL_DBWS.invoke(v_call, v_request);
        DBMS_OUTPUT.put_line(v_response.getStringVal());
        UTL_DBWS.release_call(v_call);
        UTL_DBWS.release_all_services;
    END;
    /Here is the SERVER OUTPUT:
    ServiceFacotory: oracle.j2ee.ws.client.ServiceFactoryImpl@a9deba8d
    WSDL: http://myhost/myservice?wsdl
    Service: oracle.j2ee.ws.client.dii.ConfiguredService@c881d39e
    *** Created service: -2121202561 - oracle.jpub.runtime.dbws.DbwsProxy$ServiceProxy@afb58220 ***
    ServiceProxy.get(-2121202561) = oracle.jpub.runtime.dbws.DbwsProxy$ServiceProxy@afb58220
    Collection Call info: port={my.namespace}MySoapServicePort, operation={my.namespace}ping, returnType={my.namespace}PingResponse, params count=1
    setProperty(oracle.webservices.charsetEncoding, UTF-8)
    dbwsproxy.add.map: ns, my.namespace
    Attribute 0: my.namespace: xmlns:ns, my.namespace
    dbwsproxy.lookup.map: ns, my.namespace
    createElement(ns:ping,null,my.namespace)
    dbwsproxy.add.soap.element.namespace: ns, my.namespace
    Attribute 0: my.namespace: xmlns:ns, my.namespace
    dbwsproxy.element.node.child.3: 1, null
    createElement(echoData,null,null)
    dbwsproxy.text.node.child.0: 3, Dev Team üöäß
    request:
    <ns:ping xmlns:ns="my.namespace">
       <pingRequest>
          <echoData>Dev Team üöäß</echoData>
       </pingRequest>
    </ns:ping>
    Jul 8, 2008 6:58:49 PM oracle.j2ee.ws.client.StreamingSender _sendImpl
    FINE: StreamingSender.response:<?xml version = '1.0' encoding = 'UTF-8'?>
    <env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"><env:Header/><env:Body><ns0:pingResponse xmlns:ns0="my.namespace"><pingResponse><responseTimeMillis>0</responseTimeMillis><resultCode>0</resultCode><echoData>Dev Team üöäß</echoData></pingResponse></ns0:pingResponse></env:Body></env:Envelope>
    response:
    <ns0:pingResponse xmlns:ns0="my.namespace">
       <pingResponse>
          <responseTimeMillis>0</responseTimeMillis>
          <resultCode>0</resultCode>
          <echoData>Dev Team üöäß</echoData>
       </pingResponse>
    </ns0:pingResponse>As you can see the character encoding is broken in the request and in the response, i.e. the SOAP encoder does not take into consideration the UTF-8 encoding.
    I tracked down the problem to the method oracle.jpub.runtime.dbws.DbwsProxy.dom2SOAP(org.w3c.dom.Node, java.util.Hashtable); and more specifically to the calls of oracle.j2ee.ws.saaj.soap.soap11.SOAPFactory11.
    My question is: is there a way to make the SOAP encoder use the correct character encoding?
    Thanks a lot in advance!
    Greetings,
    Dimitar

    I found a workaround of the problem:
        v_response := XMLType(v_response.getBlobVal(NLS_CHARSET_ID('CHAR_CS')), NLS_CHARSET_ID('AL32UTF8'));Ugly, but I'm tired of decompiling and debugging Java classes ;)
    Greetings,
    Dimitar

  • Steps to UTF-8 Encoding with Oracle 8i and Weblogic 6.1SP1

    What are the Steps to UTF-8 Encoding with Oracle 8i and Weblogic
              6.1SP1?
              I have:
              - Oracle 8.1.5 database created with character set=UTF8 and national
              character set=UTF8
              - Weblogic 6.1SP1 without any encoding mechanism set
              (though I did play with
              <jsp-param><param-name>encoding</param-name>
              <param-value>UTF-8</param-value>
              </jsp-param>
              in the weblogic.xml for a while though it seemed not to make a
              difference)
              - JSP pages set to content='text/html; charset=UTF-8'
              - JSP form POSTs set to enctype="UTF-8"
              I can copy and paste Chinese Kanji from a UTF8 encoded web page into
              form text boxes but when I post the data it comes back as different
              Kanji. Then once it is posted the Kanji stays the same on repeated
              posts. The same Kanji text also looks different when viewed in a form
              text box than when viewed as straight text on the page.
              Is there anything else? Or am I already encoding characters twice?
              Please help!
              Mel Christie
              

    Hi Experts,
    Please correct me if am asking you the question in wrong way.
    I have ARCGIS with oracle database 10gr2 in production server.
    My work is to connect AUTOCAD S/W (client computer which is connected in LAN) to ARCGIS in order to access the toposheets available in SDE user.
    When iam trying to connect iam getting this error:The specified credentials are not valid or provider is not able to establish a connection.
    I checked the path to production server by pinging and user/passcode too but not helpful.
    Please help me in this , very urgent.
    Thanks.
    Edited by: user13355644 on Jul 3, 2010 3:53 AM
    Edited by: user13355644 on Jul 22, 2011 2:55 AM

  • XML parser not detecting character encoding

    Hi,
    I am using Jdeveloper 9.0.5 preview and the same problem is happening in our production AS 9.0.2 release.
    The character encoding of an xml document is not correctly being detected by the oracle v2 parser even though the xml declaration correctly contains
    <?xml version="1.0" encoding="ISO-8859-1" ?>
    instead it treats the document as UTF8 encoding which is fine until a document comes along with an extended character which then causes a
    java.io.UTFDataFormatException: Invalid UTF8 encoding.
    at oracle.xml.parser.v2.XMLUTF8Reader.checkUTF8Byte(XMLUTF8Reader.java:160)
    at oracle.xml.parser.v2.XMLUTF8Reader.readUTF8Char(XMLUTF8Reader.java:187)
    at oracle.xml.parser.v2.XMLUTF8Reader.fillBuffer(XMLUTF8Reader.java:120)
    at oracle.xml.parser.v2.XMLByteReader.saveBuffer(XMLByteReader.java:448)
    at oracle.xml.parser.v2.XMLReader.fillBuffer(XMLReader.java:2023)
    at oracle.xml.parser.v2.XMLReader.tryRead(XMLReader.java:972)
    at oracle.xml.parser.v2.XMLReader.scanXMLDecl(XMLReader.java:2589)
    at oracle.xml.parser.v2.XMLReader.pushXMLReader(XMLReader.java:485)
    at oracle.xml.parser.v2.XMLReader.pushXMLReader(XMLReader.java:192)
    at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:144)
    as you can see it is explicitly casting the XMLUTF8Reader to perform the read.
    I can get around this by hard coding the xml input stream to be processed by a reader
    XMLSource = new StreamSource(new InputStreamReader(XMLInStream,"ISO-8859-1"));
    however the manual documents that the character encoding is automatically picked up from the xml file and casting into a reader is not necessary, so I should be able to write
    XMLSource = new StreamSource(XMLInStream)
    Does anyone else experience this same problem?
    having to hardcode the encoding causes my software to lose flexibility.
    Jarrod Sharp.

    An XML document should be created with 'ISO-8859-1' encoding to be parsed as 'ISO-8859-1' encoding.

  • What's the difference of character encoding between 1.4.0and1.4.2 in Linux

    As i find, the character encoding about chinese in jdk1.4.2 no langer the same of jdk1.4.0.
    In jdk1.4.0, the character encoding used the "file.encoding" system property, we often set the
    property with "gb2312".
    But in jdk1.4.2, i find that the default character encoding no longer used the "file.encoding" system property.
    Who knows the reason?
    Test Program:
    public class B{
    public static void main(String args[]) throws Exception{
    byte [] bytes = new byte[]{(byte)0xD6,(byte)0xD0,(byte)0xCE,(byte)0xC4};
    String s1 = new String(bytes);
    String s2 = new String(bytes,System.getProperty("file.encoding"));
    System.out.println("s1="+s1+" , s2="+s2);
    System.out.println("s1.length=" + s1.length() + " , s2.length="+s2.length());
    run four times and the result list:
    [root@app15 component]# /usr/local/j2sdk1.4.0/bin/java -Dfile.encoding=ISO-8859-1 -cp . B
    s1=&#20013;&#25991; , s2=&#20013;&#25991;
    s1.length=4 , s2.length=4
    [root@app15 component]# /usr/local/j2sdk1.4.0/bin/java -Dfile.encoding=gb2312 -cp . B
    s1=&#20013;&#25991; , s2=&#20013;&#25991;
    s1.length=2 , s2.length=2
    [root@app15 component]# /usr/local/j2sdk1.4.2/bin/java -Dfile.encoding=ISO-8859-1 -cp . B
    s1=&#20013;&#25991; , s2=&#20013;&#25991;
    s1.length=4 , s2.length=4
    [root@app15 component]# /usr/local/j2sdk1.4.2/bin/java -Dfile.encoding=gb2312 -cp . B
    s1=&#20013;&#25991; , s2=??
    s1.length=4 , s2.length=2
    [root@app15 component]#

    I don't know for sure, but:
    -- The API documentation for String says that "new String(byte[])" uses "the platform's default charset".
    -- The API documentation for Charset says "The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system."
    You'll notice that it doesn't say anything about using the file.encoding system value, so presumably (based on your experiments) it doesn't. I did a search for "java default charset" and didn't find anything specific, but this site says "As of Java 1.4.1, the default Charset varies from platform to platform" and suggests you explicitly hard-code your charset. I would agree with that.

Maybe you are looking for