ISO-8859-1 characters in xmlDom.domDocument

-- Oracle8i Enterprise Edition Release 8.1.7.0.0 - Production
-- JServer Release 8.1.7.0.0 - Production
-- Oracle XML Parser 2.0.2.9.0 Production
-- OS: Windows 2000 Professional
-- NLS_LANG in Oracle is: AMERICAN_AMERICA.UTF8
-- NLS_LANG in registry (client) is set to: SWEDISH_SWEDEN.WE8ISO8859P1
-- Description: Getting corrupt characters instead of the
-- Swedish characters "edv" after parsing the clob.
-- Output after running this script in sqlplus:
--| BEFORE
--| -----------------------------------------------------------------------
--| AFTER
--| -----------------------------------------------------------------------
--| <?xml version="1.0" encoding="ISO-8859-1"?><asdf>aaa edv aaa</asdf>
--| <?xml version = '1.0' encoding = 'ISO-8859-1'?>
--| <asdf>aaa ??? aaa</asdf>
--|
set serveroutput on
drop table xmltest;
create table xmltest
(before clob
,after clob);
declare
beforeClob clob;
afterClob clob;
xdoc xmldom.domdocument;
parser xmlparser.Parser;
begin
insert into xmltest
values('<?xml version="1.0" encoding="ISO-8859-1"?><asdf>aaa edv aaa</asdf>', empty_clob())
returning after into afterClob;
select before
into beforeClob
from xmltest;
parser := xmlparser.newParser;
xmlparser.parseCLOB(parser,beforeClob);
xdoc := xmlparser.getDocument(parser);
xmlparser.freeParser(parser);
dbms_output.put_line('Parsed xml charset: '||xmldom.getCharset(xdoc));
xmldom.writeToClob(xdoc, afterClob, 'WE8ISO8859P1');
commit;
end;
select * from xmltest;

Hi,
This is a known issue. Within CLOB, Oracle DB will always store data in UTF-8, so the encoding setups will not work.
Thanks.

Similar Messages

Problems reading Latin2 (ISO 8859-2) characters

Hello!
I want to read the content of an MS Access table (in an MDB file) using the JDBC:ODBC driver.
The program works well but there is a character conversion problem when I read text fields from the table.
The Latin2 (ISO 8859-2) characters like áéíóőűüöÁÉÍÓÜÖŰŐ are replaced by the "?" character.
I use the ResultSet object's getString() method.
Any idea about how to solve this problem?

Try to change session encoding from defaut to iso-8559-2
This probably would help:
http://download.oracle.com/javase/1.4.2/docs/guide/jdbc/bridge.html
>
What's New with the JDBC-ODBC Bridge?
* A jdbc:odbc: connection can now have a charSet property, to specify a Character Encoding Scheme other than the client default.
For possible values, see the Internationalization specification on the Web Site.
The following code fragment shows how to set 'Big5' as the character set for all character data.
// Load the JDBC-ODBC bridge driver
Class.forName(sun.jdbc.odbc.JdbcOdbcDriver) ;
// setup the properties
java.util.Properties prop = new java.util.Properties();
prop.put("charSet", "Big5");
prop.put("user", username);
prop.put("password", password);
// Connect to the database
con = DriverManager.getConnection(url, prop);

Polish (iso-8859-2) characters in JSP don't display properly...

I created a test JSP file:
<%@page contentType="text/html; charset=iso-8859-2"%>
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=iso-8859-2">
</head>
<body>
2+2=<%= 2+2 %><br>

±æê³ñó¶¼¿
¡ÆÊ£ÑÓ¦¬¯
<br>
</body>
</html>
The problem is that one of polish-specific characters gets turned into
a question mark (the character o-dashed, "ó" and (capitalized) "Ó").
I searched the group archives but didn't found anything related to this
problem.
Lukasz Kowalczyk

Try using ISO8859_2 in place of iso-8859-2 on the @page directive and the charset=. Also in the weblogic.properties file, in the WEBLOGIC JSP PROPERTIES section, add the following lines:
verbose=true,\
encoding=ISO8859_2
It will work. I have done the same thing for SJIS just now.
Keep me informed about it.
Nikhil
Lukasz Kowalczyk <[email protected]> wrote:
I created a test JSP file:
<%@page contentType="text/html; charset=iso-8859-2"%>
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=iso-8859-2">
</head>
<body>
2+2=<%= 2+2 %><br>

±æê³ñó¶¼¿
¡ÆÊ£ÑÓ¦¬¯
<br>
</body>
</html>
The problem is that one of polish-specific characters gets turned into
a question mark (the character o-dashed, "ó" and (capitalized) "Ó").
I searched the group archives but didn't found anything related to this
problem.
Lukasz Kowalczyk

Problems with reading XML files with ISO-8859-1 encoding

Hi!
I try to read a RSS file. The script below works with XML files with UTF-8 encoding but not ISO-8859-1. How to fix so it work with booth?
Here's the code:
import java.io.File;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import java.net.*;
* @author gustav
public class RSSDocument {
    /** Creates a new instance of RSSDocument */
    public RSSDocument(String inurl) {
        String url = new String(inurl);
        try{
            DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
            Document doc = builder.parse(url);
            NodeList nodes = doc.getElementsByTagName("item");
            for (int i = 0; i < nodes.getLength(); i++) {
                Element element = (Element) nodes.item(i);
                NodeList title = element.getElementsByTagName("title");
                Element line = (Element) title.item(0);
                System.out.println("Title: " + getCharacterDataFromElement(line));
                NodeList des = element.getElementsByTagName("description");
                line = (Element) des.item(0);
                System.out.println("Des: " + getCharacterDataFromElement(line));
        } catch (Exception e) {
            e.printStackTrace();
    public String getCharacterDataFromElement(Element e) {
        Node child = e.getFirstChild();
        if (child instanceof CharacterData) {
            CharacterData cd = (CharacterData) child;
            return cd.getData();
        return "?";
}And here's the error message:
org.xml.sax.SAXParseException: Teckenkonverteringsfel: "Malformed UTF-8 char -- is an XML encoding declaration missing?" (radnumret kan vara f�r l�gt).
    at org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1100)
    at org.apache.crimson.parser.InputEntity.fillbuf(InputEntity.java:1072)
    at org.apache.crimson.parser.InputEntity.isXmlDeclOrTextDeclPrefix(InputEntity.java:914)
    at org.apache.crimson.parser.Parser2.maybeXmlDecl(Parser2.java:1183)
    at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:653)
    at org.apache.crimson.parser.Parser2.parse(Parser2.java:337)
    at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:448)
    at org.apache.crimson.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:185)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
    at getrss.RSSDocument.<init>(RSSDocument.java:25)
    at getrss.Main.main(Main.java:25)

I read files from the web, but there is a XML tag
with the encoding attribute in the RSS file.If you are quite sure that you have an encoding attribute set to ISO-8859-1 then I expect that your RSS file has non-ISO-8859-1 character though I thought all bytes -128 to 127 were valid ISO-8859-1 characters!
Many years ago I had a problem with an XML file with invalid characters. I wrote a simple filter (using FilterInputStream) that made sure that all the byes it processed were ASCII. My problem turned out to be characters with value zero which the Microsoft XML parser failed to process. It put the parser in an infinite loop!
In the filter, as each byte is read you could write out the Hex value. That way you should be able to find the offending character(s).

HTTP-Receiver: Code page conversion error from UTF-8 to ISO-8859-1

Hello experts,
In one of our interfaces we are using the payload manipulation of the HTTP receiver channel to change the payload code page from UTF-8 to ISO-8859-1. And from time to time we are facing the following error:
u201CCode page conversion error UTF-8 from system code page to code page ISO-8859-1u201D
Iu2019m quite sure that this error occurs because of non-ISO-8859-1 characters in the processed message. And here comes my question:
Is it possible to change the error behaviour of the code page converter, so that the error will be ignored?
Perhaps the converter could replace the disruptive character with e.g. u201C#u201D?
Thank you in advance.
Best regards,
Thomas

Hello.
I'm not 100% sure if this will help, but it's a good Reading material on the subject (:
[How to Work with Character Encodings in Process Integration (NW7.0)|http://www.sdn.sap.com/irj/scn/index?rid=/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42]
The part of the XSLT / Java mapping might come in handy in your situation.
you can check for problematic chars in the code.
Good luck,
Imanuel Rahamim.

Big5 to ISO-8859-1

Hi, I want to convert a big5 chinese character to ISO-8859-1 character (&#xxxxx;)
Here is my code:
String record = "��~"; //a big5 string
System.out.println("BIG5: " + record); //Display ok
byte[] b = record.getBytes("ISO-8859-1");
String target = new String(b, "ISO-8859-1");
System.out.println("Target: " + target);
But I can't get the ISO-8859-1 code which like &#xxxxx; but gives me ???.
Please advice.

ISO-8859-1 DO have chinese characters
In JSP, when I submit a chinese value in a form, I "ll
receive this chinese characters in ISO-8859-1
encodings and get: 好人 (In chinese:
�n�H)
And when I put this ISO-8859-1 characters in the value
field of a form, the html gives me chinese character
in Big5 correctly.
I just don't know how to convert ISO-8859-1 to Big5 in
Java and vice versa.
Please help.ISO-8859-1 doesn't support chinese characters.
The reason that you can received chinese characters from a html form was because the charset of that html page was set to BIG-5. When a user makes a request, all request parameters and values will be encoded with the charset of the HTML page.

How to get ISO-8859 characters from DOM ?

Hi,
I have problems to get the ISO-8859 characters from the DOM. I parse a XML file and read the values from the DOM. All "umlauts" are scrambled.
The details:
The first line of the XML file is:
<?xml version="1.0" encoding="ISO-8859-1"?>
It contains lines like:
<tag>Dpfel</tag>
I perform the following steps:
I'm parsing the XML file ...
xmlparser.parse( parser, full_fname_xml );
... get the document ...
doc := xmlparser.getDocument( parser );
... and read the value from the TEXT_NODE:
nodevalue := xmlparser.getNodeValue( node );
And here is the problem: nodevalue does
not contain 'Dpfel' but '?pfel'.
Its not a problem of the database. Explizit INSERTs with "umlauts" e.g. are working fine.
This problem occurs with the XML Parser for PLSQL 1.0.0.1.0 on NT.
[Pls do not recommend to use the newer version. I did a test with the newer version and know it works there.]
My questions relates only to this older version:
Will it never work with this version
or what am I doing wrong ?
Any hints welcome !
Tnx
Franz
null

Tnx for the quick answer.
Sorry, indeed I can not use the newer
version in the moment.
So I urgently need a solution or any
workaround for this version.
F.

IE 9 incorrectly encoding Unicode characters in URIs to ISO-8859-1 instead of UTF8

Lets take the example word
präsentation
In Firefox, if I specify that as a CGI parameter, on the receiving end, I recieve:
pr\\303\\244sentation
which decoded as UTF-8 gives me: pr{U+00E4}sentation or my submitted word präsentation.
What does IE give me, well let's see.
pr\\344sentation
which well, doesn't decode as UTF8 because 0o344 is 0xE4.
ä in Unicode is at the codeopint 0xE4. Which as we've seen above, encoded to UTF8 is
0xC3 0xA4
So question boils down to this.
Why does IE9 use ISO-8859-1 instead of UTF8 for non-ASCII characters in URIs?

Hi,
As my understanding, you could choose the encoding ways by yourself:
Change your Internet Explorer 9 language
encoding settings
Alex Zhao
Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

How to display " ISO-8859-1 " (eg - " &eacute " ) characters in LabelField .

hai,
How to display " ISO-8859-1 " (eg - " &eacute " ) characters in LabelField .
Regards
Ratheesh R Kurup

Hi,
Perhaps
with a as
select 'TESTING' text from dual
select level, substr(a.text,level,1)
from a
connect by level <= length(a.text)

WIN-1252 characters in ISO-8859-1 textfields

My questing concerns the following situation :
The Apex-Dads is configured to be WE8ISO8859P1, the database has the same character set.
When submitting a text-field in an APEX-page, a user can submit win1252 characters : chr:128-159, although the page encoding is
ISO-8859-1.
These characters are even displayed on the apex-page(in IE, FF on windows) as the represented win1252 character.
However when these characters are inserted in the database, I have non-display characters.
My question is, how can I prevent these characters from being entered in the database ?
I've tried these
* I can assume that all input arrives in Win1252 and convert all data in the database. (seems not the optimal solution)
* I can configure the dad to be UTF8, so there will be a conversion from UTF-8-> ISO-8859-1.
However some characters aren't translated properly.
Partly this is acceptable : chr128 (euro) isn't available in ISO-8859-1 but chr146 : right shifted quote is 'translated' to ¿ (inverted quotation mark).
Perhaps someone had the same experience and can give me some more info or tips.
Thanks in advance,
Art
Edited by: amels on Sep 19, 2008 1:11 PM

Hello Art,
>> Do you know why this is / should be ?
This setting became mandatory in version 2.0, when the AJAX framework was introduced. Since then, the APEX environment itself is massively using this technology. Although the XMLHttpRequest object can support deferent character sets, its default setting is UTF-8. Using AL32UTF8 in the DAD simplify the “behind the scenes” AJAX support. This configuration also simplifies import/export of APEX applications and data, minimizing some client-server character set conversions, and I probably don’t know all the reasons that led to make this configuration a mandatory one.
The important thing to remember is that it is a mandatory setting, and ignoring it will defiantly cause you problems in functionality, both in the development environment, and the run-time environment.
Regards,
Arie.

Encoding Characters ISO-8859-2

You say:
You need to set your $ORACLE_HOME environment variable point to the database of interest.
I mean:
We work under WinNT(SP4) with MS DevStudio 6.0. What can I do to encoding Character ?
I put line <?xml version="1.0" encoding="ISO-8859-2"?> in the XML-Header, but the oracle-parser say ' LPX-00201 '.
Help, I need somebody help ? Thanks Matthias

I've found a solution. Yes, the problem was, when I've written polish characters into Excel file it encoded them as a rubbish. The problem was, I didn't set the cell encodding. So my code after I've fixed the problem looks like:
HSSFCell cell = row.createCell((short)0);
// And below is the line I was looking for
cell.setEncoding(HSSFCell.ENCODING_UTF_16);
cell.setCellValue("Some text with polish characters");
Now it works great. Thanks anyway.

How to store UTF-8 characters in an iso-8859-1 encoded oracle database?

How can we store UTF-8 characters in an iso-8859-1 encoded oracle database? We can NOT change the database encoding but need to store e.g. Polish or Russian characters besides other European languages.
Is there any stable sollution with good performance?
We use Oracle 8.1.6 with iso-8859-1 encoding, Bea WebLogic 7.0, JDK 1.3.1 and the following thin driver: "Oracle JDBC Driver version - 9.0.2.0.0".

There are a couple of unsupported options, but I wouldn't consider using them on a production database running other critical applications. I would also strongly discourage their use unless you understand in detail how Oracle National Language Support (NLS) works, otherwise you could end up with corrupt data or worse.
In a sense, you've been asked to do the impossible. The existing databas echaracter sets do not support encoding the data you've been asked to store.
Can you create a new database with an appropriate database character set and deploy your application there? That's probably the easiest solution.
If that isn't an option, and you really need to store data in this database, you could use one of the binary data types (RAW and BLOB), but that would mean that it would be exceptionally difficult for applications other than yours to extract the data. You would have to ensure that the data was always encoded in the same character set, otherwise you wouldn't be able to properly decode it later. This would also add a lot of complexity to your application, since you couldn't send or recieve string data from the database.
Unfortunately, I suspect you will have to choose from a list of bad options.
Justin
Distributed Database Consulting, Inc.
http://www.ddbcinc.com/askDDBC

UTF-8 encoding vs ISO 8859-1 encoding

The iTunes tech specs call for UTF-8 encoding of the XML feed file; a friend of mine uses feed generator software through his blog that uses ISO 8859 encoding. Is there a way to convert the latter to UTF-8 so that iTunes tags may be successfully added?
When I tried editing his XML file, I got error messages when I submitted the file to RSS feed validator sites (such as http://feedvalidator.org/. Any help or knowledge is appreciated because I am not the least bit expert in this coding arena.

You don't need to convert iso 8859-1 (us-ascii) to utf-8 unless you have nonstandard characters. Basically, ascii is a subset of utf-8 and for English it will serve you just fine. You can have iTunes tags in the xml file even if the file itself is encoded in iso 8859-1.
The error you see at feedvalidator.org is most likely a warning.
Hope this helps!
- Andy Kim
Potion Factory
http://www.potionfactory.com

Mail Receiver - Send file in ISO-8859-1 encoding

Hi,
I'm sending mail with an attachment using mail adapter, but instead of specified ISO-8859-1 it is converted to UTF-8 no BOM,. Because of that, some characters (ñ,ç, etc) are not transferred properly.
Settings:
Message protocol: XIPAYLOAD
No mail package.
Transform.ContentType: multipart/mixed; boundary=--AaZz; charset=ISO-8859-1
Payload:
multipart/mixed; boundary=AaZz; charset=ISO-8859-1</Content_Type><Content>--AaZz
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: inline
File attachment
AaZz
Content-Type: text/plain; charset= ISO-8859-1
Content-Disposition: attachment; filename=TestFile
iso-8859 characters ñ ç ñ ñ
AaZz--
</Content></ns:Mail>
I need advice in how to force the file to be created with ISO-8859-1 enconding.
Thanks in advance.
Regards,
Iván.

Hi Jean-Philippe,
Yes, please check my first post, if you use same settings, and create message as mine, it should work, the TestFile is created as an attachment.
Include this line in the module configuration with transform key:
Transform.ContentType: multipart/mixed; boundary=--AaZz;
If you still have issues, please give me a description of the error.
Regards,
Ivan.

HTTP adapter - change encoding from UTF-8 to ISO-8859-1

Hi,
I am trying to change the encoding used by the HTTP sender adapter in a scenario.
However, when I enter ISO-8859-1 in the XML Code under XI Payload Manipulation on the comms channel it has no effect - the paylad still shows as UTF-8 in SXI_MONITOR.
Am I missing a step or entering the field incorrectly ??
Thanks
Colin.

Hi,
From help
Enhancing the Payload
Some external systems, for example, Web servers in marketplaces, can only process data if it is sent as an HTML form using HTTP.
A typical HTML form comprises named fields. When transferring a completed form to the server or a CGI program, the data must be transferred in such a way that the CGI script can recognize the fields that make up the form, and which data was entered in which field.
The plain HTTP adapter constructs this format using a prolog and an epilog. Therefore, there is a particular code method that separates form fields and their data from each other. This code method uses the following rules:
     Individual form elements, including their data, are separated from each other by the character &.
     The name and data of a form element are separated from each other by an equals sign (=).
     Blanks in the entered data (for example, in multiple words) are replaced by a plus sign (+).
    All characters with the (enhanced) ASCII values 128 to 255 (hexadecimal 80 to FF) are transcribed using a hexadecimal sequence, beginning with a percentage sign (%) followed by the hexadecimal value of the character (for example, the German umlaut ö in the character set ISO-8859-1 is transcribed as %F6).
   All characters that occur in these rules as control characters (&, +, =, and %) are also transcribed hexadecimally in the same way as high value ASCII characters
http://help.sap.com/saphelp_nw2004s/helpdata/en/44/79973cc73af456e10000000a114084/content.htm
Regards
Chilla

ISO-8859-1 characters in xmlDom.domDocument

Similar Messages

Maybe you are looking for