Validate xmldocument content whether UTF-8 encoded using java

hi all,
hope every one r fine and cool going ...
iam a newbie.to xml..
i need help from the forum...
i wanted to validate an xml document content whether it is UTF-8 encoded or not....
i have to do using java....
could u provide me with the sample code...??
Any kind of help would be appreciated....its very urgent!!!
thanx in advance...
sneha

Hi,
Possibly the data you receiving is not UTF-8... Have you check with the data provider?
Try with other encoding... like ISO-8859-1 for example...
oracle.soa.common.util.Base64Decoder.decode(zipname.getBytes("ISO-8859-1"));
Cheers,
Vlad

Similar Messages

How does one know whether or not they use "Java applets"?

The support doc for the recent Java update (Update 8, for Snow Leopard), entitled "About Java for Mac OS X 10.6 Update 8," advises the following:
If you do not use Java applets, it is recommended that you disable the Java web plug-in in your web browser.
How does one know whether or not they use "Java applets"?
Thanks.
URL: http://support.apple.com/kb/HT5243

K.S. wrote:
dymar wrote:
Also, how would I know that a missing applet was causing some feature(s) not to work in a situation where no error mesage was returned?
Sometimes you have to dig to find out: http://earthnow.usgs.gov/earthnow_app.html
doesn't tell you directly, but it is mentioned in the FAQ that Java is required. If the content is appropriate, you can always ask here.
Thanks. According to that webpage, my "Java is out of date." When an error message like that is returned, I guess it's clear that "it's a Java problem." Presumably, one would then just go to java.com and download the applet if he/she wanted to view the webpage.
I was wondering more about situation when unexplained problems that involved missing Java applets weren't noted in error messages.
But maybe I'm worrying about something that doesn't really need to be worried about.

How to modify the content of a xml element using java?

Hi all,
In my usecase I need to export some data from the database to the external file (Ms Word) using java.
The data which I get from the database will be in the form of a xml file. In that xml file
I got to modify the content of an attribute of a xml element in the java class. Kindly comeup with your help
to achieve this.
Thanks,
Phanindra.
Edited by: 887737 on Dec 5, 2011 5:52 AM

Why don't you try Xerces2?Why don't you tell him to use the javax.xml APIs that are already built in? And that use Xerces2 under the hood? Instead of throwing out a suggestion that might lead to him adding another copy of Xerces into his application?
@OP there are several techniques:
- string replacement as suggested by jschell
- XML parsing to a DOM and then use the DOM API
- XSLT
Which you should use depends on the complexity of your requirement.

BinaryXML/securefile encoding using Java

Hey All,
I am trying to save to a BinaryXML column stored as Securefile using the following code fragment, but I get this exception at the last line java.sql.SQLException: ORA-31011: XML parsing failed. This is for an ALLOW NON SCHEMA column (I need the flexibility) over theThin JDBC driver.
BinXMLProcessor proc = BinXMLProcessorFactory.createProcessor();
               BLOB blob = BLOB.createTemporary(conn, true, BLOB.DURATION_SESSION);
               blob.open(BLOB.MODE_READWRITE);
               OutputStream blobOutpStream = blob.setBinaryStream(0L);
               BinXMLStream outbin = proc.createBinXMLStream(blob);
               BinXMLEncoder enc = outbin.getEncoder();
               enc.setProperty(BinXMLEncoder.ENC_SCHEMA_AWARE, false);
               enc.setProperty(BinXMLEncoder.ENC_INLINE_TOKEN_DEFS, true);
               ContentHandler hdlr = enc.getContentHandler();
               XMLReader parser = new SAXParser();
               ((SAXParser)parser).setValidationMode(SAXParser.NONVALIDATING);
               ((SAXParser)parser).setContentHandler(hdlr);
               parser.parse(new InputSource(new FileReader(inputDoc)));
               blobOutpStream.close();
               blob.close();
               String query = "INSERT INTO testxml(data) VALUES (?)";
               OraclePreparedStatement sqlStatement = null;
               sqlStatement = (OraclePreparedStatement) conn.prepareStatement(query);
               XMLType xmlData = new XMLType(conn, blob, CharacterSet.AL32UTF8_CHARSET);
               sqlStatement.setObject(1,xmlData);//blob);
               sqlStatement.execute();
When I debug through this code, the blob does not seem be updated after the statement at parser.parse(...). But if I specify a disk file as the argument to createBinXMLStream, that file is updated, but I am not able to read back XML from that file.
I am new to XML programming, and suspect I am not doing some part right. Any idea what is the issue here?
Thanks
Adi

Closing

How to validate UTF-8 characters using Regex?

Hi All,
In one of my applications, i need to include UTF-8 character set for validation of a certain string, which I am validating using a Regex.
However, I do not know how to include UTF-8 characters in a Regex, or if at all, we can specify the UTF-8 charaters ina regex.
Please Help!! Its Urgent!!!
Thanks in Advance,
Rajat Aggarwal

Ok, Let me re-state my problem again, and exactly what i am looking for:
I have an XML file with the following header: <?xml version="1.0" encoding="UTF-8"?>
This XML file contains a tag, whose text is to be validate for a syntax : Operand operator Operand.
Now, the operand on the right hand side of the operator could be a variable, or a string literal, which may contain some permissible special characters (as said above), and may or may not contain UTF-8 characters as well.
I am using the xerces SAXParser to parse the XML document, and am retrieving the text of the elemnt tag with the method <code>element.getChildText("<tagName>")<//code>
According to the org.jdom.Element API Docs,
the getChildText() method is defined as follows:
h3. getChildText{noformat}public java.lang.String getChildText(java.lang.String name){noformat}<dl><dd>Returns the textual content of the named child element, or null if there's no such child. This method is a convenience because calling <code>getChild().getText()</code> can throw a NullPointerException. <br<dd><dl><dt>Parameters: </dt><dd><code>name</code> - the name of the child </dd><dt>Returns: </dt><dd>text content for the named child, or null if no such child
</dd></dl></dd></dl>
Now, I am not sure if the String that I am reading is in UTF-8 Format. Is there any special way of reading a string in that format, or for that matter, convert a string to UTF-8 encoding?
h3.

Can't use UTF-16 encoding with XML Parser for Java v2.

This is my XML Document:
<?xml version="1.0" encoding="UTF-16" ?>
<Content>
<Title>Documento de Prueba de gestin de contenidos.</Title>
<Creator>Roberto P     rez Lita</Creator>
</Content>
This is the way in which i parse de document:
DOMParser parser=new DOMParser();
parser.setPreserveWhitespace(true);
parser.setErrorStream(System.err);
parser.setValidationMode(false);
parser.showWarnings(true);
parser.parse(
new FileInputStream(new File("PruebaA3Ingles.xml")));
I've got this error:
XML-0231 : (Error) Encoding 'UTF-16' is not currently supported.
I am using the XML Parser for Java v2_0_2_5 and I am a little
confused because the documentation says that the UTF-16 encoding
is supported in this version of the Parser.
Does anybody know how can I parse documents containing spanish
accents?
Thanks in advance.
Roberto P     rez.
null

Oracle just uploaded a new release of V2 Parser. It should
support UTF-16.
Yet, other utilities still have some problems with UTF-16
encoding. Seems we just
have to wait this one out.
BTW, I'm trying to use Japanese. We, also, have some problems
with JServer.
Roberto P     rez (guest) wrote:
: This is my XML Document:
: <?xml version="1.0" encoding="UTF-16" ?>
: <Content>
: <Title>Documento de Prueba de gestin de contenidos.</Title>
: <Creator>Roberto P     rez Lita</Creator>
: </Content>
: This is the way in which i parse de document:
: DOMParser parser=new DOMParser();
: parser.setPreserveWhitespace(true);
: parser.setErrorStream(System.err);
: parser.setValidationMode(false);
: parser.showWarnings(true);
: parser.parse(
: new FileInputStream(new File("PruebaA3Ingles.xml")));
: I've got this error:
: XML-0231 : (Error) Encoding 'UTF-16' is not currently supported.
: I am using the XML Parser for Java v2_0_2_5 and I am a little
: confused because the documentation says that the UTF-16
encoding
: is supported in this version of the Parser.
: Does anybody know how can I parse documents containing spanish
: accents?
: Thanks in advance.
: Roberto P     rez.
null

How to get UTF-8 encoding when create XML using DBMS_XMLGEN and UTL_FILE ?

How to get UTF-8 encoding when create XML using DBMS_XMLGEN and UTL_FILE ?
Hi,
I do generate XML-Files by using DBMS_XMLGEN with output by UTL_FILE
but it seems, the xml-Datafile I get on end is not really UTF-8 encoding
( f.ex. cannot verifying it correct in xmlspy )
my dbms is
NLS_CHARACTERSET          = WE8MSWIN1252
NLS_NCHAR_CHARACTERSET     = AL16UTF16
NLS_RDBMS_VERSION     = 10.2.0.1.0
I do generate it in this matter :
declare
xmldoc CLOB;
ctx number ;
utl_file.file_type;
begin
-- generate fom xml-view :
ctx := DBMS_XMLGEN.newContext('select xml from xml_View');
DBMS_XMLGEN.setRowSetTag(ctx, null);
DBMS_XMLGEN.setRowTag(ctx, null );
DBMS_XMLGEN.SETCONVERTSPECIALCHARS(ctx,TRUE);
-- create xml-file:
xmldoc := DBMS_XMLGEN.getXML(ctx);
-- put data to host-file:
vblob_len := DBMS_LOB.getlength(xmldoc);
DBMS_LOB.READ (xmldoc, vblob_len, 1, vBuffer);
bHandle := utl_file.fopen(vPATH,vFileName,'W',32767);
UTL_FILE.put_line(bHandle, vbuffer, FALSE);
UTL_FILE.fclose(bHandle);
end ;
maybe while work UTL_FILE there is a change the encoding ?
How can this solved ?
Thank you
Norbert
Edited by: astramare on Feb 11, 2009 12:39 PM with database charsets

Marco,
I tryed to work with dbms_xslprocessor.clob2file,
that works good,
but what is in this matter with encoding UTF-8 ?
in my understandig, the xmltyp created should be UTF8 (16),
but when open the xml-file in xmlSpy as UTF-8,
it is not well ( german caracter like Ä, Ö .. ):
my dbms is
NLS_CHARACTERSET = WE8MSWIN1252
NLS_NCHAR_CHARACTERSET = AL16UTF16
NLS_RDBMS_VERSION = 10.2.0.1.0
-- test:
create table nh_test ( s0 number, s1 varchar2(20) ) ;
insert into nh_test (select 1,'hallo' from dual );
insert into nh_test (select 2,'straße' from dual );
insert into nh_test (select 3,'mäckie' from dual );
insert into nh_test (select 4,'euro_€' from dual );
commit;
select * from nh_test ;
S0     S1
1     hallo
1     hallo
2     straße
3     mäckie
4     euro_€
declare
rc sys_refcursor;
begin
open rc FOR SELECT * FROM ( SELECT s0,s1 from nh_test );
dbms_xslprocessor.clob2file( xmltype( rc ).getclobval( ) , 'XML_EXPORT_DIR','my_xml_file.xml');
end;
( its the same when using output with DBMS_XMLDOM.WRITETOFILE )
open in xmlSpy is:
<?xml version="1.0"?>
<ROWSET>
<ROW>
<S0>1</S0>
<S1>hallo</S1>
</ROW>
<ROW>
<S0>2</S0>
<S1>straޥ</S1>
</ROW>
<ROW>
<S0>3</S0>
<S1>m㢫ie</S1>
</ROW>
<ROW>
<S0>4</S0>
<S1>euro_</S1>
</ROW>
</ROWSET>
regards
Norbert

Not able to handle Special Character in Bpel using UTF-8 Encoded XML

I am using UTF-8 encoded xml for my application,but during conversion of XML from one source to other using simple BIOS java programme , i am not able to convert special characters like(Göblyös Tünde,Makaróni etc).All getting converted to (G�bly�s T�nde,Makar�ni etc).As a result data corruption occurred.Please let me know if any go across this issue.

Hi,
Possibly the data you receiving is not UTF-8... Have you check with the data provider?
Try with other encoding... like ISO-8859-1 for example...
oracle.soa.common.util.Base64Decoder.decode(zipname.getBytes("ISO-8859-1"));
Cheers,
Vlad

Exception while unmarshalling UTF-8 encoded XML String, using JAXB.

hi folks. First of all, thank you for contributing to my queries as of now.
Problem statement.
- This happens when i try to unmarshall a webservice response, which is nothing but a simple UTF-8 encoded XML string in an soap envelope.
- 0xae is the register character: ®.
- My next step was to ensure that my code works without this character. So I removed all occurances from my XML. It worked just fine...
- So what do you guys suggest me to get rid of this problem?
- Any suggestion will be treated as valuable resource.
- Is there some kind of encoding setting with jaxb ?
An invalid XML character (Unicode: 0xae) was found in the element content of the document.
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xae) was found in the element content of the document.
     at weblogic.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1273)
     at weblogic.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:603)
     at weblogic.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1319)
     at weblogic.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:396)
     at weblogic.apache.xerces.framework.XMLParser.parse(XMLParser.java:1119)
     at weblogic.xml.jaxp.WebLogicXMLReader.parse(WebLogicXMLReader.java:135)
     at weblogic.xml.jaxp.RegistryXMLReader.parse(RegistryXMLReader.java:133)
     at com.sun.xml.bind.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:139)
     at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:129)
     at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:166)
     at com.hp.wwopsit.econfigure.helper.JaxbUtils.xmlStringToJaxbObject(JaxbUtils.java:66)
     at com.hp.wwopsit.econfigure.core.transformation.IPCAdapterMapper.x2oLoadConfig(IPCAdapterMapper.java:376)
     at com.hp.wwopsit.econfigure.core.adapter.IPCAdapter.loadConfiguration(IPCAdapter.java:144)
     at com.hp.wwopsit.econfigure.core.adapter.IPCAdapter.main(IPCAdapter.java:291)
--------------- linked to ------------------
javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xae) was found in the element content of the document.]
     at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:284)
     at com.sun.xml.bind.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:143)
     at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:129)
     at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:166)
     at com.hp.wwopsit.econfigure.helper.JaxbUtils.xmlStringToJaxbObject(JaxbUtils.java:66)
     at com.hp.wwopsit.econfigure.core.transformation.IPCAdapterMapper.x2oLoadConfig(IPCAdapterMapper.java:376)
     at com.hp.wwopsit.econfigure.core.adapter.IPCAdapter.loadConfiguration(IPCAdapter.java:144)
     at com.hp.wwopsit.econfigure.core.adapter.IPCAdapter.main(IPCAdapter.java:291)
***Jaxb Exception while converting xml file to object. Possible cause, Invalid schema or unrecognized elements in input XML. Actuall exception message:javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xae) was found in the element content of the document.]
End..
Output completed (44 sec consumed) - Normal Termination

This is how the XML looks like ..
<?xml version="1.0" encoding="UTF-8" ?>
- <configresponse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<csticShortText>processor</csticShortText>
- <csticValues>
- <csticValue id="024" selected="false">
<desc>Pentium� 4 1.7GHz/400MHz</desc>
</configresponse>

UTF-8 encoding Funny characters in DB

Dear All,
I am Facing a Critical issue which has been lagging, dragging
from 3 days, still couldn't figure out an issue
I've an XML file which am Getting it from a Server using
<cffile action="READ" file="#application.settings.paths"
variable="xml" charset="utf-8">
and using an XMLPARSE to parse that xml and trying to insert
that xml data into Database, in XML i've these
characters master
â€™s, which should be like a single quote
encoded like this but when saving to Database
using a coldfusion query, Data is saving as some Funny
Characters (i.e., master
?s),
xml encoding is in UTF-8 and i don't know how to convert that
zunk characters to normal characters like ( master's) - single
quote)
here are the things i tried.
in Coldfusion Administrator i added a Connection String
"useUnicode=true&characterEncoding=UTF-8"
and checked the box which says "
enable unicode for datasources configured for non-latin
characters"
Used a ConvertCharset Function passing xml object .. [/li]
ii)
<cfscript>
function convertCharset(str,charsetFrom,charsetTo)
var resultStr="";
var javaString="";
var byteArray="";
javaString = CreateObject("java", "java.lang.String");
javaString.init(str);
byteArray = javaString.getBytes(charsetFrom);
resultStr = CreateObject("java", "java.lang.String");
resultStr.init(byteArray,charsetTo);
return resultStr.toString();
</cfscript>
<cfcontent type="text/html; charset=UTF-8">
<cfset setEncoding("URL", "UTF-8")>
<cfset setEncoding("Form", "UTF-8")>
tried this method also
http://www.bennadel.com/blog/1206-Content-Is-Not-Allowed-In-Prolog-ColdFusion-XML-And-The- Byte-Order-Mark-BOM-.htm[/b
Please let me know if i need to do anything.. other than the
above methods,
Thanks

I am using SQL SERVER 2005 Database,
Field is "Description" Varchar(2000)
did you perform your test using the same table, code, etc.?
Yes
did you read in & dump out the xml file? Yes, I dumped
the xml file and if i open in NOTEPAD in UTF-8 (filetype) then i
see a single quote instead of that different character.
is it really utf-8?
so i think it's utf-8,
if your mojibake. chars are from an ms word document, then
they're not utf-8 but a superset of
latin-1.
they are not from MS WORD, i got an XML file which has all
the course and presentation information..structured properly except
those characters.. like
("younus has a Bachelorâ€™s degree). i see
that in UTF-8
so i want to know to which format do i need to convert to when
saving in Database (SQL SERVER 2005)
Thanks.

XML, XSL - HTML(JSP), bad UTF-8 encoding on Tomcat

I use simple transformation to create HTML (JSP) page from XML, XSL (both are encoded in UTF-8}. When doing this in embedded web server in JD9i developer, the result is fine. But when run at Tomcat 4.0.3, UTF-8 chars are displayed as 2 characters. It's interesting that the transformation gives different results:
- in JD9i the UTF-8 chars are left unchanged and shown good in IE browser
- in Tomcat the chars with acute (/) are transformed into "í" , "á" and so on and shown properly, whereas chars with caron (V) are encoded and shown badly.
Does somebody know solution to this ? Thanks.

Doing some more experiments reveals a subtle difference between GET and POST with Tomcat - it may apply to other application servers, but I have only tested with Tomcat 5 and Tomcat 6, so I can't say.
The following is essentially what most people have indicated you need to do to have your JSP handle UTF-8:
<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>My Title</title>
</head>
<body>
</body>
</html>This is what I couldn't get working with my GET (the default for forms) and ended up working when I specified the URIEncoding for the connector in the server.xml file.
It turns out that the above will work with no modifications to the server.xml file if the form method is POST. If you need to be able use GETs then you will need to modify the connector, otherwise it would seem nothing you do at the JSP or servlet level will make a difference, unless you want to convert each parameter manually:
String parameter = new String( request.getParameter("myParam").getBytes("ISO-8859-1"), "UTF-8" );Edited by: ajmasx on Sep 27, 2007 10:20 AM

UTF-8 encoding and BOM

I'm reading in a file that's encoded in UTF-8 and begins with the byte-order mark of EF BB BF. I'm curious to know why a byte-order mark is needed for something encoded in UTF-8, because aren't BOMs only used to figure out endianness, which isn't an issue with UTF-8 as some tutorials I've seen say. But then again, UTF-8 can use multiple bytes to specify a character, in which case endianness does matter, right?
My question is therefore whether endianness matters with UTF-8?
Also, at the online converter at the URL http://macchiato.com/unicode/convert.html, why can you convert from FE FF in UTF-16 to EF BB BF in UTF-8, but cannot do the conversion in the opposite way? Is it a problem with the converter, or something to do with the encodings themselves?
Thanks.

http://www.unicode.org/faq/utf_bom.html#25
As for that URL, not likely that anyone here would know what's wrong with it. Could be that it doesn't assume UTF-8 will have a BOM, as it's not needed anyway, so doesn't treat it as such and converts it as if they were regular characters.

UTF-8 encoding trouble

I need to use UTF8 encoding throughout a site. For that purpose, I have the following
tags on JSP:
<%@ page contentType="text/html; charset=UTF-8" %>
<meta http-equiv="Content-Type" CONTENT="text/html; charset=UTF-8">
Next, in my weblogic.xml, I have the following:
<jsp-param>
<param-name>encoding</param-name>
<param-value>UTF8</param-value>
</jsp-param>
<charset-params>
<input-charset>
<resource-path>*.jsp</resource-path>
<java-charset-name>UTF8</java-charset-name>
</input-charset>
</charset-params>
Having configured this, I have two simple JSP files. The first one submits a field
(whose contents I enter in Greek), and the second page writes them to a file. The
code for writing to a file looks like this:
FileOutputStream of = new FileOutputStream (fileName, false);
OutputStreamWriter ow = new OutputStreamWriter (of, "UTF-8");
ow.write (request.getParameter("test"));
When I enter the Greek character Alpha as input, the file has a weird string +I in
it. To fix the problem, I did the following (and it works):
String s = request.getParameter ("TestName");
byte b[] = new byte [5000];
b = s.getBytes ();
s = new String (b, "UTF-8");
writeToFile (s);
Which means that for some reason, the page gets the right String, but it seems to
be encoded with default encoding (not UTF8). When I convert it into bytes, and create
another String using the same byte-stream but a different encoding, what I get is
correct UTF-8 encoded string. Please also note that the same problem occurs with
DB as well (Oracle 8.1.7 with UTF8 on Win2k), and fixing the above code fixes problem
at both file and database level.
Rather than the above workaround, what's the proper way to accomplish this?
Thanks,
Raja

In GlassFish i have changed this now below here. Under each listeners both for Network Listeners and Protocols there are an HTTP tab and under that one i have change this,
Network Config
Network Listeners
http-listeners-1
http-listeners-2
admin-listeners
Protocols
http-listeners-1
http-listeners-2
admin-listeners
URI Encoding: UTF-8
Default Response Type: text/plain; charset=UTF-8
Forced Response Type: text/plain; charset=UTF-8
So when i run curl in a terminal window i get this response:
Macintosh:~ jespernyqvist$ curl -I http://neptunediving.com/neptune/index.jsp
HTTP/1.1 200 OK
Date: Mon, 17 May 2010 04:14:17 GMT
Server: GlassFish v3
X-Powered-By: JSP/2.1
Content-Type: text/html;charset=UTF-8
Content-Language: en-US
Transfer-Encoding: chunked
Set-Cookie: JSESSIONID=478269c08e050484d1d6fa29fc44; Path=/neptune
As you can see now my HTTP Header is looking good, no more charset=iso-8859-1. The only problem i have here is that there is no space in between text/html;charset=UTF-8. I think this should be like this instead or not, text/html; charset=UTF-8? I have noticed that they are very case sensitive so maybe this is a problem for me?
On top of my header i have this;
<%@page import="com.neptunediving.*"%>
<%@include file="WEB-INF/include/LangSupport.jsp"%>
<%@page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
In my header i have this;
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
I have changed in the preferences for Eclipse to use UTF-8. I have gone thru all properties files in my project and changed them to UTF-8 also. So what else are they to change?
Still my page is nor displayed properly, now in all browsers like Safari, Firefox, Opera and Internet Explorer. So what is wrong with my page since this don't work for me? Can anybody please explain this to me?

UTF-8 Encoding errors during nightly batch runs

My boss recently tasked me with researching (and hopefully resolving) why our XML frequently has UTF-8 encoding errors.
I've been in the IS world for less than a year now so please bear with me when it comes to terms, data flow, etc.
Overview:
Our Oracle DB spits out XML for the nightly batch runs into a file location, lets say C:\xPression\CustomerData\Certificate.xml. The XML is in Courier New font but some characters make their way into the XML but arent supported. The big one is the elongated ' - ' character. Just one instance of this and the entire XML fails.
When the batch job is run sometimes there are encoding errors (¿, ¡, -, etc) and every morning I have to come in, finding the invalid character, fix it and have the job re-run.
I want to know if there's a way so that the XML that comes out is always in the Courier New font, or is there a way to convert it.

I want to know if there's a way so that the XML that comes out is always in the Courier New font, or is there a way to convert it.
First thing first, an XML file is a text file, it doesn't have a "font" but an encoding.
The font is the graphical representation of characters and it is related to whatever client tool you're using to view the content, not to the content itself.
That being said, a lot of fonts do not support the full range of unicode characters so you may get replacement characters in some case.
We're missing some information to provide an answer :
- what's the database version?
- what's the character set of the database?
- how are you generating and writing the XML to the file ? UTL_FILE, dbms_xslprocessor, dbms_xmldom?
If the file is generated using UTF-8 encoding then the issue might just be that you're not using an UTF-8-enable editor.

UTF-8 encoding

Hi,
I'm having trouble with parsing XML stored in NCLOB column using UTF-8 encoding.
Here is what I'm running:
Windows NT 4.0 Server
Oracle 8i (8.1.5) EE
JDeveloper 3.0, JDK 1.1.8
Oracle XML Parser v2 (2.0.2.5?)
The following XML sample that I loaded into the dabase contains two UTF-8 multi-byte characters:
<?xml version="1.0" encoding="UTF-8"?>
<G><A>GBotingen, BrC<ck_W</A></G>
G(0xc2, 0x82)otingen, Br(0xc3, 0xbc)ck_W
If I'm not mistaken, both multibyte characters are valid UTF-8 encodings and they are defined in ISO-8859-1 as:
0xC2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0xFC LATIN SMALL LETTER U WITH DIAERESIS
I wrote a Java stored function that uses the default connection object to connect to the database, runs a Select query, gets the OracleResultSet, calls the getCLOB method and calls the getAsciiStream() method on the CLOB object. Then it executes the following piece of code to get the XML into a DOM object:
DOMParser parser = new DOMParser();
parser.setPreserveWhitespace(true);
parser.parse(istr); // istr getAsciiStream
XMLDocument xmldoc = parser.getDocument();
Before the stored function can do other thinks, this code seems to throw an exception complaining that the above XML contains "Invalid UTF8 encoding".
Now, when I remove the first mutlibyte character (0xc2, 0x82) from the XML, it parses fine.
Also, when I do not remove this character, but connect via the jdbc racle:thin driver (note that now I'm not running inside the RDBMS as stored function anymore) the XML is parsed with no problem and I can do what ever I want with the XMLDocument. Note that I loaded the sample XML into the database using the thin jdbc driver.
One more thing, I tried two database configurations with WE8ISO8859P1/WE8ISO8859P1 and WE8ISO8859P1/UTF8 and both showed the same problem.
I'll appreciate any help with this issue. Thanks...

I inserted the document once by using the oci8 driver and once by using the thin driver. Then I used the DBMS_LOB package to look at the individual characters and convert those characters using the ASCII function.
It looks like that when I inserted the document using the OCI8 driver, they got converted into a pair of 191 (0xbf) characters. However, when I used the thin driver they ended up being stored as 195 (0xc3) and 130 (0x82).
So it looks like that the OCI8 driver is corrupting the individual characters and that if the characters is not corrupted they cause a following exception to be thrown:
Error: 440, SQL execution error, ORA-29532: Java call terminated by uncaught Java exception: java.io.UTFDataFormatException: Invalid UTF8 encoding. ORA-06512: at "SYSTEM.GETWITHSTYLE", line 0 ORA-06512: at line 1
Note that my other example of mutli-byte character (C<) also gets corrupted by the OCI8 driver but does not cause the above exception to be thrown if it's inserted via the thin driver.
null

Validate xmldocument content whether UTF-8 encoded using java

Similar Messages

Maybe you are looking for