Question about special characters in XML documents

Hi,
I need to create some XML documents based on documents containing elements containing strings containing accentuated and special characters such as ��� or Asian characters.
I have found the following code claiming that XML documents created by the xformer will have a utf-8 format:
// Write the DOM document to the file
Transformer xformer = TheTF.newTransformer();
xformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");and I have found the InputSource object to be used to read UTF-8 XML documents:
InputSource in = new InputSource(new FileReader(filename));
in.setEncoding("UTF-8");Is this enough to make sure that accentuated and Asian characters (or any other characters) will be converted properly when creating and re-reading the XML document? If one of my elements contain the following text '���33445tata', will the string remain the same when re-read from the XML? Do I need more code to make sure that all conversion will be performed properly and transparently? If yes, can one provide an example?
Thanks!

Hi Dr.Clap,
Thanks for your answer, but I am not really sure I understand you. I have performed a test to generate a simple document containing one element. An attribute containing some special characters is added to the element. The element text also contains specials characters.
When I generate the XML into a byte array input and print the content, I get:
<?xml version="1.0" encoding="utf-8" standalone="no"?><UTF8TEST BIBI="��ii��������">tre33������</UTF8TEST>When I parse the document, and print the result, I get:
-- Begin of Parsing
Element Name = UTF8TEST
Attribute Name = BIBI
Attribute Value = �ii����
Found text : tre33���
-- End of Parsingwhich is correct. I am worried by the ��ii� ������-like characters generated in the XML. Should I be worried or shouldn't I? Should I convert these into something looking a more like ASCII, or is this fine?
I am including the code of my example (excluding imports):
public class XMLUTF8_Test extends DefaultHandler {
    // Logger
    private final static Logger LOG = Logger.getLogger(XMLUTF8_Test.class.getName());
    // Static
    public static DocumentBuilderFactory DOCUMENT_BUILDER_FACTORY;
    public static DocumentBuilder DOCUMENT_BUILDER;
    public static TransformerFactory TRANSFORMER_FACTORY;
    private static SAXParserFactory FACTORY;
    private static SAXParser SAXPARSER;   
    private static XMLReader XMLREADER;
    static {
        // Preparing the writting
        DOCUMENT_BUILDER_FACTORY = DocumentBuilderFactory.newInstance();
        try {
            DOCUMENT_BUILDER = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();
        } catch (ParserConfigurationException Ex) {
            Debugger.LogUnreachableCodeReachedSituation(LOG, Ex);
        TRANSFORMER_FACTORY = TransformerFactory.newInstance();
        // Preparing the reading
        try {
            FACTORY = SAXParserFactory.newInstance();
            SAXPARSER = FACTORY.newSAXParser();
            XMLREADER = SAXPARSER.getXMLReader();
        } catch (ParserConfigurationException Ex) {
            Debugger.LogUnreachableCodeReachedSituation(LOG, Ex);
        } catch (SAXException Ex) {
            Debugger.LogUnreachableCodeReachedSituation(LOG, Ex);
    public static void main(String[] args) {
        // Perform test
        new XMLUTF8_Test();
    public XMLUTF8_Test() {
        // Saving to File
        Document NewDoc = DOCUMENT_BUILDER.newDocument();
        Element Root = NewDoc.createElement("UTF8TEST");
        Root.setAttribute("BIBI", "�ii����");
        Root.setTextContent("tre33���");
        NewDoc.appendChild(Root);
        ByteArrayOutputStream TheBAOS = new ByteArrayOutputStream();
        try {
            Source source = new DOMSource(NewDoc);
            Result result = new StreamResult(TheBAOS);
            Transformer xformer = TRANSFORMER_FACTORY.newTransformer();
            xformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
            xformer.transform(source, result);
        } catch (TransformerConfigurationException Ex) {
            Debugger.LogUnreachableCodeReachedSituation(LOG, Ex);
        } catch (TransformerException Ex) {
            Debugger.LogUnreachableCodeReachedSituation(LOG, Ex);
        // Printing result
        System.out.println(TheBAOS.toString());
        // Parsing the result
        ByteArrayInputStream TheBAIS = new ByteArrayInputStream(TheBAOS.toByteArray());
        XMLREADER.setContentHandler(this);
        try {
            XMLREADER.parse(new InputSource(TheBAIS));
        } catch (IOException Ex) {
            Debugger.LogUnreachableCodeReachedSituation(LOG, Ex);
        } catch (SAXException Ex) {
            Debugger.LogUnreachableCodeReachedSituation(LOG, Ex);
    @Override
    public void startDocument() throws SAXException {
        System.out.println("-- Begin of Parsing");
    @Override
    public void endDocument() throws SAXException {
        System.out.println("-- End of Parsing");
    @Override
    public void startElement(String namespaceURI, String LocalName, String QualifiedName,
            Attributes TheAttributes) throws SAXException {
        String ElementName = LocalName;
        if ("".equals(ElementName)) {
            ElementName = QualifiedName; // namespaceAware = false
        System.out.println("Element Name = " + ElementName);
        if (TheAttributes != null) {
            for (int i = 0; i < TheAttributes.getLength(); i++) {
                String AttributeName = TheAttributes.getLocalName(i); // Attr name
                if ("".equals(AttributeName)) AttributeName = TheAttributes.getQName(i);
                System.out.println("Attribute Name = " + AttributeName);
                System.out.println("Attribute Value = " + TheAttributes.getValue(AttributeName));
    @Override
    public void endElement(String namespaceURI, String SimpleName, String QualifiedName) throws SAXException {
    @Override
    public void characters(char[] buf, int offset, int len) throws SAXException {
        String s = new String(buf, offset, len);
        System.out.println("Found text : " + s);
}Thanks!

Similar Messages

  • Replacing special characters from xml document/text inside element

    Hi
    Is there any way to replace the xml codes to special characters inside an entire xml document/ for a text in element.
    I want to get the xml codes to get replaced to respective special character(for any special character)
    Please see the sample xml xml element below
    <Details>Advance is applicable only for &lt; 1000. This is mentioned in Seller&apos;s document</Details>
    Thanks in advance .. any help will be highly appreciated.

    So, just to be sure I understand correctly, you want this :
    <Details>Advance is applicable only for &lt; 1000. This is mentioned in Seller&apos;s document</Details>
    to be converted to :
    <Details>Advance is applicable only for < 1000. This is mentioned in Seller's document</Details>
    If so, I'll say again : the resulting XML document will be invalid.
    Extensible Markup Language (XML) 1.0 (Fifth Edition)
    The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings " &amp; " and " &lt; " respectively. The right angle bracket (>) may be represented using the string " &gt; ", and MUST, for compatibility, be escaped using either " &gt; " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.
    Ask whoever "they" are why they would want to work with not wellformed XML.

  • Special characters in XML document

    I have a Flash file saved as version 8 with the following script calling an xml file:
    //init TextArea component
    myText.html = true;
    myText.wordWrap = false;
    myText.multiline = false;
    myText.label.condenseWhite=true;
    //load in XML
    xml = new XML();
    xml.ignoreWhite = true;
    xml.onLoad = function(success){
        if(success){
            myText.text = xml;
    xml.load("titletext.xml");
    My xml file contains the following:
    <?xml version="1.0" encoding="iso-8859-1"?>
    <![CDATA[Smith's dog & cat are "crazy"]]>
    When posted online my flash file displays the encoding tag in the xml file.
    AND the apostrophe, ampersand and quote marks display as html code instead of the actual character.
    I can take the encoding tag out of the xml file but my characters still don't display correctly.
    My dynamic text field in flash (myText) does have special characters embedded, plus I have them entered manually in the field for 'include these characters'.
    Does anyone have suggestions for me?
    You can view this test file at http://wilddezign.com/preshomes_name2.html
    TIA

    Perhaps you need a slightly different approach to loading the XML. Instead of loading the entire XML file, what if you loaded only the child you were looking for? This is what I usually do:
    var xml:XML = new XML();
    xml.ignoreWhite = true;
    xml.load("some.xml");
    xml.onLoad = parse;
    function parse(success){
    if (success){
      root = xml.firstChild;
      _global.numberItems = root.attributes.items;
      itemNode = root.firstChild;
      var i:Number = 0;
      while(itemNode != null){
         myText.text = itemNode.attributes.description;
         itemNode = itemNode.nextSibling;
         i++;
      else {
      trace("XML Bad!!");
    And your XML would be structured like this:
    <?xml version="1.0" encoding="utf-8"?>
    <sample>
    <item description="This is the text that I want to appear on this MC!" />
    </sample>

  • Trying to add emjoi from special characters to my document which is a table it seems to go through the motions but can not see the emjoi. cheers

    Question: Trying to add a emjoi from special characters to my document which has tables.  when adding can not see the emjoi picture.  Can anyone help.  Thank you

    Emoji is not supported by the iWork apps.
    Send Feedback to Apple via the Pages menu.
    Jerry

  • How to remove special characters in xml

    Dear friends,
    How to remove the special character  from the xml. I am placing the xml file and  fetching through file adapter.
    The problem is when there is any special character in xml. i am not able to pass to target system smoothly.
    Customer asking schedule the file adapter in order to do that the source xml should not have any special charatcters
    How to acheive this friends,
    Thanx in advance.
    Take care

    Hi Karthik,
    Go throgh the following links how to handle special character
    https://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/9420 [original link is broken] [original link is broken] [original link is broken]
    https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42
    Restricting special characters in XML within XI..
    Regards
    Goli Sridhar

  • ? is shown for Norwegian characters when XML document is parsed using DOM

    Hi,
    I've a sample program that creates a XML document with a single element book having Norwegian characters. Encoding is UTF-8. When i parse the XML document and try to access the value of that element then ? are shown for Norwegian characters. XML document file name is "Sample.xml"
                DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
                DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
                Document doc = docBuilder.newDocument();
                Element root = doc.createElement("root");
                root.setAttribute("value", "Á á &#260; &#261; ä É é &#280;");
                doc.appendChild(root);
                TransformerFactory transfac = TransformerFactory.newInstance();
                Transformer trans = transfac.newTransformer();
                trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
                trans.setOutputProperty(OutputKeys.INDENT, "yes");
                trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
                //create string from xml tree
                java.io.ByteArrayOutputStream baos = new java.io.ByteArrayOutputStream();
                StreamResult result = new StreamResult(baos);
                DOMSource source = new DOMSource(doc);
                trans.transform(source, result);
                writeToFile("Sample.xml", baos.toByteArray());
                InputSource is = new InputSource(new java.io.ByteArrayInputStream(readFile("Sample.xml")));
                is.setEncoding("UTF-8");
                DocumentBuilder obj_db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
                Document obj_doc = obj_db.parse(is);
                obj_doc.normalize();
                System.out.println("Value is : " + new String(((Element) obj_doc.getElementsByTagName("root").item(0)).getAttribute("value").getBytes()));writeFile() - Writes the document bytes in Sample.xml file
    readFile() - Reads the Sample.xml file
    When i run this program XML editor shows the characters correctly but Java code output is: Á á ? ? ä É é ?
    What's the problematic area in my java code. I didn't get any help from any source. Please suggest me the solution of this problem.
    Thanx in advance.

    Hi,
    I'm using JBuilder 2005 and i mentioned encoding UTF-8 for saving Java source files and also for compilation. I've modified my source code also. But the problem persists. After applying changing the dumped sample.xml file doesn't display these characters correctly in IE, but earlier it was displaying it correctly at IE.
    Modified code is:
    DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
                DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
                Document doc = docBuilder.newDocument();
                Element root = doc.createElement("root");
                root.setAttribute("value", "Á á &#260; &#261; ä É é &#280;");
                doc.appendChild(root);
                OutputFormat output = new OutputFormat(doc, "UTF-8", true);
                java.io.ByteArrayOutputStream baos = new java.io.ByteArrayOutputStream();
                OutputStreamWriter osw = new OutputStreamWriter(baos, "UTF-8");
                XMLSerializer s = new XMLSerializer(osw, output);
                s.asDOMSerializer();
                s.serialize(doc);
                writeToFile("Sample5.xml", baos.toByteArray());
                InputSource o = new InputSource(new java.io.ByteArrayInputStream(readFile("Sample5.xml")));
                o.setEncoding("UTF-8");
                com.sun.org.apache.xerces.internal.parsers.DOMParser obj_parser = new com.sun.org.apache.xerces.internal.parsers.DOMParser();
                obj_parser.parse(o);
                Document obj_doc = obj_parser.getDocument();
                System.out.println("Value : " + new String(((Element) obj_doc.getElementsByTagName("root").item(0)).getAttribute("value").getBytes()));I'm hanged on this issue. Can u please provide me the code snippet that works with these characters or suggest solution.
    Thanx

  • Special characters in XML barcode content

    Hello,
    I made a barcoded form with a custom script that creates a custom XML as barcode content.
    The decoding happens well when the user write plain text in the text fields, but whenever it inputs some special characters (for XML syntax), like ",<,>,=,etc... the content of barcode it is decoded as:
    <barcode>
    <!CDATA[... true content ...]>
    </barcode>
    how can I handle this situation?
    I have to handle what the user writes or I have to change the decode activity?
    Thank you very much for your support!
    Fabio

    Steve,
    I have already encoded decode operation in UTF-8. In form level, because it is an acrobat form, no option to choose the encoding as in LC Designer. In further tests, if I change the extractToXML output to XDP instead of XFDF, then I will receive data rather than &# sequence. It is strange. Don't understand why XDP and XFDF would give out different encoding.
    Tim

  • Special Characters in XML Publisher PDF Output

    Hi,
    I'm printing "Long Text" in report output in every line based on tab.
    Report output is having special characters like \n.
    I was using below to print in next line, any suggestions for removing \n.
    Below is what was happening:
    ===================
    RDF:
    =====
    lv_notes := replace(:CF_LONG_TEXT_DESC, chr(9), ' ') ;
    lv_notes1 := replace(lv_notes, chr(10), ' ') ;
    lv_notes2 := replace(lv_notes1, chr(13), ' ') ;
    XML
    ===
    <CF_LONG_TEXT_desc>
    Initial Billing Amount: $549,180.00 \n \n Computation: \n a) Estimated Number of Full-Time Students: 12,000 \n
    b) Estimated Number of Calendar Days: 113 \n
    c) Calendar Date (From - To): 1/18/2011 - 5/10/2011 \n
    d) Multiply by: $0.81 \n
    e) Estimated Total Costs: $1,098,360.00 \n
    f) Initial Billing Amount represents 50% of Estimated Total Costs. \n
    \n \n
    If you have questions about your invoice please contact Darud Akbar at (312) 681-2724.
    </CF_LONG_TEXT_desc>
    PDF Output
    ==========
    Initial Billing Amount: $549,180.00 \n \n Computation: \n a) Estimated Number of Full-Time Students: 12,000 \n
    b) Estimated Number of Calendar Days: 113 \n
    c) Calendar Date (From - To): 1/18/2011 - 5/10/2011 \n
    d) Multiply by: $0.81 \n
    e) Estimated Total Costs: $1,098,360.00 \n
    f) Initial Billing Amount represents 50% of Estimated Total Costs. \n
    \n \n
    If you have questions about your invoice please contact Darud Akbar at (312) 681-2724.
    Thanks.

    >
    Initial Value
    =======
    <?CF_LONG_TEXT_desc?>
    Changed to (Below gives me error)
    ========
    <?<xsl:value-of select="translate(CF_LONG_TEXT_desc,'\n','')"/>?>
    Changed to (Below doesn't fetch data)
    ========
    <xsl:value-of select="translate(CF_LONG_TEXT_desc,'\n','')"/>
    >
    must be in field as
    <xsl:value-of select="translate(CF_LONG_TEXT_desc,'\n','')"/>

  • Special characters in XML (UTF8, escapes etc)

    This may seem like a simple problem, but I'm kind of a slow learner, so any help would be appreciated.
    I'm generating XML output via ordinary PL/SQL procedures i.e. not using the Oracle XSQL stuff. The data is built up into a XML document, which will be parsed within the server, then passed out to a client, probably as a CLOB.
    The data may contain "special" characters such as "<&>" and so on. Is there an existing function to translate these into an "XML-safe" form within the text, or do I need to trap and translate them individually?
    Also, the database may contain foreign characters e.g. Thai or Chinese. Will this be handled transparently by my XML-extraction stuff - which just selects the data from the tables and writes it to a CLOB inside XML tags - or do I have to do something to make sure the XML comes out in a safe UTF format?
    Thanks,
    Chris

    use & instead of &
    see http://www.w3.org/TR/2000/REC-xml-20001006#syntax

  • Replace special characters in xml to its HTML equivalent

    Hello All,
    I did a small xml processor which will parse the given xml document and retrieves the necessary values. I used XPath with DocumentBuilder to implement the same.
    The problem which had faced initially was, i could not able parse the document when the value comes with the '&' special character. For example,
    <description>a & b</description>I did some analysis and found that the '&' should be replaced with its corresponding HTML equivalent
    &amp; So the problem had solved and i was able to process the xml document as expected. I used the replaceAll() method to the entire document.
    Then i thought, there would be some other special character which may cause the same error in future. I found '<' is also will cause the same kind of error. For example,
    <description>a < b</description>Here i couldn't able to use the replaceAll(), because even the '<' in the xml element tags was replaced. So i was not able to parse the xml document.
    Did anyone face this kind of issue? Can anyone help me to get rid of this issue.
    Thanks
    kms

    Thats the thing about XML. It has to be correct, or it doesn't pass the gate. This is nothing really, seen it many times before. It becomes funny when you start to run into character set mismatches.
    In this case the XML data should have either been escaped already, or the data should have been wrapped in cdata blocks, as described here:
    http://www.w3schools.com/xml/xml_cdata.asp
    Manually "fixing" files is not what you want to be doing. The file has to be correct according to the very simple yet strict XML specifications, or it isn't accepted.

  • Handling special characters in XML

    Hi,
    I am using Oracle 10g 'XMLType' datatype to store XML files. Before storing I parse the XML document using Java Xerces Parser. If it parses successfuly, then I perform some business rule execution based on XML file which was parsed. So till this stage there is no problems. But when XML file contains some special characters like copy-paste of some description from MS-Word document into XML tags, then Xerces parser will parse such characters with out any exceptions, but while inserting XML document, Oracle database just throws exception saying unable to handle special characters.. So how to avoid such exceptions or silent such exceptions with any specific settings respect to XMLType datatype in 10g DB.
    Please advice!
    Arvind Patil - IN

    Monica--
    In XI 2.0, we've noticed a number of issues processing special characters, primarily caused by the version of JCO that we're running.  It sounds like SAP has spent some time in the past few months focusing on these errors, so make sure you're on the most recent patchlevels of all your middleware components, including any of the middleware libraries that BC uses. In XI, we had to update the 3 files that make up the RFC library and JCO library.  SDM couldn't update the libraries for us -- we had to manually move the files to the right place.
    Escaped XML characters like "&amp;" "&#34;" "&quot;" were fixed as of JCO 2.0.10 (the current patchlevel on AIX/UNIX), the special character "&apos;" is fixed in the next release, JCO 2.0.11, due out in a few weeks (hotfixes are available).  I don't know the equivalent versions on other platforms.  By default, XI 2.0 appears to have shipped with JCO 2.0.5.  I would expect many XI 3.0 users to also be affected.
    This may or may not apply to BC, because I don't know what BC uses to talk to SAP under the covers.
    --Dan King
    Capgemini

  • Handle special characters in xml

    Hi,
      Our end users tend to copy the description text from Word documents to pdf form and submits it.
    If that text contains any special characters, its getting carried to the extracted xml. In the next step, when I try to assign a task to user with template and this xml, Managers cannot able to open the form and showing the error. When I assign the xml without special characters, its running fine.
    Please assist on how to handle this?
    My expectation is that user should be prompted in the form when he pastes any special characters or they should be auto-corrected to null values. if that is not possible, atleast we should able to filter the xml and eliminate special characters before the form go to next stage.
    Appreciate your help.
    Thanks,
    Krishna

    In first instance, I would have followed this way:
    http://www.dvteclipse.com/documentation/svlinter/How_to_use_special_characters_in_XML.3F.h tml
    so, I would have parsed the submitted text in a Validate event and changed any special chars to UTF-8 numeric reference.
    However, I found this:
    http://blog.mark-mclaren.info/2007/02/invalid-xml-characters-when-valid-utf8_5873.html
    which seems to state that not all UTF-8 characters are possible in XML.
    In fact, those allowed are listed here:
    http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char
    so, I would still use a Validate event script but based on the XML specs' Character Range. Exactly as Mark McLaren did in Java.
    This will permit to keep those special chars that are allowed. Your Managers will thank you.
    Hope it helps.

  • Special characters in XML structure when prepared using String

    Hi,
       I am preparing an XML structure using 'String'. I print the prepared XML structure in the server log. Issue is that I am seeing extra characters([[ and ]]) that I am not printing.
    Please let me know how to get rid of those.
    Code Excerpt
            String xmlHeader = "<?xml version=\"1.0\" encoding=\"utf-8\" ?>";
            String lsb_xmlcon = xmlHeader;
            logger.info("ReqXMLString Process  1  --->" + lsb_xmlcon);
            lsb_xmlcon = lsb_xmlcon +("\n");
            logger.info("ReqXMLString Process  1.1  --->" + lsb_xmlcon);
            lsb_xmlcon = lsb_xmlcon +("<REQUEST>");
            lsb_xmlcon = lsb_xmlcon +("\n");
            logger.info("ReqXMLString Process  1.2  --->" + lsb_xmlcon);
    Log
    ReqXMLString Process  1  ---><?xml version="1.0" encoding="utf-8" ?>
    ReqXMLString Process  1.1  ---><?xml version="1.0" encoding="utf-8" ?>[[
    ReqXMLString Process  1.2  ---><?xml version="1.0" encoding="utf-8" ?>[[
    <REQUEST>
    Thanks,
    Message was edited by: 996913
    This issue is observed only while running the code in server, not from Jdev.
    When we append the additional tags without new line character, "\n", there are no extra characters being added. Also, in other case also. where we used "Marshaller" to prepare the XML, we have seen this issue.
    After we set the below property to false, we got rid of the extra characters.
                            jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, false);
    Apparently the insertion of new line when the code runs on server(Weblogic 10.3.6.0) is creating the issue.
    Please let me know if anyone has come across a similar scenario.
    Thanks,

    I am building this XML in a servlet so ,right, DOM does process XML (even though a valid HTML file can be loaded into a DOM object) but if you build XML using DOM then write the XML out using PrintWriter and Transformer objects this will cause the XML to print out in your browser. If you view source on this XML you will see that the DOM object has translated all special characters to there &xxxx; equivalent. For a example: when the string (I know "cool" java) gets loaded into a attribute using the DOM object then wrote back out it looks like (I know &xxx;cool&xxx; java) if you view the source in your browser. This is what it should do, but the DOM object is not change the � to the "&xxxxx;". This servlet is acting as a gateway between a Java API and a windows asp. The asp will call the servlet expecting to get XML back and load it directly into a DOM object. When the windows DOM object gets the xml that I am returning in this servlet is throws a exception "invalid character" because the � was not translated to &xxxx; like the other characters were. According to the book HTML 4 in 24 hours (and other references) the eacute; or #233; are how you say "�" in HTML or XML. How do you say it?

  • Special characters in xml

    Hello!
    I am trying to generate XML from database by using fnd_file package. Trouble is that application NLS_LANG is not set to UTF8 and it should stay so due to the fact that a lot of custom reports written using Oracle Reports may not display special characters correctly. So i have problem with characters like 'ö','ļ'.... Is there a way to set encoding for only XML output? Also in my last efforts the output XML for some reason is concatenating a strange square shaped character into the end of file. Has anybody seen similar problem and maybe solved it?
    Thanks
    Aleksei

    Refer to
    http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JAXPSAX6.html

  • Special characters in XML built using the DOM object

    I am using the DOM object to build xml but I am having problems with special characters. I have a Element object that I create attributes in. Most special characters like the &, ", or ' are translated for me (amp; quot;, acute; put a & in front of those, if I do it here the browser will translate them into the logical character) but I am having a problem with the � character. The DOM object does not translate this. If I do it my self to(eacute; same here add a & in front) before adding the string to the attribute then the DOM object translates it to (amp;eacute; add a & in front of the amp; not in front of the eacute;). As you can see the DOM object translates the & instead of recognizing that it is a HTML character. Can anyone give me a hand?

    I am building this XML in a servlet so ,right, DOM does process XML (even though a valid HTML file can be loaded into a DOM object) but if you build XML using DOM then write the XML out using PrintWriter and Transformer objects this will cause the XML to print out in your browser. If you view source on this XML you will see that the DOM object has translated all special characters to there &xxxx; equivalent. For a example: when the string (I know "cool" java) gets loaded into a attribute using the DOM object then wrote back out it looks like (I know &xxx;cool&xxx; java) if you view the source in your browser. This is what it should do, but the DOM object is not change the � to the "&xxxxx;". This servlet is acting as a gateway between a Java API and a windows asp. The asp will call the servlet expecting to get XML back and load it directly into a DOM object. When the windows DOM object gets the xml that I am returning in this servlet is throws a exception "invalid character" because the � was not translated to &xxxx; like the other characters were. According to the book HTML 4 in 24 hours (and other references) the eacute; or #233; are how you say "�" in HTML or XML. How do you say it?

Maybe you are looking for