Convert invalid xml characters to HTML-Entity

Hi,
How can i convert invalid XML characters like �,�,�, . . . to the HTML- Entity &auml &uuml &ouml ?
Is there any Method or class who can handle an input string and transform the invalid characters?
Or is there another way to mask this characters so that an XML parser do not throw an error when parsing the document.
Best regards,
Michael

Ok sorry, I'll give you more details what i want to do and where i have the problems.
I have the following xml string:
<font family="Times New Roman" size="14" color="#333333">This is a sample Text</font>
The xml-string can contain any characters because the content is from a text pane where the user can type in any characters.
I use the DOM parser to parse this input string to get the attributes and the text content.
And thats my problem, how can i make sure that this string wont throw any exceptions when i parse it with DOM?
Parsing the string with the follwing code:
public XMLElement parse(String sourceString)
        //create a new xml element
        XMLElement xmlElement = new XMLElement();
        //create a new document
        DocumentBuilder builder = build();
        //now parse the string into the document
        InputStream is = new ByteArrayInputStream(sourceString.getBytes());
        Document document = null;
        try
            document = builder.parse(is);
        catch (SAXException e)
            System.out.println("SAXError while parsing the document");
            e.getMessage();
            //no valid document
            return null;
        catch (IOException e)
            System.out.println("IO Error while parsing the document");
            e.getMessage();
            //no valid document
            return null;
        //get the element
        org.w3c.dom.Element element = document.getDocumentElement();
        if (element != null)
            xmlElement.setNodeName(element.getNodeName());
            xmlElement.setNodeValue(element.getTextContent());
            //attributes defined?
            int length = element.getAttributes().getLength();
            //get the attributes, if defined
            for (int i = 0; i < length; i++)
                xmlElement.addAttribute(
                        element.getAttributes().item(i).getNodeName(),
                        element.getAttributes().item(i).getTextContent());
        return xmlElement;
    } XMLElement is my own class.
The builder:
private DocumentBuilder build()
        DocumentBuilder docBuilder = null;
        try
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            docBuilder = factory.newDocumentBuilder();
        catch(ParserConfigurationException pce)
            System.out.println("Error while creating an DocumentBuilder");
            pce.getMessage();
        //return the document builder
        return docBuilder;
    }Message was edited by:
heissm - spelling mistakes :(

Similar Messages

  • Invalid XML characters

    When parsing String to XML, I get org.xml.sax.SAXParseException, with the message: An invalid XML character (Unicode: 0xb) was found in the element content of the document.
    What are invalid XML characters? How do I avoid this Exception?
    Thanks.

    Here is what the XML Recommendation says are valid characters for XML:Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]Invalid characters are anything else. And it should be obvious that to avoid that exception you should not attempt to parse files that contain any invalid characters.

  • How to convert from xml file to html using java code

    How to convert from xml file to html file using java code

    Get yourself Apache Xalan or Saxon or some XSLT processor
    String styleSheet = "/YourXSLTStylesheet.xsl";
    String dataSource = "/YourXMLDocument.xml";
    InputStream stylesheetSource = TransformMe.class.getResourceAsStream(styleSheet);
    InputStream dataSourceStream = TransformMe.class.getResourceAsStream(dataSource);
    OutputStream transformedOut = new FileOutputStream("filename.html");
    TransformerFactory tFactory = TransformerFactory.newInstance();
    Transformer transformer = tFactory.newTransformer(new StreamSource(stylesheetSource));
    transformer.transform(new StreamSource(dataSourceStream), new StreamResult(transformedOut));You'll also need to learn XSLT if you don't already know that. Here's a good place to start
    http://www.w3schools.com/xsl/

  • Javax.xml.parsers.DocumentBuilder to skip invalid XML characters?

    Hi,
    I convert XML files into flat files. In doing so I call the API DocumentBuilder.parse(File f). If the XML file f contains an invalid XML character, the API throws a SAXException.
    My question is: while I know that certain invalid XML chars are not part of the data and therefore can be safely ignored in the conversion, is there a way to tell the API to skip those chars?

    Nope. Compliant XML parsers are required to parse the XML as it is and not to "repair" it in any way. Your best option is not to have badly-formed XML in the first place, so if you are the one generating the XML, you should fix that process. But if you have bozo customers generating it, and you can't make them do it right, then pre-process the XML to drop the bad characters.

  • Replace invalid XML characters using SQL query

    Hi,
       I am populating a dataset in .net with output from sql 2005 database. One of the columns in the table is a 'varchar(max)' type. This dataset is then converted to XML using WriteXml and written to a .xml document. But due to the presence of invalid characters, this process errors out.
    Is there any way using which these invalid characters can be replaced at the database level itself when querying on the table?
    The error that is produced is as follows:
    '', hexadecimal value 0x1C, is an invalid character. Line 32201, position 924. 
    Thanks,
    Nisha

    I see,
    So we have a certain character that the XML processor does not like. What do you want to do with this character? Even if you manage to make an XML file wth this some how, you will get the same problem when another application tries to read it.
    Probably you should replace those characters before converting the values to XML.
    Another option is to put the values to the CDATA. This will be tough because the query might be little tricky. Here is an example that might help you.
    Code Snippet
    CREATE TABLE CDataTest (SomeValue NVARCHAR(50))
    INSERT INTO CDataTest (SomeValue) SELECT 'Some Value ' + CHAR(25) + 'Some OtherValue'
    SELECT * FROM CDataTest FOR XML AUTO, TYPE
    error!!!
    FOR XML could not serialize the data for node 'SomeValue' because it contains
    a character (0x0019) which is not allowed in XML. To retrieve this data using
    FOR XML, convert it to binary, varbinary or image data type and use the
    BINARY BASE64 directive.
    -- option using CDATA
    SELECT
    1
    AS Tag,
    NULL AS Parent,
    (SELECT
    SomeValue AS 'data()'
    FROM CDataTest
    FOR XML PATH('')) AS 'SomeValue!1!SomeValue!cdata'
    FROM CDataTest
    FOR XML EXPLICIT, TYPE
    <SomeValue>
    <SomeValue><![CDATA[Some Value &#x19;Some OtherValue]]></SomeValue>
    </SomeValue>

  • HOW TO CONVERT A XML FILE TO HTML FILE FORMAT IN WINDOWS APPLICATION

    Hi iam a fresher iam working on a project in that i should convert the data in xml file to html file. I dont have any idea regarding this can anyone help me how to convert the xml file to a html file format. I just written the code till how to read the xml
    file. Now i stucked how to write the code for converting to html format.
    Thanks and Regards,
    Dileep.

    Hi iam a fresher iam working on a project in that i should convert the data in xml file to html file. I dont have any idea regarding this can anyone help me how to convert the xml file to a html file format. I just written the code till how to read the xml
    file. Now i stucked how to write the code for converting to html format.
    Thanks and Regards,
    Dileep.
    Hello,
    For converting xml file to html, we could refer to the way shared in the following thread which uses an XSLT stylesheet to transform the XML into another format using the
    XslTransform class.
    http://www.codeproject.com/Articles/12047/How-to-Convert-XML-Files-to-HTML
    Regards.
    Carl
    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click
    HERE to participate the survey.

  • Converting XML document to HTMl using xsl

    Hi,
    I'm trying to convert an xml document into html page using xsl. But when I try to open the page in the browser nothing comes up.
    I'm not sure if I am using the PrintWriter correctly.
    StreamResult result = new StreamResult(new PrintWriter(new (File"text.html")));
    Please help.

    Oops! I wrote the parenthesis wrong in the previous mail
    This is the correct one I use in my program.
    StreamResult result = new StreamResult(new PrintWriter(new File("text.html")));
    Please help. Its urgent

  • HTML Entity Escape Character Conversion

    Requirement is to Convert UTF-8 encoded Speciual language characters to HTML Entity Escape Character's. For example In the source I have a Description field with value "Caractéristiques" which is 'Characteristics' in French, This needs to be converted to "Caractéristiques" when sent to the Reciever.i.e the Special Language Symbols like é = é (in HTML Entity format.)
    Below is the Link for a List of HTML Entity Char's
    http://www.theukwebdesigncompany.com/articles/article.php?article=11
    could anybody please suggest how this can be achieved in mapping...any UDF or Encoding techniques...?
    many Thanks.

    Hi Veera
    this is ajay
    code for ur problem
    String ToHTMLEntity(String s) {
              StringBuffer sb = new StringBuffer(s.length());
              boolean lastWasBlankChar = false;
              int len = s.length();
              char c;
              for (int i = 0; i < len; i++) {
                   c = s.charAt(i);
                   if (c == ' ') {
                        if (lastWasBlankChar) {
                             lastWasBlankChar = false;
                             sb.append(" ");
                        } else {
                             lastWasBlankChar = true;
                             sb.append(' ');
                   } else {
                        lastWasBlankChar = false;
                        // HTML Special Chars
                        if (c == '"')
                             sb.append("&quot;");
                        else if (c == '&')
                             sb.append("&amp;");
                        else if (c == '<')
                             sb.append("&lt;");
                        else if (c == '>')
                             sb.append("&gt;");
                        else if (c == '
                             // Handle Newline
                             sb.append("&lt;br/&gt;");
                        else {
                             int ci = 0xffff & c;
                             if (ci < 160)
                                  sb.append(c);
                             else {
                                  sb.append("&#");
                                  sb.append(new Integer(ci).toString());
                                  sb.append(';');
              return sb.toString();
    rewrd points if it help u

  • How to filter invalid XML character

    I need to form an XML document in the program. There are some invalid characters like 0x8.
    How can I remove the invalid XML characters?
    Is there any existing tool (function) I can use to check whether the character is an invalid or valid XML character?
    Thank you.
    waterii

    you can set dataProvider filterFunction to filter the data
    http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/mx/collections/ListColl ectionView.html#filterFunction

  • Invalid XML character in web service answer of MS Exchange

    Hello Forum!
    We have to look up contacts in the global address list of a Microsoft Exchange server.
    The current solution uses the web services that have been introduced in version 2007 of MS Exchange.
    Unfortunately some records returned by the MS Server cause a javax.xml.stream.XMLStreamException. The Exception
    tells us that a parser error occurred. The Exception says:
    Message: Character reference "&#x7" is an invalid XML character.
    The Java classes used for accessing the Exchange web services are generated using the jaxws plugin and the application
    is running on the Glassfish application server v2 ur1.
    The only solution we can think of right now is to access the XML stream returned by the Exchange server before it is handed over
    to the parser in order to replace the invalid characters.
    Can anyone point me to some documentation or give me an example of how to intercept the XML parsing process used by the jaxws
    component?
    Any other ideas for a solution are of course also appreciated.
    Thanks for your help in advance,
    Henning Malzahn

    hm@collogia wrote:
    In addition to that MS is not very responsive when it comes to Java questions.Yes, but "Your software is producing malformed XML" is not a Java question.
    I can imagine that filtering the stream isn't very easy - are you able to provide some links to additional
    information that can help us getting started in that direction?A subclass of FilterInputStream whose read() method calls the superclass's read() method a second time when the input is between 0 and 19, or whatever are the invalid XML characters?

  • How to generate the XML file to HTML?

    Hi all,
    I am new to XML.
    Can I somehow see the HTML-Output of the XML-File, when I have the XSL-File too, but don't use any XML-Editor (XMLSpy) and FOP? I do not want use any additional tools - only the database tools.
    What I need for this?
    Do I need the XSLT-File too?
    ============================
    My test.xml file:
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="test.xsl"?>
    <ROW num="1">
         <TABLE_NAME>ABR_ART_ABR_DAU_MATRIX</TABLE_NAME>
         <TABLESPACE_NAME>PS2000_STAMM</TABLESPACE_NAME>
         <PCT_FREE>10</PCT_FREE>
         <INI_TRANS>1</INI_TRANS>
         <MAX_TRANS>255</MAX_TRANS>
         <INITIAL_EXTENT>516096</INITIAL_EXTENT>
         <NEXT_EXTENT>65536</NEXT_EXTENT>
         <MIN_EXTENTS>1</MIN_EXTENTS>
         <MAX_EXTENTS>2147483645</MAX_EXTENTS>
         <PCT_INCREASE>0</PCT_INCREASE>
         <LOGGING>YES</LOGGING>
         <BACKED_UP>N</BACKED_UP>
         <NUM_ROWS>33</NUM_ROWS>
         <BLOCKS>20</BLOCKS>
         <EMPTY_BLOCKS>0</EMPTY_BLOCKS>
         <AVG_SPACE>0</AVG_SPACE>
         <CHAIN_CNT>0</CHAIN_CNT>
         <AVG_ROW_LEN>100</AVG_ROW_LEN>
         <AVG_SPACE_FREELIST_BLOCKS>0</AVG_SPACE_FREELIST_BLOCKS>
         <NUM_FREELIST_BLOCKS>0</NUM_FREELIST_BLOCKS>
         <DEGREE> 1</DEGREE>
         <INSTANCES> 1</INSTANCES>
         <CACHE> N</CACHE>
         <TABLE_LOCK>ENABLED</TABLE_LOCK>
         <SAMPLE_SIZE>33</SAMPLE_SIZE>
         <LAST_ANALYZED>1/8/2004 11:45:1</LAST_ANALYZED>
         <PARTITIONED>NO</PARTITIONED>
         <TEMPORARY>N</TEMPORARY>
         <SECONDARY>N</SECONDARY>
         <NESTED>NO</NESTED>
         <BUFFER_POOL>DEFAULT</BUFFER_POOL>
         <ROW_MOVEMENT>DISABLED</ROW_MOVEMENT>
         <GLOBAL_STATS>YES</GLOBAL_STATS>
         <USER_STATS>NO</USER_STATS>
         <SKIP_CORRUPT>DISABLED</SKIP_CORRUPT>
         <MONITORING>NO</MONITORING>
         <DEPENDENCIES>DISABLED</DEPENDENCIES>
         <COMPRESSION>DISABLED</COMPRESSION>
    </ROW>
    ============================
    ============================
    My test.xsl-file:
    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <xsl:output method="html"/>
    <xsl:template match="/">
    <html>
    <head>
    <title>Test XSL ALL_TABLES</title>
    </head>
    <body>
    <xsl:for-each select="ROW">
    <h2>Tabelle: <xsl:value-of select="TABLE_NAME"/></h2>
         <hr/>
         <table border="1" cellpadding="0">
                             <tr>
                             <td><xsl:value-of select="TABLESPACE_NAME"/></td>
                             <td><xsl:value-of select="PCT_FREE"/></td>
                             </tr>
                        </table>
              </xsl:for-each>          
    </body>
    </html>
    </xsl:template>
    </xsl:stylesheet>
    ============================
    I am waiting for your answers, when possible with examples please.
    Regards
    Leonid Pavlov

    XSLT to convert the XML document to Html:
    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html"/>
    <xsl:template match="/">
    <html>
    <head>
    <title>Test XSL ALL_TABLES</title>
    </head>
    <body>
    <table border="1" cellspacing="0">
    <tr>
    <th>TABLE NAME</th>
    <th>TABLESPACE NAME</th>
    <th>PCT FREE</th>
    </tr>
    <xsl:for-each select="ROW">
    <tr>
    <td><xsl:value-of select="TABLE_NAME"/></td>
    <td><xsl:value-of select="TABLESPACE_NAME"/></td>
    <td><xsl:value-of select="PCT_FREE"/></td>
    </tr>
    </xsl:for-each>
    </table>
    </body>
    </html>
    </xsl:template>
    </xsl:stylesheet>

  • Inavlid XML characters & Cryptography

    HI,
    I am using JSR 172 to communicate between my mobile and a remote server. What actually i am doing is that i am encrypting my data & then sending it over the internet using a GPRS connection in the form of XML But due to encryption, the invalid XML characters also appear in the XML document. Can any body tell how can i remove the illegal XML characters from XML document. I would be deeply thankful to the person guiding me in this regard...
    Wasif Ehsan.

    sounds like you need to employ an encryption algorithm that produces encrypted data that adheres xml's required charset:
    http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char
    any 'illegal' characters put into your xml file will most likely throw an exception while being marshalled via your webservices calls.

  • Invalid XML character in castor XML

    I am using castor API for converting an object into XML. When I marshal the object, following exception occur:
    java.io.IOException: The character '' is an invalid XML character
         at org.apache.xml.serialize.BaseMarkupSerializer.characters(Unknown Source)
         at org.exolab.castor.xml.Marshaller.marshal(Unknown Source)
         at org.exolab.castor.xml.Marshaller.marshal(Unknown Source)
         at org.exolab.castor.xml.Marshaller.marshal(Unknown Source)
    Following is the code snippet which I am using:
    StringWriter writer = new StringWriter(500);
    Marshaller marshal = new Marshaller(writer);
    marshal.setEncoding("windows-1251"); //I have tried all these encodings as well: UTF-8, UTF-7, ASCII, ISO-8859-1, ISO-8859-5, windows-1251
    //marshal.marshal(token, writer);     // This is commented, since the encoding is not applied if I use this method, next statement works fine
    marshal.marshal(token);Here, token is the object which I am trying to marshal. I have tried different encodings, but the problem is not resolved. Could anyone help?
    Castor reference:
    [http://www.castor.org/xml-framework.html ]

    Do you want this encoding to be reversible? For example, that character \u001b which is in the string. You have to represent it by something different in your XML. If you want to get the same thing back when you convert your XML back into Java, then you can't just translate that character into an existing character, because then you have lost information. You have to translate it into some special series of codes. And then when you convert the XML back, you have to recognize that special series of codes and convert that into the \u001b character.
    So yeah, you could write your own custom encoding which did that. I'm not aware of any existing software that does that; it wouldn't be very useful, because it would result in XML documents which used non-standard encodings and hence couldn't be sent to anybody else.

  • Getting invalid xml character while marshalling

    Hi
    I have a text which contains all characters including some special chars.I am replacing the html codes for &,>,<,\," characters. I am building the xml file and trying to marshall it. But i am getting "The character '' is an invalid XML character". I am using castor-0.9.6.jar. Can any one tell me how can i handle special chars like � is easy way rather than repacing each and every character.
    Please let me know why i am getting the above error (is the special char end of file char. actually i am reating from string not from file).
    Thanks & Regards,
    Prasanth

    As a guess because you are treating CDATA as meaning the same as 'binary' which it isn't.  The characters in CDATA still must be valid XML characters.
    If you want binary data then base64 encode it and put that in the document - and you won't need CDATA at all then, it will just be regular element text.

  • Invalid XML charector

    Hi,
    Could any one please tell me how to parse the invalid charectors which are less than &#32. The encoding style i am using is UTF-8 and its not possible for me to change the style to iso-8859-1.
    The exception i am getting is :
    org.apache.xml.utils.WrappedRuntimeException: Can not load requested doc: Character reference "&#2" is an invalid XML character.
    Scenario
    My application generates a report by picking data from the database and displays in the format( html,xml,csv).
    But while generating the report the the xerces is unable to parse the data and resulting the above error.
    I appreciate if any one could help me.
    Regards,
    Viswanath.

    If you put an illegal character into an XML document it will not be parseable. So, it's your choice. Put the illegal character in there and have an unusable document. Or don't put it in there and have a usable document. I know which I would prefer but it's up to you.
    If you're asking how you should represent illegal characters, should you decide not to put them into the document as is, that's up to you too. You can't just put in the standard XML encoding for the illegal character, either, because that's the same as putting in the character.
    I already mentioned one possibility. If you don't like that and want to do something else, that's fine too. Just remember that you have to represent all of your data as a sequence of legal XML characters. And the receiver of the document needs to know the rules so that it can decode the data back to the binary form.

Maybe you are looking for

  • No longer supported pop up every 5 secs

    Hi All, I've just tried reconnecting to an old back up drive which is running a PowerPC application and get the following messaage: "You can't open the application Notificationexe because PowerPC applications are no longer supported." keeps popping u

  • Wifi type thinkpad yoga 12

    I want to ask information : The connector of wifi is it a mini PCIe standard? Can i use this for PCIe x16 adapter?

  • How do I convert .mkn files I created in Lion

    how do I convert notepad .mkn files I created in Lion to something I can past into Word.

  • [Bug] Spotify plays wrong track at startup.

    Hey guys, I'm not quite sure if this is a bug, but this is what I found out:I usually just hit shutdown and do not close spotify via the red x button, I assume most of us do. But when I start my computer again, and spotify opens with Windows, the cur

  • How the broker cluster determine which broker to be connected

    Currently,there are four server instances in glassfish a cluster.By default classfish used MQ broker cluster to provide JMS services.The question is when a client use the JMS connection factory to create a connection,what's the policy the brokers clu