SAX parser splits up character data; I expected Ign. whitesp

Im am working on a XML parser for loading data from some back
end systems into an Oracle 8i database. I am using the SAX
parser for this purpose. After doing some tests with larger
amounts of XML data (> 1M), I found some unexpected behaviour.
The parser sometimes splits up character data into two chunks of
data. The XML looks as follows:
<TAGNAME> this is the character data </TAGNAME>
The parser raises the following events:
1 startElement name = "TAGNAME"
2 characters chbuf = " "
3 characters chbuf = "this is the character data "
4 endElement name = "TAGNAME"
I expected an ignorableWhitespace event at step 2. The XML
document contains repetitive tagnames. The strange thing about
the parse process is that the parser splits up the character
data only sometimes, and I can't determine any kind of logica
for this behaviour. Most occurrences of exactly the same tagname
and character data are parsed correctly (that is, as I
expected).
Am I dealing with correct behaviour here, or is it a bug??
Rolf.
null

Oracle XML Team wrote:
: Rolf van Deursen (guest) wrote:
: : Im am working on a XML parser for loading data from some
back
: : end systems into an Oracle 8i database. I am using the SAX
: : parser for this purpose. After doing some tests with larger
: : amounts of XML data (> 1M), I found some unexpected
behaviour.
: : The parser sometimes splits up character data into two
chunks
: of
: : data. The XML looks as follows:
: : <TAGNAME> this is the character data </TAGNAME>
: : The parser raises the following events:
: : 1 startElement name = "TAGNAME"
: : 2 characters chbuf = " "
: : 3 characters chbuf = "this is the character data "
: : 4 endElement name = "TAGNAME"
: : I expected an ignorableWhitespace event at step 2. The XML
: : document contains repetitive tagnames. The strange thing
about
: : the parse process is that the parser splits up the character
: : data only sometimes, and I can't determine any kind of
logica
: : for this behaviour. Most occurrences of exactly the same
: tagname
: : and character data are parsed correctly (that is, as I
: : expected).
: : Am I dealing with correct behaviour here, or is it a bug??
: : Rolf.
: The behavior is expected and correct. Could you elaborate on
why
: you would expect the parser to treat the whitespace signalled
in
: step 2 as ignorable?
: Oracle XML Team
: http://technet.oracle.com
: Oracle Technology Network
Thank you for your quick response.
In my test XML, there are about 27500 tags containing character
data. All character data starts with a whitespace character.
After parsing the xml, the whitespace of only 5 (!) tags is
treated as a seperate character event (so two character events
are raised in succession). The remaining tags all raise only ONE
character event for the entire character data. I can't explain
the difference in treatment.
null

Similar Messages

  • SAX Parser and special character problem

    Hi,
    Could anyone help with the following problem?
    Background:
    1. Using a SAX Parser (Oracle Implementation) to read XML from a CLOB, convert this XML to another format to be inserted in the database.
    2. Due to performance issues we parse the input stream from the CLOB.
    3. For same reason, we are not using XSL.
    This is the problem we face:
    1. Values of some of the tags in the XML have special characters (Ex: &, <, >).
    2. While using the SAX Parser, the element handler function for the SAX Parser is called by the frame work with the value of these tags broken up with each of these characters being treated as a tag delimiter.
    Ex: <Description>SomeText_A & SomeText_B</Description>
    is treated as SomeText_A in first call to the handler; SomeText_B in the second call.
    The handler function does not get to see the "&" character.
    Thus, the final conversion is
    Say, <Description> is to be converted to <FreeText>
    we, get <FreeText>SomeText_A</FreeText>
    <FreeText>SomeText_B</FreeText>
    We tried using &; but it then breaks it up into SomeText_A, & and SomeText_B.
    How can we get the whole value for the <Description> tag in the characters() function in the SAXParser so that we can convert it to <FreeText>SomeText_A & SomeText_B</FreeText>.
    Thanks in advance..
    Chirdeep.

    We already tried that..looks like the line where I mentioned that it converted the entity referece characters to an ampersand..
    "We tried using <entity reference for &> but it then breaks it up into SomeText_A, & and SomeText_B."
    null

  • OBIEE Writeback error Sax parser Expected entity name with ampersand gt/lt

    Hi,
    I've enabled OBIEE 10.1.3.3.2 on Suse Linux 9.x writeback successfully for an OBIEE report, however if any of the fields contain XML special characters like ampersand, less than, greater than symbols etc when I save I get error:
    An error occurred while writing to the server. Please check to make sure you have entered appropriate values. If the problem persists, contact your system administrator.
    Sax parser returned an exception. Message: Expected entity name for reference, Entity publicId: , Entity systemId: , Line number: 1, Column number: 795
    Error Details
    Error Codes: UH6MBRBC:E6MUPJPH
    Xml parsed: <writeBack template="entry"><record action="update"><value columnID="c10">C</value><value columnID="c2">Jun-08</value><value columnID="c5">0001</value><value columnID="c1">NET RESULT</value><value columnID="c7">T000</value><value columnID="c3">Total X & X</value>...
    I don't really want to have to use replace function on all the fields in the report to remove special characters + train users not to enter them in editable field.
    Anyone got any ideas how to get around this?
    Thanks,
    Gareth

    Hi
    I am getting this error in writeback while submitting
    An error occurred while writing to the server. Please check to make sure you have entered appropriate values. If the problem persists, contact your system administrator.
    Sax parser returned an exception. Message: Unterminated entity reference, 'M', Entity publicId: , Entity systemId: , Line number: 1, Column number: 85
    Error Details
    Error Codes: UH6MBRBC:E6MUPJPH
    Xml parsed: <writeBack template="CPE_writeback"><record action="update"><value columnID="c0">H&M SWEDEN</value><value columnID="c1">7/5/2010</value><value columnID="c2">8/26/2010</value><value columnID="c11">Administrator</value><value columnID="c7">BOOKED</value><value columnID="c10"> y</value><value columnID="c9">H&M SWEDEN ;7/5/2010 ;8/26/2010 ;BOOKED</value></record></writeBack>
    I think the problem is with '&' or might be with something else, if i choose other options form dropdowns its does not give me any error. only fot this customer "H&M SWEDEN"
    can any one please tell me what workaround i can do.. the obi version is 10.1.3.2

  • Sax parser returned an exception. Message: Invalid character (Unicode: 0x12

    Hi,
    I'm getting the error 'Sax parser returned an exception. Message: Invalid character (Unicode: 0x12), Entity publicId: , Entity systemId: , Line number: 47, Column number: 75'
    when I try and run a report in BI Answers 10g.
    Apparently it's to do with a Java applet (or piece of Java code anyway) not being sent to an exception so it can't handle it. The problem is that I can't
    run the report to get at some of the views that seem only available at run time. eg. charts.
    Does anyone know how I can see behind the scenes of views without running the report?
    Or does anyone know how to get rid of this error anyway?
    Many thanks,
    - Jenny

    Hi Satya,
    I am facing the same issue. but I am facing this issue with Mozilla Firefox browser only. If I modify same reports in internet explorer I don't get this error.  It seems this issue is with OBIEE 10g version only and Oracle has provided some patches on this in OBIEE 11g .
    You have posted this question long back. If you have got any solution for this then please let me know also.
    Thanks

  • SAX Parser - Decoding request data

    I have implemented the SAX Parser on a web application. To do this I create a BufferedReader from the request and then call the SAX parse() method.
    BufferedReader reader = request.getReader();
    InputSource inputSource = new InputSource( reader );
    xmlReader.parse( inputSource );
    The problem is that when the XML data is posted to our web application the data from the request is encoded, and thus I need to decode it before calling parse().
    Has anyone encountered this problem? If so, what was their solution?

    I'm not sure, but are you posting the XML data as a form field? In that case you need to fetch the string data for the given parameter from the request object using getParameter() and feed that to the XML parser. What you are fetching now is the raw HTTP request data.

  • Why the SAX parser cannot support the special character like "¡"

    I do not understand why the SAX parser cannot support the special character like &iexcl; but it can replace the &quot; &amp; &lt; &gt;   to ", &, <, >, ,, but other characters will be replaced to empty charater.
    can somebody give me any suggestions or solutions. THX.
    Edited by: 844086 on 2011-3-14 上午2:27
    Edited by: 844086 on 2011-3-14 上午2:27

    I quote:
    Alternatively implement an EntityResolver that resolves the desired escapes.You are again an example that people only read/register the first thing written in a post.

  • Using the SAX parser to split up a document to be processed by a DOMParser

    I need to process a potentially large document which could prove too large to be read into memory at once by the DOM parser. The document would look something like..
    (sorry about the formatting - can't use tabs!)
    <Root>
    <Parent>
    <Child>
    <Blah.....................
    </Blah>
    </Child>
    </Parent>
    <Parent>
    <Child.......
    </Child>
    </Parent>
    etc...
    etc...
    </Root>
    The number of Parent elements could potentially be in the thousands. What I would like to do would be to use the SAX parser to read the document and when a Parent element is encountered, write the parent element and all its children to a stream in order that an input source can be created. The input source can then be parsed by a DOM parser. Once complete, the next Parent element encountered by the SAX parser could be passed to the DOM parser and so on until all of the Parent elements have been processed.
    This way I can combine the ease of parsing the document using the DOM parser without having to worry about the overhead on memory.
    Does anyone have any ideas as to the best approach? I could use the SAX parser for the whole thing but the XML is quite complex and lends itself to DOM parsing much more conveniently.

    Can you read the file line by line:
    start the reading at <child>
    pause reading at </child>
    copy the string to a var
    wrap it with respective tags (<?xml<doctype etc...) and parse
    proceed to the next <child>xx</child>
    and repeat till you hit the closing of the file...
    - Ravi

  • Code to read xml file  and display that data using sax parser

    Hai
    My problem I have to read a xml file and display the contents of the file on console using sax parser.

    here you go

  • Urgent: SAX parser bean is not working in JSP page

    Hi All,
    I have created a bean "ReadAtts" and included into a jsp file using
    "useBean", It is not working. I tried all possibilities. But Failed Plz Help me.
    Below are the details:
    Java Bean: ReadAtts.java
    package sax;
    import java.io.*;
    import javax.xml.parsers.SAXParser;
    import javax.xml.parsers.SAXParserFactory;
    import java.util.*;
    import org.xml.sax.*;
    import org.xml.sax.helpers.DefaultHandler;
    import javax.xml.parsers.ParserConfigurationException;
    public class ReadAtts extends DefaultHandler implements java.io.Serializable
         private Vector attNames = new Vector(); //Stores all the att names from the XML
         private Vector attValues = new Vector();
         private Vector att = new Vector();
         private Locator locator;
         private static String start="",end="",QueryString="",QString1="",QString2="";
    private static boolean start_collecting=false;
         public ReadAtts()
         public Vector parse(String filename,String xpath) throws Exception
    QueryString=xpath;
         StringTokenizer QueryString_ST = new StringTokenizer(QueryString,"/");
         int stLen = QueryString_ST.countTokens();
         while(QueryString_ST.hasMoreTokens())
              if((QueryString_ST.countTokens())>1)
              QString1 = QueryString_ST.nextToken();
    else if((QueryString_ST.countTokens())>0)
                   QString2 = QueryString_ST.nextToken();
         SAXParserFactory spf =
    SAXParserFactory.newInstance();
    spf.setValidating(false);
    SAXParser saxParser = spf.newSAXParser();
    // create an XML reader
    XMLReader reader = saxParser.getXMLReader();
    FileReader file = new FileReader(filename);
    // set handler
    reader.setContentHandler(this);
    // call parse on an input source
    reader.parse(new InputSource(file));
         att.add("This is now added");
         //return attNames;
    return att;
    public void setDocumentLocator(Locator locator)
    this.locator = locator;
    public void startDocument() {   }
    public void endDocument() {  }
    public void startPrefixMapping(String prefix, String uri) { }
    public void endPrefixMapping(String prefix) {  }
    /** The opening tag of an element. */
    public void startElement(String namespaceURI, String localName,String qName, Attributes atts)
    start=localName;
    if(start.equals(QString2))
    start_collecting=true; //start collecting nodes
    if(start_collecting)
    if((atts.getLength())>0)
    for(int i=0;i<=(atts.getLength()-1);i++)
    attNames.add((String)atts.getLocalName(i));
    attValues.add((String)atts.getValue(i));
    /** The closing tag of an element. */
    public void endElement(String namespaceURI, String localName, String qName)
    end = localName;
    if(end.equals(QString2))
         start_collecting=false; //stop colelcting nodes
    /** Character data. */
    public void characters(char[] ch, int start, int length) { }
    /** Ignorable whitespace character data. */
    public void ignorableWhitespace(char[] ch, int start, int length){ }
    /** Processing Instruction */
    public void processingInstruction(String target, String data) { }
    /** A skipped entity. */
    public void skippedEntity(String name) { }
    public static void main(String[] args)
    String fname=args[0];
    String Xpath=args[1];
    System.out.println("\n from main() "+(new ReadAtts().parse(fname,Xpath)));
    //System.out.println("\n from main() "+new ReadAtts().attNames());
    //System.out.println("\n from main() "+new ReadAtts().attValues());
    JSP File:
    <%@ page import="sax.*,java.io.*,java.util.*,java.lang.*,java.text.*;" %>
    <jsp:useBean id="p" class="sax.ReadAtts"/>
    Data after Parsing is.....
    <%=p.parse"E:/Log.xml","/acq/service/metrics/system/stackUsage")%>
    Expected Output:
    The jsp file should print all the vector objects from the "ReadAtts" bean
    Actual Output:
    Data after Parsing.......[]
    Thanks for your time.....
    Newton
    Bangalore. INDIA

    the problem is not because of java code insdie jsp page
    I have removed all things but the form and it is still not working
    here is the modified code:
    <!-- add news-->
    <%
    if(request.getParameter("addBTN") != null){
            out.print("addBTN");
    %>
    <!-- end of add news-->
    <form action="" method="post" enctype="multipart/form-data" name="upform" >
      <table width="99%" border="0" align="center" cellpadding="1" cellspacing="1">
        <tr>
          <td colspan="2" align="right" bgcolor="#EAEAEA" class="borderdTable"><p>'6'A) .(1 ,/J/</p></td>
        </tr>
        <tr>
          <td width="87%" align="right"><label>
            <input name="title" type="text" class="rightText" id="title">
          </label></td>
          <td width="13%" align="right">9FH'F 'D.(1</td>
        </tr>
        <tr>
          <td align="right"><textarea name="elm1" cols="50" rows="10" id="elm1" style="direction:rtl" >
              </textarea></td>
          <td align="right">*A'5JD 'D.(1</td>
        </tr>
        <tr>
          <td align="right"><label>
            <input type="file" name="filename" id="filename">
          </label></td>
          <td align="right">5H1)</td>
        </tr>
        <tr>
          <td align="right"><label>
            <input name="addBTN" type="submit" class="btn" id="addBTN" value="  '6'A) .(1 ">
          </label></td>
          <td align="right"> </td>
        </tr>
      </table>
    </form>
    <!-- TinyMCE -->
    <script type="text/javascript" src="jscripts/tiny_mce/tiny_mce.js"></script>
    <script type="text/javascript">
            tinyMCE.init({
                    mode : "textareas",
                    theme : "simple",
                    directionality : "rtl"
    </script>
    <!--end of TinyMCE -->

  • Problem with sax parser

    Hello..
    I have the following problem. When I parse an xml document with blank spaces and numbers with decimals, its sometimes comes out as one string and sometimes as two, for example "First A" sometimes comes out as "First" and "A" and sometimes as "First A", which is how its stored in the xml file. Same with numbers like 19.20. Im enclosing a little of my code..
    public void characters(char buf[], int offset, int len)
    throws SAXException
    if (textBuffer != null) {
    SaveString = ""+textBuffer;
    if(i>-1)
    numbers = SaveString;
    Whats wrong and how do I fix it.
    Best Regards Dan
    PS I have more code, in data and out data if needed.Ds

    Hello,
    I do not know if this is your problem, yet please find hereafter an excerpt of the SAX API:
    public void characters(char[] ch,
                           int start,
                           int length)
                    throws SAXException
    ... SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks;...
    ... Note that some parsers will report whitespace in element content using the ignorableWhitespace method rather than this one (validating parsers must do so)...
    In other words, I am afraid that your issue is the "standard behaviour" of a SAX parser.
    I hope it helps.

  • SAX parser problem (very odd)

    Hi,
    I�m trying to parse a XML file using SAX, it worked fine until i test with a larger file(about 12MB), in the characters() implementation, i�m trying to load the value into an object, but the object that comes with the characters()(the value of the element) comes wrong, i mean it comes but comes with less bytes.
    explanation:
    I make a System.out with the values of the offset and the length of the values of the elements, and most of the values became fine except some values that came with a byte less:
    value : blabla , offset : 456 , length : 6
    value : blabl , offset : 6662 , length : 5
    anyone knows what the hell is going on in this class...
    PS: i�ve extend the Class DefaultHandler of org.xml.sax.helpers.DefaultHandler;
    PS2: the XML file it�s fine!! The values are OK!!!

    From the documentation for the characters method of org.xml.sax.ContentHandler:
    "SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks..."

  • SAX Parser method charaters trouble

    Hi,
    I am using a sax parser for loading an xml file of size over 1gb. The problem whats happening is that the content gets truncated at random.
    And this truncation happens in the characters method of the sax parser. This am sure as I logged the content I get from characters method before starting my processing. for ex:
    <content>signal1,signal960</content>
    <content>signal1,signal970</content>
    On parsing the above snippet, the first content "signal1,signal960"
    gets extracted completely, however the next one is truncated, and the truncation is random. This happens for some tags, and then the extraction resumes normally. And this truncation starts occuring again after it has extracted a few tags.
    Also the first truncation started occuring only after parsing around 200 mb of the 1gb file.
    Could anyone tell if there is some limitation with XERCES or CRIMSON.
    or if any other one that I can use???
    Regards,
    R.

    To quote from the Documentation of ContentHandler.characters:
    -- start quote --
    The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.
    -- end quote --
    So providing the content in two seperate calls is perfectly valid and must be handled by your code. It's probably a result of the internal workings of the XML parser and allowing that in the Parser specification probably allowed some optimizations that would otherwise be impossible (a constant buffer size, for example).

  • SAX Parser Gotcha

    For what its worth, the Java SAX XML had/has a killer gotcha that one must compensate for or it does not work.
    Essentially, a SAX parser has five routines one has to program:
    startDocument
    startElement
    characters
    endElement
    endDocument
    The characters routine is where the contents of a tag appear.
    Suppose the underlying disk read routine does a read of the
    input stream that does not terminate at a tag; that has to happen
    occasionally. Then all the characters routine will see is part
    of a tag's contents. The next read of the input stream will
    cause characters to be called again with the second half of
    the tag's contents in the buffer. The application has to be
    smart enough to append the second half of the tag's contents
    to the first half, or there will be an error. My solution was
    to initialize a string in startElement, always append to it in
    characters, and call it the contents of the tag in endElement.
    This seems to work for me.
    It took weeks to find and fix this error. And as far as I know
    there is no defense against it, other than the one I outined above.
    I have read several books on XML and have never seen this problem
    described.
    Has anyone else ever seen this problem and have a better soln
    for it?
    Charles Elliott

    From the API documentation for the characters() method of org.xml.sax.ContentHandler:
    The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.You ask if anyone else has ever seen the problem? Actually it's an FAQ in this forum. Appears every couple of weeks. You've found the correct way to handle it. (Although using a StringBuilder instead of a String to accumulate the data might be a tiny bit better.)

  • Probem with SAX parser

    Hi to averybody!
    I'm trying to parse a xml file using SAX parser but I found a problem.
    When the method
    public void characters(char[] ch, int start, int length)
    I try to print the content in this way:
    for (int i = start; i <= length; i++)
    System.out.print(ch);
    but nothing is printed except except in 2 cases.
    the file parsed start like this:
    <BatchMaint xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.ncr.com/bosphoenix/WSMaintenance_5.0.xsd">
    <Id>1000950</Id>
    <DateTime>2009-02-11T11:37:09</DateTime>
    <FromPC>TD185008-1J7</FromPC>
    <FromAppl>BatchToPosWS.dll</FromAppl>
    <FromVersion>001.000.000.000</FromVersion>
    the method print only "1000950" and "2009-02-11T11:37:09".
    Has somebody any idea how fix the problem??

    From the API documentation for the characters() method:
    "The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information."

  • How to deal with empty tags in a SAX Parser

    Hi,
    I hope someone can help me with the problem I am having!
    Basically, I have written an xml-editor application. When an XML file is selected, I parse the file with a SAX parser and save the start and end locations of all the tags and character data. This enables me to display the xml file with the tags all nicely formatted with pretty colours. Truly it is a Joy To Behold. However, I have a problem with tags in this form:
    <package name="boo"/>
    because the SAX parser treats them like this:
    <package name = boo>
    </package>
    for various complex reasons the latter is unaccetable so my question is: Is there some fiendishly clever method to detect tags of this type as they occur, so that I can treat them accordingly?
    Thanks,
    Chris

    I have spent some time on googling for code doing this, but found nothing better, than I had to write in by myself.
    So, it would be something like this. Enjoy :)
    package comd;
    import org.xml.sax.helpers.DefaultHandler;
    import org.xml.sax.SAXException;
    import org.xml.sax.Attributes;
    import java.util.Stack;
    import java.util.Enumeration;
    public class EmptyTagsHandler extends DefaultHandler {
         private StringBuilder xmlBuilder;
         private Stack<XmlElement> elementStack;
         private String processedXml;
         private class XmlElement{
              private String name;
              private boolean isEmpty = true;
              public XmlElement(String name) {
                   this.name = name;
              public void setNotEmpty(){
                   isEmpty = false;
         public EmptyTagsHandler(){
              xmlBuilder = new StringBuilder();
              elementStack = new Stack();
         private String getElementXPath(){
              StringBuilder builder = new StringBuilder();
              for (Enumeration en=elementStack.elements();en.hasMoreElements();){
                   builder.append(en.nextElement());
                   builder.append("/");
              return builder.toString();
         public String getXml(){
              return processedXml;
         public void startDocument() throws SAXException {
              xmlBuilder = new StringBuilder();
              elementStack.clear();
              processedXml = null;
         public void endDocument() throws SAXException {
              processedXml = xmlBuilder.toString();
         public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
              if (!elementStack.empty()) {
                   XmlElement elem = elementStack.peek();
                   elem.setNotEmpty();
              xmlBuilder.append("<");
              xmlBuilder.append(qName);
              for (int i=0; i<attributes.getLength();i++){
                   xmlBuilder.append(" ");
                   xmlBuilder.append(attributes.getQName(i));
                   xmlBuilder.append("=");
                   xmlBuilder.append(attributes.getValue(i));
              xmlBuilder.append(">");
              elementStack.push(new XmlElement(qName));
         public void endElement(String uri, String localName, String qName) throws SAXException {
              XmlElement elem = elementStack.peek();
              if (elem.isEmpty) {
                   xmlBuilder.insert(xmlBuilder.length()-1, "/");
              } else {
                   xmlBuilder.append("</");
                   xmlBuilder.append(qName);
                   xmlBuilder.append(">");
              elementStack.pop();
         public void characters(char ch[], int start, int length) throws SAXException {
              if (!elementStack.empty()) {
                   XmlElement elem = elementStack.peek();
                   elem.setNotEmpty();
              String str = new String(ch, start, length);
              xmlBuilder.append(str);
         public void ignorableWhitespace(char ch[], int start, int length) throws SAXException {
              String str = new String(ch, start, length);
              xmlBuilder.append(str);
    }

Maybe you are looking for