SAX performence problem

Hi all
I am suposed to replace a paser already made wich is very slow by a parser made like SAX, DOM or JDOM.
My problem is that my SAx program takes the same time to anylise than the otehr rogram. My question is . Is there another way to get all the values (faster) of an XML document than this way:
public void startElement(String uri,String name,String qname,Attributes atts)throws SAXException{
//      System.out.print(" uri:"+uri+'\n');
//      System.out.print(" name:"+name+'\n');
//      System.out.print(" qname:"+qname+'\n');
//      System.out.print("\n att:"+atts+'\n');
          if (name.trim().equals("MEMBRE")){
               a++;
     if(objetMembre.nomMembre!=null){
               objetVPN.membres.addElement(objetMembre);
               objetMembre=new ObjetMembre();
/* if (name.trim().equals("SITE")){
               a++;
//     System.out.print(a+"un membre");
     if(objetMembre.nomMembre!=null){
     //          System.out.println(a+"ajout dans objetVPN");
               objetVPN.membres.addElement(objetMembre);
     //          System.out.println("new Membre");
               objetMembre=new ObjetMembre();
if (name.trim().equals("UTILISATEUR")){
               a++;
//     System.out.print(a+"un membre");
     if(objetMembre.nomMembre!=null){
     //          System.out.println(a+"ajout dans objetVPN");
               objetVPN.membres.addElement(objetMembre);
     //          System.out.println("new Membre");
               objetMembre=new ObjetMembre();
          else if (name.trim().equals("VERSION_OUTIL"))
     balise=VERSION_OUTIL;
     public void characters(char[] ch,int start,int length){
     StringBuffer temp=new StringBuffer();
     switch(balise){
     case VERSION_OUTIL:
          temp.append(ch,start,length);
          if (!temp.toString().trim().equals("")){
          versionOutil=temp.toString();
          temp=new StringBuffer();
     break;

Hi Cecile,
The performance problem may be in the code you have hooked into the method
public void characters(char[] ch,int start,int length)
Depending on the parser, this method may be called once for every character, or once for every value, or any arbitrary number of times, particularly if the node contains the likes of &
It also gets called for whitespace between nodes.
Anyhow, in your implementation of this method, you're creating new objects, which can get performance-costly.
For best performance, try the following approach:
Create a CharArrayWriter object as a member of your class. Inside the characters method, simply write the characters to the chararraywriter instance.
Inside the startElement method, call reset() on the writer
Inside the endElement method, call toString() on the writer to extract the value of the current node. This is very convenient because you have the name and the complete value of the node available to do your switch.
Using this in combination with the Xerces (or other SAX parser) should give you pretty good results.

Similar Messages

  • SAX parser problem in JDK1.5

    I have parse the xml file using SAX in jdk1.5 environment. During the parsing, it missed the some text content.
    for example
    <employeeid>
    <empid>1001</empid>
    <empid>1002</empid>
    </employeeid>
    If i have parse the above xml file using SAX in jdk1.5 environment. The output is
    1001
    100
    during the parsing , SAX parser misses digit of 2.
    if i have parse the above xml file using SAX in jdk1.4 environment , it is working fine.
    what is the problem in jdk1.5
    please help
    bala

    What I expect the problem to be was discussed recently in a topic titled "SAX Parser Problems" that was started on March 14 at 6:59 AM by JackoBS.
    Read that and see if it fixes your problem. If so, it is not a JDK1.5 problem, but a user error.
    Dave Patterson

  • Sax parser problem

    hi,
    i am assuming the problem is with sax parser but i cant be sure. I am parsing a xml file (about 1.4MB) with some data in it. the parser i have created reads the xml file correctly for the most part but when at some point the
    "public void characters(char buf[], int offset, int len) throws SAXException"
    function stops working correctly....i.e it doesnt fully read read the data between the "<start>" and "</start>" element. say it reads about 100 id's correctly---for 101 ID it does this. This is just an example. Since, the problem might be with how :
    "public void characters(char buf[], int offset, int len) throws SAXException"
    function is reading the data i was wondering if anybody else had encountered this problem or please let me know if i need to change something in the code: here's a part of the code :
    Bascially i have created three classes to enter data into three mysql tables and as i parse the data i fill up the columns by matching the column header with the tagName.
    import java.sql.Connection;
    import java.sql.DriverManager;
    import java.sql.SQLException;
    import java.sql.*;
    import java.io.*;
    import java.util.ArrayList;
    import java.lang.Object;
    import org.xml.sax.*;
    import org.xml.sax.helpers.DefaultHandler;
    import java.util.*;
    import javax.xml.parsers.SAXParserFactory;
    import javax.xml.parsers.ParserConfigurationException;
    import javax.xml.parsers.SAXParser;
    public class Echo03 extends DefaultHandler
    StringBuffer textBuffer;
    int issuedValue, prodValue;
    OrdHeader header = new OrdHeader();
    OrdDetail detail = new OrdDetail();
    Member memInfo = new Member();
    //new addition to store the dynamic value of the products
    TestOrdheader prod = new TestOrdheader();
    int counter;
    String tag, newTag;
    SetValue setVal = new SetValue();
    String test;
    public static void main(String argv[])
    if (argv.length != 1) {
    System.err.println("Usage: cmd filename");
    System.exit(1);
    // Use an instance of ourselves as the SAX event handler
    DefaultHandler handler = new Echo03();
    // Use the default (non-validating) parser
    SAXParserFactory factory = SAXParserFactory.newInstance();
    try {
    // Set up output stream
    out = new OutputStreamWriter(System.out, "UTF8");
    // Parse the input
    SAXParser saxParser = factory.newSAXParser();
    saxParser.parse( new File(argv[0]), handler);
    } catch (Throwable t) {
    t.printStackTrace();
    System.exit(0);
    static private Writer out;
    private String indentString = " "; // Amount to indent
    private int indentLevel = 0;
    //===========================================================
    // SAX DocumentHandler methods
    //===========================================================
    public void startDocument()
    throws SAXException
    nl();
    nl();
    emit("START DOCUMENT");
    nl();
    emit("<?xml version='1.0' encoding='UTF-8'?>");
    header.assign();
    public void endDocument()
    throws SAXException
    nl(); emit("END DOCUMENT");
    try {
    nl();
    out.flush();
    } catch (IOException e) {
    throw new SAXException("I/O error", e);
    public void startElement(String namespaceURI,
    String lName, // local name
    String qName, // qualified name
    Attributes attrs)
    throws SAXException
    indentLevel++;
    nl(); //emit("ELEMENT: ");
    String eName = lName; // element name
    if ("".equals(eName)) eName = qName; // namespaceAware = false
    if (qName.equals("Billing")){
    issuedValue = 1;
    }else if (qName.equals("Shipping")){
    issuedValue = 2;
    }else if (qName.equals("ShippingTotal")){
    issuedValue = 3;
    //check to see if "Product" is the name of the element thats coming next
    if (qName.equals("Product")){
    if (issuedValue != 3){
    prodValue = 1;
    prod.addCounter();
    }else{
    prodValue = 0;
    tag = eName;
    if (attrs != null) {
    for (int i = 0; i < attrs.getLength(); i++) {
    String aName = attrs.getLocalName(i); // Attr name
    if ("".equals(aName)) aName = attrs.getQName(i);
    nl();
    emit(" ATTR: ");
    emit(aName);
    emit("\t\"");
    emit(attrs.getValue(i));
    emit("\"");
    if (attrs.getLength() > 0) nl();
    public void endElement(String namespaceURI,
    String sName, // simple name
    String qName // qualified name
    throws SAXException
    nl();
    String eName = sName; // element name
    if ("".equals(eName)){
    eName = qName; // not namespaceAware
    if ("Order".equals(eName)){          
    //enter into database
         databaseEnter();
    textBuffer = null;
    indentLevel--;
    public void characters(char buf[], int offset, int len)
    throws SAXException
    nl();
    try {
    String s = new String(buf, offset, len);
    if (!s.trim().equals("")){
    settag(tag, s);
    s = null;
    }catch (NullPointerException E){
    System.out.println("Null pointer Exception:"+E);
    //===========================================================
    // Utility Methods ...
    //===========================================================
    // Wrap I/O exceptions in SAX exceptions, to
    // suit handler signature requirements
    private void emit(String s)
    throws SAXException
    try {
    out.write(s);
    out.flush();
    } catch (IOException e) {
    throw new SAXException("I/O error", e);
    // Start a new line
    // and indent the next line appropriately
    private void nl()
    throws SAXException
    String lineEnd = System.getProperty("line.separator");
    try {
    out.write(lineEnd);
    for (int i=0; i < indentLevel; i++) out.write(indentString);
    } catch (IOException e) {
    throw new SAXException("I/O error", e);
    ===================================================================
    ///User defined methods
    ===================================================================
    private String strsplit(String splitstr){
    String delimiter = new String("=");
    String[] value = splitstr.split(delimiter);
    value[1] = value[1].replace(':', ' ');
    return value[1];
    public void settag(String tag, String s){         
    String pp_transid = null, pp_respmsg = null,pp_authid = null, pp_avs = null, pp_avszip = null;
    if ((tag.equals("OrderDate")) || (tag.equals("OrderProcessingInfo"))){
    if (tag.equals("OrderDate")){
    StringTokenizer st = new StringTokenizer(s);
    String orddate = st.nextToken();
    String ordtime = st.nextToken();
    header.put("ordDate", orddate);
    header.put("ordTime", ordtime);
    }else if (tag.equals("OrderProcessingInfo")){
    StringTokenizer st1 = new StringTokenizer(s);
    int tokenCount = 1;
    while (tokenCount <= st1.countTokens()){
    switch(tokenCount){
    case 1:
    String extra = st1.nextToken();
    break;
    case 2:
    String Opp_transid = st1.nextToken();
    pp_transid = strsplit(Opp_transid);
    break;
    case 3:
    String Opp_respmsg = st1.nextToken();
    pp_respmsg = strsplit(Opp_respmsg);
    break;
    case 4:
    String Opp_authid = st1.nextToken();
    pp_authid = strsplit(Opp_authid);
    break;
    case 5:
    String Opp_avs = st1.nextToken();
    pp_avs = strsplit(Opp_avs);
    break;
    case 6:
    String Opp_avszip = st1.nextToken();
    pp_avszip = strsplit(Opp_avszip);
    break;
    tokenCount++;
    header.put("pp_transid", pp_transid);
    header.put("pp_respmsg", pp_respmsg);
    header.put("pp_authid", pp_authid);
    header.put("pp_avs", pp_avs);
    header.put("pp_avszip", pp_avszip);
    }else{
    newTag = new String(setVal.set_name(tag, issuedValue));
    header.put(newTag, s);
    //detail.put(newTag, s);
    prod.put(newTag, s);
    memInfo.put(newTag,s);
    //Check to see-- if we should add this product to the database or not
    boolean check = prod.checkValid(newTag, prodValue);
    if (check){
    prod.addValues(s);
    setVal.clearMod();
    ==================================================================
    Here's the error that i get:
    java.util.NoSuchElementException
    at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:691)
    at org.apache.crimson.parser.Parser2.parse(Parser2.java:337)
    at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:448)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:345)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:281)
    at Echo03.main(Echo03.java:47)

    I haven't gone through your code but I also had a similar error....and the exception in my was because of an "&" instead of the entity reference & in one of the element values. I use a non-validating parser but if you use a validating one then this might not be the reason for your exception.

  • SAX Java problem

    Hi,
    This is my first time to your forum. I would be grateteful if some one could tell me where Im going wrong.
    My program use a SAX parser in a Java application. It should read an XML file and extract cetain attibutes if they appear.
    Here is the program.
    import java.io.*;
    import org.xml.sax.*;
    import org.xml.sax.helpers.*;
    import javax.xml.parsers.*;
    public class PacketValues extends DefaultHandler {
    public static void main(String[] arguments) {
    if (arguments.length ==1) {
    PacketValues pv = new PacketValues(arguments[0]);
    } else {
    System.out.println("input just the xml file please");
    PacketValues(String xmlFile) {
    File input = new File(xmlFile);
    SAXParserFactory factory = SAXParserFactory.newInstance();
    factory.setValidating(false);
    try {
    SAXParser sax = factory.newSAXParser();
    PacketValueProcessor pvp = new PacketValueProcessor();
    sax.parse(input, pvp);
    System.out.println("The value is " + pvp.a);
    } catch (ParserConfigurationException pce) {
    System.out.println("Could not create that parser. ");
    System.out.println(pce.getMessage());
    } catch (SAXException se) {
    System.out.println("Problem with the SAX parser. ");
    System.out.println(se.getMessage());
    } catch (IOException ioe) {
    System.out.println("Error reading file.");
    System.out.println(ioe.getMessage());
    class PacketValueProcessor extends DefaultHandler {
    String a;
    PacketValueProcessor() {
    super();
    public void startElement(String URI, String localName,
    String qName, Attributes atts) {
    System.out.println("here in startElement");
    if (localName.equals("field")) {
    System.out.println("localname equals field");
    String n = atts.getValue("","name");
    System.out.println("n is " + n);
    if (n.equals("timestamp")) {
    String a = atts.getValue("","value");
    System.out.println("a is " + a);
    if (n.equals("tcp.seq")) {
    String a = atts.getValue("","value");
    System.out.println("a is " + a);
    It aims to extract both the value of a field element which has a name attribute "timestamp" and the value of a field element which has the attribute "tcp.seq".
    Here is part of the xml file
    <?xml version="1.0"?>
    <pdml version="0" creator="ethereal/0.10.0">
    <packet>
    <proto name="geninfo" pos="0" showname="General information" size="74">
    <field name="num" pos="0" show="1" showname="Number" value="1" size="74"/>
    <field name="len" pos="0" show="74" showname="Packet Length" value="4a" size="74"/>
    <field name="caplen" pos="0" show="74" showname="Captured Length" value="4a" size="74"/>
    <field name="timestamp" pos="0" show="Jan 22, 2004 12:04:58.646896000" showname="Captured Time" value="1074773098.646896000" size="74"/>
    </proto>
    <proto name="frame" showname="Frame 1 (74 bytes on wire, 74 bytes captured)" size="74" pos="0">
    <field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
    <field name="frame.time" showname="Arrival Time: Jan 22, 2004 12:04:58.646896000" size="0" pos="0" show="Jan 22, 2004 12:04:58.646896000"/>
    <field name="frame.time_delta" showname="Time delta from previous packet: 0.000000000 seconds" size="0" pos="0" show="0.000000000"/>
    <field name="frame.time_relative" showname="Time since reference or first frame: 0.000000000 seconds" size="0" pos="0" show="0.000000000"/>
    <field name="frame.number" showname="Frame Number: 1" size="0" pos="0" show="1"/>
    <field name="frame.pkt_len" showname="Packet Length: 74 bytes" size="0" pos="0" show="74"/>
    <field name="frame.cap_len" showname="Capture Length: 74 bytes" size="0" pos="0" show="74"/>
    </proto>
    <proto name="eth" showname="Ethernet II, Src: 00:05:5d:6d:a0:87, Dst: 00:50:da:4f:6d:83" size="14" pos="0">
    <field name="eth.dst" showname="Destination: 00:50:da:4f:6d:83 (3com_4f:6d:83)" size="6" pos="0" show="00:50:da:4f:6d:83" value="0050da4f6d83"/>
    <field name="eth.src" showname="Source: 00:05:5d:6d:a0:87 (D-Link_6d:a0:87)" size="6" pos="6" show="00:05:5d:6d:a0:87" value="00055d6da087"/>
    <field name="eth.addr" showname="Source or Destination Address: 00:50:da:4f:6d:83 (3com_4f:6d:83)" size="6" pos="0" show="00:50:da:4f:6d:83" value="0050da4f6d83"/>
    <field name="eth.addr" showname="Source or Destination Address: 00:05:5d:6d:a0:87 (D-Link_6d:a0:87)" size="6" pos="6" show="00:05:5d:6d:a0:87" value="00055d6da087"/>
    <field name="eth.type" showname="Type: IP (0x0800)" size="2" pos="12" show="0x0800" value="0800"/>
    </proto>
    <proto name="ip" showname="Internet Protocol, Src Addr: 192.168.2.21 (192.168.2.21), Dst Addr: 192.168.3.10 (192.168.3.10)" size="20" pos="14">
    <field name="ip.version" showname="Version: 4" size="1" pos="14" show="4" value="45"/>
    <field name="ip.hdr_len" showname="Header length: 20 bytes" size="1" pos="14" show="20" value="45"/>
    <field name="ip.dsfield" showname="Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)" size="1" pos="15" show="0" value="00">
    <field name="ip.dsfield.dscp" showname="0000 00.. = Differentiated Services Codepoint: Default (0x00)" size="1" pos="15" show="0x00" value="00"/>
    <field name="ip.dsfield.ect" showname=".... ..0. = ECN-Capable Transport (ECT): 0" size="1" pos="15" show="0" value="00"/>
    <field name="ip.dsfield.ce" showname=".... ...0 = ECN-CE: 0" size="1" pos="15" show="0" value="00"/>
    </field>
    <field name="ip.len" showname="Total Length: 60" size="2" pos="16" show="60" value="003c"/>
    <field name="ip.id" showname="Identification: 0xd779 (55161)" size="2" pos="18" show="0xd779" value="d779"/>
    <field name="ip.flags" showname="Flags: 0x04" size="1" pos="20" show="0x04" value="40">
    <field name="ip.flags.df" showname=".1.. = Don't fragment: Set" size="1" pos="20" show="1" value="40"/>
    <field name="ip.flags.mf" showname="..0. = More fragments: Not set" size="1" pos="20" show="0" value="40"/>
    </field>
    <field name="ip.frag_offset" showname="Fragment offset: 0" size="2" pos="20" show="0" value="4000"/>
    <field name="ip.ttl" showname="Time to live: 64" size="1" pos="22" show="64" value="40"/>
    <field name="ip.proto" showname="Protocol: TCP (0x06)" size="1" pos="23" show="0x06" value="06"/>
    <field name="ip.checksum" showname="Header checksum: 0xdcd2 (correct)" size="2" pos="24" show="0xdcd2" value="dcd2"/>
    <field name="ip.src" showname="Source: 192.168.2.21 (192.168.2.21)" size="4" pos="26" show="192.168.2.21" value="c0a80215"/>
    <field name="ip.addr" showname="Source or Destination Address: 192.168.2.21 (192.168.2.21)" size="4" pos="26" show="192.168.2.21" value="c0a80215"/>
    <field name="ip.dst" showname="Destination: 192.168.3.10 (192.168.3.10)" size="4" pos="30" show="192.168.3.10" value="c0a8030a"/>
    <field name="ip.addr" showname="Source or Destination Address: 192.168.3.10 (192.168.3.10)" size="4" pos="30" show="192.168.3.10" value="c0a8030a"/>
    </proto>
    <proto name="tcp" showname="Transmission Control Protocol, Src Port: 32862 (32862), Dst Port: 5001 (5001), Seq: 0, Ack: 0, Len: 0" size="40" pos="34">
    <field name="tcp.srcport" showname="Source port: 32862 (32862)" size="2" pos="34" show="32862" value="805e"/>
    <field name="tcp.dstport" showname="Destination port: 5001 (5001)" size="2" pos="36" show="5001" value="1389"/>
    <field name="tcp.port" showname="Source or Destination Port: 32862" size="2" pos="34" show="32862" value="805e"/>
    <field name="tcp.port" showname="Source or Destination Port: 5001" size="2" pos="36" show="5001" value="1389"/>
    <field name="tcp.len" showname="TCP Segment Len: 0" size="4" pos="34" show="0" value="805e1389"/>
    <field name="tcp.seq" showname="Sequence number: 0" size="4" pos="38" show="0" value="8dc936ab"/>
    <field name="tcp.hdr_len" showname="Header length: 40 bytes" size="1" pos="46" show="40" value="a0"/>
    <field name="tcp.flags" showname="Flags: 0x0002 (SYN)" size="1" pos="47" show="0x02" value="02">
    <field name="tcp.flags.cwr" showname="0... .... = Congestion Window Reduced (CWR): Not set" size="1" pos="47" show="0" value="02"/>
    <field name="tcp.flags.ecn" showname=".0.. .... = ECN-Echo: Not set" size="1" pos="47" show="0" value="02"/>
    <field name="tcp.flags.urg" showname="..0. .... = Urgent: Not set" size="1" pos="47" show="0" value="02"/>
    <field name="tcp.flags.ack" showname= etc etc
    The output is
    here in startElement
    here in startElement
    here in startElement
    here in startElement
    here in startElement
    here in startElement
    here in startElement
    here in startElement
    etc
    etc
    here in startElement
    here in startElement
    The value is null
    this of course means that all the program gets as far as startElement
    it never gets beyond the condition
    if (localName.equals("field")) {
    Can someone tell me why? Im not an experienced programmer.
    Thanks in advance.

    I based my program on this program:
    import java.io.*;
    import org.xml.sax.*;
    import org.xml.sax.helpers.*;
    import org.apache.xerces.parsers.SAXParser;
    public class Flour extends DefaultHandler {
    float amount = 0;
    public void startElement(String namespaceURI, String localName,
    String qName, Attributes atts) {
    if (namespaceURI.equals("http://recipes.org") && localName.equals("ingredient")) {
    String n = atts.getValue("","name");
    if (n.equals("flour")) {
    String a = atts.getValue("","amount"); // assume 'amount' exists
    amount = amount + Float.valueOf(a).floatValue();
    public static void main(String[] args) {
    Flour f = new Flour();
    SAXParser p = new SAXParser();
    p.setContentHandler(f);
    try { p.parse(args[0]); }
    catch (Exception e) {e.printStackTrace();}
    System.out.println(f.amount);
    which has as the data
    <?xml version="1.0" encoding="UTF-8" ?>
    <?dsd href="recipes.dsd"?>
    - <collection xmlns="http://recipes.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://recipes.org recipes.xsd">
    <description>Some recipes used in the XML tutorial.</description>
    - <recipe>
    <title>Beef Parmesan with Garlic Angel Hair Pasta</title>
    <ingredient name="beef cube steak" amount="1.5" unit="pound" />
    <ingredient name="onion, sliced into thin rings" amount="1" />
    <ingredient name="green bell pepper, sliced in rings" amount="1" />
    <ingredient name="Italian seasoned bread crumbs" amount="1" unit="cup" />
    <ingredient name="grated Parmesan cheese" amount="0.5" unit="cup" />
    <ingredient name="olive oil" amount="2" unit="tablespoon" />
    <ingredient name="spaghetti sauce" amount="1" unit="jar" />
    <ingredient name="shredded mozzarella cheese" amount="0.5" unit="cup" />
    <ingredient name="angel hair pasta" amount="12" unit="ounce" />
    <ingredient name="minced garlic" amount="2" unit="teaspoon" />
    <ingredient name="butter" amount="0.25" unit="cup" />
    - <preparation>
    <step>Preheat oven to 350 degrees F (175 degrees C).</step>
    <step>Cut cube steak into serving size pieces. Coat meat with the bread crumbs and parmesan cheese. Heat olive oil in a large frying pan, and saute 1 teaspoon of the garlic for 3 minutes. Quick fry (brown quickly on both sides) meat. Place meat in a casserole baking dish, slightly overlapping edges. Place onion rings and peppers on top of meat, and pour marinara sauce over all.</step>
    <step>Bake at 350 degrees F (175 degrees C) for 30 to 45 minutes, depending on the thickness of the meat. Sprinkle mozzarella over meat and leave in the oven till bubbly.</step>
    <step>Boil pasta al dente. Drain, and toss in butter and 1 teaspoon garlic. For a stronger garlic taste, season with garlic powder. Top with grated parmesan and parsley for color. Serve meat and sauce atop a mound of pasta!</step>
    </preparation>
    my program doesnt need to account for namespaces apart from that I cant see why this would work and my one not.

  • SAX parsing problem

    Thanks for reading this msg!
    I am using SAX parsing XML , everthing is looks OK but only one I don't get it, one element out put is being splited but all other no problem even same data...
    XML element :
    <item> RO(contains 160 FT)</item>
    after parsing the data is like this: 'RO(c', 'ontains 160 FT)'
    Thanks for your help!

    This is a FAQ, search the forums or look at the API documentation
    http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)
    Pete

  • SAX (xerces) problem

    I have a big problem with Apache Xerces2 Java.
    I have to parse and get data from very large xml files (100 MB to 20 GB). Because the files are very large I have to use SAX parser.
    If I use internal xerces in any update of jdk/jre 1.6 then whole document gets into memory. I have found a bug report related at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6536111 . I am not sure that fix will solve my problem and fix has not delivered yet. According to the bug report it is going to be delivered with jdk6 update 14 in the mid May 2009.
    I thougt maybe the problem is with the internal SAX parser. So I started to use source of xerces. (I use the last version - 2.9.1). At this point I have discovered that parse takes more time and need 24 byte for each node. Sometimes xml files have 80.000.000 nodes. It will take 1,5 - 2 GB of RAM which I don't have. Even if I have RAM that size I can not use it at windows 32 platform. (OS limits)
    Has anyone got idea, solution?
    Thanks..

    Thank you both Toll and DrClap for your help. I'll take a look at Saxon, but I'm still intrigued why nobody is complaining about a tool (SAX) that's almost a standard for stream parsing... and yet not working for XSLT transformations! Maybe you were right after all when you said stream processing might not be possible for my XSLT file but I doubt it because the XML is representing "sort of" a table and therefore it's made up of thousands of structurally identical <row> elements which can be individually transformed...I can't think of anything more suitable for streaming transformation.
    Thanks again for your time.
    Edited: at the end I've decided to parse the document using SAX (which in my tests uses almost no memory at all and performs lightning-fast) and then applying a XSL transformation for each parsed node (I can do it in my case). But transforming a document will still be a huge problem -in terms of memory usage- for those who don't have a repeating pattern on their XML's, although I guess 99% of the times there'll probably be one for big/huge documents.
    I think this will be very useful for other programmers facing the same problem I had. This code divides a xml file into several different files according to a repeating pattern (in this case InsurancePolicyData/Record) using SAX and then processes each chunk of xml separately, optimizing the use of memory:
    import java.io.BufferedWriter;
    import java.io.File;
    import java.io.FileWriter;
    import java.io.Writer;
    import org.dom4j.io.SAXReader;
    public class SingleThreadSplitXSLT
      public static void main(String[] args)
        throws Exception
        if (args.length != 3)
          System.err.println(
            "Error: Please provide 3 inputs:” +
            “ inputXML XSLT outputXML");
          System.exit(-1);
        long startTimeMs = System.currentTimeMillis();
        File xmlFile = new File(args[0]);
        File xsltFile = new File(args[1]);
        BufferedWriter outputWriter = new
          BufferedWriter(new FileWriter(args[2]));
        styleDocument(xmlFile, xsltFile, outputWriter);
        outputWriter.close();
        long executionTime =
          System.currentTimeMillis() - startTimeMs;
        System.err.println("Successful transformation took "
          + executionTime);
      public static void styleDocument(File xmlFile,
        File xsltFile, Writer outputWriter)
        throws Exception
        // start the output file
        outputWriter.write(
          "<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
        outputWriter.write("<InsurancePolicyData>");
        // read the input file incrementally
        SAXReader reader = new SAXReader();
        reader.addHandler( "/InsurancePolicyData/Record",
          new SplitFileElementHandler(
            xsltFile, outputWriter));
        reader.read(xmlFile);
        // finish output file
        outputWriter.write("</InsurancePolicyData>");
    }(I found it at http://www.devx.com/xml/Article/34677/1954)
    That's exactly what I was looking for, hope it helps others as well :)
    Edited by: Isana on Jun 4, 2009 7:56 AM

  • Weird (well for me anyway) SAX parsing problem

    Hi,
    I have to use SAX to parse huge XML files (the biggest is around 8Gb; genomic data) and input into MySQL. I get an IndexOutOfBoundException when I try to parse a perfectly valid string of an element after a few thousands entries.
    Let's say that my string is "G/T". I use split("/") to get the string array and get item[0] and item[1]. The error comes up always when I try to read item[1]. What is the weird part is that if I take out the already read part of the file, it is read perfectly up until a few thousand elements againn with the same error.
    Any insights would be gladly appreciated
    Sylvain Foisy, Ph. D.
    Bio-informatician
    Inflammgen.org
    Montreal Cardiology Institute
    Montreal,Qc

    Hi,
    In the characters() method from DefaultHandler, it seems that every once in a while the string returned is not read in full by the parser. The string returned for this element can only be someting like "G/T". When the parser breaks, I get that "G/" string instead. The logic of the parser goes like this:
    -look for tag OBSERVED;
    -when found, get the string attached to it (something like "G/T");
    -split in two chars: G and T
    -assigned the chars to variables.
    I know that the problem comes from the parser not getting the whole string, just part of it therefore my IndexOutOfBoundException.
    I am looking for a way to for the parser to get the whole string.
    A+
    Sylvain Foisy, Ph. D.
    Bio-informatician
    Inflammgen.org
    Montreal Cardiology Institute
    Montreal,Qc

  • Please Help:  SAX-RPC Problem(Web services Problem)

    Hi ,
    I am getting error while run the client for StaticStub (SAX-RPC).
    Errors:
    Please tell me what is the solution
    Exception in thread "main" java.lang.NoSuchMethodError
    at com.sun.xml.rpc.encoding.simpletype.XSDDateTimeDateEncoder.<clinit>(X
    SDDateTimeDateEncoder.java:195)
    at com.sun.xml.rpc.encoding.soap.StandardSOAPTypeMappings.<init>(Standar
    dSOAPTypeMappings.java:563)
    at com.sun.xml.rpc.encoding.StandardTypeMappings.getSoap(StandardTypeMap
    pings.java:32)
    at com.sun.xml.rpc.client.BasicService.createSoapMappings(BasicService.j
    ava:230)
    at com.sun.xml.rpc.client.BasicService.createStandardTypeMappingRegistry
    (BasicService.java:202)
    at sstub.MyFirstService_SerializerRegistry.getRegistry(MyFirstService_Se
    rializerRegistry.java:25)
    at sstub.MyFirstService_Impl.<init>(MyFirstService_Impl.java:25)
    at sstub.MathClient.createProxy(MathClient.java:31)
    at sstub.MathClient.main(MathClient.java:12)

    I had the same error. It was caused in my case by using an incorrect version of Java.
    I was using JWSDP 1.5 with Java 1.3.1 - I consulted the JWSDP documentation and discovered that it had only been tested with Java 1.4.2 +
    I switched to Java 1.4.2 and the problem was solved.

  • Sax parser problem plz help

    hi !!!!!!!!!!!!
    Im using a SAX parser to parse an xml file im successful in doing it i got tit parse after parising i want the value of a particular tag for example i have a tag as
    <filtername>datasource</filtername>
    i want datasource after parsing the xml and without using DOM concept how do i do it please help

    hi i have a parser for start parsing and for start element
    when im doing this i get the complete xml parsed after the xml is parsed how do i get it i still could not get how to soak the characters .....
    private void startParsing()
              try
                   InputStream instream = getClass().getClassLoader().getResourceAsStream(contextFile);
                   InputSource in = new InputSource(instream);
                   SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
                   XMLReader reader = parser.getXMLReader();
                   reader.setContentHandler(this);
                   reader.parse(in);
              catch(ParserConfigurationException pce)
                   System.out.println(pce);
              catch(SAXException se)
                   System.out.println(se);
              catch(IOException ie)
                   System.out.println(ie);
         public void startElement(String uri,String localName,String qName,Attributes attributes)
              System.out.format("uri : %s localName %s qName %s %n",uri,localName,qName);
    so now i have an attribut but i need content of the opening and closingtag like
    attribute is name
    <name>jack</name>
    im successfull in getting name but now i want jill ....

  • SAX parser problem (very odd)

    Hi,
    I�m trying to parse a XML file using SAX, it worked fine until i test with a larger file(about 12MB), in the characters() implementation, i�m trying to load the value into an object, but the object that comes with the characters()(the value of the element) comes wrong, i mean it comes but comes with less bytes.
    explanation:
    I make a System.out with the values of the offset and the length of the values of the elements, and most of the values became fine except some values that came with a byte less:
    value : blabla , offset : 456 , length : 6
    value : blabl , offset : 6662 , length : 5
    anyone knows what the hell is going on in this class...
    PS: i�ve extend the Class DefaultHandler of org.xml.sax.helpers.DefaultHandler;
    PS2: the XML file it�s fine!! The values are OK!!!

    From the documentation for the characters method of org.xml.sax.ContentHandler:
    "SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks..."

  • Java SAX mapping "&" problem

    Hi, I'm using SAX mapping, but in case that some field has "&" as content, for example company name,
         public void characters(char buf[], int offset, int len)
              throws SAXException {
              String s = new String(buf, offset, len);
    for string "name m&s" returns only "s". It loops(splits this string into 3) trough:
    1. "name m"
    2. "&"
    3. "s"
    and in this case, I get only "s" what is completely wrong. How to avoid that?
    br
    mario

    Hi
      Can you convert the character & to   "& a m p ;" whithout white spaces?
      "& a m p ;" is the correct value for some XML parsers, or try to use other character set.
    Regards
    Ivá

  • Performence problem

    Hi All,
    I have written a select command on the table COEP based on objnr and vrgng fields ( both are required )
    So when i was execuiting the query it was taking too much time for a single record.
    Can you guide me the alternative table or alternative how to get the data fastly.
    Advanced Thanks for your guidence
    Best Regards
    Sudhakar

    Hi All,
    Please find code below :
      SELECT wtgbtr objnr vrgng
      FROM coep
      INTO TABLE itb_coep
      FOR ALL ENTRIES IN itb_aufk
      WHERE objnr = itb_aufk-objnr
      and vrgng ne 'KOAO'.
    The above query is taking too much time to fetch the single record .
    Thank you for your guidence
    Regrds
    Sudhakar

  • To know Reporting time

    Hi,
    I have one problem.One of reports are taking long time and there is performence problem.For that I have to modify update rule logic.I want to know howmuch time is taking for reports at front side and backend side and How to I know?Can anyone tell the procedure to find out?
    Sridhar

    Sridhar,
    You can use technical content..
    OR
    Go to ST03N Tcode
    Goto RSRT give the query name and see the technical property you will find the query generation time.
    For the time taken for execution Select the execute and debug mode in the options select display statistics.
    Execute the query
    How to know the Query Execution time
    Query Run time

  • Database 10g partitioning performance reported

    Dear sir,
    I have a sales history table that holds 68 million records, as a data architect I have partitioned the table by RANGE based on the client ID that is a sequence.
    currently the table holds 120,000 physical partitions, the performence problem is reported by the developers once trying to insert a bulk of 8 million records into the table, they are trying to do so in a new partition, I provided them a partitioning syntax to do the insert directly into the table but it's taking hours without being accomplished.
    only a PK on the ID is created, tablespace is OK and server space is OK. I'm trying to move some of the data into a backup table, but this is a manual job that can't be done in production.
    is there any restriction on the number of partitions in a table? I did some search and nothing specified.
    what would be a solution for such activity?? is there any other technology that I might adopt?
    Thank you for your support,
    IG

    user10651321 wrote:
    Hi Gurus,
    We have recently introduce partitioning over about 40 tables, which grew more than 5 Gb each and wanted see what are the things I can keep looking into AWR, which can indicate performance impact over all on the database.
    In another words what sections of AWR I should look into to be able to say that impact was less or more on the database server apart from Load Profile?
    Env: Linux, 10.2.0.4
    Regards,
    MS 1) First of all what type of partitioning you have implemented? List,Hash,Range or other?
    2) What was the idea behind implementing the partition? Manageability or performance?
    3) Did you see any impact? I.e any user complaint about performance impact of queries after you implemented the partitioning?
    The more appropriate case to check the performance difference is to check the explain plan and tracefile.
    Explain plan will show you if partition pruning is happening or not, if not then performance would be nearly same as with non partition
    Tracing through tkprof will show you if you get any benefits out of consistent/physical IOs?
    May be in case of local partition index you would see more consistent gets then normal.

  • Inbound processing Workflow r12.1.3

    Dear all,
    We would like to setup inbound processing in r12.1.3. So that once approved or rejectect then it should process.
    I have created folders PROCESS & DISCARD. there are below issues that I am currently facing.
    1) While in user prefrences if the Email Style is HTML mail or HTML mail with attachments  the mail is not going.
    i.e. in wf_notifications    mail_status shows failed.
    where as if the Email style is plain text then we are receiving emails without the buttons approve reject etc.(Plain text).
    Thanks

    Thanks the workflow log actually gave the issue details.
    [Jul 20, 2010 10:15:36 AM EST]:1279584936805:-1:-1:server.name:xxx.xxxx.xxx:-1:-1:1:20420:SYSADMIN(0):-1:Thread[outboundThreadGroup1,5,outboundThreadGroup]:1564257481:16495:1279583974073:12:ERROR:[SVC-GSM-WFMLRSVC-301317-10006 : oracle.apps.fnd.wf.mailer.SMTPMessageHandler.prepareMessages(String)]:FormatterException -> oracle.apps.fnd.wf.mailer.FormatterException: Problem parsing XML-> org.xml.sax.SAXException: Problem obtaining the RESOURCE content -> java.net.MalformedURLException
    at oracle.apps.fnd.wf.mailer.NotificationFormatter.handleResEndTag(NotificationFormatter.java:3470)
    at oracle.apps.fnd.wf.mailer.NotificationFormatter.endElement(NotificationFormatter.java:578)
    at oracle.xml.parser.v2.XMLContentHandler.endElement(XMLContentHandler.java:210)
    at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingParser.java:1345)
    at oracle.xml.parser.v2.NonValidatingParser.parseRootElement(NonValidatingParser.java:362)
    followed note id : 1116718.1
    & problem resolved

Maybe you are looking for