TransformerHandler throws OutOfMemoryError with large xml files

i'm using TransformerHandler to convert any content to SAX events and transform it using XSLT into an XML file.
the problem is that for large amount of content i get a OutOfMemoryError.
it seams that the content is kept in memory and only flushed when i call handler.endDocument();
i tried using auto flush writers as the Result, or call the flush() method myself, but nothing.
here is the example - pls help!
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.sax.TransformerHandler;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.xml.sax.helpers.AttributesImpl;
public class Test
      * test handler memory usage
      * @param loops no of loops - when large enogh - OutOfMemoryError !!!
      * @param xsltFilePath xslt file
      * @param targetXmlFile output xml file
      * @throws Exception
     public static void testHandlerMemUsage(int loops, String xsltFilePath, String targetXmlFile)throws Exception
          //verify SAX support
          TransformerFactory factory = TransformerFactory.newInstance();
          if(!factory.getFeature(SAXTransformerFactory.FEATURE))
               throw new UnsupportedOperationException("SAX tranformations not supported");
          TransformerHandler handler=
               ((SAXTransformerFactory)factory).newTransformerHandler(new StreamSource(xsltFilePath));
          handler.setResult(new StreamResult(targetXmlFile));
          handler.startDocument();
          handler.startElement(null,"root","root",new AttributesImpl());
          //loop
          for(int i=0;i<loops;i++)
               handler.startElement(null,"el-"+i,"el-"+i,new AttributesImpl());
               handler.characters("value".toCharArray(),0,"value".length());
               handler.endElement(null,"el-"+i,"el-"+i);
          handler.endElement(null,"root","root");
          //System.out.println("end document");
          //only after endDocument() starts to print..
          handler.endDocument();
          //System.out.println("ended document");
     public static void main(String[] args)throws Exception
          System.out.println("--starting..");
          testHandlerMemUsage(500000,"/copy.xslt","/testHandlerMemUsage.xml");
          System.out.println("--we are still here -- increase loops..");
}

Did you try increasing memeory when starting java with the -Xmx parameter? You know that java uses only 64MB by default, so you might need to increase it to e.g. 256MB for your XML to work.

Similar Messages

  • Does the parser work with large XML files?

    Is there a restriction on the XML file size that can be loaded into the parser?
    I am getting a out of memory exception reading in large XML file(10MB) using the commands
    DOMParser parser = new DOMParser();
    URL url = createURL(argv[0]);
    parser.setErrorStream(System.err);
    parser.setValidationMode(true);
    parser.showWarnings(true);
    parser.parse(url);
    Win NT 4.0 Server
    Sun JDK 1.2.2
    ===================================
    Error output
    ===================================
    Exception in thread "main" java.lang.OutOfMemoryError
    at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
    led Code)
    at java.util.Hashtable.<init>(Unknown Source)
    at oracle.xml.parser.v2.DTDDecl.<init>(DTDDecl.java, Compiled Code)
    at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
    led Code)
    at oracle.xml.parser.v2.ValidatingParser.checkDefaultAttributes(Validati
    ngParser.java, Compiled Code)
    at oracle.xml.parser.v2.NonValidatingParser.parseAttributes(NonValidatin
    gParser.java, Compiled Code)
    at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingPa
    rser.java, Compiled Code)
    at oracle.xml.parser.v2.ValidatingParser.parseRootElement(ValidatingPars
    er.java:97)
    at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingP
    arser.java:199)
    at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:146)
    at TestLF.main(TestLF.java:40)
    null

    We have a number of test files that are that size and it works without a problem. However using the DOMParser does require significantly more memory than your doc size.
    What is the memory configuration of the JVM that you are running with? Have you tried increasing it? Are you using our latest version 2.0.2.6?
    Oracle XML Team

  • Problems with Large XML files

    I have tried increasing the memory pool using the -mx and -ms options. It doesnt work. I am using your latest XML parser for Java v2. Please let me know if there are some specific options I should be using.
    Thanx,
    -Sameer
    We have a number of test files that are that size and it works without a problem. However using the DOMParser does require significantly more memory than your doc size.
    What is the memory configuration of the JVM that you are running with? Have you tried increasing it? Are you using our latest version 2.0.2.6?
    Oracle XML Team
    Is there a restriction on the XML file size that can be loaded into the parser?
    I am getting a out of memory exception reading in large XML file(10MB) using the commands
    DOMParser parser = new DOMParser();
    URL url = createURL(argv[0]);
    parser.setErrorStream(System.err);
    parser.setValidationMode(true);
    parser.showWarnings(true);
    parser.parse(url);
    Win NT 4.0 Server
    Sun JDK 1.2.2
    ===================================
    Error output
    ===================================
    Exception in thread "main" java.lang.OutOfMemoryError
    at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
    led Code)
    at java.util.Hashtable.<init>(Unknown Source)
    at oracle.xml.parser.v2.DTDDecl.<init>(DTDDecl.java, Compiled Code)
    at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
    led Code)
    at oracle.xml.parser.v2.ValidatingParser.checkDefaultAttributes(Validati
    ngParser.java, Compiled Code)
    at oracle.xml.parser.v2.NonValidatingParser.parseAttributes(NonValidatin
    gParser.java, Compiled Code)
    at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingPa
    rser.java, Compiled Code)
    at oracle.xml.parser.v2.ValidatingParser.parseRootElement(ValidatingPars
    er.java:97)
    at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingP
    arser.java:199)
    at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:146)
    at TestLF.main(TestLF.java:40)
    null

    You might try using a different JDK/JRE - either a 1.1.6+ or 1.3 version as 1.2 in our experience has the largest footprint. If this doesn't work can you give us some details about your system configuration. Finally you might try the SAX interface as it does not need to load the entire DOM tree into memory.
    Oracle XML Team

  • OSB - Iterating over large XML files with content streaming

    Hi @ll
    I have to iterate over all item in large XML files and insert into a oracle database.
    The file is about 200 MB and contains around 500'000, and I am using OSB 10gR3.
    The XML structure is something like this:
    <allItems>
    <item>.....</item>
    <item>.....</item>
    <item>.....</item>
    <item>.....</item>
    <item>.....</item>
    </allItems>
    Actually I thought about using a proxy service with enabled content streaming and a "for each" action for iterating
    over all items. But for this the whole XML structure has to be materialized into a variable otherwise it is not possible!
    More about streaming large files can be found here:
    [http://download.oracle.com/docs/cd/E13159_01/osb/docs10gr3/userguide/context.html#large_messages]
    There is written "When you enable streaming for large message processing, you cannot use the ... for each...".
    And for accessing single items you should should use an assign action with a xpath like "$body/allItems/item[1]";
    this works fine and not the whole XML stream has to be materialized.
    So my idea was to use the "for each" action and processing seqeuntially all items with a xpath like:
    $body/allItems/item[$counter]
    But the "for each" action just allows iterating over a sequence of xml items by defining an selection xpath
    and the variable that contains all items. I would like to have a "repeat until" construct that iterates as long
    $body/allItems/item[$counter] returns not null. Or can I use the "for each" action differently?
    Does the OSB provides any other iterating mechanism? I know there is this spli-join construct that supports
    different looping techniques, but as far I know it does not support content streaming, is this correct?
    Did I miss somehting?
    Thanks a lot for helping!
    Cheers
    Dani
    Edited by: user10095731 on 29.07.2009 06:41

    Hi Dani,
    Yes, according to me this would be the best approach. You can use content-streaming to pass this large xml to ejb and once it passes successfully EJB should operate on this. If you want any result back (for further routing), you can get it back from EJB.
    EJB gives you power of java to process this file and from java perspective 150 MB is not a very LARGE data. Ensure that you are using buffering. Check out this link for an explanation on Java IO Streams and, in particular, buffered streams -
    http://java.sun.com/developer/technicalArticles/Streams/ProgIOStreams/
    Try dom4J with xpp (XML Pull Parser) parser in case you have parsing requirement. We had worked with 1.2GB file using this technique.
    Regards,
    Anuj

  • Transform Large XML files with XSL

    HELP, LARGE XML FILES
    I have got 30 - 50 MB large xml file, and i would like to transform it
    with xslt, i tried but i have got OutOfMemory Exception.
    I tried to find out the solution on JAVA site, but i didn't find it.
    I can not displit my xml file. I hope for some help.
    I tried really everything.
    Thanks a lot

    What is your machine configuration ?
    The above 2 suggestions would help, but it does depend on how your software is written.
    Please post more info about your environment and object design.
    Chintan

  • Problem with parsing large xml files

    Hello All,
    I am parsing a large xml file of 20MB and I use DocumentBuilder.parse(File). This method works for small xml files with size less than 20MB but the application hangs and doesn't through any error message when parsing 20MB xml files. Please let me know what I have to do at this point ?
    Thanks & Regards,
    Kumar.

    Well... i can't agree.
    If you have such structure:
    <task>
      <task/>
      <task>
         <task>
            <task/>
         </task>
         <task/>
      </task>
    </task>
    ...you may always keep stack of tasks (at startElement push to top, and at endElement pop), so at every leaf of tree you will have all parents of that leaf.
    for such structure:
    <task id="1" parent="0"/>
    <task id="2" parent="1"/>
    <task id="3" parent="1"/>
    <task id="4" parent="2"/>
    <task id="5" parent="3"/>
    ...it will be much faster to go thro document by sax several times to build tree of tasks, than to load all document into memory...

  • Problem with parsing large XML files chunked over HTTP

    I'm trying to isolate a bug that was introduced when upgrading the JRE in use from Java 7u51 to 7u71 without changing any code. The problem appears to be very similar to: Bug ID: JDK-8027359 XML parser returns incorrect parsing results.
    Further investigation showed that it was also introduced in the same versions (7u71) where that fix was applied. Unlike that bug though, my XML is marked as version 1.0. It also appears to be with only large XML files, on the order of 10MB or so.
    The closest I've been able to narrow it down to is the code is using JAXB to unmarshall a stream that the debugger tells me is a org.apache.http.com.EofSensorInputStream / org.apache.http.impl.io.ChunkedInputStream. The exception I get is not consistent, but typically appears to be from chunks being overwritten or shuffled, resulting in letters appearing in attributes that are actually numbers, or like the following where an attribute "testAttribute" gets partially overwritten by the end of a timestamp that was in a different section of the XML.
    javax.xml.bind.UnmarshalException
    - with linked exception:
    [javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,98748]
    Message: Attribute name "testAttribu00Z" associated with an element type "testElement" must be followed by the ' = ' character.]
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.handleStreamException(UnmarshallerImpl.java:421)
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:357)
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:334)
    Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,98748]
    Message: Attribute name "testAttribu00Z" associated with an element type "testElement" must be followed by the ' = ' character.
      at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598)
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:181)
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:355)
      ... 6 more
    Here's some code that seems to reproduce it if you can connect to an XML server that returns a large chunked XML file:
      SchemeRegistry registry = new SchemeRegistry();
      registry.register(
                    new Scheme("http", 80, PlainSocketFactory.getSocketFactory()));
      HttpClient client = new DefaultHttpClient(new BasicClientConnectionManager(registry));
      String url = "http://someUrlReturningAlargeChunkedXML";
      HttpGet method = new HttpGet(url);
      HttpResponse response = client.execute(method);
      InputStream inputStream = response.getEntity().getContent();
      XMLStreamReader responseReader = factory.createXMLStreamReader(inputStream);
      JAXBElement<JaxBObjectOfResponse> wot = unmarshaller.unmarshal(responseReader, JaxBObjectOfResponse.class);
    If you connect using URL.openStream() to the same service there is no error. If I read bytes directly and write to a file, there is no error. The error only happens when I try to unmarshal it, and it's large, and I'm using Java 7u71 (or later). It can be consistently repeated with the jsp webapp that I'm using, but didn't show the error when I used the same code with a Wikipedia dump XML file.
    How can I unmarshal in a different way to avoid this problem? Or, how can I better isolate the bug so it can be posted to the appropriate bug system?

    Apparently, adding the Woodstox XML libraries avoids the bug. Is there anyone who can reproduce this on another system? Was there any changes to the Stax implementation between u67 and u71 that may have introduced a bug like this?
    Edit: When setting the logging level to DEBUG, I once saw the overwritten buffer being logged as if that was what was received (as in the testAttribu00Z example above). I can't repeat that anymore though, and very rarely it does parses with no exception (though it may have still been corrupted). Now the error seems to be consistently on one of the buffer boundaries, as in:
    17:08:09,705 DEBUG wire:63 - << "2000[\r][\n]"
    17:08:09,705 DEBUG wire:77 - << "trend>....OTHER XML...<trend hours=""
    17:08:09,705 DEBUG wire:77 - << "634.0972777777778" datetime="2013-05-21T00:43:48.350Z" t"
    17:08:09,705 DEBUG wire:63 - << "[\r][\n]"
    17:08:09,705 DEBUG wire:63 - << "2000[\r][\n]"
    17:08:09,705 DEBUG wire:77 - << "rend-mode="0">
    Exception in thread "main" java.lang.NumberFormatException: t34.0972777777778
      at com.sun.xml.internal.bind.DatatypeConverterImpl._parseDouble(DatatypeConverterImpl.java:213)
      at mypackage.Trend_JaxbXducedAccessor_hours.parse(TransducedAccessor_field_Double.java:48)
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StructureLoader.startElement(StructureLoader.java:194)
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:486)
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:465)
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:60)
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.handleStartElement(StAXStreamConnector.java:231)
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:165)
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:355)
      at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:334)
    Or:
    17:19:12,563 DEBUG wire:63 - << "2000[\r][\n]"
    17:19:12,563 DEBUG wire:77 - << ...OTHER XML...<trend index="5"
    17:19:12,563 DEBUG wire:77 - << "" label="N"
    17:19:12,563 DEBUG wire:63 - << "[\r][\n]"
    Exception in thread "main" java.lang.NumberFormatException: Not a number: N
      at com.sun.xml.internal.bind.DatatypeConverterImpl._parseInt(DatatypeConverterImpl.java:106)
      at com.sun.xml.internal.bind.DatatypeConverterImpl._parseShort(DatatypeConverterImpl.java:118)

  • Veeery Slooow (Dom +quite large xml file)

    I have an xml file, where is 50000 to 100000 xml elements like this:
    <mittaus>
    <pvm>25.1.2006</nimi>
    <klo>12.10.27.234</klo>
    <arvo>44</arvo>
    </mittaus>
    When I try to print the contents of the XML file with this code:
    File file = new File(gv.FilePath() +nimi+".xml");
    Document doc = builder.parse(file);
        //Juurielementti
        Element root = doc.getDocumentElement();
        // Kaikki mittaus -lohkot
        NodeList time = root.getElementsByTagName("mittaus");
        for (int i=0; i < time.getLength() ; i++) {
        Element element = (Element)time.item(i);
        NodeList arvoList = element.getElementsByTagName("arvo");
        Element arvoElement = (Element)arvoList.item(0);
        String arvo = arvoElement.getFirstChild().getNodeValue();
        NodeList timeList = element.getElementsByTagName("klo");
        Element fileElement = (Element)timeList.item(0);
        String timeS = fileElement.getFirstChild().getNodeValue();
        NodeList dateList = element.getElementsByTagName("pvm");
        Element dateElement = (Element)dateList.item(0);
        String dateS = dateElement.getFirstChild().getNodeValue();
        palautus = arvo +";" +timeS +";" +dateS; The palautus -String is then printed on the screen.
    It takes about 1sec per couple elements, so the whole file printing takes about 20000 seconds!
    Is there something wrong with the code? If the file size over 100000 elements, I get also an out of memory failure (java.lang.memoryoutof... or something like that).
    What the heck is wrong? With smaller xml files (under 1000 elements) it works ok.

    If you use a SAX parsing methodology, then the memory problem goes away. We are using it here at work to parse large streams. It is very fast, and uses little resource.
    Open up your file as an InputStream, send it to the SAX parser, and handle the tags in the given event handler.
    Inside the event handler, send your data to your print or other output stream.
    Example fragments:
    import javax.xml.parsers.*;
    import org.xml.sax.*;
    import org.xml.sax.helpers.*;
    StringBuffer someStringCollector = new StringBuffer();
    //parser
    public void some xmlParser() throws Exception{
    DefaultHandler handler = new MyHandler();
    SAXParserFactory factory = SAXParserFactory.newInstance();
    factory.newSAXParser().parse((InputStream) XMLByteStream,DefaultHandler) handler);
    //parse event handler (not all events represented in the fragment)
    //simply overload the handler event methods as specified in http://java.sun.com/j2ee/1.4/docs/api/index.html
    class MyHandler extends Default Handler{
    //handle events
    public void startElement(String namespaceURI, String localName, String qName, Attributes atts){
    //code to deal with the start of the element.
    public void end Element(String namespaceURI, String localname, String qName){
    //send contents of current string buffer to printer, or other handling as necessary and reset charcter buffer of type StringBuffer perhaps.
    if(qName.equals("something")){
    //handle data
    someStringCollector.setLength(0);
    public void characters(char [] ch, int start, int length){
    //gather up data in a String Buffer for handling when endElement reached.
    someStringCollector.append(ch, start, length);
    Some good examples can be found at: http://userpage.fu-berlin.de/~ram/pub/pub_jf47htuHHt/java_sax_parser_en

  • PostMethod with large XML passed in setRequestEntity is truncated

    Hi,
    I use PostMethod to transfer large XML to a servlet.
    CharArrayWriter chWriter = new CharArrayWriter();
    post.marshal(chWriter);
    //The chWriter length is 120KB
    HttpClient httpClient = new HttpClient();
    PostMethod postMethod = new PostMethod(urlStr);
    postMethod.setRequestEntity(new StringRequestEntity(chWriter.toString(), null, null));
    postMethod.setRequestHeader("Content-type", "text/xml");
    int responseCode = httpClient.executeMethod(postMethod);
    String responseBody = postMethod.getResponseBodyAsString();
    postMethod.releaseConnection();
    When I open the request in doPost method in sevlet:
    Reader inpReader = request.getReader();
    char[] chars = MiscUtils.ReaderToChars(inpReader);
    inpReader = new CharArrayReader(chars);
    //The data is truncated (in chars[]) ???
    static public char[] ReaderToChars(Reader r) throws IOException, InterruptedException {
    //IlientConf.logger.debug("Start reading from stream");
    BufferedReader br = new BufferedReader(r);
    StringBuffer fileData = new StringBuffer("");
    char[] buf = new char[1024];
    int numRead=0;
    while((numRead=br.read(buf)) != -1){
    String readData = String.valueOf(buf, 0, numRead);
    fileData.append(readData);
    buf = new char[1024];
    return fileData.toString().toCharArray();
    Any idies what can be the problem??
    Lior.

    Hi,
    I use the same code and have 2 tomcats running with the same servlet on each of them.
    One running on Apache/2.0.52 (CentOS) Server and the second running on Apache/2.0.52 (windows XP) Server.
    I managed to post large XML from (CentOS) Server to (windows XP) Server successfully and failed to post large XML
    from (windows XP) Server to (CentOS) Server.
    I saw somthing called mod_isapi that might be disabling posting large XML files.
    Can anyone help me on going over that limitation?
    Thanks,
    Lior.

  • How to set SAXParser at command-line interface to create a large XML file

    Hi,
    I am trying to create a large XML file (more than 50 MB) by selecting from Oracle database but failed because of "out of memory" error. According to "Oracle XML Developer Guide", we should use SAXParser to parsing a large XML file. But there is no example to show how to set SAXParser at command-line
    Following is what I use to get xml files. It works only when the file is small.
    java OracleXML getXML -DateFormat -withDTD -rowsetTag PO_HDR -conn
    "jdbc:oracle:oci8:@server_name" -user "ID/password" "select * from table_name"
    When I set SAXParser at the way below,
    java oracle.xml.parser.v2.SAXParser OracleXML getXML -DateFormat -withDTD -rowsetTag PO_HDR -conn
    "jdbc:oracle:oci8:@server_name" -user "ID/password" "select * from table_name"
    it failed with the error message: "In class oracle.xml.parser.v2.SAXParser: void main(String argv[]) is not defined"
    Does anyone know how to solve the problem? I'll be appreciated very much for your help.
    Yi

    here are my ideas.
    register the xml schema.
    using xmldom, generate the desired xml output and return as xmltype.
    then you can use something like this to check.
    declare
    xmldoc xmltype ;
    begin
       -- populate xmldoc from you xmldom function
       -- validate against XML schema
       xmldoc.isSchemaValid(schema_url, root_element);
       if xmldoc.isSchemaValid = 1 then
            --valid schema
       else
            --invalid
       end if;
    end

  • I want to load large raw XML file in firefox and parse by DOM. But, for large XML file the firefox very slow some time crashed . Is there any option to increase DOM handling memory in Firefox

    Actually i am using an off-line form to load very large XML file and using firefox to load that form. But, its taking more time to load and some time the browser crashed. through DOM parsing this XML file to my form. Is there any option to increase DOM handler size in firefox

    Thank you for your suggestion. I have a question,
    though. If I use a relational database and try to
    access it for EACH and EVERY click the user makes,
    wouldn't that take much time to populate the page with
    data?
    Isn't XML store more efficient here? Please reply me.You have the choice of reading a small number of records (10 children per element?) from a database, or parsing multiple megabytes. Reading 10 records from a database should take maybe 100 milliseconds (1/10 of a second). I have written a web application that reads several hundred records and returns them with acceptable response time, and I am no expert. To parse an XML file of many megabytes... you have already tried this, so you know how long it takes, right? If you haven't tried it then you should. It's possible to waste a lot of time considering alternatives -- the term is "analysis paralysis". Speculating on how fast something might be doesn't get you very far.

  • Query in a large xml file

    Hello,
    I'm trying to work with very large xml files which are created from csv files. These files may be very large - up to 1 GB ! Untill now I have managed to do several validations on these big xml files, and the only thing that works for me is SAX parser, DOM is out of the question because it fills up memory.
    My next task is to do queries on these files, smth like:
    select field1,field2 from file.xml
    where field3 = 'A'
    and (fileld4>'B' or field1='C')
    order by field2.
    I searched the net about finding out how to make queries on xml files (since I have never done queries on xml before), but I couldn't find which "query language" is best for large files. If I use XPath (XSLT) will that not cause me memory problems because XSLT represents the file as a memory object?
    My idea is to parse the file with SAX and check every row if it fits the where condition and then write it immediately to a result xml file. But validating the where statement can be very complicated without using some tool. Also the order by statement is another problematic issue.
    Does anyone have some more intelegent ideas about how I can do this? Please help! :(
    The xml file looks like this:
    <doc>
    <row id ="1">
    <column id="1" name="column1">value</column>
    <column id="N" name="columnN">value</column>
    </row>
    <row id ="M">
    <column id="1" name="column1">value</column>
    <column id="N" name="columnN">value</column>
    </row>
    </doc>

    Hi all,
    Thank you very much for your replies.
    First, saxon didn't work because it uses an in-memory parser, and that is what I was trying to avoid.
    Different database is also out of the question, because the customer insist on XML, and also there are some files that can never be converted to a database table, because eventually with some transformations thay are changed and are not completely like the standard csv format.
    I think that maybe http://exist.sourceforge.net is the rigth solution for me, but I will probably try it in the next version of my project.
    For now I have managed to make the project with only SAXParser and a lot of back - end programming and it works ok, althoug it was very hard to make it, and will be harded to maintain, so I will try to look at the eXist project.
    Thanks everyone for the help.

  • Best technology to navigate through a very large XML file in a web page

    Hi!
    I have a very large XML file that needs to be displayed in my web page, may be as a tree structure. Visitors should be able to go to any level depth nodes and access the children elements or text element of those nodes.
    I thought about using DOM parser with Java but dropped that idea as DOM would be stored in memory and hence its space consuming. Neither SAX works for me as every time there is a click on any of the nodes, my SAX parser parses the whole document for the node and its time consuming.
    Could anyone please tell me the best technology and best parser to be used for very large XML files?

    Thank you for your suggestion. I have a question,
    though. If I use a relational database and try to
    access it for EACH and EVERY click the user makes,
    wouldn't that take much time to populate the page with
    data?
    Isn't XML store more efficient here? Please reply me.You have the choice of reading a small number of records (10 children per element?) from a database, or parsing multiple megabytes. Reading 10 records from a database should take maybe 100 milliseconds (1/10 of a second). I have written a web application that reads several hundred records and returns them with acceptable response time, and I am no expert. To parse an XML file of many megabytes... you have already tried this, so you know how long it takes, right? If you haven't tried it then you should. It's possible to waste a lot of time considering alternatives -- the term is "analysis paralysis". Speculating on how fast something might be doesn't get you very far.

  • Efficient searching in a large XML file for specific elements

    Hi
    How can I search in a large XML file for a specific element efficiently (fast and memory savvy?) I have a large (approximately 32MB with about 140,000 main elements) XML file and I have to search through it for specific elements. What stable and production-ready open source tools are available for such tasks? I think PDOM is a solution but I can't find any well-known and stable implementations on the web.
    Thanks in advance,
    Behrang Saeedzadeh.

    The problem with DOM parsers is that the whole document needs to be parsed!
    So with large documents this uses up a lot of memory.
    I suggest you look at sometthing like a pull parser (Piccolo or MPX1) which is a fast parser that is program driven and not event driven like SAX. This has the advantage of not needing to remember your state between events.
    I have used Piccolo to extract events from large xml based log files.
    Carl.

  • What are the best tools for opening very large XML files and examining the tree and confirming they are valid?

    I am generating some very large XML files (600,000+ lines, 50MB+ characters). I finally have them all being valid XML and valid UTF-8.
    But the files are so large Safari and Chrome will often not open them. FireFox will though.
    Instead of these browsers, I was wondering if there are there any other recommended apps for the Mac for opening and viewing the XML, getting an error message if they are not valid for some reason and examing the XML tree?
    I opened the file in the default app for XML which is Xcode, but that is just like opening it in a plain text editor. You can't expand/collapse the XML tree like you can with a browser, and it doesn't report errors.
    Thanks,
    Doug

    Hi Tom,
    I had not seen that list. I'll look it over.
    I'm also in touch with the developer of BBEdit (they are quite responsive) and they are willing to look at the file in question and see why it is not reporting UTF-8 errors while Chrome is.
    For now I have all the invalid characters quashed and things are working. But it would be useful in the future.
    By the by, some of those editors are quite pricey!
    doug

Maybe you are looking for