Too big XML file to parse

I have this big xml file which is 13MB which I am trying to parse by Java. The problem is that I get the message
java.lang.OutOfMemoryError: Java heap spaceI am loading from xml file as following:
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse (new File("xmlGeneSynonyms.xml"));  //Here I get problem!!!Do you have any suggestion how I can work this out?
thanx

I agree with the poster who said SAX may be a better choice. 13 MB, though (probably) smaller than the memory space of your machine, is still going to take a while to parse & build the nodes in memory. Unless you are going to keep the parsed XML tree in a memory for a LONG time and do a LOT with it, SAX is likely a better choice.
With SAX, you just parse a little at a time until you find what you're looking for, then quit. This would be a better wa to go, for example, if want only few parts of the 13 MB document.
And how many 13 MB documents do you have? If more than just one or two, you're going to take a huge performance hit by trying to parse each one & stuff the whole thing into memory if you're only interested in a small part of it.

Similar Messages

  • How quickly parse big XML file (60 MB) ???

    How quickly parse big XML file (60 MB) ???

    I assume you mean load it into XML DB ?. Fundamentally your document is about the upper limit for 9.2.x. I would strongly recommend trying to break it up into a set of smaller documents using a SAX parser before trying to load it into XML DB. In 10g it should be possible to load much bigger documents than this.

  • Slow extraction in big XML-Files with PL/SQL

    Hello,
    i have a performance problem with the extraction from attributes in big XML Files. I tested with a size of ~ 30 mb.
    The XML file is a response of a webservice. This response include some metadata of a document and the document itself. The document is inline embedded with a Base64 conversion.  Here is an example of a XML File i want to analyse:
    <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
       <soap:Body>
          <ns2:GetDocumentByIDResponse xmlns:ns2="***">
             <ArchivedDocument>
                <ArchivedDocumentDescription version="1" currentVersion="true" documentClassName="Allgemeines Dokument" csbDocumentID="***">
                   <Metadata archiveDate="2013-08-01+02:00" documentID="123">
                      <Descriptor type="Integer" name="fachlicheId">
                         <Value>123<Value>
                      </Descriptor>
                      <Descriptor type="String" name="user">
                         <Value>***</Value>
                      </Descriptor>
                      <InternalDescriptor type="Date" ID="DocumentDate">
                         <Value>2013-08-01+02:00</Value>
                      </InternalDescriptor>
                      <!-- Here some more InternalDescriptor Nodes -->
                   </Metadata>
                   <RepresentationDescription default="true" description="Description" documentPartCount="1" mimeType="application/octet-stream">
                      <DocumentPartDescription fileName="20mb.test" mimeType="application/octet-stream" length="20971520 " documentPartNumber="0" hashValue=""/>
                   </RepresentationDescription>
                </ArchivedDocumentDescription>
                <DocumentPart mimeType="application/octet-stream" length="20971520 " documentPartNumber="0" representationNumber="0">
                   <Data fileName="20mb.test">
                      <BinaryData>
                        <!-- Here is the BASE64 converted document -->
                      </BinaryData>
                   </Data>
                </DocumentPart>
             </ArchivedDocument>
          </ns2:GetDocumentByIDResponse>
       </soap:Body>
    </soap:Envelope>
    Now i want to extract the filename and the Base64 converted document from this XML response.
    For the extraction of the filename i use the following command:
    v_filename := apex_web_service.parse_xml(v_xml, '//ArchivedDocument/ArchivedDocumentDescription/RepresentationDescription/DocumentPartDescription/@fileName');
    For the extraction of the binary data i use the following command:
    v_clob := apex_web_service.parse_xml_clob(v_xml, '//ArchivedDocument/DocumentPart/Data/BinaryData/text()');
    My problem is the performance of this extraction. Here i created some summary of the start and end time for the commands:
    Start Time
    End Time
    Difference
    Command
    10.09.13 - 15:46:11,402668000
    10.09.13 - 15:47:21,407895000
    00:01:10,005227
    v_filename_bcm := apex_web_service.parse_xml(v_xml, '//ArchivedDocument/ArchivedDocumentDescription/RepresentationDescription/DocumentPartDescription/@fileName');
    10.09.13 - 15:47:21,407895000
    10.09.13 - 15:47:22,336786000
    00:00:00,928891
    v_clob := apex_web_service.parse_xml_clob(v_xml, '//ArchivedDocument/DocumentPart/Data/BinaryData/text()');
    As you can see the extraction of the filename is slower then the document extraction. For the Extraction of the filename i need ~01
    I wonder about it and started some tests.
    I tried to use an exact - non dynamic - filename. So i have this commands:
    v_filename := '20mb_1.test';
    v_clob := apex_web_service.parse_xml_clob(v_xml, '//ArchivedDocument/DocumentPart/Data/BinaryData/text()');
    Under this Conditions the time for the document extraction soar. You can see this in the following table:
    Start Time
    End Time
    Difference
    Command
    10.09.13 - 16:02:33,212035000
    10.09.13 - 16:02:33,212542000
    00:00:00,000507
    v_filename_bcm := '20mb_1.test';
    10.09.13 - 16:02:33,212542000
    10.09.13 - 16:03:40,342396000
    00:01:07,129854
    v_clob := apex_web_service.parse_xml_clob(v_xml, '//ArchivedDocument/DocumentPart/Data/BinaryData/text()');
    So i'm looking for a faster extraction out of the xml file. Do you have any ideas? If you need more informations, please ask me.
    Thank you,
    Matthias
    PS: I use the Oracle 11.2.0.2.0

    Although using an XML schema is a good advice for an XML-centric application, I think it's a little overkill in this situation.
    Here are two approaches you can test :
    Using the DOM interface over your XMLType variable, for example :
    DECLARE
      v_xml    xmltype := xmltype('<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> 
           <soap:Body> 
              <ns2:GetDocumentByIDResponse xmlns:ns2="***"> 
                 <ArchivedDocument> 
                    <ArchivedDocumentDescription version="1" currentVersion="true" documentClassName="Allgemeines Dokument" csbDocumentID="***"> 
                       <Metadata archiveDate="2013-08-01+02:00" documentID="123"> 
                          <Descriptor type="Integer" name="fachlicheId"> 
                             <Value>123</Value> 
                          </Descriptor> 
                          <Descriptor type="String" name="user"> 
                             <Value>***</Value> 
                          </Descriptor> 
                          <InternalDescriptor type="Date" ID="DocumentDate"> 
                             <Value>2013-08-01+02:00</Value> 
                          </InternalDescriptor> 
                          <!-- Here some more InternalDescriptor Nodes --> 
                       </Metadata> 
                       <RepresentationDescription default="true" description="Description" documentPartCount="1" mimeType="application/octet-stream"> 
                          <DocumentPartDescription fileName="20mb.test" mimeType="application/octet-stream" length="20971520 " documentPartNumber="0" hashValue=""/> 
                       </RepresentationDescription> 
                    </ArchivedDocumentDescription> 
                    <DocumentPart mimeType="application/octet-stream" length="20971520 " documentPartNumber="0" representationNumber="0"> 
                       <Data fileName="20mb.test"> 
                          <BinaryData> 
                            ABC123 
                          </BinaryData> 
                       </Data> 
                    </DocumentPart> 
                 </ArchivedDocument> 
              </ns2:GetDocumentByIDResponse> 
           </soap:Body> 
        </soap:Envelope>');
      domDoc    dbms_xmldom.DOMDocument;
      docNode   dbms_xmldom.DOMNode;
      node      dbms_xmldom.DOMNode;
      nsmap     varchar2(2000) := 'xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns2="***"';
      xpath_pfx varchar2(2000) := '/soap:Envelope/soap:Body/ns2:GetDocumentByIDResponse/';
      istream   sys.utl_characterinputstream;
      buf       varchar2(32767);
      numRead   pls_integer := 1;
      filename       varchar2(30);
      base64clob     clob;
    BEGIN
      domDoc := dbms_xmldom.newDOMDocument(v_xml);
      docNode := dbms_xmldom.makeNode(domdoc);
      filename := dbms_xslprocessor.valueOf(
                    docNode
                  , xpath_pfx || 'ArchivedDocument/ArchivedDocumentDescription/RepresentationDescription/DocumentPartDescription/@fileName'
                  , nsmap
      node := dbms_xslprocessor.selectSingleNode(
                docNode
              , xpath_pfx || 'ArchivedDocument/DocumentPart/Data/BinaryData/text()'
              , nsmap
      --create an input stream to read the node content :
      istream := dbms_xmldom.getNodeValueAsCharacterStream(node);
      dbms_lob.createtemporary(base64clob, false);
      -- read the content in 32k chunk and append data to the CLOB :
      loop
        istream.read(buf, numRead);
        exit when numRead = 0;
        dbms_lob.writeappend(base64clob, numRead, buf);
      end loop;
      -- free resources :
      istream.close();
      dbms_xmldom.freeDocument(domDoc);
    END;
    Using a temporary XMLType storage (binary XML) :
    create table tmp_xml of xmltype
    xmltype store as securefile binary xml;
    insert into tmp_xml values( v_xml );
    select x.*
    from tmp_xml t
       , xmltable(
           xmlnamespaces(
             'http://schemas.xmlsoap.org/soap/envelope/' as "soap"
           , '***' as "ns2"
         , '/soap:Envelope/soap:Body/ns2:GetDocumentByIDResponse/ArchivedDocument/DocumentPart/Data'
           passing t.object_value
           columns filename    varchar2(30) path '@fileName'
                 , base64clob  clob         path 'BinaryData'
         ) x

  • Big XML Files

    Hello,
    does anybody have experience with large XML documents in oracle XML DB? I think bigger than the purchaseOrder example, for example a whole book. I am looking for a storage method for documents at least 10 MByte large.
    Thanks
    Krisztian

    Sometimes the chapters are so big like a whole book. (ca 200-300 pages, legislation commentaries). What I'm looking for is a way to store big XML files and access them flexible at different levels. E.g. A law can have 50 articles and some times even 2400 articles. If I need to share the editing work the editor can get the whole document but sometimes even only fragments. Much better would be if more than one editor could work at one document, even at different fragments. But the fragments must be created dinamically.

  • Building big XML file from scratch - Urgent

    Oracle 8.1.7.3 on windows NT platform
    What is the best way to generate a quiet big XML file from multiple tables ?
    I have information stored in many relational tables from which I need to generate a XML flat file either stored in a CLOB field or in a text file in a system directory. This XML file will be then used
    as an input to generate a report either in HTML using XSLT, or in a PDF file using apache-fop or in a MS Excel file using SoftArtisan ExcelWriter.
    My XML file has many levels in it structure, I mean that it is composed of one root element with 2 children, each children has a 3 children. One on these 3 children has 2 children and so on. Actually there are more or less 10 nested levels.
    To generate this XML file, I tried to use XSU for PL/SQL with the nested cursor() feature plus XSLT to transform the raw XML file to my requirements.
    But obviously there are some limitations using this method. Particularly, if the inner cursor returns an empty set I get exhausted resultset java error... A TAR confirmed that limitation.
    So I had to give up this method to use basic nested PL/SQL cursors. Each fetched row is then inserted into a table (varchar2) with a sequence number so that with a cursor like select xml_chunk from my_table order by sequence, I get the whole XML file that I save either in a flat file or in a CLOB (using append method).
    This method works fine, but it takes time and it's not flexible at all as I have to construct each XML tag. I guest this way of proceeding is not the more efficient...
    Using DOM method won't be better as I still need PL/SQL cursor to select each level of my XML structure and in addition I might for sure encounter a problem of memory.
    So what solutions would you suggest to generate this XML file. It must be quiet fast. The XML file can be up to 2Mo big. My system is actually a kind of on-the-fly reports generation. I mean that the XML file needs to be created with up-to-date data many times during the working hours !
    Quick answers or suggestions would be greatly appreciated. It's very urgent !!
    Thanks

    I looks like the best way is to using the SAX processing for your application? Do you know the DTD or XML schema of your output XML document?
    Would you send me the sample code for the method "to use XSU for PL/SQL with the nested cursor() feature plus XSLT to transform the raw XML file" to reproduce the problem?

  • Breaking BIG XML files in to 4 different XML Files

    Hi:
    What will be a problem if I break the BIG XML file in to a number of different XML FILES?
    My main reason is to create XMLVIEW.
    Please help
    ALI_2

    Would Suite Spot Studios "AA Translator" achieve this?  If you navigate to his web site you can test the app by submitting a test project to him and he/they will attempt to "convert" it for you.  (Only one of you, you or your buddy, would need to buy and have the app installed).
    I realise this is something of a "long shot" since "suitespot" does not use a Mac and this "new" session format (.sesx) may not be readable by his app.
    Jeff

  • SQL*Loader problem - not efficient, parsing error for big xml files

    Hi Experts,
    First of all, I would like to store xml files in object relation way. Therefore I created a schema and a table for it (see above).
    I wants to propagate it (by using generated xml files), hence I created a control file for sql loader (see above).
    I have two problems for it.
    1, It takes a lot of time. It means I can upload a ~80MB file in 2 hours and a half.
    2, At bigger files, I got the following error messages (OCI-31011: XML parsing failed OCI-19202: Error occurred in XML processing LPX-00243: element attribute value must be enclosed in quotes). It is quite interesting because my xml file is generated and I could generated and uploaded the first and second half of the file.
    Can you help me to solve these problems?
    Thanks,
    Adam
    Control file
    UNRECOVERABLE
    LOAD DATA
    CHARACTERSET UTF8
    INFILE *
    APPEND
    INTO TABLE coll_xml_objrel
    XMLTYPE(xml)
    FIELDS
    ident constant 2
    ,file_name filler char(100)
    ,xml LOBFILE (file_name) TERMINATED BY EOF
    BEGINDATA
    generated1000x10000.xml
    Sql Loader command
    sqlldr.exe username/password@//localhost:1521/SID control='loader.ctl' log='loadr.log' direct=true
    Schema
    <?xml version="1.0" encoding="UTF-8"?>
    <schema targetNamespace="http://www.something.com/shema/simple_searches" elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://www.something.com/shema/simple_searches">
        <element name="searches" type="tns:searches_type"></element>
        <element name="search" type="tns:search_type"></element>
        <element name="results" type="tns:results_type"></element>
        <element name="result" type="tns:result_type"></element>
        <complexType name="searches_type">
            <sequence>
                <element ref="tns:search" maxOccurs="unbounded"></element>
            </sequence>
        </complexType>
        <complexType name="search_type">
            <sequence>
                <element ref="tns:results"></element>
            </sequence>
            <attribute ref="tns:id" use="required"></attribute>
            <attribute ref="tns:type" use="required"></attribute>
        </complexType>
        <complexType name="results_type">
            <sequence maxOccurs="unbounded">
                <element ref="tns:result"></element>
            </sequence>
        </complexType>
        <complexType name="result_type">
            <attribute ref="tns:id" use="required"></attribute>
        </complexType>
        <simpleType name="type_type">
            <restriction base="string">
                <enumeration value="value1"></enumeration>
                <enumeration value="value2"></enumeration>
            </restriction>
        </simpleType>
        <attribute name="type" type="tns:type_type"></attribute>
        <attribute name="id" type="string"></attribute>
    </schema>
    Create table
    create table coll_xml_objrel
    ident Number(20) primary key,
    xml xmltype)
    Xmltype column xml
    store as object relational
    xmlschema "http://www.something.com/schema/simple_searches.xsd"
    Element "searches";

    Hi Odie_63,
    Thanks for your answer.
    I will post this question in the XML DB forum too (edit: I realized that you have done it. Thanks for it).
    1, Version: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
    2, see above
    3, I have registered my schema with using dbms_xmlschema.registerSchema function.
    Cheers,
    Adam
    XML generator:
    import java.io.FileNotFoundException;
    import java.io.FileOutputStream;
    import javax.xml.stream.XMLOutputFactory;
    import javax.xml.stream.XMLStreamException;
    import javax.xml.stream.XMLStreamWriter;
    public class mainGenerator {
        public static void main(String[] args) throws FileNotFoundException, XMLStreamException {
            // TODO Auto-generated method stub
            final long numberOfSearches = 500;
            final long numberOfResults = 10000;
            XMLOutputFactory xof = XMLOutputFactory.newFactory();
            XMLStreamWriter writer = xof.createXMLStreamWriter(new FileOutputStream("C:\\Working\\generated500x10000.xml"));
            writer.writeStartDocument();
            writer.writeStartElement("tns","searches", "http://www.something.com/schema/simple_searches");
            writer.writeNamespace("tns", "http://www.something.com/schema/simple_searches");
            for (long i = 0; i < numberOfSearches; i++){
                Long help = new Long(i);
                writer.writeStartElement("tns","search", "http://www.something.com/schema/simple_searches);
                writer.writeAttribute("tns", "http://www.something.com/schema/simple_searches", "type", "value1");
                writer.writeAttribute("tns", "http://www.something.com/schema/simple_searches", "id", help.toString());
                writer.writeStartElement("tns","results", "http://www.something.com/schema/simple_searches");
                for (long j = 0; j < numberOfResults; j++){
                    writer.writeStartElement("tns","result", "http://www.something.com/schema/simple_searches");
                    Long helper = new Long(i*numberOfResults+j);
                    writer.writeAttribute("tns", "http://www.something.com/schema/simple_searches", "id", helper.toString());
                    writer.writeEndElement();
                writer.writeEndElement();
                writer.writeEndElement();
            writer.writeEndElement();
            writer.writeEndDocument();
            writer.close();
    registerSchema:
    begin
    dbms_xmlschema.registerSchema(
    'http://www.something.com/schema/simple_searches',
    '<?xml version="1.0" encoding="UTF-8"?>
    <schema targetNamespace="http://www.something.com/schema/simple_searches" elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://www.something.com/schema/simple_searches">
        <element name="searches" type="tns:searches_type"></element>
        <element name="search" type="tns:search_type"></element>
        <element name="results" type="tns:results_type"></element>
        <element name="result" type="tns:result_type"></element>
        <complexType name="searches_type">
            <sequence>
                <element ref="tns:search" maxOccurs="unbounded"></element>
            </sequence>
        </complexType>
        <complexType name="search_type">
            <sequence>
                <element ref="tns:results"></element>
            </sequence>
            <attribute ref="tns:id" use="required"></attribute>
            <attribute ref="tns:type" use="required"></attribute>
        </complexType>
        <complexType name="results_type">
            <sequence maxOccurs="unbounded">
                <element ref="tns:result"></element>
            </sequence>
        </complexType>
        <complexType name="result_type">
            <attribute ref="tns:id" use="required"></attribute>
        </complexType>
        <simpleType name="type_type">
            <restriction base="string">
                <enumeration value="value1"></enumeration>
                <enumeration value="value2"></enumeration>
            </restriction>
        </simpleType>
        <attribute name="type" type="tns:type_type"></attribute>
        <attribute name="id" type="string"></attribute>
    </schema>',
    TRUE, TRUE, FALSE, FALSE);
    end

  • How to read a very BIG XML-File

    Hello together,
    how can I read a XML file of e.g. 2 GByte in ABAP ( Release 4.6C ). The file should be read from the Application-Server ( not from the Front-End ).
    Problem: Too much MEMORY is needed by the upload of the complete file into an internal table. In order to produce a stream, the complete file must be uploaded. The parser works with parse_event ( Event-triggered ) and is not creating a DOM. The parsing itself is no problem.
    Possible solution: Feed the stream with parts of the file. But how !?
    Here is some coding of the program:
      data: l_xml_filename    type localfile,
            l_event_sub       type i,
            l_boolean         type c,
            l_subrc           type i.
    Datei mit Größe in XML-Tabelle einlesen:
    l_xml_filename = 'I:TEMPGeboIsp-PSGK37.xml'.
      perform read_file_in_xml_table using    p_xml_filename
                                     changing g_xml_table
                                              g_xml_size.
    XML-Dokument aus Tabelle bilden:     =========================
    iXML factory erzeugen:
      go_ixml = cl_ixml=>create( ).
    stream factory erzeugen:
      p_streamfactory = go_ixml->create_stream_factory( ).
    XML-Dokument erzeugen:
      p_xml_document = go_ixml->create_document( ).
    Input-Stream erzeugen:
      p_istream = p_streamfactory->create_istream_itable(
                          table = g_xml_table
                          size  = g_xml_size ).
      p_parser = go_ixml->create_parser( stream_factory = p_streamfactory
                                         istream        = p_istream
                                         document       = p_xml_document ).
      l_event_sub = if_ixml_event=>co_event_element_pre +
                    if_ixml_event=>co_event_element_post.
      call method p_parser->set_event_subscription( l_event_sub ).
      l_boolean = p_parser->set_dom_generating( ' ' ).
    The program works fine but needs too much memory because of the big internal table.
    I would be very happy if somebody could help me!
    Thanks in advance!
    Thomas
    P.s.: German answers are welcome!  
    Message was edited by:
            Thomas13 Scheuermann
    Message was edited by:
            Thomas13 Scheuermann

    Hello myself,
    nobody has answered my question, so now I answer myself!!  
    The wrong part is to read the file with "open dataset" and to create the inputstream with
    p_istream = p_streamfactory->create_istream_itable(
    table = g_xml_table
    size = g_xml_size ).
    Better ist to create the inputstream with
    p_istream = p_streamfactory->create_istream_uri(
    .......................PUBLIC_ID = ''
    .......................SYSTEM_ID = '
    applserver\I$\TEMP\Datei.XML' ).
    In this way no space is needed for the file.
    Best regards,
    Thomas
    Message was edited by:
            Thomas13 Scheuermann

  • Loading XML file for parsing

    Folks,
    I have placed one xxx.XML file under /WEB-INF/ of my application server and i have configured a servlet in web.xml file to execute during server startup like below
    <servlet>
    <servlet-name>MonitorInitServlet</servlet-name>
    <servlet-class>com.xxx.xxx..services.MonitorInitServlet</servlet-class>
    <init-param>
    <param-name>configFile</param-name>
    <param-value>/WEB-INF/portalMonitorConfig.xml</param-value>
    </init-param>
    <load-on-startup>1</load-on-startup>
    </servlet>
    And the above servlet loads portalMonitorConfig.xml file and i have below code
    InputStream inputStream = xxx.class.getResourceAsStream("/WEB-INF/portalMonitorConfirg.xml");
    private static Document parseXmlFile(InputStream xml) throws SAXException, IOException,
    ParserConfigurationException
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(false);
    return factory.newDocumentBuilder().parse(xml);
    So when i do the above , i get
    java.lang.IllegalArgumentException :InputStream cannot be null,
    I know its sending a null InputStream, By the way i need to know what is the issue in the above code when i create InputStream object
    InputStream inputStream = xxx.class.getResourceAsStream("/WEB-INF/portalMonitorConfig.xml");
    Any help with the code is really appreciated

    when serializing the DOM, use this: transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, scrReg.dtd");
    Merry Chrasmas too ;-)

  • Cannot read big xml file

    I am working on a swing application in which i have to populate data in a jcombobox (onchange event)
    from an .xml file .but the application gets hanged when the size of xml is big (350-400 records)

    Welcome to the Sun forums.
    >
    I am working on a swing application in which i have to populate data in a jcombobox (onchange event) from an .xml file .but the application gets hanged when the size of xml is big (350-400 records)>Just what are these 'records', great works of literature?!? I cannot believe that any Java based XML API (even a DOM based one) would suffer an OutOfMemoryError on an XML document that was any less than 350-400 thousand records, for any sensibly sized record.
    But as a general tip. Some XML APIs - notably DOM based - will need to create a much larger object in memory, than for example, using SAX to parse XML.
    If even a SAX based API is failing, it is possibly due to a memory leak in the code, rather than the actual size of the XML.
    Edit 1:
    Now that I reread BDLH's reply, I think it more closely addresses the actual problem (frozen GUI, not OOME). Even then, my query on the number of records stands, surely it cannot be less than 500 records that is 'freezing' the GUI?
    Edited by: AndrewThompson64 on Apr 29, 2009 3:33 PM

  • Create Big XML files ( extract ) from Relational Tables

    Experts: I need to create a big XML extract more than 5Gb , from relations tables using SQLX. I read the excellent FAQ given by MDrake in the following thread.
    https://forums.oracle.com/thread/418001
    Question
    1) Is it better to use XML schema, My XML output format is pretty much going to be static, so I can register an XML schema .
    2) Does Registering the XMLschema help with better memory management. I recall I used to get out of memory exception when I generated xml documents on oracle 10g using DBMS_XMLGEN.
    3) Can I generate this 5 Gb of XML file using oracle's default DOM parser?
    Thanks
    Kevin

    Hi Kevin,
    1) Is it better to use XML schema, My XML output format is pretty much going to be static, so I can register an XML schema .
    2) Does Registering the XMLschema help with better memory management. I recall I used to get out of memory exception when I generated xml documents on oracle 10g using DBMS_XMLGEN.
    No, an XML schema won't help for the generation.
    It is useful though if you're looking for the opposite task, i.e. loading an XML file into database tables.
    3) Can I generate this 5 Gb of XML file using oracle's default DOM parser?
    What is the default DOM parser ? Do you mean DBMS_XMLDOM APIs?
    Since you want to generate XML, there's not much to parse.
    Generally, using SQL/XML functions is the way to go.
    You may still hit some performance/memory issues while reaching such a size, especially with large XMLAgg aggregation context.
    If you do, you may switch to chunk generation instead. I've got some pretty good result with this approach and the parallel query feature.

  • Loading big XML files using JDBC gives errors

    Hi,
    I've created a XMLType table using binary storage, with the restriction that any document stored has a (any) schema:
    CREATE TABLE XMLBIN OF XMLTYPE
    XMLTYPE STORE AS BINARY XML
    ALLOW ANYSCHEMA;Then I use JDBC to store a relatively large document using the following code:
    Class.forName("oracle.jdbc.driver.OracleDriver").newInstance();
    String connectionString = "jdbc:oracle:thin:@host:1521:sid";
    File f = new File("c:\\temp\\big.xml");
    Connection conn = DriverManager.getConnection(connectionString, "username", "password");
    XMLType xml = XMLType.createXML(conn,new FileInputStream(f));
    String statementText = "INSERT INTO xmlbin VALUES (?)";
    OracleResultSet resultSet = null;
    OracleCallableStatement statement = (OracleCallableStatement)conn.prepareCall(statementText);
    statement.setObject(1,xml);
    statement.execute();
    statement.close();
    conn.commit();
    conn.close();Loading a file of 61Mb (real Mb, in non-IT Mb (where 1Mb seems to be 10^6) it is 63.9Mb) or less doesn't give any errors, loading a file bigger then that gives the following error:
    java.sql.SQLRecoverableException: Io exception: Software caused connection abort: socket write error
            at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:101)
            at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:112)
            at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:173)
            at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:229)
            at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:458)
            at oracle.jdbc.driver.T4CCallableStatement.executeForRows(T4CCallableStatement.java:960)
            at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1222)
            at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3381)
            at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:3482)
            at oracle.jdbc.driver.OracleCallableStatement.execute(OracleCallableStatement.java:3856)
            at oracle.jdbc.driver.OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1373)
            at jdbctest.Tester.main(Tester.java:60)A succesful insert of a 63Mb file takes about 23 seconds to execute. The 70Mb file fails already after a few seconds, so I'm ruling out any time outs.
    I'm guessing there are some buffers that need to be enlarged, but don't have a clue which ones.
    Anyone any idea what might cause the problem and how to resolve?
    My server runs Oracle 11g Win32. The client is Windows running Sun Java 1.6, using ojdbc6.jar and Oracle 11g Client installed.
    Cheers,
    Harald

    Hi Mark,
    The trace log in the OEM shows me:
    Errors in file d:\oracle11g\app\helium\diag\rdbms\helium\helium\trace\helium_ora_6948.trc  (incident=7510): ORA-07445: exception encountered: core dump [__intel_new_memcpy()+613] [ACCESS_VIOLATION] [ADDR:0x0] [PC:0x6104B045] [UNABLE_TO_WRITE] []  If needed I can post the full contents (if I find out how, am still a novice :-))
    Cheers,
    Harald

  • How to indicate the Schema path for a XML file when parsing?

    I have to validate a XML file. At the header line of this file, I need to specificate only the name of the schema but not the full path:
    <DATAMODULES xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Datamodules.xsd">
    When validating, there is any way to specify to the parser the correct location of the XSD file?
    I know I could copy the XSD file to the XML file directory or vice versa, but I cannot do that and I need a software solution.
    Could someone help me?

    An External Parsed Entity could be used to reference a schema.
    In your DTD:
    <!ENTITY datamodules SYSTEM "file:///c:/Datamodules/Datamodules.xsd">
    Refer the external entity in xml document:
    <datamodules>&datamodules;</datamodules>

  • Junk Characters in the output XML file after Parsing

    Hi
    I am using DOM parser to parse an XML file.
    After parsing the input XML file, i am fetching some contents
    from the same and also putting the same contents in the parsed output file which is also an XML file.
    The problem here is that the after putting the contents to the output XML file from the input XML file,
    some junk character appears at the end of each and every tag in the outputfile.
    The junk character is some thing like this : " ampersand hash thirteen and a semicolon "
    (*THE MESSAGE DID NOT ACCEPT THE SYMBOLS KINDLY TRANSLATE THE ABOVE WORDINGS*
    INTO SYMBOLS)
    This character gets appended at the end of each and every tag in the output file due to which the output file
    is not recognised as an XML file.
    Please let me know as to why is this character appearing and also please suggest some solution
    for the same.
    -Thanks in Advance.
    Edited by: itskarthik on Oct 10, 2008 7:16 AM
    Edited by: itskarthik on Oct 10, 2008 7:18 AM
    Edited by: itskarthik on Oct 10, 2008 7:19 AM
    Edited by: itskarthik on Oct 10, 2008 7:19 AM
    Edited by: itskarthik on Oct 10, 2008 7:23 AM
    Edited by: itskarthik on Oct 10, 2008 7:23 AM

    Wierd.
    Try this piece of code. (You can always change it to use input file and output file instead of string if you want)
    What is your output?
    import java.io.ByteArrayInputStream;
    import java.io.ByteArrayOutputStream;
    import java.io.IOException;
    import java.io.OutputStream;
    import javax.xml.parsers.DocumentBuilderFactory;
    import javax.xml.transform.Result;
    import javax.xml.transform.Source;
    import javax.xml.transform.Transformer;
    import javax.xml.transform.TransformerFactory;
    import javax.xml.transform.dom.DOMSource;
    import javax.xml.transform.stream.StreamResult;
    import org.w3c.dom.Document;
    public class XMLTest {
         public static void main(String[] args) throws Exception {
              String input = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" +
                                  "<test>\r\n" +
                                  "</test>";
              System.out.println("Input:");
              lookForCR(input);
              DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
              Document doc = factory.newDocumentBuilder().parse( new ByteArrayInputStream(input.getBytes()) );
              Source source = new DOMSource(doc);
              ByteArrayOutputStream bArrOut = new ByteArrayOutputStream();
              Result result = new StreamResult( new StripOutputStream( bArrOut ) );
              Transformer xformer = TransformerFactory.newInstance().newTransformer();
              xformer.transform(source, result);     
              System.out.println("\nResult:");
              lookForCR(bArrOut.toString());
              System.out.println("\nDone");
         public static void lookForCR(String input) throws Exception {
              char[] chars =input.toCharArray();
              for ( char chr : chars ) {
                   if (chr == 13 ) {
                        System.out.println("Has 0x0D character!!");
    class StripOutputStream extends OutputStream {
         OutputStream out = null;
         public StripOutputStream(OutputStream out) {
              this.out = out;
         @Override
         public void write(int b) throws IOException {
              if ( b != 13 )
                   out.write(b);
    }- Roy

  • Loading xml file and parsing error in web start

    Hello,
    I load a xml file from jar file, but i have a error at parsing see :
    ClassLoader cl= this.getClass().getClassLoader();
    File file = new File(cl.getResource("paradise/test/maquette/parser/areas.xml").getFile());
    parseur.parse(new InputSource(new FileInputStream(file)), this);
    the file opening but at parseur.parse() i have a path error with :
    http:// . . . . \Paradise_client\paradise.jar!\paradise\test\maquette\parser\areas.xml , bad name of directories .....
    Can you help me ? please :-(

    I need to do a similar thing but in my case I don't know the structure of the xml file. I have 2 questions for this mapping. For an xml file like this:
    <?xml version="1.0"?>
    <catalog>
    <book id="bk101">
    <author>Gambardella, Matthew</author>
    <title>XML Developer's Guide</title>
    <genre>Computer</genre>
    <price>44.95</price>
    <publish_date>2000-10-01</publish_date>
    <publisher>Dummy Publisher Co.</publisher>
    <publisher_address>
    <city>London</city>
    <street>Heart St.</street>
    <no>23/5</no>
    </publisher_address>
    <description>An in-depth look at creating applications
    with XML.
    </description>
    </book>
    </catalog>
    If I'm right, I need to create a database named catalog and a table named book. But the problem comes out here: How can i store publisher_address? In a table or what? Other problem is, is there a difference between storing id attribute of the book and genre element of the book? I think they are just columns of the book table. But if I'm wrong what is the correct solution?

Maybe you are looking for