Using dbms_xmldom to load large xml
Hi,
I am using dbms_xmldom to load the xml of size 600 MB. Its taking around 7 hours to load the xml.
Please provide the tips or tricks to tune the dbms_xmldom code. Quick help will be appreciated. Thanks !!
55e6744f-71c3-4d9d-a1a9-772c14ab90f9 wrote:
Please help. Its urgent.
No it's not. And it's very rude of you to assume so.
Read: Re: 2. How do I ask a question on the forums?
In answer to your question, I agree with Odie, you don't want to be using XML DOM for this. See Mark Drake's (mdrake) answers on this thread over in the XMLDB forum...
Re: XML file processing into oracle
Similar Messages
-
Loading Large XML files using plsql
I have a process where there is a need to load large xml files (i.e. easily over 500k or more) into Oracle via an interface. Preference would be to use plsql or some plsql based utility if possible. I am looking for any suggestions on the best method to accomplish this. Currently running on 9.2.0.6. Thanks in advance.
I have a process where there is a need to load large xml files (i.e. easily over 500k or more) into Oracle via an interface. Preference would be to use plsql or some plsql based utility if possible. I am looking for any suggestions on the best method to accomplish this. Currently running on 9.2.0.6. Thanks in advance.
-
Bulk Loader Program to load large xml document
I am looking for a bulk loader database program that will load a very large xml document. The simple bulk loader application available on the oracle site will not load this document due to its size which is approximately 20MG. Please advise asap. Thank you.
From the above document:
Storing XML Data Across Tables
Question
Can XML- SQL Utility store XML data across tables?
Answer
Currently XML-SQL Utility (XSU) can only store to a single table. It maps a canonical representation of an XML document into any table/view. But of course there is a way to store XML with the XSU across table. One can do this using XSLT to transform any document into multiple documents and insert them separately. Another way is to define views over multiple tables (object views if needed) and then do the inserts ... into the view. If the view is inherently non-updatable (because of complex joins, ...), then one can use INSTEAD-OF triggers over the views to do the inserts.
-- I've tried this, works fine. -
I'm running 10.2.0.3 on a Linux box and I'm having problems loading a large XML document (about 100 MB). In the past, I would simply load the XML file into a XMLType column like this:
INSERT INTO foo VALUES (XMLType(bfilename('XMLDIR', 'test.xml'). nls_charset_id('AL32UTF8')));
But when I try this with a large file, it runs for 10 minutes and then returns an ORA-03113. I'm assuming the file is just too large for this technique. I spoke to Mark Drake when I was at OpenWorld and he suggested I use Oracle XML DB, so I created and registered a schema and tried using sqlldr to load the doc, but it ran for 2 1/2 hours before returning:
Parse Error on row 1 in table FOO
OCI-31038: Invalid integer value: "129"
I tried simplifying both the XML file and schema to just the following:
<schedules>
<s s="2009-09-21T04:00:00" d="21600" p="335975" c="19672"/>
<s s="2009-09-21T04:00:00" d="21600" p="335975" c="15387"/>
<s s="2009-09-21T04:00:00" d="25200" p="335975" c="5256"/>
<s s="2009-09-21T04:00:00" d="86400" p="335975" c="26198">
<k id="5" v="2009-09-21 09:00:00.000"/>
<k id="6" v="2009-09-22 03:59:59.000"/>
<k id="26" v="0.00"/><k id="27" v="US"/>
</s>
<s s="2009-09-21T04:00:00" d="21600" p="335975" c="11678"/>
<s s="2009-09-21T04:00:00" d="21600" p="335975" c="26697"/>
<s s="2009-09-21T04:00:00" d="21600" p="335975" c="25343"/>
<s s="2009-09-21T04:00:00" d="21600" p="335975" c="25269"/>
<s s="2009-09-21T04:00:00" d="86400" p="335975" c="26200">
<k id="5" v="2009-09-21 09:00:00.000"/>
<k id="6" v="2009-09-22 03:59:59.000"/>
<k id="26" v="0.00"/><k id="27" v="US"/>
</s>
</schedules>
<?xml version="1.0" encoding="UTF-8"?>
<!--W3C Schema generated by XMLSpy v2008 sp1 (http://www.altova.com)-->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xdb="http://xmlns.oracle.com/xdb"
version="1.0"
xdb:storeVarrayAsTable="true">
<xs:element name="schedules" xdb:defaultTable="SCHEDULES">
<xs:complexType xdb:SQLType="SCHEDULES_T">
<xs:sequence>
<xs:element ref="s" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="s">
<xs:complexType>
<xs:choice minOccurs="0">
<xs:element ref="f" maxOccurs="unbounded"/>
<xs:element ref="k" maxOccurs="unbounded"/>
</xs:choice>
<xs:attribute name="s" use="required" type="xs:dateTime"/>
<xs:attribute name="p" use="required" type="xs:int"/>
<xs:attribute name="d" use="required" type="xs:int"/>
<xs:attribute name="c" use="required" type="xs:short"/>
</xs:complexType>
</xs:element>
<xs:element name="k">
<xs:complexType>
<xs:attribute name="v" use="required" type="xs:string"/>
<xs:attribute name="id" use="required" type="xs:byte"/>
</xs:complexType>
</xs:element>
<xs:element name="f">
<xs:complexType>
<xs:attribute name="id" use="required" type="xs:byte"/>
</xs:complexType>
</xs:element>
</xs:schema>
Keep in mind both the actual XML file and corresponding XSD is much more complex, but this particular section is about 70 MB so I wanted to see if I could just load that. I used the following sqlldr script:
LOAD DATA
INFILE *
INTO TABLE schedules_tmp TRUNCATE
XMLType(xml_doc)(
lobfn FILLER CHAR TERMINATED by ',',
xml_doc LOBFILE(lobfn) TERMINATED BY EOF
BEGINDATA
/tmp/schedules.xmlThis worked fine on a small doc - loaded correctly and I could query it fine - but when I tried using the 70 MB file it ran for a couple of hours before dying with a memory problem.
So what am I doing wrong? Is there a better way to load a large file?
Thanks for the help.
Pete
Edited by: mdrake on Nov 9, 2009 8:46 PMMark:
Answers to your questions:
Do you use direct load ? -> Yes. I tried again using UNRECOVERABLE LOAD DATA to try to speed up performance, but it still ran for a couple of hours before dying.
Which DB Release are you working with ? -> 10.2.0.3
Can you see if you can upload via FTP ? -> I added noNamespaceSchemaLocation to my XML file and ftp'd it to my XML directory, but it wasn't recognized. Is there something else I have to do?
The change for unsignedInt should have code rid of the issue with the value 129. Did it ? -> I didn't try it again on the whole XML file (I'm just working with the schedules section), so I haven't verified this.
I'm still stumped as to why sqlldr takes so long. I could write something to parse the XML file into a flat file and then use sqlldr to load it into a relational table, and the load would only take a few minutes. But then I wouldn't be using XML DB which I thought would be faster. What am I doing wrong?
Pete -
Loading Large XML into Oracle Database
Hi,
I am fairly new to XML DB. I have been successful in registering a schema to a table in the Database. Now, I have to load the appropriate XML into that table. I am using the Simple Bulk Loader program found on this oracle site, however, when I load my XML file I get the following error: ORA-21700: object does not exist or is marked for delete.
So, I figured maybe simple bulk loader cannot handle large files? So I reduced my XML file and loaded it with the program and it worked. However, does anyone know how I can load large files into my registered schema table.
Thanks,
Prerna :o)Did you specify genTables true or false when registering the XML Schema ?
Does you XML schema contain a recursive definition
Is it possible that after reducing the size of the document you no longer have nodes that contain recursive structures... -
Hi everyone again :) Just sitting here trying to load a large XML document (it's ~13Mb). I know that's a massive XML document, but that's the way it is. The problem that I am having is that when I try to load the document I get an out of memory exception.
Frankly, I'm not surprised, but is there a remedy? Any thoughts/ideas/solutions would be greatly appreciated. :)
BenSounds like you are using DOM. The DOM parser you are using must be loading the whole Document tree right away. DOM really eats up memory. There are two possible solutions:
1. Look into using a SAX parser. I don't know what you are doing with the xml, so I can't say whether or not that will work for you.
2. Configure the DOM parser to defer loading nodes until they are requested, or if that option is not available with your parser, get a parser that will defer node loading.
If option 2 sounds like what you need, then I suggest looking into the Apache Xerces parser. I am pretty sure it defers loading. You shouldn't have to change your code to work with the Xerces parser, you just have to make sure you set the proper system properties so that Java will automatically use the Xerces parser. -
Unable to use sqlloader to load large data into a varchar2 column..plz.help
Hi,
Currently the column in the database is a varchar2(4000) coulmn that stores data in the form of xml and data that has many carrriage returns.
Current I am trying to archive this column value and later truncate the data in this column.
I was able to create a .csv file but when using the sqlloder ,the utility fails as the data I am about to load has data stored both in the form of "xml" sometimes and sometimes as a list of attributes separated by newline character.
After several failed attempts... I would like to know if it can be achieved using sqlloader or if sqlloader is a bad choice and i should go for an import/export instead?
Thanks in advance...
-KevinCurrently the column in the database is a
varchar2(4000) coulmn that stores data in the form of
xml and data that has many carrriage returns. Can I ask why your XML data has carriage returns in it? The nature of XML dictates that the structure is defined by the tags. Of course you can have CR's in your actual data between those tags, but if someone is hard coding CR's into the XML to make it look pretty this in not how XML was intended to be used.
I was able to create a .csv file but when using the
sqlloder ,the utility fails as the data I am about to
load has data stored both in the form of "xml"
sometimes and sometimes as a list of attributes
separated by newline character.It probably can be (can you provide a sample of data so we can see the structure) but you would probably be best revisiting the code that generates the CSV and ensure that it is output in a simpler format. -
Loading large XMLs to Berkeley XML DB
Hi all ,
I am trying to load a very big XML file to Berkeley XML DB from unix shell. I have created a container of type node storage with node index. It is taking around 15 minutes to load the document. My system configuration is 64GB RAM and 2 CPU each with 4 cores.It is very much require for me to use node storage with indexing for best xquery performanceIs there a way to improvise the loading performance ?Hi,
DB XML is an embeddable database that means that it is supposed to be integrated into your application that in turn takes care about communication with client. That means if you want the server-client architecture you wouldn't have to take care of it yourself. For example, writing a Java servlet that under the hood connects locally to DB XML and does necessary stuff. If I remember correctly, there was one example (in the DB XML examples) that implements a servlet, but I have never look into it.
Vyacheslav -
ORA-04030: out of process memory Loading Large XML File
Experts: I am trying to load a 2.1G XML file into an Object Relational table . The xml schema document xsd is already registered successfully. It fails with the following error :
ORA-04030: out of process memory when trying to allocate 4032 bytes
(qmxtgCreateBuf,kghsseg: kolaslCreateCtx)
ORA-06512: at "SYS.XMLTYPE", line 296
ORA-06512: at line 1I am able to load the document successfully in SECUREFILE BINARY XML table but that would not work as I need to create relational view on top of this table which does not work with SECUREFILE BINARY XML storage.
Please suggest what may be the workaround here?
Thanks
KevinMDrake: I am trying to load like this:
insert into TEST_HUGE_XML
values(
xmltype(
bfilename('XMLDIR', 'huge_xmldoc.xml')
, nls_charset_id('AL32UTF8')
, 'huge_xmldoc.xsd'
db version: 11.2.0.3I saw an example of loading using createresource API here:
http://www.oracle-developer.net/display.php?id=416
SQL> DECLARE
2 v_return BOOLEAN;
3 BEGIN
4 v_return := DBMS_XDB.CREATERESOURCE(
5 abspath => '/public/demo/xml/db_objects.xml',
6 data => BFILENAME('XML_DIR', 'db_objects.xml')
7 );
8 COMMIT;
9 END;
10 /
PL/SQL procedure successfully completed.How do i load the huge xml document in my custom object relational table which was created like this:
CREATE TABLE HUGE_XML OF XMLTYPE
XMLTYPE STORE AS OBJECT RELATIONAL
XMLSCHEMA "huge_xmldoc.xsd"
ELEMENT "root_element"
Thanks
Edited by: Kevin_K on Feb 8, 2013 9:35 AM -
Loading, processing and transforming Large XML Files
Hi all,
I realize this may have been asked before, but searching the history of the forum isn't easy, considering it's not always a safe bet which words to use on the search.
Here's the situation. We're trying to load and manipulate large XML files of up to 100MB in size.
The difference from what we have in our hands to other related issues posted is that the XML isn't big because it has a largly branched tree of data, but rather because it includes large base64-encoded files in the xml itself. The size of the 'clean' xml is relatively small (a few hundred bytes to some kilobytes).
We had to deal with transferring the xml to our application using a webservice, loading the xml to memory in order to read values from it, and now we also need to transform the xml to a different format.
We solved the webservice issue using XFire.
We solved the loading of the xml using JAXB. Nevertheless, we use string manipulations to 'cut' the xml before we load it to memory - otherwise we get OutOfMemory errors. We don't need to load the whole XML to memory, but I really hate this solution because of the 'unorthodox' manipulation of the xml (i.e. the cutting of it).
Now we need to deal with the transofmation of those XMLs, but obviously we can't cut it down this time. We have little experience writing XSL, but no experience on how to use Java to use the XSL files. We're looking for suggestions on how to do it most efficiently.
The biggest problem we encounter is the OutOfMemory errors.
So I ask several questions in one post:
1. Is there a better way to transfer the large files using a webservice?
2. Is there a better way to load and manipulate the large XML files?
3. What's the best way for us to transform those large XMLs?
4. Are we missing something in terms of memory management? Is there a better way to control it? We really are struggling there.
I assume this is an important piece of information: We currently use JDK 1.4.2, and cannot upgrade to 1.5.
Thanks for the help.I think there may be a way to do it.
First, for low RAM needs, nothing beats SAX. as the first processor of the data. With SAX, you control the memory use since SAX only processes one "chunk" of the file at a time. You supply a class with methods named startElement, endElement, and characters. It calls the startElement method when it finds a new element. It calls the characters method when it wants to pass you some or all of the text between the start and end tags. It calls endElement to signal that passing characters is over, and to let you get ready for the next element. So, if your characters method did nothing with the base-64 data, you could see the XML go by with low memory needs.
Since we know in your case that the characters will process large chunks of data, you can expect many calls as SAX calls your code. The only workable solution is to use a StringBuffer to accumulate the data. When the endElement is called, you can decode the base-64 data and keep it somewhere. The most efficient way to do this is to have one StringBuffer for the class handling the SAX calls. Instantiate it with a big enough size to hold the largest of your binary data streams. In the startElement, you can set the length of the StringBuilder to zero and reuse it over and over.
You did not say what you wanted to do with the XML data once you have processed it. SAX is nice from a memory perspective, but it makes you do all the work of storing the data. Unless you build a structured set of classes "on the fly" nothing is kept. There is a way to pass the output of one SAX pass into a DOM processor (without the binary data, in this case) and then you would wind up with a nice tree object with the rest of your data and a group of binary data objects. I've never done the SAX/DOM combo, but it is called a SAXFilter, and you should be able to google an example.
So, the bottom line is that is is very possible to do what you want, but it will take some careful design on your part.
Dave Patterson -
Using sqlldr to load XML files and the file name ? or access the file name
I can use sqlldr to load the XML files when I declare the target table of XMLTYPE, but I need to load the filename and a sequence also with the XML in a table so we can mark wether the file passed validation or not.
something like ....
Create table test1
lobfn varchar(200),
load_number number,
XML_COL XMLTYPE
--------------- here is my sqlldr command
sqlldr xml_user/xml_user load_test1.ctl
LOAD DATA
INFILE *
INTO TABLE test1
append
xmltype(XML_COL)
lobfn FILLER char TERMINATED by ',',
XML_COL LOBFILE(lobfn) TERMINATED BY EOF
BEGINDATA
filename1.xml
filename2.xml
filename64.xml
sqlldr comes back and says "commit point reached - logical record count 64
but when I select count(*) from test1; it returns 0 rows.
and when I
SELECT X.* FROM tst1 P2,
XMLTable ( '//XMLRoot//APPLICATION'
PASSING P2.XML_COL
COLUMNS
"LASTNAME" CHAR(31) PATH 'LASTNAME',
"FIRSTNAME" CHAR(31) PATH 'FIRSTNAME'
) AS X;
It tells me invalid identifier ,
Do I need to use a function to get the XML_COL as a object_value ???
But when I create the table like
create table test1 of XMLTYPE;
and use sqlldr it works, but I dont have the file name, or dont know how to access it ??
and I can use
SELECT X.* FROM tst1 P2,
XMLTable ( '//XMLRoot//APPLICATION'
PASSING P2.object_value
COLUMNS
"LASTNAME" CHAR(31) PATH 'LASTNAME',
"FIRSTNAME" CHAR(31) PATH 'FIRSTNAME'
) AS X;BTW
Here's a trivial example of what you appear to be trying to do..
C:\xdb\otn\sqlLoader>sqlplus scott/tiger @createTable
SQL*Plus: Release 10.2.0.2.0 - Production on Wed Aug 2 22:08:10 2006
Copyright (c) 1982, 2005, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> DROP TABLE TEST_TABLE
2 /
Table dropped.
SQL> CREATE TABLE TEST_TABLE
2 (
3 filename VARCHAR2(32),
4 file_content xmltype
5 )
6 /
Table created.
SQL> quit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
C:\xdb\otn\sqlLoader>type sqlldr.ctl
LOAD DATA
INFILE 'filelist.txt'
INTO TABLE TEST_TABLE
FIELDS TERMINATED BY ','
FILENAME CHAR(32),
FILE_CONTENT LOBFILE(FILENAME) TERMINATED BY EOF
C:\xdb\otn\sqlLoader>type filelist.txt
testcase1.xml
testcase2.xml
C:\xdb\otn\sqlLoader>type testcase1.xml
<foo/>
C:\xdb\otn\sqlLoader>type testcase1.xml
<foo/>
C:\xdb\otn\sqlLoader>sqlldr userid=SCOTT/TIGER control=sqlldr.ctl log=sqlldr.log -direct
SQL*Loader: Release 10.2.0.2.0 - Production on Wed Aug 2 22:08:11 2006
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Load completed - logical record count 2.
C:\xdb\otn\sqlLoader>sqlplus scott/tiger
SQL*Plus: Release 10.2.0.2.0 - Production on Wed Aug 2 22:08:18 2006
Copyright (c) 1982, 2005, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> select * from TEST_TABLE
2 /
FILENAME
FILE_CONTENT
testcase1.xml
<foo/>
testcase2.xml
<baa/>
SQL>
If you are interested in the 11g beta program please contact me directory at
markDOTdrakeAToracleDOTcom -
PostMethod with large XML passed in setRequestEntity is truncated
Hi,
I use PostMethod to transfer large XML to a servlet.
CharArrayWriter chWriter = new CharArrayWriter();
post.marshal(chWriter);
//The chWriter length is 120KB
HttpClient httpClient = new HttpClient();
PostMethod postMethod = new PostMethod(urlStr);
postMethod.setRequestEntity(new StringRequestEntity(chWriter.toString(), null, null));
postMethod.setRequestHeader("Content-type", "text/xml");
int responseCode = httpClient.executeMethod(postMethod);
String responseBody = postMethod.getResponseBodyAsString();
postMethod.releaseConnection();
When I open the request in doPost method in sevlet:
Reader inpReader = request.getReader();
char[] chars = MiscUtils.ReaderToChars(inpReader);
inpReader = new CharArrayReader(chars);
//The data is truncated (in chars[]) ???
static public char[] ReaderToChars(Reader r) throws IOException, InterruptedException {
//IlientConf.logger.debug("Start reading from stream");
BufferedReader br = new BufferedReader(r);
StringBuffer fileData = new StringBuffer("");
char[] buf = new char[1024];
int numRead=0;
while((numRead=br.read(buf)) != -1){
String readData = String.valueOf(buf, 0, numRead);
fileData.append(readData);
buf = new char[1024];
return fileData.toString().toCharArray();
Any idies what can be the problem??
Lior.Hi,
I use the same code and have 2 tomcats running with the same servlet on each of them.
One running on Apache/2.0.52 (CentOS) Server and the second running on Apache/2.0.52 (windows XP) Server.
I managed to post large XML from (CentOS) Server to (windows XP) Server successfully and failed to post large XML
from (windows XP) Server to (CentOS) Server.
I saw somthing called mod_isapi that might be disabling posting large XML files.
Can anyone help me on going over that limitation?
Thanks,
Lior. -
Actually i am using an off-line form to load very large XML file and using firefox to load that form. But, its taking more time to load and some time the browser crashed. through DOM parsing this XML file to my form. Is there any option to increase DOM handler size in firefox
Thank you for your suggestion. I have a question,
though. If I use a relational database and try to
access it for EACH and EVERY click the user makes,
wouldn't that take much time to populate the page with
data?
Isn't XML store more efficient here? Please reply me.You have the choice of reading a small number of records (10 children per element?) from a database, or parsing multiple megabytes. Reading 10 records from a database should take maybe 100 milliseconds (1/10 of a second). I have written a web application that reads several hundred records and returns them with acceptable response time, and I am no expert. To parse an XML file of many megabytes... you have already tried this, so you know how long it takes, right? If you haven't tried it then you should. It's possible to waste a lot of time considering alternatives -- the term is "analysis paralysis". Speculating on how fast something might be doesn't get you very far. -
I have a large XML file that I am converting to an
ArrayCollection to use as a dataprovider for a datagrid. It takes
sometime to fully load. Is there any way to load partial list while
the rest of the list is loading?? or does anyone know a way speed
up this process??
ThanksI'd try to modify the autoComplete component.
You could break this processing up into smaller chunks. For
it to work, you need some outside counter or indexer that keeps
track of where you are. Have the conversion function process say
nodes 0-500, then end. Then using callLater, call that function
again, to process ne next batch of nodes.
This process will allow the UI to update between iteration
batches. If you need more responsiveness, you could try monitoring
mouse move, and stopping the conversion, until the mouse is
inactive again. That is just brainstorming. I have not tried it
(the mouse move part. I know the iterator method works to allow the
UI to update.)
Tracy -
Processing large xml file (500mb)? break into small part? load into jtree
hi,
i'm doing an assignment to processing large xml file (500mb) and
load into jree using JAVA.
can someone advice me on the algorithm to do this?
how can i load a 500mb xml in a jtree without system hang?
how to i break my file and do the loading?1 Is the file schema based binary XML.
2. The limits are dependant on storage model and chacater set.
3. For all NON-XML content the current limit is 4GBytes (Where that is bytes not characters). So for Character content in an AL32UTF8 database the limit is 2GB
4. For XML Content stored as CLOB the limit is the same as for character data (2GB/4GB) dependant on database character set.
5. For SB Based XML content stored in Object Relatioanl storage the limit is determined by the complexity and structures defiend in the XML Schema
Maybe you are looking for
-
Ok my ipod touch syncs but not properly,I see a little icon on both itunes and the ipod rolling,itunes shows me that I have less memory available on my ipod after I sync which coresponds well in terms of memory with the ammount of songs I am loading
-
How to restore white macbook pro
I have the older white macbook pro. Cd drive is dead, so using the external DVD reader. I have the original disks it came with. I am trying everything to get to that utility screen where I can restore the Laptop to factory. It does not get past the
-
How to get Camera Raw 6.7 to work in Bridge CS5?
I've read past posts about the ways to fix this problem, but they don't work for me. I'm using a pc with windows 8.1. I've downloaded and installed Camera Raw 6.7 update and when I select "Help, Update" in Bridge, it says I have the latest updates
-
How to build a Dashboard with one maps
Hi guys. I have one question for you. I want to buil a dashboard in which i want to insert map of Italy divided into regions. My idea is that: when i create a dashboard for the pc sales in the year 2008, I want to see an aggregate area on the interes
-
Hello, i am using a 8700g currently which is running off of v4.1.0.346 (platform 2.0.0.143), with T-Mobile. I purchased a new Bold online with AT&T due to bad reception at my home and i really liked the Bold. My question is this: Which version of des