Store large XML files

Has anybody experience with storing of very large XML files (400 MB) as a XML type in Oracle?
Is this feasible? Is this efficient (response time)? Can you recommend to do this?
Or is it better (with a better performance) to parse the files and store the data in an own db scheme?
Thanks,
-Bernhard

Here is an Oracle ACE with experience on storing large XML into the DB
[HOWTO: Load Really Big XML Files|http://www.liberidu.com/blog/?p=473]
There are more examples on Marco's blog as well of different ways to load XML into the DB.
Yes it is feasible. Efficient depends upon the complexity of the XML (not so much size) and how you are storing it into the DB (XMLType, XMLType associated to a schema, Object Relational, Hybrid, etc).
Mark Drake from the {forum:id=34} forum has seen good performance just using INSERT INTO ... VALUES ... (BFILENAME()) to load the XML as well. Some other suggestions are listed in the FAQ on that forum as well.
Any future questions you have on this topic would probably be best answered by posting in that forum.

Similar Messages

I want to load large raw XML file in firefox and parse by DOM. But, for large XML file the firefox very slow some time crashed . Is there any option to increase DOM handling memory in Firefox

Actually i am using an off-line form to load very large XML file and using firefox to load that form. But, its taking more time to load and some time the browser crashed. through DOM parsing this XML file to my form. Is there any option to increase DOM handler size in firefox

Thank you for your suggestion. I have a question,
though. If I use a relational database and try to
access it for EACH and EVERY click the user makes,
wouldn't that take much time to populate the page with
data?
Isn't XML store more efficient here? Please reply me.You have the choice of reading a small number of records (10 children per element?) from a database, or parsing multiple megabytes. Reading 10 records from a database should take maybe 100 milliseconds (1/10 of a second). I have written a web application that reads several hundred records and returns them with acceptable response time, and I am no expert. To parse an XML file of many megabytes... you have already tried this, so you know how long it takes, right? If you haven't tried it then you should. It's possible to waste a lot of time considering alternatives -- the term is "analysis paralysis". Speculating on how fast something might be doesn't get you very far.

Best technology to navigate through a very large XML file in a web page

Hi!
I have a very large XML file that needs to be displayed in my web page, may be as a tree structure. Visitors should be able to go to any level depth nodes and access the children elements or text element of those nodes.
I thought about using DOM parser with Java but dropped that idea as DOM would be stored in memory and hence its space consuming. Neither SAX works for me as every time there is a click on any of the nodes, my SAX parser parses the whole document for the node and its time consuming.
Could anyone please tell me the best technology and best parser to be used for very large XML files?

Thank you for your suggestion. I have a question,
though. If I use a relational database and try to
access it for EACH and EVERY click the user makes,
wouldn't that take much time to populate the page with
data?
Isn't XML store more efficient here? Please reply me.You have the choice of reading a small number of records (10 children per element?) from a database, or parsing multiple megabytes. Reading 10 records from a database should take maybe 100 milliseconds (1/10 of a second). I have written a web application that reads several hundred records and returns them with acceptable response time, and I am no expert. To parse an XML file of many megabytes... you have already tried this, so you know how long it takes, right? If you haven't tried it then you should. It's possible to waste a lot of time considering alternatives -- the term is "analysis paralysis". Speculating on how fast something might be doesn't get you very far.

How to parse large xml file

I need to parse large xml file which contains following tag. The size of the file is upto 10MB-50MB or more.
<departments>
<department>
<a_depart id="124">
<b_depart id="Bss_253">
<bss_depart id="253">
<attributes>
<name_one>abc</name_one>
</attributes>
</bss_depart id="253">
</b_depart id="Bss_253">
</a_depart id="124">
</department>
<department>
<a_depart id="124">
<b_depart id="Bss_254">
<mss_depart id="253">
          <attributes>
          <name_one>abc</name_one>
          <name_two>xyz</name_one>
          </attributes>
     </mss_depart>
     </b_depart>
</a_depart>
</department>
<department>
<a_depart id="124">
<b_depart id="Bss_254">
<mss_depart id="255">
          <attributes>
          <name_one>abc</name_one>
          <name_two>xyz</name_one>
          </attributes>
     </mss_depart>
     </b_depart>
</a_depart>
</department>
<department>
<a_depart id="125">
<b_depart id="Bss_254">
<mss_depart id="253">
          <attributes>
          <name_one>abc</name_one>
          <name_two>xyz</name_one>
          </attributes>
     </mss_depart>
     </b_depart>
</a_depart>
</department>
I want to get the infomation for that xml file. like mss_depart id=233, building xpath dyanmically for every id and loading
that using dom4j. which is very very slow.
Is there any other solution for that to read the data using sax parser only.
I want to execute the xpath or data for the following way.
//a_depart/@id ------> all the ids of a_depart tags if it returns 3 values say 123,124,125
after that i want to execute
//a_depart[@id='123']/b_depart/@id like this ...to retrive the values of all the levels ...
     I am executing following xpath for every unique ids at all levels.
     List l = doc.selectNodes(xPathForID);
     List l1 = doc.selectNodes(xPathForAttributes+attributes.get(j)+"/text()");
But it is very slow and taking lot of time.
Is there any other way to solve this problem. If any please mail me it is urgent.
I am using jdk1.4 and jdk1.5
Is there any support for sax parser to execute xpath in jdk1.5 direclty, with out using dom4j
Thanks in advance....

I doubt you will find a preexisting solution to your problem.
SAX is usually recommended for processing big files (where "big" is undefined"). It works on big files by avoiding the messy problem of storing the data -- that is left as an exercise to you.
DOM (and its variants) works by building a Document object as the head of the tree of objects for the entire contents. With DOM, you can then use XPath, because there is something to search that is already in memory. To use XPath, you seem to have two choices, build a DOM-ish tree, or if you can find an XPath processor (I'm not sure if one exists) that can process the XML file directly, but it will be slow, since you are looking for "all" occurences of an attribute, and this means you have to read the entire file each time.
It might be worth exploring a hybrid approach -- use SAX to get some information, and build your own objects to store the data. Maybe a HashMap as the main index. But, that will keep you from using XPath, since you do not have the data structures it expects.
A third alternative would be to look at JAXB. It builds Java code from a Schema of your data and then when you import the data, it creates the necessary objects and fills in values. But, I don't think XPath woll work there either.
Dave Patterson

Validating large xml files

In our application we have a requirement to validate large xml file (around 40 mb) against a simple schema file.
I have tried all options(Sax and Dom) mentioned in below article on sun site -
http://java.sun.com/developer/EJTechTips/2005/tt1025.html
but with all of these approches the validation process just hangs for hours.
Even i am running my test case for this validation by allocating 1 GB memory to test case, on latest processor(core-2 duo) but still it just hangs and returns after a long duration.
If needed i can provide sample test case, schema file and xml file.
Please help me to know if its a problem with JAXP 1.3 Schema Validation Framework (SVF) to handle large files, or if there is any other efficient way of validating large xml file.
Regards
Rahul

you may want to split the XML to 10MB each and parse the XML's separately or store in a database and read the XML using XQuery.
I am sure there may be better ways to do this, will see experts opinions here.

Import Large XML File to Table

I have a large (819MB) XML file I'm trying to import into a table in the format:
<ROW_SET>
<ROW>
<column_name>value</column_name>
</ROW>
<ROW>
<column_name>value</column_name>
</ROW>
</ROW_SET>
I've tried importing it with xmlsequence(...).extract(...) and ran into the number of nodes exceed maximum error.
I've tried importing it with XMLTable(... passing XMLTYPE(bfilename('DIR_OBJ','large_819mb_file.xml'), nls_charset_id('UTF8'))) and I gave up after it ran for 15+ hours ( COLLECTION ITERATOR PICKLER FETCH issue ).
I've tried importing it with:
insCtx := DBMS_XMLStore.newContext('schemaname.tablename');
DBMS_XMLStore.clearUpdateColumnList(insCtx);
DBMS_XMLStore.setUpdateColumn(insCtx,'column1name');
DBMS_XMLStore.setUpdateColumn(insCtx,'columnNname');
ROWS := DBMS_XMLStore.insertXML(insCtx, XMLTYPE(bfilename('DIR_OBJ','large_819mb_file.xml'), nls_charset_id('UTF8')));
and ran into ORA-04030: out of process memory when trying to allocate 1032 bytes (qmxlu subheap,qmemNextBuf:alloc).
All I need to do is read the XML file and move the data into a matching table in a reasonable time. Once I have the data in the database, I no longer need the XML file.
What would be the best way to import large XML files?
Oracle Database 11g Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
"CORE     11.2.0.1.0     Production"
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

This (rough) approach should work for you.
CREATE TABLE HOLDS_XML
        (xml_col XMLTYPE)
      XMLTYPE xml_col STORE AS SECUREFILE BINARY XML;
INSERT INTO HOLDS_XML
VALUES (xmltype(bfilename('DIR_OBJ','large_819mb_file.xml'), nls_charset_id('UTF8')))
-- Should be using AL32UTF8 for DB character set with XML
SELECT ...
FROM HOLD_XML HX
       XMLTable(...
          PASSING HX.xml_col ...)How it differs from your approach.
By using the HOLDS_XML table with SECUREFILE BINARY XML storage (which became the default in 11.2.0.2) we are providing a place for Oracle to store a parsed version of the XML. This allows the XML to be stored on disk instead of in memory. Oracle can then access the needed pieces of XML from disk by streaming them instead of holding the whole XML in memory and parsing it repeatedly to find the information needed. That is what COLLECTION ITERATOR PICKLER FETCH means. A lot of memory work. You can search on that term to learn more about it if needed.
The XMTable approach then simply reads this XML from disk and should be able to parse the XML with no issue. You have the option of adding indexes to the XML, but since you are simply reading it all one time and tossing it, there is no advantage to indexes (most likely)

Again: display large XML fil;e?

Dear all,
Any ideas to display large XML file into a JTree? This has been asked before, but still does not have any solution yet.
I am looking for your kind help. Please help show me some codes, if you know.
Many thanks!

Any (close) examples for it? i've not got any - sorry
btw, how large is large? does this mean you can't afford to store the whole XML in memory at the same time, or simply that the gui is unresponsive/resource hungry?
asjf

How to read the data from Excel file and Store in XML file using java

Hi All,
I got a problem with Excel file.
My problem is how to read the data from Excel file and Store in XML file using java excel api.
For getting the data from Excel file what are all the steps i need to follow to get the correct result.
Any body can send me the code (with java code ,Excel sheet) to this mail id : [email protected]
Thanks & Regards,
Sreenu,
[email protected],
india,

If you want someone to do your work, please have the courtesy to provide payment.
http://www.rentacoder.com

Is there a way to import large XML files into HANA efficiently are their any data services provided to do this?

1. Is there a way to import large XML files into HANA efficiently?
2. Will it process it node by node or the entire file at a time?
3. Are there any data services provided to do this?
This for a project use case i also have an requirement to process bulk XML files, suggest me to accomplish this task

Hi Patrick,
I am addressing the similar issue. "Getting data from huge XMLs into Hana."
Using Odata services can we handle huge data (i.e create schema/load into Hana) On-the-fly ?
In my scenario,
I get a folder of different complex XML files which are to be loaded into Hana database.
Then I gotta transform & cleanse the data.
Can I use oData services to transform and cleanse the data ?
If so, how can I create oData services dynamically ?
Any help is highly appreciated.
Thank you.
Regards,
Alekhya

I want to store an xml file into database, and transport it to the XI.

tell me in how many ways we can store a xml file in database.
one i know is.- create a table with field same as that of XML. store XML file data in it
and at the time of transfering data to XI fatch data from table and create an XML file and transfer...
tell me if u know some more

Dear Swethi,
You can move images to SAP using SE78. then u can use them where ever u you require them.
SE78->GRAPHICS->BMAP here give ur image name and click on save
Rgds,
Kiran
Edited by: Kiran on Jun 11, 2009 7:15 AM

How to set SAXParser at command-line interface to create a large XML file

Hi,
I am trying to create a large XML file (more than 50 MB) by selecting from Oracle database but failed because of "out of memory" error. According to "Oracle XML Developer Guide", we should use SAXParser to parsing a large XML file. But there is no example to show how to set SAXParser at command-line
Following is what I use to get xml files. It works only when the file is small.
java OracleXML getXML -DateFormat -withDTD -rowsetTag PO_HDR -conn
"jdbc:oracle:oci8:@server_name" -user "ID/password" "select * from table_name"
When I set SAXParser at the way below,
java oracle.xml.parser.v2.SAXParser OracleXML getXML -DateFormat -withDTD -rowsetTag PO_HDR -conn
"jdbc:oracle:oci8:@server_name" -user "ID/password" "select * from table_name"
it failed with the error message: "In class oracle.xml.parser.v2.SAXParser: void main(String argv[]) is not defined"
Does anyone know how to solve the problem? I'll be appreciated very much for your help.
Yi

here are my ideas.
register the xml schema.
using xmldom, generate the desired xml output and return as xmltype.
then you can use something like this to check.
declare
xmldoc xmltype ;
begin
   -- populate xmldoc from you xmldom function
   -- validate against XML schema
   xmldoc.isSchemaValid(schema_url, root_element);
   if xmldoc.isSchemaValid = 1 then
        --valid schema
   else
        --invalid
   end if;
end

Performance Problem in parsing large XML file (15MB)

Hi,
I'm trying to parse a large XML file(15 MB) and facing a clear performance problem. A Simple XML Validation using the following code snippet:
DBMS_LOB.fileopen(targetFile, DBMS_LOB.file_readonly);
DBMS_LOB.loadClobfromFile
tempCLOB,
targetFile,
DBMS_LOB.getLength(targetFile),
dest_offset,
src_offset,
nls_charset_id(CONSTANT_CHARSET),
lang_context,
conv_warning
DBMS_LOB.fileclose(targetFile);
p_xml_document := XMLType(tempCLOB, p_schema_url, 0, 0);
p_xml_document.schemaValidate();
is taking 30 mins on a HP-UX (4GB ram, 2 CPU) machine (Oracle version : 9.2.0.4).
Please explain what could be going wrong.
Thanks In Advance,
Vineet

Thanks Mark,
I'll open a TAR and also upload the schema and instance XML.
If i'm not changing the track too much :-) one more thing in continuation:
If i skip the Schema Validation step and directly insert the instance document into a Schema linked XMLType table, what does OracleXDB do in such a case?
i'm getting a severe performance hit here too... the same file as above takes almost 40 mins to Insert.
code snippet:
DBMS_LOB.fileopen(targetFile, DBMS_LOB.file_readonly);
DBMS_LOB.loadClobfromFile
tempCLOB,
targetFile,
DBMS_LOB.getLength(targetFile),
dest_offset,
src_offset,
nls_charset_id(CONSTANT_CHARSET),
lang_context,
conv_warning
DBMS_LOB.fileclose(targetFile);
p_xml_document := XMLType(tempCLOB, p_schema_url, 0, 0);
-- p_xml_document.schemaValidate();
insert into INCOMING_XML values(p_xml_document);
Here table INCOMING_XML is :
TABLE of SYS.XMLTYPE(XMLSchema "http://INCOMING_XML.xsd" Element "MatchingResponse") STORAGE Object-
relational TYPE "XDBTYPE_MATCHING_RESPONSE"
This table and type XDBTYPE_MATCHING_RESPONSE were created using the mapping provided in the registered XML Schema.
Thanks,
Vineet

OSB - Iterating over large XML files with content streaming

Hi @ll
I have to iterate over all item in large XML files and insert into a oracle database.
The file is about 200 MB and contains around 500'000, and I am using OSB 10gR3.
The XML structure is something like this:
<allItems>
<item>.....</item>
<item>.....</item>
<item>.....</item>
<item>.....</item>
<item>.....</item>
</allItems>
Actually I thought about using a proxy service with enabled content streaming and a "for each" action for iterating
over all items. But for this the whole XML structure has to be materialized into a variable otherwise it is not possible!
More about streaming large files can be found here:
[http://download.oracle.com/docs/cd/E13159_01/osb/docs10gr3/userguide/context.html#large_messages]
There is written "When you enable streaming for large message processing, you cannot use the ... for each...".
And for accessing single items you should should use an assign action with a xpath like "$body/allItems/item[1]";
this works fine and not the whole XML stream has to be materialized.
So my idea was to use the "for each" action and processing seqeuntially all items with a xpath like:
$body/allItems/item[$counter]
But the "for each" action just allows iterating over a sequence of xml items by defining an selection xpath
and the variable that contains all items. I would like to have a "repeat until" construct that iterates as long
$body/allItems/item[$counter] returns not null. Or can I use the "for each" action differently?
Does the OSB provides any other iterating mechanism? I know there is this spli-join construct that supports
different looping techniques, but as far I know it does not support content streaming, is this correct?
Did I miss somehting?
Thanks a lot for helping!
Cheers
Dani
Edited by: user10095731 on 29.07.2009 06:41

Hi Dani,
Yes, according to me this would be the best approach. You can use content-streaming to pass this large xml to ejb and once it passes successfully EJB should operate on this. If you want any result back (for further routing), you can get it back from EJB.
EJB gives you power of java to process this file and from java perspective 150 MB is not a very LARGE data. Ensure that you are using buffering. Check out this link for an explanation on Java IO Streams and, in particular, buffered streams -
http://java.sun.com/developer/technicalArticles/Streams/ProgIOStreams/
Try dom4J with xpp (XML Pull Parser) parser in case you have parsing requirement. We had worked with 1.2GB file using this technique.
Regards,
Anuj

Query in a large xml file

Hello,
I'm trying to work with very large xml files which are created from csv files. These files may be very large - up to 1 GB ! Untill now I have managed to do several validations on these big xml files, and the only thing that works for me is SAX parser, DOM is out of the question because it fills up memory.
My next task is to do queries on these files, smth like:
select field1,field2 from file.xml
where field3 = 'A'
and (fileld4>'B' or field1='C')
order by field2.
I searched the net about finding out how to make queries on xml files (since I have never done queries on xml before), but I couldn't find which "query language" is best for large files. If I use XPath (XSLT) will that not cause me memory problems because XSLT represents the file as a memory object?
My idea is to parse the file with SAX and check every row if it fits the where condition and then write it immediately to a result xml file. But validating the where statement can be very complicated without using some tool. Also the order by statement is another problematic issue.
Does anyone have some more intelegent ideas about how I can do this? Please help! :(
The xml file looks like this:
<doc>
<row id ="1">
<column id="1" name="column1">value</column>
<column id="N" name="columnN">value</column>
</row>
<row id ="M">
<column id="1" name="column1">value</column>
<column id="N" name="columnN">value</column>
</row>
</doc>

Hi all,
Thank you very much for your replies.
First, saxon didn't work because it uses an in-memory parser, and that is what I was trying to avoid.
Different database is also out of the question, because the customer insist on XML, and also there are some files that can never be converted to a database table, because eventually with some transformations thay are changed and are not completely like the standard csv format.
I think that maybe http://exist.sourceforge.net is the rigth solution for me, but I will probably try it in the next version of my project.
For now I have managed to make the project with only SAXParser and a lot of back - end programming and it works ok, althoug it was very hard to make it, and will be harded to maintain, so I will try to look at the eXist project.
Thanks everyone for the help.

Large XML file Loading

I have a large XML file that I am converting to an
ArrayCollection to use as a dataprovider for a datagrid. It takes
sometime to fully load. Is there any way to load partial list while
the rest of the list is loading?? or does anyone know a way speed
up this process??
Thanks

I'd try to modify the autoComplete component.
You could break this processing up into smaller chunks. For
it to work, you need some outside counter or indexer that keeps
track of where you are. Have the conversion function process say
nodes 0-500, then end. Then using callLater, call that function
again, to process ne next batch of nodes.
This process will allow the UI to update between iteration
batches. If you need more responsiveness, you could try monitoring
mouse move, and stopping the conversion, until the mouse is
inactive again. That is just brainstorming. I have not tried it
(the mouse move part. I know the iterator method works to allow the
UI to update.)
Tracy

Store large XML files

Similar Messages

Maybe you are looking for