Performance Problem in parsing large XML file (15MB)

Hi,
I'm trying to parse a large XML file(15 MB) and facing a clear performance problem. A Simple XML Validation using the following code snippet:
DBMS_LOB.fileopen(targetFile, DBMS_LOB.file_readonly);
DBMS_LOB.loadClobfromFile
tempCLOB,
targetFile,
DBMS_LOB.getLength(targetFile),
dest_offset,
src_offset,
nls_charset_id(CONSTANT_CHARSET),
lang_context,
conv_warning
DBMS_LOB.fileclose(targetFile);
p_xml_document := XMLType(tempCLOB, p_schema_url, 0, 0);
p_xml_document.schemaValidate();
is taking 30 mins on a HP-UX (4GB ram, 2 CPU) machine (Oracle version : 9.2.0.4).
Please explain what could be going wrong.
Thanks In Advance,
Vineet

Thanks Mark,
I'll open a TAR and also upload the schema and instance XML.
If i'm not changing the track too much :-) one more thing in continuation:
If i skip the Schema Validation step and directly insert the instance document into a Schema linked XMLType table, what does OracleXDB do in such a case?
i'm getting a severe performance hit here too... the same file as above takes almost 40 mins to Insert.
code snippet:
DBMS_LOB.fileopen(targetFile, DBMS_LOB.file_readonly);
DBMS_LOB.loadClobfromFile
tempCLOB,
targetFile,
DBMS_LOB.getLength(targetFile),
dest_offset,
src_offset,
nls_charset_id(CONSTANT_CHARSET),
lang_context,
conv_warning
DBMS_LOB.fileclose(targetFile);
p_xml_document := XMLType(tempCLOB, p_schema_url, 0, 0);
-- p_xml_document.schemaValidate();
insert into INCOMING_XML values(p_xml_document);
Here table INCOMING_XML is :
TABLE of SYS.XMLTYPE(XMLSchema "http://INCOMING_XML.xsd" Element "MatchingResponse") STORAGE Object-
relational TYPE "XDBTYPE_MATCHING_RESPONSE"
This table and type XDBTYPE_MATCHING_RESPONSE were created using the mapping provided in the registered XML Schema.
Thanks,
Vineet

Similar Messages

Problem with parsing large XML files chunked over HTTP

I'm trying to isolate a bug that was introduced when upgrading the JRE in use from Java 7u51 to 7u71 without changing any code. The problem appears to be very similar to: Bug ID: JDK-8027359 XML parser returns incorrect parsing results.
Further investigation showed that it was also introduced in the same versions (7u71) where that fix was applied. Unlike that bug though, my XML is marked as version 1.0. It also appears to be with only large XML files, on the order of 10MB or so.
The closest I've been able to narrow it down to is the code is using JAXB to unmarshall a stream that the debugger tells me is a org.apache.http.com.EofSensorInputStream / org.apache.http.impl.io.ChunkedInputStream. The exception I get is not consistent, but typically appears to be from chunks being overwritten or shuffled, resulting in letters appearing in attributes that are actually numbers, or like the following where an attribute "testAttribute" gets partially overwritten by the end of a timestamp that was in a different section of the XML.
javax.xml.bind.UnmarshalException
- with linked exception:
[javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,98748]
Message: Attribute name "testAttribu00Z" associated with an element type "testElement" must be followed by the ' = ' character.]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.handleStreamException(UnmarshallerImpl.java:421)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:357)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:334)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,98748]
Message: Attribute name "testAttribu00Z" associated with an element type "testElement" must be followed by the ' = ' character.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:181)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:355)
... 6 more
Here's some code that seems to reproduce it if you can connect to an XML server that returns a large chunked XML file:
SchemeRegistry registry = new SchemeRegistry();
registry.register(
new Scheme("http", 80, PlainSocketFactory.getSocketFactory()));
HttpClient client = new DefaultHttpClient(new BasicClientConnectionManager(registry));
String url = "http://someUrlReturningAlargeChunkedXML";
HttpGet method = new HttpGet(url);
HttpResponse response = client.execute(method);
InputStream inputStream = response.getEntity().getContent();
XMLStreamReader responseReader = factory.createXMLStreamReader(inputStream);
JAXBElement<JaxBObjectOfResponse> wot = unmarshaller.unmarshal(responseReader, JaxBObjectOfResponse.class);
If you connect using URL.openStream() to the same service there is no error. If I read bytes directly and write to a file, there is no error. The error only happens when I try to unmarshal it, and it's large, and I'm using Java 7u71 (or later). It can be consistently repeated with the jsp webapp that I'm using, but didn't show the error when I used the same code with a Wikipedia dump XML file.
How can I unmarshal in a different way to avoid this problem? Or, how can I better isolate the bug so it can be posted to the appropriate bug system?

Apparently, adding the Woodstox XML libraries avoids the bug. Is there anyone who can reproduce this on another system? Was there any changes to the Stax implementation between u67 and u71 that may have introduced a bug like this?
Edit: When setting the logging level to DEBUG, I once saw the overwritten buffer being logged as if that was what was received (as in the testAttribu00Z example above). I can't repeat that anymore though, and very rarely it does parses with no exception (though it may have still been corrupted). Now the error seems to be consistently on one of the buffer boundaries, as in:
17:08:09,705 DEBUG wire:63 - << "2000[\r][\n]"
17:08:09,705 DEBUG wire:77 - << "trend>....OTHER XML...<trend hours=""
17:08:09,705 DEBUG wire:77 - << "634.0972777777778" datetime="2013-05-21T00:43:48.350Z" t"
17:08:09,705 DEBUG wire:63 - << "[\r][\n]"
17:08:09,705 DEBUG wire:63 - << "2000[\r][\n]"
17:08:09,705 DEBUG wire:77 - << "rend-mode="0">
Exception in thread "main" java.lang.NumberFormatException: t34.0972777777778
at com.sun.xml.internal.bind.DatatypeConverterImpl._parseDouble(DatatypeConverterImpl.java:213)
at mypackage.Trend_JaxbXducedAccessor_hours.parse(TransducedAccessor_field_Double.java:48)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StructureLoader.startElement(StructureLoader.java:194)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:486)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:465)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:60)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.handleStartElement(StAXStreamConnector.java:231)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:165)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:355)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:334)
Or:
17:19:12,563 DEBUG wire:63 - << "2000[\r][\n]"
17:19:12,563 DEBUG wire:77 - << ...OTHER XML...<trend index="5"
17:19:12,563 DEBUG wire:77 - << "" label="N"
17:19:12,563 DEBUG wire:63 - << "[\r][\n]"
Exception in thread "main" java.lang.NumberFormatException: Not a number: N
at com.sun.xml.internal.bind.DatatypeConverterImpl._parseInt(DatatypeConverterImpl.java:106)
at com.sun.xml.internal.bind.DatatypeConverterImpl._parseShort(DatatypeConverterImpl.java:118)

Problem with parsing large xml files

Hello All,
I am parsing a large xml file of 20MB and I use DocumentBuilder.parse(File). This method works for small xml files with size less than 20MB but the application hangs and doesn't through any error message when parsing 20MB xml files. Please let me know what I have to do at this point ?
Thanks & Regards,
Kumar.

Well... i can't agree.
If you have such structure:
<task>
<task/>
<task>
     <task>
        <task/>
     </task>
     <task/>
</task>
</task>
...you may always keep stack of tasks (at startElement push to top, and at endElement pop), so at every leaf of tree you will have all parents of that leaf.
for such structure:
<task id="1" parent="0"/>
<task id="2" parent="1"/>
<task id="3" parent="1"/>
<task id="4" parent="2"/>
<task id="5" parent="3"/>
...it will be much faster to go thro document by sax several times to build tree of tasks, than to load all document into memory...

How to parse large xml file

I need to parse large xml file which contains following tag. The size of the file is upto 10MB-50MB or more.
<departments>
<department>
<a_depart id="124">
<b_depart id="Bss_253">
<bss_depart id="253">
<attributes>
<name_one>abc</name_one>
</attributes>
</bss_depart id="253">
</b_depart id="Bss_253">
</a_depart id="124">
</department>
<department>
<a_depart id="124">
<b_depart id="Bss_254">
<mss_depart id="253">
          <attributes>
          <name_one>abc</name_one>
          <name_two>xyz</name_one>
          </attributes>
     </mss_depart>
     </b_depart>
</a_depart>
</department>
<department>
<a_depart id="124">
<b_depart id="Bss_254">
<mss_depart id="255">
          <attributes>
          <name_one>abc</name_one>
          <name_two>xyz</name_one>
          </attributes>
     </mss_depart>
     </b_depart>
</a_depart>
</department>
<department>
<a_depart id="125">
<b_depart id="Bss_254">
<mss_depart id="253">
          <attributes>
          <name_one>abc</name_one>
          <name_two>xyz</name_one>
          </attributes>
     </mss_depart>
     </b_depart>
</a_depart>
</department>
I want to get the infomation for that xml file. like mss_depart id=233, building xpath dyanmically for every id and loading
that using dom4j. which is very very slow.
Is there any other solution for that to read the data using sax parser only.
I want to execute the xpath or data for the following way.
//a_depart/@id ------> all the ids of a_depart tags if it returns 3 values say 123,124,125
after that i want to execute
//a_depart[@id='123']/b_depart/@id like this ...to retrive the values of all the levels ...
     I am executing following xpath for every unique ids at all levels.
     List l = doc.selectNodes(xPathForID);
     List l1 = doc.selectNodes(xPathForAttributes+attributes.get(j)+"/text()");
But it is very slow and taking lot of time.
Is there any other way to solve this problem. If any please mail me it is urgent.
I am using jdk1.4 and jdk1.5
Is there any support for sax parser to execute xpath in jdk1.5 direclty, with out using dom4j
Thanks in advance....

I doubt you will find a preexisting solution to your problem.
SAX is usually recommended for processing big files (where "big" is undefined"). It works on big files by avoiding the messy problem of storing the data -- that is left as an exercise to you.
DOM (and its variants) works by building a Document object as the head of the tree of objects for the entire contents. With DOM, you can then use XPath, because there is something to search that is already in memory. To use XPath, you seem to have two choices, build a DOM-ish tree, or if you can find an XPath processor (I'm not sure if one exists) that can process the XML file directly, but it will be slow, since you are looking for "all" occurences of an attribute, and this means you have to read the entire file each time.
It might be worth exploring a hybrid approach -- use SAX to get some information, and build your own objects to store the data. Maybe a HashMap as the main index. But, that will keep you from using XPath, since you do not have the data structures it expects.
A third alternative would be to look at JAXB. It builds Java code from a Schema of your data and then when you import the data, it creates the necessary objects and fills in values. But, I don't think XPath woll work there either.
Dave Patterson

Can someone help me with a problem of parsing an XML file?

Hello,
I'm having some problems parsing an xml file. I get a SAXNotSupportedException when setting a property value.
Here is the piece of code where I have the problem:
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
SAXParser saxParser = spf.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
DefaultHandler defHandler = new DefaultHandler();
xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler", defHandler);
and the log is:
Problem with the parser org.xml.sax.SAXNotSupportedException: PAR012 For propertyID "http://xml.org/sax/properties/lexical-handler", the value "org.xml.sax.helpers.DefaultHandler@4ff4f74a" cannot be cast to LexicalHandler.
http://xml.org/sax/properties/lexical-handler org.xml.sax.helpers.DefaultHandler@4ff4f74a LexicalHandler
I've been working on this problem but I can't find the error.
Does anyone have an idea of what to do to solve it?
Thanx in advance,
M@G

before deciding which XML technology to use, you should see if your application fit in the category below:
use SAX:
1. The XML file is rather large (30 or 40+ MB)
2. I don't need the xml document in memory. I will parse the document and store the data in my own object.
use DOM or JDOM
1. The XML file is relatively small (less than 30 MB) or I can increase the runtime memory for larger xml file.
2. I will need to walk up and down the xml document tree severals time.
3. My application is in Java and it's not going to be rewritten in C++, etc (use JDOM)
NOTE:
JDOM is rather easier to use (for Java developer), but it's not an www.org.com standardlized xml parser.
personally, i like JDOM for traversing the DOM.

Parsing Large XML File

Is there a restriction on the XML file size that can be loaded into the parser?
I am getting a out of memory exception reading in large XML file(10MB) using the commands
DOMParser parser = new DOMParser();
URL url = createURL(argv[0]);
parser.setErrorStream(System.err);
parser.setValidationMode(true);
parser.showWarnings(true);
parser.parse(url);
Win NT 4.0 Server
Sun JDK 1.2.2
===================================
Error output
===================================
Exception in thread "main" java.lang.OutOfMemoryError
at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
led Code)
at java.util.Hashtable.<init>(Unknown Source)
at oracle.xml.parser.v2.DTDDecl.<init>(DTDDecl.java, Compiled Code)
at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
led Code)
at oracle.xml.parser.v2.ValidatingParser.checkDefaultAttributes(Validati
ngParser.java, Compiled Code)
at oracle.xml.parser.v2.NonValidatingParser.parseAttributes(NonValidatin
gParser.java, Compiled Code)
at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingPa
rser.java, Compiled Code)
at oracle.xml.parser.v2.ValidatingParser.parseRootElement(ValidatingPars
er.java:97)
at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingP
arser.java:199)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:146)
at TestLF.main(TestLF.java:40)
null

hi
i think you can use STAX a java lib.
since i m not having much more knowledge about this.
you can find this on Internaet.
Thanks

Parsing large xml file and display using swing

Hi all,
I want to read a large xml file and display graphically in swing as a tree structure.
I implemented it and works fine for files of 5MB size after increasing the jvm heap size (-Xmx). If the file size is larger than 5MB it throws out of memory error. I'm creating a custom datastructure from the xml and I'm using sax parsing.
After displaying the datastructure, the user could do some operation on this, like search etc.
Can any of you suggest a method, to support larger files ? What I'm looking for is create the datastructure in file system, rather than in memory.
Any other tips for memory management would be greatly appreciated
Thanks in Advance.
Nisha

Use a memory-mapped file?
http://javaalmanac.com/egs/java.nio/CreateMemMap.html

Help : Parsing large XML files

Hi,
someone please help, I am trying to parse XML files of about 60 MB, I have to parse throught 120 of them , search for a particular node and print it. I am using jdk1.3.x , using jdom
On the sample filesd that r available of 114KB i am able to run my code and get the result, but as soon as the large files are used I get the following error
OutOfMemoryError
<<nostacktrace>>
Exception in thread main
thanks

I guess you are using a DOM parser which builds a complete tree of the document. For what you are trying to do this is probably not necessary so a SAX parser may be better. If JDOM doesn't have one try using Xerces from Apache.

Does the parser work with large XML files?

Is there a restriction on the XML file size that can be loaded into the parser?
I am getting a out of memory exception reading in large XML file(10MB) using the commands
DOMParser parser = new DOMParser();
URL url = createURL(argv[0]);
parser.setErrorStream(System.err);
parser.setValidationMode(true);
parser.showWarnings(true);
parser.parse(url);
Win NT 4.0 Server
Sun JDK 1.2.2
===================================
Error output
===================================
Exception in thread "main" java.lang.OutOfMemoryError
at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
led Code)
at java.util.Hashtable.<init>(Unknown Source)
at oracle.xml.parser.v2.DTDDecl.<init>(DTDDecl.java, Compiled Code)
at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
led Code)
at oracle.xml.parser.v2.ValidatingParser.checkDefaultAttributes(Validati
ngParser.java, Compiled Code)
at oracle.xml.parser.v2.NonValidatingParser.parseAttributes(NonValidatin
gParser.java, Compiled Code)
at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingPa
rser.java, Compiled Code)
at oracle.xml.parser.v2.ValidatingParser.parseRootElement(ValidatingPars
er.java:97)
at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingP
arser.java:199)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:146)
at TestLF.main(TestLF.java:40)
null

We have a number of test files that are that size and it works without a problem. However using the DOMParser does require significantly more memory than your doc size.
What is the memory configuration of the JVM that you are running with? Have you tried increasing it? Are you using our latest version 2.0.2.6?
Oracle XML Team

Problems with Large XML files

I have tried increasing the memory pool using the -mx and -ms options. It doesnt work. I am using your latest XML parser for Java v2. Please let me know if there are some specific options I should be using.
Thanx,
-Sameer
We have a number of test files that are that size and it works without a problem. However using the DOMParser does require significantly more memory than your doc size.
What is the memory configuration of the JVM that you are running with? Have you tried increasing it? Are you using our latest version 2.0.2.6?
Oracle XML Team
Is there a restriction on the XML file size that can be loaded into the parser?
I am getting a out of memory exception reading in large XML file(10MB) using the commands
DOMParser parser = new DOMParser();
URL url = createURL(argv[0]);
parser.setErrorStream(System.err);
parser.setValidationMode(true);
parser.showWarnings(true);
parser.parse(url);
Win NT 4.0 Server
Sun JDK 1.2.2
===================================
Error output
===================================
Exception in thread "main" java.lang.OutOfMemoryError
at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
led Code)
at java.util.Hashtable.<init>(Unknown Source)
at oracle.xml.parser.v2.DTDDecl.<init>(DTDDecl.java, Compiled Code)
at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
led Code)
at oracle.xml.parser.v2.ValidatingParser.checkDefaultAttributes(Validati
ngParser.java, Compiled Code)
at oracle.xml.parser.v2.NonValidatingParser.parseAttributes(NonValidatin
gParser.java, Compiled Code)
at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingPa
rser.java, Compiled Code)
at oracle.xml.parser.v2.ValidatingParser.parseRootElement(ValidatingPars
er.java:97)
at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingP
arser.java:199)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:146)
at TestLF.main(TestLF.java:40)
null

You might try using a different JDK/JRE - either a 1.1.6+ or 1.3 version as 1.2 in our experience has the largest footprint. If this doesn't work can you give us some details about your system configuration. Finally you might try the SAX interface as it does not need to load the entire DOM tree into memory.
Oracle XML Team

I want to load large raw XML file in firefox and parse by DOM. But, for large XML file the firefox very slow some time crashed . Is there any option to increase DOM handling memory in Firefox

Actually i am using an off-line form to load very large XML file and using firefox to load that form. But, its taking more time to load and some time the browser crashed. through DOM parsing this XML file to my form. Is there any option to increase DOM handler size in firefox

Thank you for your suggestion. I have a question,
though. If I use a relational database and try to
access it for EACH and EVERY click the user makes,
wouldn't that take much time to populate the page with
data?
Isn't XML store more efficient here? Please reply me.You have the choice of reading a small number of records (10 children per element?) from a database, or parsing multiple megabytes. Reading 10 records from a database should take maybe 100 milliseconds (1/10 of a second). I have written a web application that reads several hundred records and returns them with acceptable response time, and I am no expert. To parse an XML file of many megabytes... you have already tried this, so you know how long it takes, right? If you haven't tried it then you should. It's possible to waste a lot of time considering alternatives -- the term is "analysis paralysis". Speculating on how fast something might be doesn't get you very far.

Parsing a XML file using Jdom-Problem.

Hi all,
I am reposting it as I was told to format the code and send it again.
I am trying to parse a xml file using a jdom java code.This code works fine if I remove xmlns attribute in the root element. (I get the expected result) .If the "xmlns" attribute is put in the xml as it should be then the parsing and retrieving returns null. Please tell me how to fix this issue.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Xml
<process name="CreateKBEntryService" targetNamespace="http://serena.com/CreateKBEntryService" suppressJoinFailure="yes" xmlns:tns="http://serena.com/CreateKBEntryService" xmlns="http://schemas.xmlsoap.org/ws/2003/03/business-process/" xmlns:bpelx="http://schemas.oracle.com/bpel/extension" xmlns:ora="http://schemas.oracle.com/xpath/extension" xmlns:nsxml0="http://localhost:8080/axis/services/CreateKBEntryService" xmlns:nsxml1="http://DefaultNamespace" xmlns:bpws="http://schemas.xmlsoap.org/ws/2003/03/business-process/">
<partnerLinks>
<partnerLink name="client" partnerLinkType="tns:CreateKBEntryService" myRole="CreateKBEntryServiceProvider"/>
<partnerLink name="CreateKBEntryPartnerLink" partnerLinkType="nsxml0:CreateKBEntryLink" partnerRole="CreateKBEntryProvider"/>
</partnerLinks>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
Java:
import java.io.*;
import java.util.*;
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.input.SAXBuilder;
public class sample1 {
public static void main(String[] args) throws Exception {
// create a XML parser and read the XML file
SAXBuilder oBuilder = new SAXBuilder();
Document oDoc = oBuilder.build(new File("**xml file location**"));
Element root = oDoc.getRootElement();
System.out.println(root.getName());
String tgtns= root.getAttributeValue("targetNamespace");
System.out.println("tgt ns "+ tgtns);
List list= root.getChildren("partnerLinks");
Iterator it1= list.iterator();
System.out.println("Iterator 1 - "+list.size());
while(it1.hasNext()){
Element partnerlinks = (Element)it1.next();
List list2= partnerlinks.getChildren("partnerLink");
System.out.println("iterator 2 - "+list2.size());
}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Result:
Without Xmlns in xml file(Expected and correct output)
process
tgt ns http://serena.com/CreateKBEntryService
Iterator 1 - 1//expected and correct result that comes when I remove xmlns attribute from xml
iterator 2 - 2
Result with xmlns:
process
tgt ns http://serena.com/CreateKBEntryService
Iterator 1 - 0 //instead of 0 should return 1

LOL
This is what you get for working 12 hours straight....
I changed:
xmlObject["mydoc"]["modelglue"]["event-handlers"]["event-handler"][i].xmlAttrib utes["name"]<br>
to:
#mydoc["modelglue"]["event-handlers"]["event-handler"][i].xmlAttrib utes["name"]#<br>
xmlObject is the name of my xml object in memory, and then you reference from the root of the xml doc down the chain.
Sorry for the inconvenience,
Rich

Parsing an XML file to a table

Hi,
I am looking at a scenario where right now we have a daily data feed that is an xml file that is approx between 1 to 2 GB in size. Using Java code, this file is parsed and one by one, the records are inserted into the DB (Oracle 10.2). But as the data feed is growing larger, so is the file size -- which is resulting in more time taken for this process to run.
I do realize that the ideal scenario would be to stop using the file as a source of feed. But this is beyond our control.
I looked at using sqlLoader but as I understand it requires a csv based file. In my case, I would have to convert the xml file to csv, which again might be a huge operation to be carried out daily. (Edit: Am I wrong? can we use Sql Loader here?)
Thus I am exploring options if there could be a way to write a PL/SQL procedure that would parse after taking this xml file as an input and load the data into a table. Any suggestions/directions will be hugely appreciated.
Thanks,
Ak
Edit:
I also browsed through a few questions asked previously here and I came across this: Import Large XML File to Table
My situation is similar and as per the suggestion here by A_Non, SECUREFILE BINARY XML storage can be used. But the problem I have is that I am on Oracle 10.2 while this option is only available in a later version, if I am right.
Edited by: Aditya Kumar on Nov 21, 2012 10:49 PM
Edited by: Aditya Kumar on Nov 22, 2012 12:20 AM

Thanks a lot for your reply Odie.
Exact Oracle version is Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
A couple of doubts here:
1. The file size would range 1-2 GB (and can go beyond) Is SQLLoader also an option feasible in this scenario?
2. The link you mentioned has this line: Storing the XML document in Oracle Database using Binary File (BFILE) -- How is this different from Binary XML in 11g? I am confused.
3. With respect to storage space usage, should we be concerned about the storage space/ memory that the operation of loading the file in this (temporary) table will take? Because right now, the xml feed file is residing in the webserver where java code iterates through it and inserts each record in the table. Now this will change and technically this file will have to go in the DB box.
We do not have the xsd for it as right now we retrieve a few selected tags from this xml, parse and store it in a table (columns mapped to tags). Below is the sample xml. This is one such record and there are thousands below it. I had to change things because of data confidentiality but it mimics the original.
<?xml version="1.0" encoding="windows-1250"?>
<Involvements xmlns="com/xyz/us/abc/v0/ijkdatatypes" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ijk:Involvement xmlns:ijk="com/xyz/us/abc/v0/ijkdatatypes">
<ijk:PCenterData>
<ijk:HPCenterData>
<ijk:HPCenter/>
<ijk:HPCName/>
</ijk:HPCenterData>
</ijk:PCenterData>
<ijk:BillableWBSCode>00000000</ijk:BillableWBSCode>
<ijk:EDescription/>
<ijk:InvolvementType>0</ijk:InvolvementType>
<ijk:ProductData>
<ijk:ICode/>
<ijk:Product/>
<ijk:ProductName/>
</ijk:ProductData>
<ijk:PartnerData>
<ijk:BPartnerData>
<ijk:BPartnerNumber/>
</ijk:BPartnerData>
<ijk:PPartnerData>
<ijk:PPartnerNumber/>
</ijk:PPartnerData>
<ijk:LeadPartnerData>
<ijk:LeadCSPNumber/>
</ijk:LeadPartnerData>
</ijk:PartnerData>
<ijk:ManagerData>
<ijk:BManagerInformation>
<ijk:BManagerNumber/>
</ijk:BManagerInformation>
<ijk:PerformanceManagerInformation>
<ijk:PerformanceManagerNumber/>
</ijk:PerformanceManagerInformation>
</ijk:ManagerData>
<ijk:InvolvementDateData>
<ijk:InvolvementstartDate/>
<ijk:InvolvementTerminationDate/>
</ijk:InvolvementDateData>
<ijk:HostRegion/>
<ijk:WBSD>
<ijk:WBSC>00000000</ijk:WBSC>
<ijk:WBSDescription/>
<ijk:ContractCode/>
<ijk:ContractLine/>
<ijk:ContractLineDescription/>
</ijk:WBSD>
<ijk:PDIndicator>false</ijk:PDIndicator>
<ijk:CompanyCode/>
<ijk:ClientCode/>
<ijk:ContractCode/>
</ijk:Involvement>
</Involvements>

How to set SAXParser at command-line interface to create a large XML file

Hi,
I am trying to create a large XML file (more than 50 MB) by selecting from Oracle database but failed because of "out of memory" error. According to "Oracle XML Developer Guide", we should use SAXParser to parsing a large XML file. But there is no example to show how to set SAXParser at command-line
Following is what I use to get xml files. It works only when the file is small.
java OracleXML getXML -DateFormat -withDTD -rowsetTag PO_HDR -conn
"jdbc:oracle:oci8:@server_name" -user "ID/password" "select * from table_name"
When I set SAXParser at the way below,
java oracle.xml.parser.v2.SAXParser OracleXML getXML -DateFormat -withDTD -rowsetTag PO_HDR -conn
"jdbc:oracle:oci8:@server_name" -user "ID/password" "select * from table_name"
it failed with the error message: "In class oracle.xml.parser.v2.SAXParser: void main(String argv[]) is not defined"
Does anyone know how to solve the problem? I'll be appreciated very much for your help.
Yi

here are my ideas.
register the xml schema.
using xmldom, generate the desired xml output and return as xmltype.
then you can use something like this to check.
declare
xmldoc xmltype ;
begin
   -- populate xmldoc from you xmldom function
   -- validate against XML schema
   xmldoc.isSchemaValid(schema_url, root_element);
   if xmldoc.isSchemaValid = 1 then
        --valid schema
   else
        --invalid
   end if;
end

Query in a large xml file

Hello,
I'm trying to work with very large xml files which are created from csv files. These files may be very large - up to 1 GB ! Untill now I have managed to do several validations on these big xml files, and the only thing that works for me is SAX parser, DOM is out of the question because it fills up memory.
My next task is to do queries on these files, smth like:
select field1,field2 from file.xml
where field3 = 'A'
and (fileld4>'B' or field1='C')
order by field2.
I searched the net about finding out how to make queries on xml files (since I have never done queries on xml before), but I couldn't find which "query language" is best for large files. If I use XPath (XSLT) will that not cause me memory problems because XSLT represents the file as a memory object?
My idea is to parse the file with SAX and check every row if it fits the where condition and then write it immediately to a result xml file. But validating the where statement can be very complicated without using some tool. Also the order by statement is another problematic issue.
Does anyone have some more intelegent ideas about how I can do this? Please help! :(
The xml file looks like this:
<doc>
<row id ="1">
<column id="1" name="column1">value</column>
<column id="N" name="columnN">value</column>
</row>
<row id ="M">
<column id="1" name="column1">value</column>
<column id="N" name="columnN">value</column>
</row>
</doc>

Hi all,
Thank you very much for your replies.
First, saxon didn't work because it uses an in-memory parser, and that is what I was trying to avoid.
Different database is also out of the question, because the customer insist on XML, and also there are some files that can never be converted to a database table, because eventually with some transformations thay are changed and are not completely like the standard csv format.
I think that maybe http://exist.sourceforge.net is the rigth solution for me, but I will probably try it in the next version of my project.
For now I have managed to make the project with only SAXParser and a lot of back - end programming and it works ok, althoug it was very hard to make it, and will be harded to maintain, so I will try to look at the eXist project.
Thanks everyone for the help.

Performance Problem in parsing large XML file (15MB)

Similar Messages

Maybe you are looking for