Is JAXB suitable for large XML files ?

Hi,
I have a very large XML file (~700 MB) (schema available). I need to unmarshall this into java objects and carry out some (business validation rules) on it.. These buisness rules may involve validating data from content objects that correspond to different sections of this large XML file.
I am uncertain whether JAXB will help me here. (just started on it) Does JAXB build the entire content tree for the XML document during Unmarshaller.unmarshall ? Is there anyway of asking it to build content objects on demand as opposed to building the whole content tree immediately ?
All help/suggestions appreciated.

Forgot to add:
after carrying out validation the data is put into some RDBMS tables.
One approach would be to convert the XML files into SQL Loader compatible flat files (using a tool). Load these flat files into staging tables. Perform business validations on staging table data and then finally move the data into the main tables. All the validating logic could either be in stored procedures or java code.
The above is very long-winded. It would be great if JAXB can handle very large XML files (without loading the whole XML file into memory) so that business validations can be done by java, without any intermediate format conversion.
I hope the above is somewhat clear.

Similar Messages

I want to load large raw XML file in firefox and parse by DOM. But, for large XML file the firefox very slow some time crashed . Is there any option to increase DOM handling memory in Firefox

Actually i am using an off-line form to load very large XML file and using firefox to load that form. But, its taking more time to load and some time the browser crashed. through DOM parsing this XML file to my form. Is there any option to increase DOM handler size in firefox

Thank you for your suggestion. I have a question,
though. If I use a relational database and try to
access it for EACH and EVERY click the user makes,
wouldn't that take much time to populate the page with
data?
Isn't XML store more efficient here? Please reply me.You have the choice of reading a small number of records (10 children per element?) from a database, or parsing multiple megabytes. Reading 10 records from a database should take maybe 100 milliseconds (1/10 of a second). I have written a web application that reads several hundred records and returns them with acceptable response time, and I am no expert. To parse an XML file of many megabytes... you have already tried this, so you know how long it takes, right? If you haven't tried it then you should. It's possible to waste a lot of time considering alternatives -- the term is "analysis paralysis". Speculating on how fast something might be doesn't get you very far.

Options for large XML file - downloading from the app hangs browser

I am using 5.6.3 in EBS.
My users want to create files that can be uploaded into another system. The system will take XML files so my original idea was to create a data definition and have the user simply save the xml output (using View XML option). A .rtf template is also required for a pdf output - a summary of what was sent in the extract. This idea worked fine until I got a large file (37+ MB) and the View XML crashed my browser. Other than moving the files to a directory on the server that my user has access to, is there any other way to get at that raw XML?
Updated:
I have been doing some more reading and I am wondering if both an etext template and a pdf template would have been the better solution. This way I could submit once with the etext template and republish same data with the pdf template. Has anyone tried this successfully? I am not sure though that this would eliminate my problem with the large files (100,000+ records) crashing the browser. When you use an etext template, how do you get at that file? Is it using the 'View Output' button on Request Form or do I have the option to send it somewhere directly (i.e. server or email)?
Any tips would be appreciated.
Edited by: Tam_11 on Jan 14, 2011 1:38 PM

Hi,
using etext, the text output will still be in the usual directory at $APPLCSF/$APPLOUT.
The text output will be smaller than the XML output so I'd be surprised if it still crashes your browser so you shouldn't need to get the XML.
What format is needed for the target system?
Cheers
Kofi

Bouncy Castle Encryption for Large XML Files?

Hi,
I am trying to encrypt files > 8 KB using the KeyBasedLargeFileProcessor Utility class of Bouncy Castle.
It's encrypting the file. But, unable to decrypt the same encrypted file. Hence it's encrypting incorrectly.
While trying to decrypt the encrypted file, it says "It was encrypted with a key (4E616D65) that does not exist"
Please suggest what change needs to be made in the 'encryptFile' method where it says -
"OutputStream cOut = cPk.open(out, new byte[1 << 16]);"
I tried changing the value, but it doesn't work. The maximum file size that we may encrypt is around 3 MB.
The files that are being encrypted / decrypted are XML files.
Any inputs are highly appreciated.
Thanks,
Tan

While trying to decrypt the encrypted file, it says "It was encrypted with a key (4E616D65) that does not exist" So use the same key you encrypted with.
Please suggest what change needs to be made in the 'encryptFile' method where it says -
"OutputStream cOut = cPk.open(out, new byte[1 << 16]);" Do you have some evidence that that's where the problem is?

Efficient searching in a large XML file for specific elements

Hi
How can I search in a large XML file for a specific element efficiently (fast and memory savvy?) I have a large (approximately 32MB with about 140,000 main elements) XML file and I have to search through it for specific elements. What stable and production-ready open source tools are available for such tasks? I think PDOM is a solution but I can't find any well-known and stable implementations on the web.
Thanks in advance,
Behrang Saeedzadeh.

The problem with DOM parsers is that the whole document needs to be parsed!
So with large documents this uses up a lot of memory.
I suggest you look at sometthing like a pull parser (Piccolo or MPX1) which is a fast parser that is program driven and not event driven like SAX. This has the advantage of not needing to remember your state between events.
I have used Piccolo to extract events from large xml based log files.
Carl.

What are the best tools for opening very large XML files and examining the tree and confirming they are valid?

I am generating some very large XML files (600,000+ lines, 50MB+ characters). I finally have them all being valid XML and valid UTF-8.
But the files are so large Safari and Chrome will often not open them. FireFox will though.
Instead of these browsers, I was wondering if there are there any other recommended apps for the Mac for opening and viewing the XML, getting an error message if they are not valid for some reason and examing the XML tree?
I opened the file in the default app for XML which is Xcode, but that is just like opening it in a plain text editor. You can't expand/collapse the XML tree like you can with a browser, and it doesn't report errors.
Thanks,
Doug

Hi Tom,
I had not seen that list. I'll look it over.
I'm also in touch with the developer of BBEdit (they are quite responsive) and they are willing to look at the file in question and see why it is not reporting UTF-8 errors while Chrome is.
For now I have all the invalid characters quashed and things are working. But it would be useful in the future.
By the by, some of those editors are quite pricey!
doug

Best method for encrypting/decrypting large XML files ( 100MB)

I am in need of encrypting XML for large part files that can get upwards of 100Mb+.
I found some articles and code, but the only example I was successful in getting to work used XMLCipher, which takes a Document, parses it, and then encrypts it.
Obviously, 100Mb files do not cooperate well with DOM, so I want to find a better method for encryption/decryption of these files.
I found some articles using a CipherInputStream and CipherOutputStreams, but am not clear if this is the way to go and if this will avoid memory errors.
import java.io.*;
import java.security.spec.AlgorithmParameterSpec;
import javax.crypto.*;
import javax.crypto.spec.IvParameterSpec;
public class DesEncrypter {
    Cipher ecipher;
    Cipher dcipher;
    public DesEncrypter(SecretKey key) {
        // Create an 8-byte initialization vector
        byte[] iv = new byte[]{
            (byte)0x8E, 0x12, 0x39, (byte)0x9C,
            0x07, 0x72, 0x6F, 0x5A
        AlgorithmParameterSpec paramSpec = new IvParameterSpec(iv);
        try {
            ecipher = Cipher.getInstance("DES/CBC/PKCS5Padding");
            dcipher = Cipher.getInstance("DES/CBC/PKCS5Padding");
            // CBC requires an initialization vector
            ecipher.init(Cipher.ENCRYPT_MODE, key, paramSpec);
            dcipher.init(Cipher.DECRYPT_MODE, key, paramSpec);
        } catch (java.security.InvalidAlgorithmParameterException e) {
        } catch (javax.crypto.NoSuchPaddingException e) {
        } catch (java.security.NoSuchAlgorithmException e) {
        } catch (java.security.InvalidKeyException e) {
    // Buffer used to transport the bytes from one stream to another
    byte[] buf = new byte[1024];
    public void encrypt(InputStream in, OutputStream out) {
        try {
            // Bytes written to out will be encrypted
            out = new CipherOutputStream(out, ecipher);
            // Read in the cleartext bytes and write to out to encrypt
            int numRead = 0;
            while ((numRead = in.read(buf)) >= 0) {
                out.write(buf, 0, numRead);
            out.close();
        } catch (java.io.IOException e) {
    public void decrypt(InputStream in, OutputStream out) {
        try {
            // Bytes read from in will be decrypted
            in = new CipherInputStream(in, dcipher);
            // Read in the decrypted bytes and write the cleartext to out
            int numRead = 0;
            while ((numRead = in.read(buf)) >= 0) {
                out.write(buf, 0, numRead);
            out.close();
        } catch (java.io.IOException e) {
}This looks like it might fit, but there is one more twist, I am using a persistence manager and xml encoding to accomplish that, so I am not sure how (where) to implement this method without affecting persistence.
Any guidance on what would work best in this situation would be appreciated.
Regards,
vbplayr2000

I can give some general guidelines that might help, having done much similar work:
You have 2 different issues, at least from my reading of your problem:
1) How to deal with large XML docs that most parsers will not handle without memory issues
2) Where to hide or "black box" the encrypt/decrypt routines
#1: Check into XPP3/XMLPull. Yes, it's different that the other XML parsers you are used to using, and more work is involved, but it is blazing fast and can be used to parse a stream as it is being read. You can populate beans and process as needed since there is really not much "inversion of control" involved compared to parsers that go on to finish the entire document or load it all into memory.
#2: Extend Serializable and write your own readObject/writeObject methods. Place the encrypt/decrypt in there as appropriate. That will "hide" the implementation and should be what any persistence manager can deal with.
Regards,
antarti

How to parse large xml file

I need to parse large xml file which contains following tag. The size of the file is upto 10MB-50MB or more.
<departments>
<department>
<a_depart id="124">
<b_depart id="Bss_253">
<bss_depart id="253">
<attributes>
<name_one>abc</name_one>
</attributes>
</bss_depart id="253">
</b_depart id="Bss_253">
</a_depart id="124">
</department>
<department>
<a_depart id="124">
<b_depart id="Bss_254">
<mss_depart id="253">
          <attributes>
          <name_one>abc</name_one>
          <name_two>xyz</name_one>
          </attributes>
     </mss_depart>
     </b_depart>
</a_depart>
</department>
<department>
<a_depart id="124">
<b_depart id="Bss_254">
<mss_depart id="255">
          <attributes>
          <name_one>abc</name_one>
          <name_two>xyz</name_one>
          </attributes>
     </mss_depart>
     </b_depart>
</a_depart>
</department>
<department>
<a_depart id="125">
<b_depart id="Bss_254">
<mss_depart id="253">
          <attributes>
          <name_one>abc</name_one>
          <name_two>xyz</name_one>
          </attributes>
     </mss_depart>
     </b_depart>
</a_depart>
</department>
I want to get the infomation for that xml file. like mss_depart id=233, building xpath dyanmically for every id and loading
that using dom4j. which is very very slow.
Is there any other solution for that to read the data using sax parser only.
I want to execute the xpath or data for the following way.
//a_depart/@id ------> all the ids of a_depart tags if it returns 3 values say 123,124,125
after that i want to execute
//a_depart[@id='123']/b_depart/@id like this ...to retrive the values of all the levels ...
     I am executing following xpath for every unique ids at all levels.
     List l = doc.selectNodes(xPathForID);
     List l1 = doc.selectNodes(xPathForAttributes+attributes.get(j)+"/text()");
But it is very slow and taking lot of time.
Is there any other way to solve this problem. If any please mail me it is urgent.
I am using jdk1.4 and jdk1.5
Is there any support for sax parser to execute xpath in jdk1.5 direclty, with out using dom4j
Thanks in advance....

I doubt you will find a preexisting solution to your problem.
SAX is usually recommended for processing big files (where "big" is undefined"). It works on big files by avoiding the messy problem of storing the data -- that is left as an exercise to you.
DOM (and its variants) works by building a Document object as the head of the tree of objects for the entire contents. With DOM, you can then use XPath, because there is something to search that is already in memory. To use XPath, you seem to have two choices, build a DOM-ish tree, or if you can find an XPath processor (I'm not sure if one exists) that can process the XML file directly, but it will be slow, since you are looking for "all" occurences of an attribute, and this means you have to read the entire file each time.
It might be worth exploring a hybrid approach -- use SAX to get some information, and build your own objects to store the data. Maybe a HashMap as the main index. But, that will keep you from using XPath, since you do not have the data structures it expects.
A third alternative would be to look at JAXB. It builds Java code from a Schema of your data and then when you import the data, it creates the necessary objects and fills in values. But, I don't think XPath woll work there either.
Dave Patterson

Loading, processing and transforming Large XML Files

Hi all,
I realize this may have been asked before, but searching the history of the forum isn't easy, considering it's not always a safe bet which words to use on the search.
Here's the situation. We're trying to load and manipulate large XML files of up to 100MB in size.
The difference from what we have in our hands to other related issues posted is that the XML isn't big because it has a largly branched tree of data, but rather because it includes large base64-encoded files in the xml itself. The size of the 'clean' xml is relatively small (a few hundred bytes to some kilobytes).
We had to deal with transferring the xml to our application using a webservice, loading the xml to memory in order to read values from it, and now we also need to transform the xml to a different format.
We solved the webservice issue using XFire.
We solved the loading of the xml using JAXB. Nevertheless, we use string manipulations to 'cut' the xml before we load it to memory - otherwise we get OutOfMemory errors. We don't need to load the whole XML to memory, but I really hate this solution because of the 'unorthodox' manipulation of the xml (i.e. the cutting of it).
Now we need to deal with the transofmation of those XMLs, but obviously we can't cut it down this time. We have little experience writing XSL, but no experience on how to use Java to use the XSL files. We're looking for suggestions on how to do it most efficiently.
The biggest problem we encounter is the OutOfMemory errors.
So I ask several questions in one post:
1. Is there a better way to transfer the large files using a webservice?
2. Is there a better way to load and manipulate the large XML files?
3. What's the best way for us to transform those large XMLs?
4. Are we missing something in terms of memory management? Is there a better way to control it? We really are struggling there.
I assume this is an important piece of information: We currently use JDK 1.4.2, and cannot upgrade to 1.5.
Thanks for the help.

I think there may be a way to do it.
First, for low RAM needs, nothing beats SAX. as the first processor of the data. With SAX, you control the memory use since SAX only processes one "chunk" of the file at a time. You supply a class with methods named startElement, endElement, and characters. It calls the startElement method when it finds a new element. It calls the characters method when it wants to pass you some or all of the text between the start and end tags. It calls endElement to signal that passing characters is over, and to let you get ready for the next element. So, if your characters method did nothing with the base-64 data, you could see the XML go by with low memory needs.
Since we know in your case that the characters will process large chunks of data, you can expect many calls as SAX calls your code. The only workable solution is to use a StringBuffer to accumulate the data. When the endElement is called, you can decode the base-64 data and keep it somewhere. The most efficient way to do this is to have one StringBuffer for the class handling the SAX calls. Instantiate it with a big enough size to hold the largest of your binary data streams. In the startElement, you can set the length of the StringBuilder to zero and reuse it over and over.
You did not say what you wanted to do with the XML data once you have processed it. SAX is nice from a memory perspective, but it makes you do all the work of storing the data. Unless you build a structured set of classes "on the fly" nothing is kept. There is a way to pass the output of one SAX pass into a DOM processor (without the binary data, in this case) and then you would wind up with a nice tree object with the rest of your data and a group of binary data objects. I've never done the SAX/DOM combo, but it is called a SAXFilter, and you should be able to google an example.
So, the bottom line is that is is very possible to do what you want, but it will take some careful design on your part.
Dave Patterson

Is there a way to import large XML files into HANA efficiently are their any data services provided to do this?

1. Is there a way to import large XML files into HANA efficiently?
2. Will it process it node by node or the entire file at a time?
3. Are there any data services provided to do this?
This for a project use case i also have an requirement to process bulk XML files, suggest me to accomplish this task

Hi Patrick,
I am addressing the similar issue. "Getting data from huge XMLs into Hana."
Using Odata services can we handle huge data (i.e create schema/load into Hana) On-the-fly ?
In my scenario,
I get a folder of different complex XML files which are to be loaded into Hana database.
Then I gotta transform & cleanse the data.
Can I use oData services to transform and cleanse the data ?
If so, how can I create oData services dynamically ?
Any help is highly appreciated.
Thank you.
Regards,
Alekhya

How to set SAXParser at command-line interface to create a large XML file

Hi,
I am trying to create a large XML file (more than 50 MB) by selecting from Oracle database but failed because of "out of memory" error. According to "Oracle XML Developer Guide", we should use SAXParser to parsing a large XML file. But there is no example to show how to set SAXParser at command-line
Following is what I use to get xml files. It works only when the file is small.
java OracleXML getXML -DateFormat -withDTD -rowsetTag PO_HDR -conn
"jdbc:oracle:oci8:@server_name" -user "ID/password" "select * from table_name"
When I set SAXParser at the way below,
java oracle.xml.parser.v2.SAXParser OracleXML getXML -DateFormat -withDTD -rowsetTag PO_HDR -conn
"jdbc:oracle:oci8:@server_name" -user "ID/password" "select * from table_name"
it failed with the error message: "In class oracle.xml.parser.v2.SAXParser: void main(String argv[]) is not defined"
Does anyone know how to solve the problem? I'll be appreciated very much for your help.
Yi

here are my ideas.
register the xml schema.
using xmldom, generate the desired xml output and return as xmltype.
then you can use something like this to check.
declare
xmldoc xmltype ;
begin
   -- populate xmldoc from you xmldom function
   -- validate against XML schema
   xmldoc.isSchemaValid(schema_url, root_element);
   if xmldoc.isSchemaValid = 1 then
        --valid schema
   else
        --invalid
   end if;
end

OSB - Iterating over large XML files with content streaming

Hi @ll
I have to iterate over all item in large XML files and insert into a oracle database.
The file is about 200 MB and contains around 500'000, and I am using OSB 10gR3.
The XML structure is something like this:
<allItems>
<item>.....</item>
<item>.....</item>
<item>.....</item>
<item>.....</item>
<item>.....</item>
</allItems>
Actually I thought about using a proxy service with enabled content streaming and a "for each" action for iterating
over all items. But for this the whole XML structure has to be materialized into a variable otherwise it is not possible!
More about streaming large files can be found here:
[http://download.oracle.com/docs/cd/E13159_01/osb/docs10gr3/userguide/context.html#large_messages]
There is written "When you enable streaming for large message processing, you cannot use the ... for each...".
And for accessing single items you should should use an assign action with a xpath like "$body/allItems/item[1]";
this works fine and not the whole XML stream has to be materialized.
So my idea was to use the "for each" action and processing seqeuntially all items with a xpath like:
$body/allItems/item[$counter]
But the "for each" action just allows iterating over a sequence of xml items by defining an selection xpath
and the variable that contains all items. I would like to have a "repeat until" construct that iterates as long
$body/allItems/item[$counter] returns not null. Or can I use the "for each" action differently?
Does the OSB provides any other iterating mechanism? I know there is this spli-join construct that supports
different looping techniques, but as far I know it does not support content streaming, is this correct?
Did I miss somehting?
Thanks a lot for helping!
Cheers
Dani
Edited by: user10095731 on 29.07.2009 06:41

Hi Dani,
Yes, according to me this would be the best approach. You can use content-streaming to pass this large xml to ejb and once it passes successfully EJB should operate on this. If you want any result back (for further routing), you can get it back from EJB.
EJB gives you power of java to process this file and from java perspective 150 MB is not a very LARGE data. Ensure that you are using buffering. Check out this link for an explanation on Java IO Streams and, in particular, buffered streams -
http://java.sun.com/developer/technicalArticles/Streams/ProgIOStreams/
Try dom4J with xpp (XML Pull Parser) parser in case you have parsing requirement. We had worked with 1.2GB file using this technique.
Regards,
Anuj

Query in a large xml file

Hello,
I'm trying to work with very large xml files which are created from csv files. These files may be very large - up to 1 GB ! Untill now I have managed to do several validations on these big xml files, and the only thing that works for me is SAX parser, DOM is out of the question because it fills up memory.
My next task is to do queries on these files, smth like:
select field1,field2 from file.xml
where field3 = 'A'
and (fileld4>'B' or field1='C')
order by field2.
I searched the net about finding out how to make queries on xml files (since I have never done queries on xml before), but I couldn't find which "query language" is best for large files. If I use XPath (XSLT) will that not cause me memory problems because XSLT represents the file as a memory object?
My idea is to parse the file with SAX and check every row if it fits the where condition and then write it immediately to a result xml file. But validating the where statement can be very complicated without using some tool. Also the order by statement is another problematic issue.
Does anyone have some more intelegent ideas about how I can do this? Please help! :(
The xml file looks like this:
<doc>
<row id ="1">
<column id="1" name="column1">value</column>
<column id="N" name="columnN">value</column>
</row>
<row id ="M">
<column id="1" name="column1">value</column>
<column id="N" name="columnN">value</column>
</row>
</doc>

Hi all,
Thank you very much for your replies.
First, saxon didn't work because it uses an in-memory parser, and that is what I was trying to avoid.
Different database is also out of the question, because the customer insist on XML, and also there are some files that can never be converted to a database table, because eventually with some transformations thay are changed and are not completely like the standard csv format.
I think that maybe http://exist.sourceforge.net is the rigth solution for me, but I will probably try it in the next version of my project.
For now I have managed to make the project with only SAXParser and a lot of back - end programming and it works ok, althoug it was very hard to make it, and will be harded to maintain, so I will try to look at the eXist project.
Thanks everyone for the help.

Large XML file Loading

I have a large XML file that I am converting to an
ArrayCollection to use as a dataprovider for a datagrid. It takes
sometime to fully load. Is there any way to load partial list while
the rest of the list is loading?? or does anyone know a way speed
up this process??
Thanks

I'd try to modify the autoComplete component.
You could break this processing up into smaller chunks. For
it to work, you need some outside counter or indexer that keeps
track of where you are. Have the conversion function process say
nodes 0-500, then end. Then using callLater, call that function
again, to process ne next batch of nodes.
This process will allow the UI to update between iteration
batches. If you need more responsiveness, you could try monitoring
mouse move, and stopping the conversion, until the mouse is
inactive again. That is just brainstorming. I have not tried it
(the mouse move part. I know the iterator method works to allow the
UI to update.)
Tracy

Best technology to navigate through a very large XML file in a web page

Hi!
I have a very large XML file that needs to be displayed in my web page, may be as a tree structure. Visitors should be able to go to any level depth nodes and access the children elements or text element of those nodes.
I thought about using DOM parser with Java but dropped that idea as DOM would be stored in memory and hence its space consuming. Neither SAX works for me as every time there is a click on any of the nodes, my SAX parser parses the whole document for the node and its time consuming.
Could anyone please tell me the best technology and best parser to be used for very large XML files?

Thank you for your suggestion. I have a question,
though. If I use a relational database and try to
access it for EACH and EVERY click the user makes,
wouldn't that take much time to populate the page with
data?
Isn't XML store more efficient here? Please reply me.You have the choice of reading a small number of records (10 children per element?) from a database, or parsing multiple megabytes. Reading 10 records from a database should take maybe 100 milliseconds (1/10 of a second). I have written a web application that reads several hundred records and returns them with acceptable response time, and I am no expert. To parse an XML file of many megabytes... you have already tried this, so you know how long it takes, right? If you haven't tried it then you should. It's possible to waste a lot of time considering alternatives -- the term is "analysis paralysis". Speculating on how fast something might be doesn't get you very far.

Is JAXB suitable for large XML files ?

Similar Messages

Maybe you are looking for