Best technology to navigate through a very large XML file in a web page

Hi!
I have a very large XML file that needs to be displayed in my web page, may be as a tree structure. Visitors should be able to go to any level depth nodes and access the children elements or text element of those nodes.
I thought about using DOM parser with Java but dropped that idea as DOM would be stored in memory and hence its space consuming. Neither SAX works for me as every time there is a click on any of the nodes, my SAX parser parses the whole document for the node and its time consuming.
Could anyone please tell me the best technology and best parser to be used for very large XML files?

Thank you for your suggestion. I have a question,
though. If I use a relational database and try to
access it for EACH and EVERY click the user makes,
wouldn't that take much time to populate the page with
data?
Isn't XML store more efficient here? Please reply me.You have the choice of reading a small number of records (10 children per element?) from a database, or parsing multiple megabytes. Reading 10 records from a database should take maybe 100 milliseconds (1/10 of a second). I have written a web application that reads several hundred records and returns them with acceptable response time, and I am no expert. To parse an XML file of many megabytes... you have already tried this, so you know how long it takes, right? If you haven't tried it then you should. It's possible to waste a lot of time considering alternatives -- the term is "analysis paralysis". Speculating on how fast something might be doesn't get you very far.

Similar Messages

What are the best tools for opening very large XML files and examining the tree and confirming they are valid?

I am generating some very large XML files (600,000+ lines, 50MB+ characters). I finally have them all being valid XML and valid UTF-8.
But the files are so large Safari and Chrome will often not open them. FireFox will though.
Instead of these browsers, I was wondering if there are there any other recommended apps for the Mac for opening and viewing the XML, getting an error message if they are not valid for some reason and examing the XML tree?
I opened the file in the default app for XML which is Xcode, but that is just like opening it in a plain text editor. You can't expand/collapse the XML tree like you can with a browser, and it doesn't report errors.
Thanks,
Doug

Hi Tom,
I had not seen that list. I'll look it over.
I'm also in touch with the developer of BBEdit (they are quite responsive) and they are willing to look at the file in question and see why it is not reporting UTF-8 errors while Chrome is.
For now I have all the invalid characters quashed and things are working. But it would be useful in the future.
By the by, some of those editors are quite pricey!
doug

Best data Structor for dealing with very large CSV files

hi im writeing an object that stores data from a very large CSV file. The idea been that you initlize the object with the CSV file, then it has lots of methods to make manipulating and working with the CSV file simpler. Operations like copy colum, eliminate rows, perform some equations on all values in a certain colum, etc. Also a method for prining back to a file.
however the CSV files will probly be in the 10mb range maby larger so simply loading into an array isn't posable. as it produces a outofmemory error.
does anyone have a data structor they could recomend that can store the large amounts of data require and are easly writeable. i've currently been useing a randomaccessfile but it is aquard to write to as well as needing an external file which would need to been cleaned up after the object is removed (something very hard to guarentee occurs).
any suggestions would be greatly apprechiated.
Message was edited by:
ninjarob

How much internal storage ("RAM") is in the computer where your program should run? I think I have 640 Mb in mine, and I can't believe loading 10 Mb of data would be prohibitive, not even if the size doubles when the data comes into Java variables.
If the data size turns out to be prohibitive of loading into memory, how about a relational database?
Another thing you may want to consider is more object-oriented (in the sense of domain-oriented) analysis and design. If the data is concerned with real-life things (persons, projects, monsters, whatever), row and column operations may be fine for now, but future requirements could easily make you prefer something else (for example, a requirement to sort projects by budget or monsters by proximity to the hero).

I want to load large raw XML file in firefox and parse by DOM. But, for large XML file the firefox very slow some time crashed . Is there any option to increase DOM handling memory in Firefox

Actually i am using an off-line form to load very large XML file and using firefox to load that form. But, its taking more time to load and some time the browser crashed. through DOM parsing this XML file to my form. Is there any option to increase DOM handler size in firefox

Thank you for your suggestion. I have a question,
though. If I use a relational database and try to
access it for EACH and EVERY click the user makes,
wouldn't that take much time to populate the page with
data?
Isn't XML store more efficient here? Please reply me.You have the choice of reading a small number of records (10 children per element?) from a database, or parsing multiple megabytes. Reading 10 records from a database should take maybe 100 milliseconds (1/10 of a second). I have written a web application that reads several hundred records and returns them with acceptable response time, and I am no expert. To parse an XML file of many megabytes... you have already tried this, so you know how long it takes, right? If you haven't tried it then you should. It's possible to waste a lot of time considering alternatives -- the term is "analysis paralysis". Speculating on how fast something might be doesn't get you very far.

Query in a large xml file

Hello,
I'm trying to work with very large xml files which are created from csv files. These files may be very large - up to 1 GB ! Untill now I have managed to do several validations on these big xml files, and the only thing that works for me is SAX parser, DOM is out of the question because it fills up memory.
My next task is to do queries on these files, smth like:
select field1,field2 from file.xml
where field3 = 'A'
and (fileld4>'B' or field1='C')
order by field2.
I searched the net about finding out how to make queries on xml files (since I have never done queries on xml before), but I couldn't find which "query language" is best for large files. If I use XPath (XSLT) will that not cause me memory problems because XSLT represents the file as a memory object?
My idea is to parse the file with SAX and check every row if it fits the where condition and then write it immediately to a result xml file. But validating the where statement can be very complicated without using some tool. Also the order by statement is another problematic issue.
Does anyone have some more intelegent ideas about how I can do this? Please help! :(
The xml file looks like this:
<doc>
<row id ="1">
<column id="1" name="column1">value</column>
<column id="N" name="columnN">value</column>
</row>
<row id ="M">
<column id="1" name="column1">value</column>
<column id="N" name="columnN">value</column>
</row>
</doc>

Hi all,
Thank you very much for your replies.
First, saxon didn't work because it uses an in-memory parser, and that is what I was trying to avoid.
Different database is also out of the question, because the customer insist on XML, and also there are some files that can never be converted to a database table, because eventually with some transformations thay are changed and are not completely like the standard csv format.
I think that maybe http://exist.sourceforge.net is the rigth solution for me, but I will probably try it in the next version of my project.
For now I have managed to make the project with only SAXParser and a lot of back - end programming and it works ok, althoug it was very hard to make it, and will be harded to maintain, so I will try to look at the eXist project.
Thanks everyone for the help.

Store large XML files

Has anybody experience with storing of very large XML files (400 MB) as a XML type in Oracle?
Is this feasible? Is this efficient (response time)? Can you recommend to do this?
Or is it better (with a better performance) to parse the files and store the data in an own db scheme?
Thanks,
-Bernhard

Here is an Oracle ACE with experience on storing large XML into the DB
[HOWTO: Load Really Big XML Files|http://www.liberidu.com/blog/?p=473]
There are more examples on Marco's blog as well of different ways to load XML into the DB.
Yes it is feasible. Efficient depends upon the complexity of the XML (not so much size) and how you are storing it into the DB (XMLType, XMLType associated to a schema, Object Relational, Hybrid, etc).
Mark Drake from the {forum:id=34} forum has seen good performance just using INSERT INTO ... VALUES ... (BFILENAME()) to load the XML as well. Some other suggestions are listed in the FAQ on that forum as well.
Any future questions you have on this topic would probably be best answered by posting in that forum.

Large xml file

I have a very large xml file (i.e. test.xml) which I need to import into our oracle database.
We have Oracle9i Release 9.2.0.6.0.
Test.xml is contained in a directory on our server, i.e. "xmldirectory".
I have looked on the web, forums (including this one), left, right and centre...also looked at this link http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14259/xdb03usg.htm#sthref246 ...but was unable to get anything to work.
Can you please provide examples?
Thanks

well, you could
do this
http://www.oracle-base.com/articles/9i/ParseXMLDocuments9i.php
or this
http://www.devx.com/xml/Article/32046
or this
http://www.oracle.com/technology/pub/articles/quinlan-xml.html
or this
XMLType view of Relational Content

Is JAXB suitable for large XML files ?

Hi,
I have a very large XML file (~700 MB) (schema available). I need to unmarshall this into java objects and carry out some (business validation rules) on it.. These buisness rules may involve validating data from content objects that correspond to different sections of this large XML file.
I am uncertain whether JAXB will help me here. (just started on it) Does JAXB build the entire content tree for the XML document during Unmarshaller.unmarshall ? Is there anyway of asking it to build content objects on demand as opposed to building the whole content tree immediately ?
All help/suggestions appreciated.

Forgot to add:
after carrying out validation the data is put into some RDBMS tables.
One approach would be to convert the XML files into SQL Loader compatible flat files (using a tool). Load these flat files into staging tables. Perform business validations on staging table data and then finally move the data into the main tables. All the validating logic could either be in stored procedures or java code.
The above is very long-winded. It would be great if JAXB can handle very large XML files (without loading the whole XML file into memory) so that business validations can be done by java, without any intermediate format conversion.
I hope the above is somewhat clear.

Best parser for handling very large XML document

which is the best parser whenread and extract information from very large XML document

Any SAX-parser, since DOM would use 6 times as much primary memory as the file-size.
Xerces SAX-parser is in my experience the fastest.
Gil

Have a very large text file, and need to read lines in the middle.

I have very large txt files (around several hundred megabytes), and I want to be able to skip and read specific lines. More specifically, say the file looks like:
scan 1
scan 2
scan 3
scan 100,000
I want to be able to skip move the filereader immediately to scan 50,000, rather than having to read through scan 1-49,999.
Thanks for any help.

If the lines are all different lengths (as in your example) then there is nothing you can do except to read and ignore the lines you want to skip over.
If you are going to be doing this repeatedly, you should consider reformatting those text files into something that supports random access.

Very large XML String parameters

Hi !
I'm using AXIS 1.x, websphere 5 --- The problem is - when i call webservice with xml (String) parameter upto size of 10kb-400kb.. it works fine..
But my application could genrate very large xml, like 900kb-1000kb even more.. When this large XML is sent as String parameter.. no reply is recieved back..
Can some body throw some light.. what is going wrong... and which approch to be followed.
Thanks a lot
@mit

Maybe this example on the XDB forum will be helpful...
XMLType view of Relational Content
XML type questions are best asked in that forum.
;)

I need to sort very large Excel files and perform other operations. How much faster would this be on a MacPro rather than my MacBook Pro i7, 2.6, 15R?

I am a scientist and run my own business. Money is tight. I have some very large Excel files (~200MB) that I need to sort and perform logic operations on. I currently use a MacBookPro (i7 core, 2.6GHz, 16GB 1600 MHz DDR3) and I am thinking about buying a multicore MacPro. Some of the operations take half an hour to perform. How much faster should I expect these operations to happen on a new MacPro? Is there a significant speed advantage in the 6 core vs 4 core? Practically speaking, what are the features I should look at and what is the speed bump I should expect if I go to 32GB or 64GB? Related to this I am using a 32 bit version of Excel. Is there a 64 bit spreadsheet that I can us on a Mac that has no limit on column and row size?

Grant Bennet-Alder,
It’s funny you mentioned using Activity Monitor. I use it all the time to watch when a computation cycle is finished so I can avoid a crash. I keep it up in the corner of my screen while I respond to email or work on a grant. Typically the %CPU will hang at ~100% (sometimes even saying the application is not responding in red) but will almost always complete the cycle if I let it go for 30 minutes or so. As long as I leave Excel alone while it is working it will not crash. I had not thought of using the Activity Monitor as you suggested. Also I did not realize using a 32 bit application limited me to 4GB of memory for each application. That is clearly a problem for this kind of work. Is there any work around for this? It seems like a 64-bit spreadsheet would help. I would love to use the new 64 bit Numbers but the current version limits the number of rows and columns. I tried it out on my MacBook Pro but my files don’t fit.
The hatter,
This may be the solution for me. I’m OK with assembling the unit you described (I’ve even etched my own boards) but feel very bad about needing to step away from Apple products. When I started computing this was the sort of thing computers were designed to do. Is there any native 64-bit spreadsheet that allows unlimited rows/columns, which will run on an Apple? Excel is only 64-bit on their machines.
Many thanks to both of you for your quick and on point answers!

Today, I randomly happened to have less than 1GB of hard drive space left. I found very large "frame" files, what are they?

I found very large "frame" files, what are they & can I delete them? (See screenshot). I'm a (17 today)-year-old film-maker and can't edit in FCP X anymore because I "don't have enough space". Every time I try to delete one, another identical file creates itself...
If that can help: I just upgraded to FCP 10.0.4 and every time I launch it it asks to convert my current projects (I know it would do it at least once) and I accept, but everytime I have to get it done AGAIN. My computer is slower than ever and I have a deadline this friday
I also just upgraded to Mac OS X 10.7.4, and the problem hasn't been here for long, so it may be linked...
Please help me!
Alex

The first thing you should do is to back up your personal data. It is possible that your hard drive is failing. If you are using Time Machine, that part is already done.
Then, I think it would be easiest to reformat the drive and restore. If you ARE using Time Machine, you can start up from your Leopard installation disc. At the first Installer screen, go up to the menu bar, and from the Utilities menu, first select to run Disk Utility. Completely erase the internal drive using the Erase tab; make sure you have the internal DRIVE (not the volume) selected in the sidebar, and make sure you are NOT erasing your Time Machine drive by mistake. After erasing, quit Disk Utility, and select the command to restore from backup from the same Utilities menu. Using that Time Machine volume restore utility, you can restore it to a time and date immediately before you went on vacation, when things were working.
If you are not using Time Machine, you can erase and reinstall the OS (after you have backed up your personal data). After restarting from the new installation and installing all the updates using Software Update, you can restore your personal data from the backup you just made.

How can NI FBUS Monitor display very large recorded files

NI FBUS Monitor version 3.0.1 outputs an error message "Out of memory", if I try to load a large recorded file of size 272 MB. Is there any combination of operating system (possible Vista32 or Vista64) and/or physical memory size, where NI FBUS Monitor can display such large recordings ? Are there any patches or workarounds or tools to display very large recorded files?

Hi,
NI-FBUS Monitor does not set the limitation on the maximum record file size. The physical memory size in the system is one of the most important factors that affect the loading of large record file. Monitor will try loading the entire file into the memory during file open operation.
272MB is a really large file size. To open the file, your system must have sufficient physical memory available. Otherwise "Out of memory" error will occur.
I would recommend you do not use Monitor to open a file larger than 100MB. Loading of a too large file will consume the system memory quickly and decrease the performance.
Message Edited by Vince Shen on 11-30-2009 09:38 PM
Feilian (Vince) Shen

HELP!! Very Large Spooling / File Size after Data Merge

My question is: If the image is the same and only the text is different why not use the same image over and over again?
Here is what happens...
Using CS3 and XP (P4 2.4Ghz, 1GB Ram, 256MB Video Card) I have taken a postcard pdf (the backside) placed it in a document, then I draw a text box. Then I select a data source and put the fields I wish to print (Name, address, zip, etc.) in the text box.
Now, under the Create Merged Document menu I select Multiple Records and then use the Multiple Records Layout tab to adjust the placement of this postcard on the page. I use the preview multiple records option to lay out 4 postcards on my page. Then I merge the document (it has 426 records).
Now that my merged document is created with four postcards per page and the mailing data on each card I go to print. When I print the file it spools up huge! The PDF I orginally placed in the document is 2.48 MB but when it spools I can only print 25 pages at a time and that still takes FOREVER. So again my question is... If the image is the same and only the text is different why not use the same image over and over again?
How can I prevent the gigantic spooling? I have tried putting the PDF on the master page and then using the document page to create the merged document and still the same result. I have also tried createing a merged document with just the addresses then adding the the pdf on the Master page afterward but again, huge filesize while spooling. Am I missing something? Any help is appreciated :)

The size of the EMF spool file may become very large when you print a document that contains lots of raster data
View products that this article applies to.
Article ID : 919543
Last Review : June 7, 2006
Revision : 2.0
On This Page
SYMPTOMS
CAUSE
RESOLUTION
STATUS
MORE INFORMATION
Steps to reproduce the problem
SYMPTOMS
When you print a document that contains lots of raster data, the size of the Enhanced Metafile (EMF) spool file may become very large. Files such as Adobe .pdf files or Microsoft Word .doc documents may contain lots of raster data. Adobe .pdf files and Word .doc documents that contain gradients are even more likely to contain lots of raster data.
Back to the top
CAUSE
This problem occurs because Graphics Device Interface (GDI) does not compress raster data when the GDI processes EMF spool files and generates EMF spool files.
This problem is very prominent with printers that support higher resolutions. The size of the raster data increases by four times if the dots-per-inch (dpi) in the file increases by two times. For example, a .pdf file of 1 megabyte (MB) may generate an EMF spool file of 500 MB. Therefore, you may notice that the printing process decreases in performance.
Back to the top
RESOLUTION
To resolve this problem, bypass EMF spooling. To do this, follow these steps:1. Open the properties dialog box for the printer.
2. Click the Advanced tab.
3. Click the Print directly to the printer option.
Note This will disable all print processor-based features such as the following features: N-up
Watermark
Booklet printing
Driver collation
Scale-to-fit
Back to the top
STATUS
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.
Back to the top
MORE INFORMATION
Steps to reproduce the problem
1. Open the properties dialog box for any inbox printer.
2. Click the Advanced tab.
3. Make sure that the Print directly to the printer option is not selected.
4. Click to select the Keep printed documents check box.
5. Print an Adobe .pdf document that contains many groups of raster data.
6. Check the size of the EMF spool file.

Best technology to navigate through a very large XML file in a web page

Similar Messages

Maybe you are looking for