Searching through very large vectors

I am working on a way to process two flat tab delimited files into a tree, assign a x and y coordinate to each node in the tree and output all the nodes (with their coordinates) to a new flat file.
I currently have a program that works pretty well. It roughly uses the following flow.
- Read both files into memory (by opening the file reading each line and load the appropriate data from each line into a Vector, making sure no duplicates are entered by comparing the currentline to the last line.
- Using the first vector (which contains the strating nodes) search through the second vector (which contains parent child relationships between 2 nodes) to construct the tree. For this tree I use a XML DOM Document. In this logic I use a for loop to find all the children for the given node. I store the index of each found reference and when all children are found I loop through all the indexes and delete those records from the parent-child vector.
- After the tree is created I walk through the tree and assign each node a x and y attribute.
- When this is done I create a NodeList and use are for-loop to write each node (with x and y) to a StringBuffer which is then written to a file. In this process for each new node that is written I check (in the StringBuffer) if the node (name) is present. If not I write the new Node.
- For debugging purposes I write all the references from the second Vector to a file and output the XML DOM tree to a XML file.
This program works wel. It handles files with 10000 start nodes and 20000 parent-child references (30000 nodes in total) in under 2 minutes (using 1:20 for the generation of the output file).
However when the volume of these file increase it starts to struggle.
As the ultimate test I ran it with a file that contains 250000 start nodes and 500000 references. For it to run I need to use the -Xmx256m parameter to allocate extra memory. But I ran it for 2 hours and killed it because I didn't want to wait longer.
What I would like to know is how I can approach this better. Right now I'm loading the data from the files into memory entirely. Maybe this isn't the best approach.
Also I'm looping through a Vector with 500000 elements, how can this be done more efficiently? However the reference vector isn't sorted in any way.

Hi,
That's no problem.. Here's some sample code:
package tests;
import java.util.List;
import java.util.Map;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.Iterator;
class Example {
    private List roots;
    private Map elements;
    public Example() {
        roots = new LinkedList();
        elements = new HashMap();
    public void initRoots(String[] rows) {
        for (int i=0; i<rows.length; i++) {
            String[] parts = rows.split(" ");
String name = parts[0];
roots.add(name);
elements.put(name, new Node(name));
public void addChilds(String[] rows) {
for (int i=0; i<rows.length; i++) {
String[] parts = rows[i].split(" ");
String parentId = parts[1];
String name = parts[2];
addNode(parentId, name);
private void addNode(String parentId, String name) {
Node current = (Node)elements.get(name);
if (current == null) {
current = new Node(name);
elements.put(name, current);
Node parent = (Node)elements.get(parentId);
if (parent == null) {
//Parent is missing, is that a problem?. Create it now.
parent = new Node(parentId);
elements.put(parentId, parent);
return;
parent.addChild(current);
public void printTree() {
for (Iterator it = roots.iterator(); it.hasNext(); ) {
String id = (String)it.next();
printChildren(id, 1);
private void printChildren(String id, int depth) {
Node node = (Node)elements.get(id);
System.out.println(node);
private static final class Node {
private String name;
private List children;
private Node(String name) {
this.name = name;
children = new LinkedList();
public void addChild(Node node) {
children.add(node);
public String toString() {
return name + " " + children;
public static void main(String[] args) throws Exception {
Example test = new Example();
test.initRoots(new String[] {
"SU_1 1 1 1 0 0 0 0",
"SU_2 1 1 1 0 0 0 0",
"SU_3 1 1 1 0 0 0 0"
test.addChilds(new String[] {
"COM_1 SU_1 PR_1 0 0 0 0 0",
"COM_1 PR_1 ST_1 0 0 0 0 0",
"COM_2 SU_2 PR_2 0 0 0 0 0",
"COM_2 PR_2 ST_2 0 0 0 0 0",
"COM_3 SU_3 PR_3 0 0 0 0 0",
"COM_3 PR_3 ST_3 0 0 0 0 0"
test.printTree();
The execution prints:
SU_1 [PR_1 [ST_1 []]]
SU_2 [PR_2 [ST_2 []]]
SU_3 [PR_3 [ST_3 []]]
/Kaj

Similar Messages

Searching through a Large Database

There's a specialty program that I use all the time and it's built for windows only. This program pretty much searches through a big database of articles that this organization has published and you can read them.
I'm wondering, what should I do to redevelop the database? Is there a good program to use? and what language I should use?
Message was edited by: Supanatral

+I just got to track down who did it I guess.+
If you have that as a resource I'd very much recommend doing that. No reason to re-reverse engineer the wheel.
+Is there a good step by step how to for mysql?+
There are many scattered around the web.
Here is Apple's:http://developer.apple.com/internet/opensource/osdb.html
Here is MySQLs: http://dev.mysql.com/doc/refman/5.0/en/mac-os-x-installation.html
Also some OS X seem to have shipped with OpenBase installed and some don't. I'm not sure what the process or license is on that but it is much friendly and GUI operative than MySQL. If you have it it's worth looking into if it'll work for you.
Good Luck,
=Tod

Very-large-scale searching in J2EE

I'm looking to solve a very-large-scale searching problem. I am creating a site
where users can search a table with five million records, filtering and sorting
independantly on ten different columns. For example, the table might be five million
customers, and the user might choose "S*" for the last name, and sort ascending
on street name.
I have read up on a number of patterns to solve this problem, but anticipate some
performance issues. I'll explain below:
1) "Page-by-Page Iterator" or "Value List Handler"
In this pattern, it appears that all records that match the search criteria are
retrieved from the database and cached on the application server. The client (JSP)
can then access small pieces of the cached results at a time. Issues with this
include:
- If the customer record is 1KB, then wide search criteria (i.e. last name =
S*) will cause 1 GB transfer from the database server to app server, and then
1GB being stored on the app server, cached, waiting for the user (each user!)
to ask for the next 10 or 100 records. This is inefficient use of network and
memory resources.
- 99% of the data transfered from the database server will not by used ... most
users flip through a couple of pages and then choose a record or start a new search
2) Requery the database each time and ask for a subset
I haven't seen this formalized into a pattern yet, but the basic idea is this:
If a clients asks for records 1-100 first (i.e. page 1), only fetch that many
records from the db. If the user asks for the next page, requery the database
and use the JDBC API's ResultSet.absolute(int row) to start at record 101. Issue:
The query is re-performed, causing the Oracle server to do another costly "execute"
(bad on 5M records with sorting).
To solve this, I've beed trying to enhance the second strategy above by caching
the ResultSet object in a stateful session bean. Unfortunately, this causes a
"ResultSet already closed" SQLException, although I ensure that the Connection,
PreparedStatement, and ResultSet are all stored in the EJB and not closed. I've
seen this on newsgroups ... it appears that WebLogic is forcing the Connection
closed. If this is how J2EE and pooled connections work, then that's fine ...
there's nothing I can really do about it.
Another idea is to use "explicit cursors" in Oracle. I haven't fully explored
it yet, but it wouldn't be a great solution as it would be using Oracle-specific
functionality (we are trying to be db-agnostic).
More information:
- BEA WebLogic Server 8.1
- JDBC: Oracle's thin driver provided with WLS 8.1
- Platform: Sun Solaris 5.8
- Oracle 9i
Any other ideas on how I can solve this issue?

Michael McNeil wrote:
I'm looking to solve a very-large-scale searching problem. I am creating a site
where users can search a table with five million records, filtering and sorting
independantly on ten different columns. For example, the table might be five million
customers, and the user might choose "S*" for the last name, and sort ascending
on street name.
I have read up on a number of patterns to solve this problem, but anticipate some
performance issues. I'll explain below:
1) "Page-by-Page Iterator" or "Value List Handler"
In this pattern, it appears that all records that match the search criteria are
retrieved from the database and cached on the application server. The client (JSP)
can then access small pieces of the cached results at a time. Issues with this
include:
- If the customer record is 1KB, then wide search criteria (i.e. last name =
S*) will cause 1 GB transfer from the database server to app server, and then
1GB being stored on the app server, cached, waiting for the user (each user!)
to ask for the next 10 or 100 records. This is inefficient use of network and
memory resources.
- 99% of the data transfered from the database server will not by used ... most
users flip through a couple of pages and then choose a record or start a new search
2) Requery the database each time and ask for a subset
I haven't seen this formalized into a pattern yet, but the basic idea is this:
If a clients asks for records 1-100 first (i.e. page 1), only fetch that many
records from the db. If the user asks for the next page, requery the database
and use the JDBC API's ResultSet.absolute(int row) to start at record 101. Issue:
The query is re-performed, causing the Oracle server to do another costly "execute"
(bad on 5M records with sorting).
To solve this, I've beed trying to enhance the second strategy above by caching
the ResultSet object in a stateful session bean. Unfortunately, this causes a
"ResultSet already closed" SQLException, although I ensure that the Connection,
PreparedStatement, and ResultSet are all stored in the EJB and not closed. I've
seen this on newsgroups ... it appears that WebLogic is forcing the Connection
closed. If this is how J2EE and pooled connections work, then that's fine ...
there's nothing I can really do about it.
Another idea is to use "explicit cursors" in Oracle. I haven't fully explored
it yet, but it wouldn't be a great solution as it would be using Oracle-specific
functionality (we are trying to be db-agnostic).
More information:
- BEA WebLogic Server 8.1
- JDBC: Oracle's thin driver provided with WLS 8.1
- Platform: Sun Solaris 5.8
- Oracle 9i
Any other ideas on how I can solve this issue? Hi. Fancy SQL to the rescue! If the table has a unique key, you can simply send a
query per page, with iterative SQL that selects the next N rows beyond what was
selected last time. Eg:
Let variable X be the highest key value you've seen so far. Initially it would
be the lowest possible value.
select * from mytable M
where ... -- application-specific qualifications...
and M.key >= X
and (100 <= select count(*) from mytable MM where MM.key > X and MM.key < M.key and ...)
In English, this says, select all the qualifying rows higher than what I last saw, but
only those that have fewer than 100 qualifying rows between the last I saw and them (ie:
the next 100).
When processing this query, remember the highest key value you see, and use it for the
next query.
Joe

Best technology to navigate through a very large XML file in a web page

Hi!
I have a very large XML file that needs to be displayed in my web page, may be as a tree structure. Visitors should be able to go to any level depth nodes and access the children elements or text element of those nodes.
I thought about using DOM parser with Java but dropped that idea as DOM would be stored in memory and hence its space consuming. Neither SAX works for me as every time there is a click on any of the nodes, my SAX parser parses the whole document for the node and its time consuming.
Could anyone please tell me the best technology and best parser to be used for very large XML files?

Thank you for your suggestion. I have a question,
though. If I use a relational database and try to
access it for EACH and EVERY click the user makes,
wouldn't that take much time to populate the page with
data?
Isn't XML store more efficient here? Please reply me.You have the choice of reading a small number of records (10 children per element?) from a database, or parsing multiple megabytes. Reading 10 records from a database should take maybe 100 milliseconds (1/10 of a second). I have written a web application that reads several hundred records and returns them with acceptable response time, and I am no expert. To parse an XML file of many megabytes... you have already tried this, so you know how long it takes, right? If you haven't tried it then you should. It's possible to waste a lot of time considering alternatives -- the term is "analysis paralysis". Speculating on how fast something might be doesn't get you very far.

Checking a Sub String in a String of a very large file ?

Hi All,
I am having a 20mb file and i am coverting that 20mb file to a string.now i like to search whether the following substring appears in the string
a) One
b) Two
c) Three
I am just checking with String.indexOf() for each .
Is there is any other efficient way of searching this rather than the above method String.indexOf
please give me some suggestion how to perform searching in a very large string.
Thanks,
J.Kathir

We have to read line and check for the three strings
right ?
Is there is any other way to check the substring in
the whole string ?
Reading line by line and searching for the three
strings or Reading the entire file as string and
searching for the substring
which is better ?Depends on your definition of "better" - reading line by line saves memory, but is probably slower. Reading the entire file into memory then searching it is probably faster but takes more memory.
As a third option, if you're using version 1.5, you can use the Scanner class to read the file and search for all 3 strings at the same time...something like this:
Scanner scn = new Scanner(new File("myfile.txt"));
String regExp = "a|b|c";
String s;
while ((s = scn.findWithinHorizon(regExp)) != null) {
System.out.println("Found: " + s);
scn.close();I'm pretty sure this is more efficient as it looks for anything matching the regular expression as it goes through the String...but I haven't done any timings on each approach.

Is there a way to split a very large 1 page pdf into letter size multiple page pdf?

I often have very large single page pdfs that need to be printed onto letter size paper. Usually I don't have access to the printer where I'm working so I have to send the file to someone for printing.
I have AXI pro, they don't.
I want to make sure the job is printed as I specify and most of the users are using Reader. So I want to give the someone the pdf ready to print sized in legal. This requires manipulation of the pdf that I don't seem to be able to figure out how to do.
In older versions of Acrobat, I could print to a new pdf and designate the page size. Acrobat would create the multipage pdf. The newer versions don't allow this.
With OSX 10.8 & AXI you can't save, export, split a one page (68" x 16") document into multiple page letter size (16 pages) pdf.
Perhaps this can be done by printing to eps and running through distiller again or something else, but I'm stumped at the moment.
Any suggestions of how to attach this would be appreciated.
Thanks.

That's a tough one. Acrobat is not designed for tiling PDF files to create another PDF. That's really what you're asking. There is the option to PRINT to a PDF, and turn on the Poster feature. If were in Windows where there is a real Adobe PDF printer driver, you could probably use that feature. But for various reasons (too complicated to describe here), that was withdrawn on the Macintosh.
If you have a copy of Adobe InDesign, and if you installed an Adobe PDF 9 PPD file (see description below), it could be done in a somewhat awkward way. InDesign allows you to place PDF files so you would need to make a page of the proper size and place your large PDF:
Then after installing the Adobe PDF 9 PPD file, you could choose File > Print. Then choose to print a PostScript file to the Adobe PDF 9.0 PPD file. In the Setup panel, you'd choose a Letter size page. Then you'd choose the Tile option at the bottom and set the Overlap amount:
Then you'd save the PostScript file and process through Distiller.
My blog post below describes how to find and install the Adobe PDF 9.0 PPD file:
http://indesignsecrets.com/creating-postscript-files-in-snow-leopard-for-older-print-workf lows.php

Automatic Area of Search through finder?

By default, every time I try to search through spotlight in finder, it automatically searches "This Mac". I was wondering if there was a way to have it search in whatever folder I am currently in like it did in OS 10.3. Instead it comes up searching my hard drive and I have to manual click on whatever folder I am in. Also, does anyone know how to make it so if you have something typed into spotlight in finder, it will stay in the search box instead of erasing whenever I move to another folder. It makes it very agitating to have to retype in the file name each folder I switch to. This is also another thing that had changed from OS 10.3. If anyone has any way to help me on either of these problems, I would really appreciate it. Thank you.

If you did this is applescript, you'd need to call unix through do shell script anyway, so don't bother with applescript. just use mdfind directly. e.g.:
# finds files that have a metadata attribute called MyKeyword with value DesiredValue
mdfind "MyKeyword == DesiredValue"
# finds files that have a metadata attribute called MyKeyword whose values start with Desi
mdfind "MyKeyword == 'Desi*'"
# finds files that have some metadata attribute whose value contains redVal
mdfind '*redVal*'
to make a smart folder just do a search in the Finder and then save it to the desktop (careful, it will default to saving it in the sidebar, which will put it in some funky folder down in you library). once you've saved it you can open it in a text editor or plist editor to see the contained mdfind command. Then it's just a question of modifying the right keywords. I have an applescript somewhere that does it, but it's not all that useful - almost as easy to make the smart folder by hand in the Finder, and there's no way to extract the files from the smart folder once you've made it.

Unable to copy very large file to eSATA external HDD

I am trying to copy a VMWare Fusion virtual machine, 57 GB, from my Macbook Pro's laptop hard drive to an external, eSATA hard drive, which is attached through an ExpressPort adapter. VMWare Fusion is not running and the external drive has lots of room. The disk utility finds no problems with either drive. I have excluded both the external disk and the folder on my laptop hard drive that contains my virtual machine from my Time Machihne backups. At about the 42 GB mark, an error message appears:
The Finder cannot complete the operation because some data in "Windows1-Snapshot6.vmem" could not be read or written. (Error code -36)
After I press OK to remove the dialog, the copy does not continue, and I cannot cancel the copy. I have to force-quit the Finder to make the copy dialog go away before I can attempt the copy again. I've tried rebooting between attempts, still no luck. I have tried a total of 4 times now, exact same result at the exact same place, 42 GB / 57 GB.
Any ideas?

Still no breakthrough from Apple. They're telling me to terminate the VMWare processes before attempting the copy, but had they actually read my description of the problem first, they would have known that I already tried this. Hopefully they'll continue to investigate.
From a correspondence with Tim, a support representative at Apple:
Hi Tim,
Thank you for getting back to me, I got your message. Although it is true that at the time I ran the Capture Data program there were some VMWare-related processes running (PID's 105, 106, 107 and 108), this was not the case when the issue occurred earlier. After initially experiencing the problem, this possibility had occurred to me so I took the time to terminate all VMWare processes using the activity monitor before again attempting to copy the files, including the processes mentioned by your engineering department. I documented this in my posting to apple's forum as follows: (quote is from my post of Feb 19, 2008, 1:28pm, to the thread "Unable to copy very large file to eSATA external HDD", relevant section in >bold print<)
Thanks for the suggestions. I have since tried this operation with 3 different drives through two different interface types. Two of the drives are identical - 3.5" 7200 RPM 1TB Western Digital WD10EACS (WD Caviar SE16) in external hard drive enclosures, and the other is a smaller USB2 100GB Western Digital WD1200U0170-001 external drive. I tried the two 1TB drives through eSATA - ExpressPort and also over USB2. I have tried the 100GB drive only over USB2 since that is the only interface on the drive. In all cases the result is the same. All 3 drives are formatted Mac OS Extended (Journaled).
I know the files work on my laptop's hard drive. They are a VMWare virtual machine that works just fine when I use it every day. >Before attempting the copy, I shut down VMWare and terminated all VMWare processes using the Activity Monitor for good measure.< I have tried the copy operation both through the finder and through the Unix command prompt using the drive's mount point of /Volumes/jfinney-ext-3.
Any more ideas?
Furthermore, to prove that there were no file locks present on the affected files, I moved them to a different location on my laptop's HDD and renamed them, which would not have been possible if there had been interference from vmware-related processes. So, that's not it.
Your suggested workaround, to compress the files before copying them to the external drive, may serve as a temporary workaround but it is not a solution. This VM will grow over time to the point where even the compressed version is larger than the 42GB maximum, and compressing and uncompressing the files will take me a lot of time for files of this size. Could you please continue to pursue this issue and identify the underlying cause?
Thank you,
- Jeremy

I support a very large school district currently running Firefox 3.6. What will happen at end of life date? We're in the middle of online testing this week.

I run the test center for a very large school district with over 120k students. We've got a current deployed base of 54k client machines using Firefox 3.6. We haven't upgraded due to multiple reasons, the most important of which is removing the possibility of using In Private Browsing from the students, and dealing with plugin-updates for the non digital natives (read dumber than a bag of hammers users) that make up the majority of the client base.
We're testing ESR now, but just found out that end of life for 3.6 is tomorrow, 4/24. We are currently in the middle of statewide online testing. The question is, what will happen tomorrow when the browser goes end of life. The ESR wiki mentions that "an update to the current version of Desktop Firefox will be offered through the Application Update Service"
So the main question is, are my students/teachers going to get a popup telling them they have to update the browser if we have the updates already turned off? If so, can I turn it off remotely using SCCM, because it will cause all kinds of havoc.
Please advise asap, and thanks in advance.

We had to do some serious gymnastics to remove at least most of the ability to use IPB. We removed it from the gui, but unfortunately, if they know the hotkey, they can still bring it up. Security has some serious headaches with this, as by law they have to be able to track where students go, and going with private browsing removes their ability to do forensic work they're required to be able to do. Not a very well thought out feature from Mozilla in my opinion, but it is what it is. Successive versions have made it even more difficult to remove even the gui portion.
We do plan to release ESR due to the aforementioned security issues, but testing has been slow.
But thanks for the reply. I think we can turn off the updates if it isn't already done.

Have a very large text file, and need to read lines in the middle.

I have very large txt files (around several hundred megabytes), and I want to be able to skip and read specific lines. More specifically, say the file looks like:
scan 1
scan 2
scan 3
scan 100,000
I want to be able to skip move the filereader immediately to scan 50,000, rather than having to read through scan 1-49,999.
Thanks for any help.

If the lines are all different lengths (as in your example) then there is nothing you can do except to read and ignore the lines you want to skip over.
If you are going to be doing this repeatedly, you should consider reformatting those text files into something that supports random access.

I inadvertently created a very large file. Not needing it I sent it to the Trash Can. However, the deletion process never completes. Any suggestions as to how to delete it ?

I inadvertently created a very large file on my hard drive. Not needing it I sent it to the Trash Can. However, the deletion process never completes. Any suggestions as to how to delete it ? I re-installed the OS but the file was still there taking up needed space.

Command-click all of the files you want to delete and don't actually move them to the Trash until you've gone through the entire folder.
(63341)

Please help!! "Can't open the illustration. This artwork contains a very large image that can not...

Hi all, I am subscribing Illustrator CS6 16.0.0 and using it on Windows 7.
Few days ago I was working with one file, I saved it successfully and now when I'm trying to open it an error message occures "Can't open the illustration. This artwork contains a very large image that can not be read on this version of AI. Please try with 64-bit version." and ALMOST ALL of the objects (vector) are missing as if they were deleted!
It's kind of strange since I created the file with the same program and everything was working properly before.
Please Please advice further steps for recovering my file.

Thank you so much! the file is recovered (as well as my emotional state )
The finding of the day - apparently I have two versions of AI in my PC!

Can iCloud be used to synchronize a very large Aperture library across machines effectively?

Just purchased a new 27" iMac (3.5 GHz i7 with 8 GB and 3 TB fusion drive) for my home office to provide support. Use a 15" MBPro (Retina) 90% of the time. Have a number of Aperture libraries/files varying from 10 to 70 GB that are rapidly growing. Have copied them to the iMac using a Thunderbolt cable starting the MBP in target mode.
While this works I can see problems keeping the files in sync. Thought briefly of putting the files in DropBox but when I tried that with a small test file the load time was unacceptable so I can imagine it really wouldn't be practical when the files get north of 100 GB. What about iCloud? Doesn't appear a way to do this but wonder if that's an option.
What are the rest of you doing when you need access to very large files across multiple machines?
David Voran

Hi David,
dvoran wrote:
Don't you have similar issues when the libraries exceed several thousand images? If not what's your secret to image management.
No, I don't .
It's an open secret: database maintenance requires steady application of naming conventions, tagging, and backing-up. With the digitization of records, losing records by mis-filing is no longer possible. But proper, consistent labeling is all the more important, because every database functions as its own index -- and is only as useful as the index is uniform and holds content that is meaningful.
I use one, single, personal Library. It is my master index of every digital photo I've recorded.
I import every shoot into its own Project.
I name my Projects with a verbal identifier, a date, and a location.
I apply a metadata pre-set to all the files I import. This metadata includes my contact inf. and my copyright.
I re-name all the files I import. The file name includes the date, the Project's verbal identifier and location, and the original file name given by the camera that recorded the data.
I assign a location to all the Images in each Project (easy, since "Project" = shoot; I just use the "Assign Location" button on the Project Inf. dialog).
I _always_ apply a keyword specifying the genre of the picture. The genres I use are "Still-life; Portrait; Family; Friends; People; Rural; Urban; Birds; Insects; Flowers; Flora (not Flowers); Fauna; Test Shots; and Misc." I give myself ready access to these by assigning them to a Keyword Button Set, which shows in the Control Bar.
That's the core part. Should be "do-able". (Search the forum for my naming conventions, if interested.) Or course, there is much more, but the above should allow you to find most of your Images (you have assigned when, where, why, and what genre to every Image). The additional steps include using Color Labels, Project Descriptions, keywords, and a meaningful Folder structure. NB: set up your Library to help YOU. For example, I don't sell stock images, and so I have no need for anyone else's keyword list. I created my own, and use the keywords that I think I will think of when I am searching for an Image.
One thing I found very helpful was separating my "input and storage" structure from my "output" structure. All digicam files get put in Projects by shoot, and stay there. I use Folders and Albums to group my outputs. This works for me because my outputs come from many inputs (my inputs and outputs have a many-to-many relationship). What works for you will depend on what you do with the picture data you record with your cameras. (Note that "Project" is a misleading term for the core storage group in Aperture. In my system they are shoots, and all my Images are stored by shoot. For each output project I have (small "p"), I create a Folder in Aperture, and put Albums, populated with the Images I need, in the Folder. When these projects are done, I move the whole Folder into another Folder, called "Completed".)
Sorry to be windy. I don't have time right now for concision.
HTH,
--Kirby.

Efficient searching in a large XML file for specific elements

Hi
How can I search in a large XML file for a specific element efficiently (fast and memory savvy?) I have a large (approximately 32MB with about 140,000 main elements) XML file and I have to search through it for specific elements. What stable and production-ready open source tools are available for such tasks? I think PDOM is a solution but I can't find any well-known and stable implementations on the web.
Thanks in advance,
Behrang Saeedzadeh.

The problem with DOM parsers is that the whole document needs to be parsed!
So with large documents this uses up a lot of memory.
I suggest you look at sometthing like a pull parser (Piccolo or MPX1) which is a fast parser that is program driven and not event driven like SAX. This has the advantage of not needing to remember your state between events.
I have used Piccolo to extract events from large xml based log files.
Carl.

Safari crashes when opening a very large pdf

I have a 1st generation ipad running 4.3.5.
Everytime I try to download a very large pdf in Safari ie. 175+ megs in gets about three quarters of the way through then Safari crashes and loose my progress. I will then restart Safari and the process restarts and crashes again.
I've set auto-lock to never but that didn't help. Any ideas how to get Safari to download this file.
ps I've considered other methods of getting the pdf but for this project I have to download it from a web site.
Thanks for any help.

Other apps can download PDFs - I don't know whether it can cope with a 175 meg download, but GoodReader can download files : http://www.goodreader.net/gr-man-tr-web.html

Searching through very large vectors

Similar Messages

Maybe you are looking for