Handling large ResultSets

I want to retrieve about 30 rows at a time from our DB2/AS400. The table contains over 4,000,000 rows. I would like to begin at the first row and drag 30 rows over the network. Then get the next 30 if the user requests them. I know the answer is to use cursors but I cannot use these statements within my code on the AS400. Websphere Studio allows me to create jsp's using a <tsx:repeat> tag to iterate over the result set but the instructions on using these is pretty vague.
Can anyone direct me to some informative sites with examples or recommend a way to go about this?

That would be fantastic and my ideal approach but the
manager of the department wants to keep the
functionality of what they presently use and he and
his team wrote 20 years ago written in RPG with
loverly green screens. They scroll 10 rows at a time,
jump to the top or the bottom of of the table and
type in the first two three or four letters of the
search parameter and get the results which can also
be scrolled. Some tables are even worse, one contains
over 10 million rows.We have a lot of those green screen applications in our AS/400 systems too. So I can tell you (and your manager should be able to confirm this) that the "subfiles" that they scroll cannot contain more than 9,999 records. But in real life, even in our AS/400 environment, nobody ever starts at the beginning of our customer file (which does have more than 9,999 records) and scrolls through it looking for something. They put something into the search fields first.
So displaying the first 10 records of the file before allowing somebody to enter the search criteria is pointless. And jumping to the end of the table is pointless too -- unless the table is ordered by date and you want to get recent transactions, in which case you should be sorting it by descending date anyway. My point is that those AS/400 programs were written that way because it was easy to write them that way, not necessarily because people would ever use those features. When you have hundreds of tables (as we do), it's easier just to copy and paste an old program to produce a maintenance program for a new table than it is to start from scratch and ask the users what they really need. That's why all the programs look alike there. It's not because the requirements are all the same, it's because it's easier for the programmers to write them.
Here's another example: Google. When you send it a query it comes back with something like "Results 1 - 10 of about 2,780,000". But you can't patiently page through all of those 2,780,000 results: Google only saves the first 1000 for you to look at, and won't show you more than that.
So I agree, a program that's simply designed to let somebody page through millions of records needs to be redesigned. If you want to write a generic program that lets people page through small files (less than 1000 records, let's say) there's nothing wrong with that, but your users will curse you if you make them use it for large files.

Similar Messages

How do I handle large resultsets in CRXI without a performance issue?

Hello -
Problem Definition
I have a performance problem displaying large/huge resultset of data on a crystal report. The report takes about 4 minutes or more depending on the resultset size.
How do you handle large resultsets in Crystal Reports without a performance issue?
Environment
Crystal Reports XI
Apache WebSvr 2.X, Jboss 4.2.3, Struts
Java Reporting Component (JRC),Crystal Report Viewer (CRV)
Firefox
DETAILS
I use the CRXI thick client to build my report (.rpt) and then use it in my webapplication (webapp) under Jboss.
User specifies the filter criteria to generate a report (date range etc) and submits the request to the webapp. Webapp queries the database, gets a "resultset".
I initialize the JRC and CRV according to all the specifications and finally call the "processHttpRequest" method of Crystal Report Viewer to display the report on browser.
So.....
- Request received to generate a report with a filter criteria
- Query DB to get resultset
- Initialize JRC and CRV
- finally display the report by calling
reportViewer.processHttpRequest(request, response, request.getSession().getServletContext(), null);
The performance problem is within the last step. I put logs everywhere and noticed that database query doesnt take too long to return resultset. Everything processes pretty quickly till I call the processHttpRequest of CRV. This method just hangs for a long time before displaying the report on browser.
CRV runs pretty fast when the resultset is smaller, but for large resultset it takes a long long time.
I do have subreports and use Crystal report formulas on the reports. Some of them are used for grouping also. But I dont think Subreports is the real culprit here. Because I have some other reports that dont have any subreports, and they too get really slow displaying large resultsets.
Solutions?
So obviously I need a good solution to this generic problem of "How do you handle large resultsets in Crystal Reports?"
I have thought of some half baked ideas.
A) Use external pagination and fetch data only for the current page being displayed. But for this, CRXI must allow me to create my own buttons (previous, next, last), so I can control the click event and fetch data accordingly. I tried capturing events by registering event handler "addToolbarCommandEventListener" of CRV. But my listener gets invoked "after" processHttpRequest method completes, which doesnt help.
Some how I need to be able to control the UI by adding my own previous page, next page, last page buttons and controlling it's click events.
B) Automagically have CRXI use a javascript functionality, to allow browser side page navigation. So maybe the first time it'll take 5 mins to display the report, but once it's displayed, user can go to any page without sending the request back to server.
C) Try using Crystal Reports 2008. I'm open to using this version, but I couldnt figureout if it has any features that can help me do external pagination or anything that can handle large resultsets.
D) Will using the Crystal Reports Servers like cache server/application server etc help in any way? I read a little on the Crystal Page Viewer, Interactive Viewer, Part Viewer etc....but I'm not sure if any of these things are going to solve the issue.
I'd appreciate it if someone can point me in the right direction.

Essentialy the answer is use smaller resultsets or pull from the database directly instead of using resultsets.

Ways to handle large volume data (file size = 60MB) in PI 7.0 file to file

Hi,
In a file to file scenario (flat file to xml file), the flat file is getting picked up by FCC and then send to XI. In xi its performing message mapping and then XSL transformation in a sequence.
The scenario is working fine for small files (size upto 5MB) but when the input flat file size is more then 60 MB, then XI is showing lots of problem like (1) JCo call error or (2) some times even XI is stoped and we have to strat it manually again to function properly.
Please suggest some way to handle large volume (file size upto 60MB) in PI 7.0 file to file scenario.
Best Regards,
Madan Agrawal.

Hi Madan,
If every record of your source file was processed in a target system, maybe you could split your source file into several messages by setting up this in Recordset Per Messages parameter.
However, you just want to convert you .txt file into a .xml file. So, try firstly to setting up
EO_MSG_SIZE_LIMIT parameter in SXMB_ADM.
However this could solve the problem in Inegration Engine, but the problem will persit in Adapter Engine, I mean, JCo call error ...
Take into account that file is first proccessed in Adapter Engine, File Content Conversion and so on...
and then it is sent to the pipeline in Integration Engine.
Carlos

Best practices for handling large messages in JCAPS 5.1.3?

Hi all,
We have ran into problems while processing larges messages in JCAPS 5.1.3. Or, they are not that large really. Only 10-20 MB.
Our setup looks like this:
We retrieve flat file messages with from an FTP server. They are put onto a JMS queue and are then converted to and from different XML formats in several steps using a couple of jcds with JMS queues between them.
It seems that we can handle one message at a time but as soon as we get two of these messages simultaneously the logicalhost freezes and crashes in one of the conversion steps without any error message reported in the logicalhost log. We can't relate the crashes to a specific jcd and it seems that the memory consumption increases A LOT for the logicalhost-process while handling the messages. After restart of the server the message that are in the queues are usually converted ok. Sometimes we have however seen that some message seems to disappear. Scary stuff!
I have heard of two possible solutions to handle large messages in JCAPS so far; Splitting them into smaller chunks or streaming them. These solutions are however not an option in our setup.
We have manipulated the JVM memory settings without any improvements and we have discussed the issue with Sun's support but they have not been able to help us yet.
My questions:
* Any ideas how to handle large messages most efficiently?
* Any ideas why the crashes occur without error messages in the logs or nothing?
* Any ideas why messages sometimes disappear?
* Any other suggestions?
Thanks
/Alex

* Any ideas how to handle large messages most efficiently? --
Strictly If you want to send entire file content in JMS message then i don't have answer for this question.
Generally we use following process
After reading the file from FTP location, we just archive in local directory and send a JMS message to queue
which contains file name and file location. Most of places we never send file content in JMS message.
* Any ideas why the crashes occur without error messages in the logs or nothing?
Whenever JMSIQ manager memory size is more lgocialhosts stop processing. I will not say it is down. They
stop processing or processing might take lot of time
* Any ideas why messages sometimes disappear?
Unless persistent is enabled i believe there are high chances of loosing a message when logicalhosts
goes down. This is not the case always but we have faced similar issue when IQ manager was flooded with lot
of messages.
* Any other suggestions
If file size is more then better to stream the file to local directory from FTP location and send only the file
location in JMS message.
Hope it would help.

Handling large messages with MQ JMS sender adapter

Hi.
Im having trouble handling large messages with a MQ JMS sender adapter.
The messages are around 35-40MB.
Are there any settings I can ajust to make the communication channel work?
Error message is:
A channel error occurred. The detailed error (if any) : JMS error:MQJMS2002: failed to get message from MQ queue, Linked error:MQJE001: Completion Code 2, Reason 2010, Error Code:MQJMS2002
The communication channel works fine with small messages!
Im on SAP PI 7.11, MQ Driver is version 6.
Best Regards...
Peter

The problem solved itself, when the MQ server crashed and restarted.
I did find a note that might could have been useful:
Note 1258335 - Tuning the JMS service for large messages or many consumers
A relevant post as well: http://forums.sdn.sap.com/thread.jspa?threadID=1550399

Returning a Large ResultSet

At the momoent we use our own queryTableModel to fetch data from the database. Although we use the traditional (looping on ResultSet.next())method of loading the data into a vector, we find that large ResultSets (1000+ rows) take a considerable amount of time to load into the vector.
Is there a more efficient way of storing the ResultSet other than using a vector? We believe the addElement method constantly expanding the vector is the cause of the slowdown.
Any tips appreciated.

One more thing:
We believe the addElement method constantly expanding the vector is the cause of the slowdown.You probably are rigth, but this is easy to avoid: both Vector and ArrayList have one constructor in which you can specify the initial size, so could save much time in growing the List, and in Vector class, as in another collection classes as HashSet, there is a constructor in which you can specify plus the initial size the loafFactor
Abraham.

PL/SQL Dev query session lost on large resultsets after db update

We have a problem with our PL/SQL Developer tool (www.allroundautomations.nl) since updating our Database.
So far we had Oracle DB 10.1.0.5 Patch 2 on W2k3 and XP-Clients using Instant Client 10.1.0.5 and PL/SQL Developer 5.16 or 6.05 to query your DB. This scenario worked well.
Now we upgraded to ORACLE 10G 10.1.0.5 PATCH 25 and now our PL/SQL Developer 5.16 or 6.05 (on IC 10.1.0.5) can logon the db and also query small tables. But as soon as the resultset reaches a certain size, the query on a table won't come to an end and is always showing "Executing...". We can only press "BREAK" what the results in a "ORA-12152: TNS: unable to send break message" and "ORA-03114: not connected to ORACLE".
If i narrow the resultset down on the same table it works like before.
If i watch the sessions on small resultset-queries, i see the corresponding session, but on large resultset-queries the session seem to close immediately.
To solve this issue a already tried to install the newest PL/SQL developer 7.1.5(trail) or/and installing a newer instant client version (10.2.0.4), which both did not solve the problem.
Is there a new option in 10.1.0.5 Patch 25 (or before) which closes sessions if the resultsets getting to large over a slower internet connection?
btw. using sqlplus in the instantclient directory or even excel over odbc on the same client, returns the full resultset without problems. Could this be some kind of timeout problem ?
Edit:
Here is a snippet of the tracefile on the client right after Executing the select-statement. Some data seems to be retrieved and than it ends with these lines:
(1992) [20-AUG-2008 17:13:00:953] nsprecv: 2D 20 49 6E 74 72 61 6E |-.Intran|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: 65 74 2D 47 72 75 6E 64 |et-Grund|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: 6C 61 67 65 6E 02 C1 04 |lagen...|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: 02 C1 03 02 C1 0B 02 C1 |........|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: 51 00 02 C1 03 02 C1 2D |Q......-|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: 05 48 4B 4F 50 50 01 80 |.HKOPP..|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: 03 3E 64 66 01 80 07 78 |.>df...x|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: 65 0B 0F 01 01 01 07 76 |e......v|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: C7 01 01 09 01 01 07 76 |.......v|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: C7 01 01 18 01 01 07 78 |.......x|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: 65 0B 0F 01 01 01 07 76 |e......v|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: C7 01 01 09 01 01 07 76 |.......v|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: C7 01 01 18 01 01 02 C1 |........|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: 3B 02 C1 02 01 80 00 00 |;.......|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: 00 00 00 00 00 00 00 00 |........|
(1992) [20-AUG-2008 17:13:00:953] nsprecv: 00 00 01 80 15 0C 00 |....... |
(1992) [20-AUG-2008 17:13:00:953] nsprecv: normal exit
(1992) [20-AUG-2008 17:13:00:953] nsrdr: got NSPTDA packet
(1992) [20-AUG-2008 17:13:00:953] nsrdr: NSPTDA flags: 0x0
(1992) [20-AUG-2008 17:13:00:953] nsrdr: normal exit
(1992) [20-AUG-2008 17:13:00:953] snsbitts_ts: entry
(1992) [20-AUG-2008 17:13:00:953] snsbitts_ts: acquired the bit
(1992) [20-AUG-2008 17:13:00:953] snsbitts_ts: normal exit
(1992) [20-AUG-2008 17:13:00:953] snsbitcl_ts: entry
(1992) [20-AUG-2008 17:13:00:953] snsbitcl_ts: normal exit
(1992) [20-AUG-2008 17:13:00:953] nsdo: what=1, bl=2001
(1992) [20-AUG-2008 17:13:00:953] snsbitts_ts: entry
(1992) [20-AUG-2008 17:13:00:953] snsbitts_ts: acquired the bit
(1992) [20-AUG-2008 17:13:00:953] snsbitts_ts: normal exit
(1992) [20-AUG-2008 17:13:00:953] nsdo: nsctxrnk=0
(1992) [20-AUG-2008 17:13:00:953] snsbitcl_ts: entry
(1992) [20-AUG-2008 17:13:00:953] snsbitcl_ts: normal exit
(1992) [20-AUG-2008 17:13:00:953] nsdo: normal exit
(1992) [20-AUG-2008 17:13:00:953] nioqrc: exit
(1992) [20-AUG-2008 17:13:00:953] nioqrc: entry
(1992) [20-AUG-2008 17:13:00:953] nsdo: entry
(1992) [20-AUG-2008 17:13:00:953] nsdo: cid=0, opcode=85, bl=0, what=0, uflgs=0x0, cflgs=0x3
(1992) [20-AUG-2008 17:13:00:953] snsbitts_ts: entry
(1992) [20-AUG-2008 17:13:00:953] snsbitts_ts: acquired the bit
(1992) [20-AUG-2008 17:13:00:953] snsbitts_ts: normal exit
(1992) [20-AUG-2008 17:13:00:953] nsdo: rank=64, nsctxrnk=0
(1992) [20-AUG-2008 17:13:00:953] snsbitcl_ts: entry
(1992) [20-AUG-2008 17:13:00:953] snsbitcl_ts: normal exit
(1992) [20-AUG-2008 17:13:00:953] nsdo: nsctx: state=8, flg=0x100400d, mvd=0
(1992) [20-AUG-2008 17:13:00:953] nsdo: gtn=127, gtc=127, ptn=10, ptc=2011
(1992) [20-AUG-2008 17:13:00:953] snsbitts_ts: entry
(1992) [20-AUG-2008 17:13:00:953] snsbitts_ts: acquired the bit
(1992) [20-AUG-2008 17:13:00:953] snsbitts_ts: normal exit
(1992) [20-AUG-2008 17:13:00:953] snsbitcl_ts: entry
(1992) [20-AUG-2008 17:13:00:953] snsbitcl_ts: normal exit
(1992) [20-AUG-2008 17:13:00:953] nsdo: switching to application buffer
(1992) [20-AUG-2008 17:13:00:953] nsrdr: entry
(1992) [20-AUG-2008 17:13:00:953] nsrdr: recving a packet
(1992) [20-AUG-2008 17:13:00:953] nsprecv: entry
(1992) [20-AUG-2008 17:13:00:953] nsprecv: reading from transport...
(1992) [20-AUG-2008 17:13:00:968] nttrd: entry
Message was edited by:
vhbtech

Found nothing in the \bdump alert.log or \bdump trace files. I only have the DEFAULT profile and everything is set to UNLIMITED there.
But the \udump generates a trace file the moment i execute the query:
Dump file <path>\udump\<sid>ora4148.trc
Fri Aug 22 09:12:18 2008
ORACLE V10.1.0.5.0 - Production vsnsta=0
vsnsql=13 vsnxtr=3
Oracle Database 10g Release 10.1.0.5.0 - Production
With the OLAP and Data Mining options
Windows Server 2003 Version V5.2 Service Pack 2
CPU : 2 - type 586, 1 Physical Cores
Process Affinity : 0x00000000
Memory (Avail/Total): Ph:898M/3071M, Ph+PgF:2675M/4967M, VA:812M/2047M
Instance name: <SID>
Redo thread mounted by this instance: 1
Oracle process number: 33
Windows thread id: 4148, image: ORACLE.EXE (SHAD)
*** 2008-08-22 09:12:18.731
*** ACTION NAME:(SQL Window - select * from stude) 2008-08-22 09:12:18.731
*** MODULE NAME:(PL/SQL Developer) 2008-08-22 09:12:18.731
*** SERVICE NAME:(<service-name>) 2008-08-22 09:12:18.731
*** SESSION ID:(145.23131) 2008-08-22 09:12:18.731
opitsk: network error occurred while two-task session server trying to send break; error code = 12152
This trace is only generated if the query with a expected large resultset fails. If i narrow down the resultset no trace is written, well and the query then works of course.

How to handle large result set of a SQL query

Hi,
I have a question about how to handle large result set of a SQL query.
My query returns more than a million records. However, the Query Template has a "row count" parameter. If I don't specify it, it by default returns only 100 lines of records in the query result. If I specify it, then it's limited to a specific number.
Is there any way to get around of this row count issue? I don't want any restriction on the number of records returned by a query.
Thanks a lot!

No human can manage that much data...in a grid, a chart, or a direct-connected link to the brain.
What you want to implement (much like other customers with similar requirements) is a drill-in and filtering model that helps the user identify and zoom in on data of relevance, not forcing them to scroll through thousands or millions of records.
You can also use a time-based paging model so that you only deal with a time "slice" at one request (e.g. an hour, day, etc...) and provide a scrolling window. This is commonly how large datasets are also dealt with in applications.
I would suggest describing your application in more detail, and we can offer design recommendations and ideas.
- Rick

Best way to handle large amount of text

hello everyone
My project involves handling large amount of text.(from
conferences and
reports)
Most of them r in Ms Word. I can turn them into RTF format.
I dont want to use scrolling. I prefer turning pages(next,
previous, last,
contents). which means I need to break them into chunks.
Currently the process is awkward and slow.
I know there wud b lots of people working on similar
projects.
Could anyone tell me an easy way to handle text. Bring them
into cast and
break them.
any ideas would be appreciated
thanx
ahmed

Hacking up a document with lingo will probably loose the rtf
formatting
information.
Here's a bit of code to find the physical position of a given
line of on
screen text (counting returns is not accurate with word
wrapped lines)
This stragety uses charPosToLoc to get actual position for
the text
member's current width and font size
maxHeight = 780 -- arbitrary display height limit
T = member("sourceText").text
repeat with i = 1 to T.line.count
endChar = T.line[1..i].char.count
lineEndlocV = charPosToLoc(member "sourceText",
endChar).locV
if lineEndlocV > maxHeight then -- fount "1 too many"
line
-- extract identified lines "sourceText"
-- perhaps repeat parce with remaining part of "sourceText"
singlePage = T.line[1..i - 1]
member("sourceText").text = T.line[i..99999] -- put remaining
text back
into source text member
If you want to use one of the roundabout ways to display pdf
in
director. There might be some batch pdf production tools that
can create
your pages in pretty scalable pdf format.
I think flashpaper documents can be adapted to director.

Handling large files in scope of WSRP portlets

Hi there,
just wanted to ask if there are any best practices in respect to handling large file upload/download when using WSRP portlets (apart from by-passing WebCenter all-together for these use-cases, that is). We continue to get OutOfMemoryErrors and TimeoutExceptions as soon as the file being transfered becomes larger than a few hundred megabytes. The portlet is happily streaming the file as part of its javax.portlet.ResourceServingPortlet.serveResource(ResourceRequest, ResourceResponse) implementation, so the problem must somehow lie within WebCenter itself.
Thanks in advance,
Chris

Hi Yash,
Check this blogs for the strcuture you are mentioning:
/people/shabarish.vijayakumar/blog/2006/02/27/content-conversion-the-key-field-problem
/people/shabarish.vijayakumar/blog/2005/08/17/nab-the-tab-file-adapter
Regards,
---Satish

Can express vi handle large data

Hello,
I'm facing problem in handling large data using express vi's. The input to express vi is a large data of 2M samples waveform & i am using 4 such express vi's each with 2M samples connected in parallel. To process these data the express vi's are taking too much of time compared to other general vi's or subvi's. Can anybody give the reason why its taking too much time in processing. As per my understanding since displaying large data in labview is not efficient & since the express vi's have an internal display in the form of configure dialog box. Hence i feel most of the processing time is taken to plot the data on the graph of configure dailog box. If this is correct then Is there any solution to overcome this.
waiting for reply
Thanks in advance

Hi sayaf,
I don't understand your reasoning for not using the "Open Front Panel"
option to convert the Express VI to a standard VI. When converting the
Express VI to a VI, you can save it with a new name and still use the
Express VI in the same VI.
By the way, have you heard about the NI LabVIEW Express VI Development Toolkit? That is the choice if you want to be able to create your own Express VIs.
NB: Not all Express VIs can be edited with the toolkit - you should mainly use the toolkit to develop your own Express VIs.
Have fun!
- Philip Courtois, Thinkbot Solutions

How to handle large heap requirement

Hi,
Our Application requires large amount of heap memory to load data in memory for further processing.
Application is load balanced and we want to share the heap across all servers so one server can use heap of other server.
Server1 and Server2 have 8GB of RAM and Server3 has 16 GB of RAM.
If any request comes to server1 and if it requires some more heap memory to load data, in this scenario can server1 use serve3’s heap memory?
Is there any mechanism/product which allows us to share heap across all the servers? OR Is there any other way to handle large heap requirement issue?
Thanks,
Atul

user13640648 wrote:
Hi,
Our Application requires large amount of heap memory to load data in memory for further processing.
Application is load balanced and we want to share the heap across all servers so one server can use heap of other server.
Server1 and Server2 have 8GB of RAM and Server3 has 16 GB of RAM.
If any request comes to server1 and if it requires some more heap memory to load data, in this scenario can server1 use serve3’s heap memory?
Is there any mechanism/product which allows us to share heap across all the servers? OR Is there any other way to handle large heap requirement issue? That isn't how you design it (based on your brief description.)
For any transaction A you need a set of data X.
For another transaction B you need a set of data Y which might or might not overlap with X.
The set of data (X or Y) is represented by discrete hunks of data (form is irrelevant) which must be loaded.
One can preload the server with this data or do a load on demand.
Once in memory it is cached.
One can refine this further with alternative caching strategies that define when loaded data is unloaded and how it is unloaded.
JEE servers normally support this in a variety of forms. But one can custom code it as well.
JEE servers can also replicate cached data across server instances. Custom code can do this but it is more complicated than doing the custom caching.
A load balanced system exists for performance and failover scenarios.
Obviously in a failover situation a "shared heap" would fail completely (as asked about) because the other server would be gone.
One might also need to support very large data sets. In that case something like Memcached (google for it) can be used. There are commercial solutions in this space as well. This allows for distributed caching solutions which can be scaled.

Processing large Resultsets quickly or parallely

How to process a large Resultset that contains a purchase entries of say 20 K users. Each user may have 1 or more purchase entries. The resultset is ordered by userid. And the other fields are itemname, quantity, price.
Mine is a quad processor machine.
Thanks.

you're going to need to provide a lot more details. for instance, is the slow part reading the data from the database, or the processing that you are going to do on the data? if the former, then in order to do work in parallel, you probably need separate threads with their own resultsets. if the latter, then you could parallelize the work by having one thread read the resultset and push the data onto a shared work queue, from which multiple worker threads were reading. these are just a few of the possibilities.

Best way to return large resultsets

Hi everyone,
I have a servlet that searches a (large) database by complex queries and sends the results to an applet. Since the database is quite large and the queries can be quite general, it is entirely possible that a particular query can generate a million rows.
My question is, how do I approach this problem from a design standpoint? For instance, should I send the query without limits and get all the results (possibly a million) back? Or should I get only a few rows, say 50,000, at a time by using the SQL limit construct or some other method? Or should I use some totally different approach?
The reason I am asking this question is that I have never had to deal with such large results and the expertise on this group will help me avoid some of the design pitfalls at the very outset. Of course, there is the question of whether the servlet should send so may results at once to the applet, but thats probably for another forum.
Thanks in advance,
Alan

If you are using a one of the premiere databases (Oracle, SQL Server, Informix) I am fairly confident that it would be best to allow the database to manage both the efficiency of the query, and the efficiency of the transport.
QUERY EFFICIENCY
Query efficiences in all databases are optimized by the DBMS to a general algorithm. That means there are assumptions made by the DBMS as to the 'acceptable' number of rows to process, the number of tables to join, the number of rows that will be returned, etc. These general algorithms do an excellent job on 95+% of queries run against database. However, f you fall outside the bounds of these general algorithms, you will run into escalating performance problems. Luckily, SQL syntax provides enourmous flexibility in how to get your data from the database, and you can code the SQL to 'help' the database do a better job when SQL performance becomes a problem. On the extreme, it is possible that you will issue a query that overwhelms the database, and the physical resources available to the database (memory, CPU, I/O channels, etc). Sometimes this can happen even when a ResultSet returns only a single row. In the case of a single row returned, it is the intermediate processing (table joins, sorts, etc) that overwhelms the resources. You can help manage the memory resource issue by purchasing more memory (obviously), or re-code the SQL to a more apply a more efficent algorithm (make the optimizer do a better job), or you may as a last resort, have to break the SQL up into seperate SQL statements, using more granual approach (this is your "where id < 1000"). BTW: If you do have to use this approach, in most casees using the BETWEEN is often more efficient.
TRANSPORT
Most if not all of the JDBC drivers return the ResultSet data in 'blocks' of rows, that are delivered on an as needed basis to your program. Some databases alllow you to specify the size of these 'blocks' to aid in the optimization of your batch style processes. Assuming that this is true for your JDBC driver, you cannot manage it better than the JDBC driver implementation, so you should not try. In all cases, you should allow the database to handle as much of the data manipulation and transport logic as possible. They have 1000's of programmers working overtime to optimzie that code. They just have you out numbered, and while it's possible that you can code an efficiency, it's possible that you will be unable to take advantage of future efficiencies within the database due to your proprietary efficiencies.
You have some interesting, and important decisions to make. I'm not sure how much control of the architecture is available, but you may want to consider alternatives to moving these large amounts of data around through the JDBC architecture. Is it possible to store this information on the server, and have it fetched using FTP or some other simple transport? Far less CPU usage, and more efficient use of your bandwith.
So in case it wasn't clear, no, I don't think you should break up the SQL initially. If it were me, I would probably spend the time in putting out some metic based information to allow you to better judge where you are having slow downs when or if any should occur. With something like this, I have seen I.T. spend hours and hours tuning SQL just to find out that the network was the problem (or vice versa). I would also go ahead and run the expected queries outside of application and determine what kind of problems there are before coding of the application is finished.
Hey, this got a bit wordy, sorry. Hopefully there is something in here that can help you...Joel

DbKona and large resultsets

Hi,
I'm working on a application that will allow a user to run ad-hoc queries against
a database. These queries can return huge resultsets (between 50,000 & 450,000
rows). Per our requirements, we can't limit (through the database anyway) the
number of rows returned from a random query. In trying to deal with the large
number of results, I've been looking at the dbKona classes. I can't find a solution
with the regular JDBC stuff and CachedRowSet is just not meant to handle that
much data.
While running a test (based on an example given in the dbKona manual), I keep
running into out of memory issues. The code follows:
QueryDataSet qds = new TableDataSet(resultSet);
while (!qds.allRecordsRetrieved()) {
DataSet currentData = qds.fetchRecords(1000);
// Process the hundred records . . .
currentData.clearRecords();
currentData = null;
qds.clearRecords();
I'm currently not doing any processing with the records returned for this trivial
test. I just get them and clear them immediately to see if I can actually get
them all. On a resultset of about 45,000 rows I get an out of memory error about
halfway through the fetches. Are the records still being held in memory? Am
I doing something incorrectly?
Thanks for any help,
K Lewis

I think I found the problem. From some old test, the Statement object I made returned
a Scrollable ResultSet. I couldn't find a restriction on this immediately in
the docs (or maybe its just a problem of the Oracle driver?). As soon as I moved
back to the the default type of ResultSet (FORWARD_ONLY) I was able to process
150,000 records just fine.
"Sree Bodapati" <[email protected]> wrote:
Hi
Can you tell me what JDBC driver you are using?
sree
"K Lewis" <[email protected]> wrote in message
news:[email protected]..
Hi,
I'm working on a application that will allow a user to run ad-hoc queriesagainst
a database. These queries can return huge resultsets (between 50,000&
450,000
rows). Per our requirements, we can't limit (through the databaseanyway)
the
number of rows returned from a random query. In trying to deal withthe
large
number of results, I've been looking at the dbKona classes. I can'tfind
a solution
with the regular JDBC stuff and CachedRowSet is just not meant to handlethat
much data.
While running a test (based on an example given in the dbKona manual),I
keep
running into out of memory issues. The code follows:
QueryDataSet qds = new TableDataSet(resultSet);
while (!qds.allRecordsRetrieved()) {
DataSet currentData = qds.fetchRecords(1000);
// Process the hundred records . . .
currentData.clearRecords();
currentData = null;
qds.clearRecords();
I'm currently not doing any processing with the records returned forthis
trivial
test. I just get them and clear them immediately to see if I can actuallyget
them all. On a resultset of about 45,000 rows I get an out of memoryerror about
halfway through the fetches. Are the records still being held in memory?Am
I doing something incorrectly?
Thanks for any help,
K Lewis

Handling large ResultSets

Similar Messages

Maybe you are looking for