Best way to return large resultsets

Hi everyone,
I have a servlet that searches a (large) database by complex queries and sends the results to an applet. Since the database is quite large and the queries can be quite general, it is entirely possible that a particular query can generate a million rows.
My question is, how do I approach this problem from a design standpoint? For instance, should I send the query without limits and get all the results (possibly a million) back? Or should I get only a few rows, say 50,000, at a time by using the SQL limit construct or some other method? Or should I use some totally different approach?
The reason I am asking this question is that I have never had to deal with such large results and the expertise on this group will help me avoid some of the design pitfalls at the very outset. Of course, there is the question of whether the servlet should send so may results at once to the applet, but thats probably for another forum.
Thanks in advance,
Alan

If you are using a one of the premiere databases (Oracle, SQL Server, Informix) I am fairly confident that it would be best to allow the database to manage both the efficiency of the query, and the efficiency of the transport.
QUERY EFFICIENCY
Query efficiences in all databases are optimized by the DBMS to a general algorithm. That means there are assumptions made by the DBMS as to the 'acceptable' number of rows to process, the number of tables to join, the number of rows that will be returned, etc. These general algorithms do an excellent job on 95+% of queries run against database. However, f you fall outside the bounds of these general algorithms, you will run into escalating performance problems. Luckily, SQL syntax provides enourmous flexibility in how to get your data from the database, and you can code the SQL to 'help' the database do a better job when SQL performance becomes a problem. On the extreme, it is possible that you will issue a query that overwhelms the database, and the physical resources available to the database (memory, CPU, I/O channels, etc). Sometimes this can happen even when a ResultSet returns only a single row. In the case of a single row returned, it is the intermediate processing (table joins, sorts, etc) that overwhelms the resources. You can help manage the memory resource issue by purchasing more memory (obviously), or re-code the SQL to a more apply a more efficent algorithm (make the optimizer do a better job), or you may as a last resort, have to break the SQL up into seperate SQL statements, using more granual approach (this is your "where id < 1000"). BTW: If you do have to use this approach, in most casees using the BETWEEN is often more efficient.
TRANSPORT
Most if not all of the JDBC drivers return the ResultSet data in 'blocks' of rows, that are delivered on an as needed basis to your program. Some databases alllow you to specify the size of these 'blocks' to aid in the optimization of your batch style processes. Assuming that this is true for your JDBC driver, you cannot manage it better than the JDBC driver implementation, so you should not try. In all cases, you should allow the database to handle as much of the data manipulation and transport logic as possible. They have 1000's of programmers working overtime to optimzie that code. They just have you out numbered, and while it's possible that you can code an efficiency, it's possible that you will be unable to take advantage of future efficiencies within the database due to your proprietary efficiencies.
You have some interesting, and important decisions to make. I'm not sure how much control of the architecture is available, but you may want to consider alternatives to moving these large amounts of data around through the JDBC architecture. Is it possible to store this information on the server, and have it fetched using FTP or some other simple transport? Far less CPU usage, and more efficient use of your bandwith.
So in case it wasn't clear, no, I don't think you should break up the SQL initially. If it were me, I would probably spend the time in putting out some metic based information to allow you to better judge where you are having slow downs when or if any should occur. With something like this, I have seen I.T. spend hours and hours tuning SQL just to find out that the network was the problem (or vice versa). I would also go ahead and run the expected queries outside of application and determine what kind of problems there are before coding of the application is finished.
Hey, this got a bit wordy, sorry. Hopefully there is something in here that can help you...Joel

Similar Messages

  • What is the best way of returning group-by sql results in Toplink?

    I have many-to-many relationship between Employee and Project; so,
    a Employee can have many Projects, and a Project can be owned by many Employees.
    I have three tables in the database:
    Employee(id int, name varchar(32)),
    Project(id int, name varchar(32)), and
    Employee_Project(employee_id int, project_id int), which is the join-table between Employee and Project.
    Now, I want to find out for each employee, how many projects does the employee has.
    The sql query that achieves what I want would look like this:
    select e.id, count(*) as numProjects
    from employee e, employee_project ep
    where e.id = ep.employee_id
    group by e.id
    Just for information, currently I am using a named ReadAllQuery and I write my own sql in
    the Workbench rather than using the ExpressionBuilder.
    Now, my two questions are :
    1. Since there is a "group by e.id" on the query, only e.id can appear in the select clause.
    This prevent me from returning the full Employee pojo using ReadAllQuery.
    I can change the query to a nested query like this
    select e.eid, e.name, emp.cnt as numProjects
    from employee e,
    (select e_inner.id, count(*) as cnt
    from employee e_inner, employee_project ep_inner
    where e_inner.id = ep_inner.employee_id
    group by e_inner.id) emp
    where e.id = emp.id
    but, I don't like the complication of having extra join because of the nested query. Is there a
    better way of doing something like this?
    2. The second question is what is the best way of returning the count(*) or the numProjects.
    What I did right now is that I have a ReadAllQuery that returns a List<Employee>; then for
    each returned Employee pojo, I call a method getNumProjects() to get the count(*) information.
    I had an extra column "numProjects" in the Employee table and in the Employee descriptor, and
    I set this attribute to be "ReadOnly" on the Workbench; (the value for this dummy "numProjects"
    column in the database is always 0). So far this works ok. However, since the numProjects is
    transient, I need to set the query to refreshIdentityMapResult() or otherwise the Employee object
    in the cache could contain stale numProjects information. What I worry is that refreshIdentityMapResult()
    will cause the query to always hit the database and beat the purpose of having a cache. Also, if
    there are multiple concurrent queries to the database, I worry that there will be a race condition
    of updating this transient "numProjects" attribute. What are the better way of returning this kind
    of transient information such as count(*)? Can I have the query to return something like a tuple
    containing the Employee pojo and an int for the count(*), rather than just a Employee pojo with the
    transient int inside the pojo? Please advise.
    I greatly appreciate any help.
    Thanks,
    Frans

    No I don't want to modify the set of attributes after TopLink returns it to me. But I don't
    quite understand why this matters?
    I understand that I can use ReportQuery to return all the Employee's attributes plus the int count(*)
    and then I can iterate through the list of ReportQueryResult to construct the Employee pojo myself.
    I was hesitant of doing this because I think there will be a performance cost of not being able to
    use lazy fetching. For example, in the case of large result sets and the client only needs a few of them,
    if we use the above aproach, we need to iterate through all of them and wastefully create all the Employee
    pojos. On the other hand, if we let Toplink directly return a list of Employee pojo, then we can tell
    Toplink to use ScrollableCursor and to fetch only the first several rows. Please advise.
    Thanks.

  • Accidentally I enlarge a window. Best way to return to its original size?

    Sometimes I accidentally enlarge a window. What is the best way to return to its original sign?

    Usually there is a place in the window where three colors of dots are, and the Green dot can make the screen larger and then the same one can make it smaller again.
    Not sure if this helps...
    Good luck & happy computing!

  • Best way to return an array of values from c++?

    I have a a group of structs made of a mix of primitive types in c++. I need to return these to Java but Im not sure the best way to return them quickly.
    Should I create a class corresponding to the structure in Java then create the class in C++ and set its fields and return that? Or should I create arrays of different primitive types and return those? Can I even do that? Say, store a group of arrays of different types in a jobjectArray and return that? If I can, what would be the return type? Object[][]?

    Java side:
    package jni;
    import java.nio.ByteBuffer;
    import java.nio.ByteOrder;
    * @author Ian Schneider
    public class Struct {
        static {
            System.loadLibrary("struct");
        public static void doit() {
            ByteBuffer bytes = ByteBuffer.allocateDirect(structSize());
            bytes.order(ByteOrder.nativeOrder());
            getData(bytes);
            System.out.println("int : " + bytes.getInt());
            System.out.println("double : " + bytes.getDouble());
        private static native void getData(ByteBuffer bytes);
        private static native int structSize();
        public static void main(String[] args) throws Exception {
            doit();
    }C side (I'm not a C programmer, so be nice):
    #include "jni_Struct.h"
    struct foo {
      int x;
      double y;
    } foo;
    JNIEXPORT void JNICALL Java_jni_Struct_getData
      (JNIEnv * env, jobject obj, jobject buf) {
      struct foo * f = (void*) malloc(sizeof(foo));
      f->x = 123;
      f->y = 456.;
      void * data = (void *) (*env)->GetDirectBufferAddress(env,buf);
      memcpy(data,f,sizeof(foo));
      free(f);
    JNIEXPORT jint JNICALL Java_jni_Struct_structSize
      (JNIEnv * env, jobject obj) {
      return sizeof(foo);
    }This is a bit simplistic as foo is static in size (no pointers), but hopefully you get the point.

  • What is the Best way to move large mailboxes between datacenters?

    What is the Best way to move large mailboxes between datacenters?

    Hi, 
     Are you asking with regards to on-premises Exchange? With Microsoft Online SaaS services (aka Exchange Online) there is no control and no need to control which data center a mailbox resides in.
     With regard to on-premises Exchange, you have two choices: you can move it over the WAN in which case you would either do a native mailbox move (assuming you have Exchange 2010 or later you can suspend the move after the copy so you can control the
    time of the cutover) or create a database copy in the second data center and once the database copies have synchronized change the active copy.
    The other choice is to move is out of band which would usually involve an offline seed of the database (you could conceivably move via PST file but that would disrupt access to the mailbox and is not really the 'best way').
    In general, Exchange on-premises questions are best asked on the Exchange forum: http://social.technet.microsoft.com/Forums/office/en-US/home?category=exchangeserver
    Thanks,
    Guy 

  • Best way to return more than 1 value in a function?

    Hi all,
    What's the best way to return more than 1 value from a function? returning a cursor? varray? objects? etc? I thought of a cursor first, but i was hesitant since i am not sure if the cursor will be automatically closed when you return a cursor(open cursor no longer used is bad). Example:
    BEGIN
    OPEN c_temp_cursor;
    RETURN c_temp_cursor;
    END;
    With above example, c_temp_cursor is remained open. Or is it automatically closed once it exits from the function? Need some suggestions and expert advice.
    Thanks.
    Note: Function is to be used to return and not a procedure (This is a requirement. Can't explain the details on why).
    Edited by: dongzky on Jul 3, 2010 4:17 PM
    typo: "ir exists" to "it exits" (in bold)

    First create your pl/sql table type
    CREATE OR REPLACE TYPE pmc_tab AS TABLE OF NUMBER;
    Then a table:-
    CREATE TABLE v_stats_daily(start_date date, field1 number, field2 number, field3 number);
    Some insert into the table so we've got test data...
    insert into v_stats_daily values('08-OCT-2003',10,20,30);
    insert into v_stats_daily values('08-OCT-2003',40,50,60);
    insert into v_stats_daily values('08-OCT-2003',70,80,90);
    Then create your function:-
    CREATE OR REPLACE FUNCTION PMC_STATS
    (pStatDate Date) RETURN pmc_tab IS
    MyArray pmc_tab;
    vstat1 NUMBER;
    vstat2 NUMBER;
    vstat3 NUMBER;
    BEGIN
    MyArray := pmc_tab();
    select sum(Field1), sum(field2),sum(field3)
    into vstat1, vstat2,vstat3
    from v_stats_daily
    where Start_date = pStatDate;
    MyArray.extend;
    MyArray(1) := vstat1;
    MyArray.extend;
    MyArray(2) := vstat2;
    MyArray.extend;
    MyArray(3) := vstat3;
    RETURN MyArray;
    END;
    In SQL*Plus:-
    SQL>set serverout on
    Then a lump of PL/SQL to run your function:-
    DECLARE
    MyDate DATE;
    MyArray pmc_tab;
    i NUMBER;
    numOut NUMBER;
    BEGIN
    MyArray := pmc_stats('08-OCT-2003');
    dbms_output.put_line('Table count: '||to_char(MyArray.count));
    for i in 1..MyArray.last LOOP
    numOut := MyArray(i);
    --if numOut is null then
    dbms_output.put_line('Value: '||to_char(numOut));
    --end if;
    END LOOP;
    END;
    Your output will look like:-
    Table count: 3
    Value: 120
    Value: 150
    Value: 180
    Hope this helps,

  • Best way to return structured data?procedure or function?

    Hi everyone.
    I cannot choose the best way to return, from java stored procedure, some data.
    for example , which is best way to write this method:
    public STRUCT function(input parameters) { STRUCT result     ...   return result; }
    or
    public void procedure(input parameters, STRUCT[] struct) { STRUCT result     ...   struct[0] = result; }
    in the last way, the wrapper obviously will be: CREATE PROCEDURE name(... IN datatype, data OUT STRUCT)
    Which is best way in efficiency ?
    thank you ;)

    da`,
    By "efficiency", I assume you mean, best performance/least time.
    In my experience, the only way to discover this is to compare the two, since there are many factors that affect the situation.
    Good Luck,
    Avi.

  • What is best way dealing with large tiff file in OSX Lion?

    I'm working with large tiff  file (engineering drawing), but preview couldnt handle (unresponsive) this situation.
    What is best way dealing with large tiff file in OSX Lion? (Viewing only or simple editing)
    Thx,
    54n9471

    Use an iPad and this app http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=400600005&mt=8

  • Best way to copy large folders to external drive?

    What is the best way to transfer large folders across to external drives?
    I'm trying to clean up my internal hard drive and want to copy a folder (including its sub folders) of approx 10GB of video files to an external firewire drive.
    I've tried drag and drop in finder (both copy and move) and it gets stuck on a particular file and then aborts the whole process.
    1. Surely it should copy everything it can, and simply not copy the problem file.
    2. Is there a way I can do copy and verify, a bit like when I burn a disk so I can be sure the video files have transferred safely before I delete them from my internal drive?
    Many thanks in advance for any advice.

    What you are trying to do makes perfect sense to me and I have done the same prior to getting myself a Time Machine system in place.
    1. Surely it should copy everything it can, and simply not copy the problem file.
    The fact that it is getting stuck on a particular file suggests that there is a problem with it. Try to identify which one it is and deal with that file on it's own. It could be that there is a disk error where that file is stored.
    2. Is there a way I can do copy and verify....
    The copy process you are using does that implicitly as I understand it.
    Chris

  • Best way to pass large amounts of data to subroutines?

    I'm writing a program with a large amount of data, around 900 variables.  What is the best way for me to pass parts of this data to different subroutines?  I have a main loop on a PXI RT Controller that is controlling hydraulic cylinders and the loop needs to be 1ms or better.  How on earth should I pass 900 variables through a loop and keep it at 1ms?  One large cluster??  Several smaller clusters??  Local Variables?  Global Variables??  Help me please!!!

    My suggestion, similar to Altenbach and Matt above, is to use a Functional Global Variable (FGV) and use a 1D array of 900 values to store the data in the FGV. You can retrieve individual data items from the FGV by passing in the index of the desired variable and the FGV returns the value from the array. Instead of passing in an index you could also use a TypeDef'd Enum with all of your variables as element of the Enum, which will allow you to place the Enum constant on the diagram and make selecting variables, as well as reading the diagram, simpler.
    My group is developing a LabVIEW component/example code with this functionality that we plan to publish on DevZone in a month or two.
    The attached RTF file shows the core piece of this implementation. This VI off course is non-reentrant. The Init case could be changed to allocate the internal 1D array as part of this VI rather than passing it from another VI.
    Message Edited by Christian L on 01-31-2007 12:00 PM
    Christian Loew, CLA
    Principal Systems Engineer, National Instruments
    Please tip your answer providers with kudos.
    Any attached Code is provided As Is. It has not been tested or validated as a product, for use in a deployed application or system,
    or for use in hazardous environments. You assume all risks for use of the Code and use of the Code is subject
    to the Sample Code License Terms which can be found at: http://ni.com/samplecodelicense
    Attachments:
    CVT_Double_MemBlock.rtf ‏309 KB

  • What is the best way to send large, HD video files from the iPad?

    Hello,
    My department is looking to purchase between 50 and 150 new iPads. We hold conferences where participants take short, 3-minute videos of themselves in situations they have been trained for. The idea was that after they have taken the video, they could simply email these videos to themselves, negating the need to hook the device up to a computer, transfer the video over to a flash drive, and ensure that drive gets back to the right person.
    Now that I have the first one, I'm finding any video longer than 45s or so can't be sent via email because it's too large.
    Are you kidding?
    The file I was just able to send was 7.5MB. That's the biggest video I can send from this thing? Why even give us the ability to take HD video if we can't *do* anything with it?
    But, I digress.
    What I'm looking for is the best way to get video off the iPads and into the hands of our participants. Are there apps on the App Store which might convert whatever video we take to an emailable size? I thought about setting everyone up with Apple IDs and having them use iMessage since there is no limit on file size, but of course you must own an Apple device in order to do that, and it still doesn't solve getting it on their computers.
    Any help here would be greatly appreciated.
    Thanks!
    --Rob

    You could try uploading them to a joint dropbox account that they could then download them from....but how fast it is will depend on the internet connection available.

  • Best approach to return Large data ( 4K) from Stored Proc

    We have a stored proc (Oracle 8i) that:
    1) receives some parameters.
    2)performs computations which create a large block of data
    3) return this data to caller.
    compatible with both ASP (using MSDAORA.Oracle), and ColdFusion (using Oracle ODBC driver). This procedure is critical in terms of performance.
    I have written this procedure as having an OUT param which is a REF CURSOR to a record containing a LONG. In order to make this work, at the end of the procedure I have to store the working buffer (an internal LONG variable) into a temp table, and then open the cursor as a SELECT from the temp table.
    I have tried to open the cursor as a SELECT of the working buffer (from dual) but I get the error "ORA-01460: unimplemented or unreasonable conversion requested"
    I suspect this is taking too much time; any tips about the best approach here? is there a resource with REAL examples on returning large data?
    If I switch to CLOB, will it speed the process, be compatible with callers, etc ? all references to CLOB I saw use trivial examples.
    Thanks for any help,
    Yoram Ayalon

    We have a stored proc (Oracle 8i) that:
    1) receives some parameters.
    2)performs computations which create a large block of data
    3) return this data to caller.
    compatible with both ASP (using MSDAORA.Oracle), and ColdFusion (using Oracle ODBC driver). This procedure is critical in terms of performance.
    I have written this procedure as having an OUT param which is a REF CURSOR to a record containing a LONG. In order to make this work, at the end of the procedure I have to store the working buffer (an internal LONG variable) into a temp table, and then open the cursor as a SELECT from the temp table.
    I have tried to open the cursor as a SELECT of the working buffer (from dual) but I get the error "ORA-01460: unimplemented or unreasonable conversion requested"
    I suspect this is taking too much time; any tips about the best approach here? is there a resource with REAL examples on returning large data?
    If I switch to CLOB, will it speed the process, be compatible with callers, etc ? all references to CLOB I saw use trivial examples.
    Thanks for any help,
    Yoram Ayalon

  • Best way to delete large number of records but not interfere with tlog backups on a schedule

    Ive inherited a system with multiple databases and there are db and tlog backups that run on schedules.  There is a list of tables that need a lot of records purged from them.  What would be a good approach to use for deleting the old records?
    Ive been digging through old posts, reading best practices etc, but still not sure the best way to attack it.
    Approach #1
    A one-time delete that did everything.  Delete all the old records, in batches of say 50,000 at a time.
    After each run through all the tables for that DB, execute a tlog backup.
    Approach #2
    Create a job that does a similar process as above, except dont loop.  Only do the batch once.  Have the job scheduled to start say on the half hour, assuming the tlog backups run every hour.
    Note:
    Some of these (well, most) are going to have relations on them.

    Hi shiftbit,
    According to your description, in my opinion, the type of this question is changed to discussion. It will be better and 
    more experts will focus on this issue and assist you. When delete large number of records from tables, you can use bulk deletions that it would not make the transaction log growing and runing out of disk space. You can
    take the table offline for maintenance, a complete reorganization is always best because it does the delete and places the table back into a pristine state. 
    For more information about deleting a large number of records without affecting the transaction log.
    http://www.virtualobjectives.com.au/sqlserver/deleting_records_from_a_large_table.htm
    Hope it can help.
    Regards,
    Sofiya Li
    Sofiya Li
    TechNet Community Support

  • (New to C# here) What is the best way to return false in a function if it cannot be executed successfully?

    In Javascript or PHP you can have a function that could return, for example, a string in case of success and false in case of failure.
    I've noticed (in the few days I've been learning C#) that you need to define a type of value that the function will return, so you need to return that type of value but in case of failure you can't return false.
    What is the best way to achieve this behavior and is there an example I can see?
    Thank you in advance,
    Juan

    Juan, be aware that returning null won't work with value types, such as an int, which can't be null. You'd have to use a nullable value type. A nullable int would be declared with a "?", such as:
    int? someOtherFunction(int param)
    if(something goes great)
    return param * 42;
    return null;
    And you have to use it like this:
    int? result = someOtherFunction(666);
    if (result != null) // you can also use result.HasValue
    // it worked, do something with the result
    // if you're doing math with the result, no problem
    int x = result * 2;
    // but if you're assigning it to another an int, you need to use this syntax
    int y = result.Value;
    Before nullable value types came along, for a method that returned an int, I'd use a value of something that wouldn't normally be returned by that method to indicate failure, such as a -1.
    ~~Bonnie DeWitt [C# MVP]
    That's something very very important to keep in mind. Can save you from a lot of headaches!
    So if you have an int function and it might return NULL, if you are doing Math with the return value you can use it directly, but if you're assigning it to another variable you have to use .Value?
    Thanks

  • Best way to move large iTunes library?

    I have a large (nearly 1Tb) iTunes library of music, videos and TV shows and have almost reached the capacity of the HDD they currently eside upon.  At the moment they ae on a Wester Digital My Book, attached by USB to my TimeCapsule.  I have just ordered a larger La Cie drive to put them on and plan to attach this to my iMac directly, by Firewire (I don't have a Thunderbolt iMac).  The TimeCapsule is attached to the iMac by Ethernet cable.
    I know the best way to move an iTunes library is to point the library at the new location and then 'Organise Library' from the File menu.  I suspect this will take a while, with so much data to transfer.  I am wondering if I would be better to attach both drives to my computer directly and transfer the files across, then point the iTunes library at the new location - would this work?  It is vital that whatever I do keeps all the metadat intact, as I spent ages tagging all my DVDs.  I don't care about play counts and my Playlists can easily be remade, so I don't mind losing them.
    Any advice is gatefully received,
    Steve

    I have a 270 gig iTunes folder and a 260gig Iphoto Library and used this technique to massively speed up the transfer to an external USB3 drive.  It uses terminal commands i found here:
    http://www.cnet.com/au/news/using-the-os-x-terminal-instead-of-the-finder-to-cop y-files/
    I used this command successfully to achieve 3 GB per minute transfer.
    First open Terminal app (easy to find in spotlight)
    Type the following noting that all content in () is an instruction for you to complete or a spacebar between text. "c (space) -av (space) (drag source folder directory onto open terminal app window) /* (space) (drag destination folder onto open terminal app window) /"
    Your transfer should begin
    NOTE: If you are using this for your iPhoto or Aperture libraries there are many index files containing meta data of your photos.  This may appear to not transfer large amounts of data while transferring these files before getting to the .jpg or .RAW files.  Don't worry, be patient, once these are done the speed will pick up considerably.
    I hope this helps,
    Matthew

Maybe you are looking for

  • Issues with "Higher performance" setting on Macbook Pro with external monitor

    Hi all, I rarely used the "Higher performance" setting in the Energy Saver pane of System Preferences. I use my MB connected with an external monitor, with its own screen closed. I tried to switch that setting on and the external monitor seems to rep

  • Aiport Express bridge or repeater-  WPA2 Enterprise?

    Hello- If I buy an Airport Express, will it connect to a WPA2 Enterprise network and either: -- repeat that network via wireless, extending the range -- connect to the network and share the connection via ethernet If not, might an airport extreme do

  • Creating Online Documentation With Conditional Text

    I'm fairly new to conditional text and was curious if it was possible to create a document that can be hosted online with conditional text. The document versions need to contain links specific to each company referencing the document. For example, if

  • Broadcasting the query

    Hi, There is a condition in the query. When it is executed in BEx, the report is correct. But, when it is broadcasted, the condition isnot taken into consideration and the report isnot correct. How to fix this issue? Thanks.

  • Ipod touch 4th generation; cannot get the #1 to enter on keypad.  help!

    any advice on how to get the number one on my keypad to work?  i didn't know if there was a way to get the keypad to display in a different format, so i wouldn't have to touch the far left side of the screen?  every time i touch the one it enters 2.