Handling large datasets

Hi gang,
I have a query which returns a large large result set.My goal is to populate a scrollable JTable with this result set.This result set is so large that the memory can also not handle it.So I am looking for options to save the results of this query.
I am thinking of writing the results of the query to a CSV file and then read chunks of the CSV file in a vector which is then used to populate the JTable(paging the JTable).
Does any of you have any experience working with large files and basically the performance of CSV files in reading the chunks.Do you think there is a bottleneck which I am ignoring?
I'll appreciate any suggestions.
Thanks
Connie

I understand you know how to handle the paging with scrollable jtables. I furthermore think that you know that a JTable is backed by a TableModel which contains the data you want to display.
You state that the result set is likely to exceed the memory of the client computer. A question may be allowed: Is it reasonable to display the ENTIRE result set in a single table then? Assuming that each row occupies one kB of RAM, 64.000 rows would consume 64 MB of RAM, which modern computers CAN handle. Do you really want to ask users to visually handle 64.000 table rows???
However. The New I/O introduced with JDK 1.4 might help. Write the entire result set into a file (CSV or binary octet stream), and blend them into the memory using FileChannel.map(mode, start, end) with varying start and end parameters depending on the portion to be displayed.

Similar Messages

Service design for handling large datasets

As an overnight process we need to invoke 2 services against every record in our database (over 1 million records). Specifically, the process flow should be as follows:
- For each record in the database invoke service A.
- For each record use the return value from service A as a parameter to invoke service B.
If we were to process each record one at a time in a synchronous fashion, the time needed for processing all records would be too great. I was wondering if there is a better way to implement this? I have considered batching and making asynchronous calls
using a duplex but am unclear about which option would be superior.

Datasets with datatables, the salad bowl, are two slow for Service Oriented Achitecture.
http://www.hanselman.com/blog/ReturningDataSetsFromWebServicesIsTheSpawnOfSatanAndRepresentsAllThatIsTrulyEvilInTheWorld.aspx
Datatables use boxing and unboxing, which makes it slow..
http://www.csharphelp.com/2010/02/c-best-practices-to-write-high-performance-code/
You should be using DTO(s) and a List of DTO(s)
http://lauteikkehn.blogspot.com/2012/03/datatable-vs-list.html
http://en.wikipedia.org/wiki/Data_transfer_object
http://www.mindscapehq.com/documentation/lightspeed/Building-Distributed-Applications-/Building-WCF-Services-using-Data-Transfer-Objects
On the other hand and if using SQL Server, you may want to look into MS SQL Server Service Broker too.
https://technet.microsoft.com/en-us/library/ms166104(v=sql.105).aspx

How to handle large result set of a SQL query

Hi,
I have a question about how to handle large result set of a SQL query.
My query returns more than a million records. However, the Query Template has a "row count" parameter. If I don't specify it, it by default returns only 100 lines of records in the query result. If I specify it, then it's limited to a specific number.
Is there any way to get around of this row count issue? I don't want any restriction on the number of records returned by a query.
Thanks a lot!

No human can manage that much data...in a grid, a chart, or a direct-connected link to the brain.
What you want to implement (much like other customers with similar requirements) is a drill-in and filtering model that helps the user identify and zoom in on data of relevance, not forcing them to scroll through thousands or millions of records.
You can also use a time-based paging model so that you only deal with a time "slice" at one request (e.g. an hour, day, etc...) and provide a scrolling window. This is commonly how large datasets are also dealt with in applications.
I would suggest describing your application in more detail, and we can offer design recommendations and ideas.
- Rick

Ways to handle large volume data (file size = 60MB) in PI 7.0 file to file

Hi,
In a file to file scenario (flat file to xml file), the flat file is getting picked up by FCC and then send to XI. In xi its performing message mapping and then XSL transformation in a sequence.
The scenario is working fine for small files (size upto 5MB) but when the input flat file size is more then 60 MB, then XI is showing lots of problem like (1) JCo call error or (2) some times even XI is stoped and we have to strat it manually again to function properly.
Please suggest some way to handle large volume (file size upto 60MB) in PI 7.0 file to file scenario.
Best Regards,
Madan Agrawal.

Hi Madan,
If every record of your source file was processed in a target system, maybe you could split your source file into several messages by setting up this in Recordset Per Messages parameter.
However, you just want to convert you .txt file into a .xml file. So, try firstly to setting up
EO_MSG_SIZE_LIMIT parameter in SXMB_ADM.
However this could solve the problem in Inegration Engine, but the problem will persit in Adapter Engine, I mean, JCo call error ...
Take into account that file is first proccessed in Adapter Engine, File Content Conversion and so on...
and then it is sent to the pipeline in Integration Engine.
Carlos

How do I handle large resultsets in CRXI without a performance issue?

Hello -
Problem Definition
I have a performance problem displaying large/huge resultset of data on a crystal report. The report takes about 4 minutes or more depending on the resultset size.
How do you handle large resultsets in Crystal Reports without a performance issue?
Environment
Crystal Reports XI
Apache WebSvr 2.X, Jboss 4.2.3, Struts
Java Reporting Component (JRC),Crystal Report Viewer (CRV)
Firefox
DETAILS
I use the CRXI thick client to build my report (.rpt) and then use it in my webapplication (webapp) under Jboss.
User specifies the filter criteria to generate a report (date range etc) and submits the request to the webapp. Webapp queries the database, gets a "resultset".
I initialize the JRC and CRV according to all the specifications and finally call the "processHttpRequest" method of Crystal Report Viewer to display the report on browser.
So.....
- Request received to generate a report with a filter criteria
- Query DB to get resultset
- Initialize JRC and CRV
- finally display the report by calling
reportViewer.processHttpRequest(request, response, request.getSession().getServletContext(), null);
The performance problem is within the last step. I put logs everywhere and noticed that database query doesnt take too long to return resultset. Everything processes pretty quickly till I call the processHttpRequest of CRV. This method just hangs for a long time before displaying the report on browser.
CRV runs pretty fast when the resultset is smaller, but for large resultset it takes a long long time.
I do have subreports and use Crystal report formulas on the reports. Some of them are used for grouping also. But I dont think Subreports is the real culprit here. Because I have some other reports that dont have any subreports, and they too get really slow displaying large resultsets.
Solutions?
So obviously I need a good solution to this generic problem of "How do you handle large resultsets in Crystal Reports?"
I have thought of some half baked ideas.
A) Use external pagination and fetch data only for the current page being displayed. But for this, CRXI must allow me to create my own buttons (previous, next, last), so I can control the click event and fetch data accordingly. I tried capturing events by registering event handler "addToolbarCommandEventListener" of CRV. But my listener gets invoked "after" processHttpRequest method completes, which doesnt help.
Some how I need to be able to control the UI by adding my own previous page, next page, last page buttons and controlling it's click events.
B) Automagically have CRXI use a javascript functionality, to allow browser side page navigation. So maybe the first time it'll take 5 mins to display the report, but once it's displayed, user can go to any page without sending the request back to server.
C) Try using Crystal Reports 2008. I'm open to using this version, but I couldnt figureout if it has any features that can help me do external pagination or anything that can handle large resultsets.
D) Will using the Crystal Reports Servers like cache server/application server etc help in any way? I read a little on the Crystal Page Viewer, Interactive Viewer, Part Viewer etc....but I'm not sure if any of these things are going to solve the issue.
I'd appreciate it if someone can point me in the right direction.

Essentialy the answer is use smaller resultsets or pull from the database directly instead of using resultsets.

Best practices for handling large messages in JCAPS 5.1.3?

Hi all,
We have ran into problems while processing larges messages in JCAPS 5.1.3. Or, they are not that large really. Only 10-20 MB.
Our setup looks like this:
We retrieve flat file messages with from an FTP server. They are put onto a JMS queue and are then converted to and from different XML formats in several steps using a couple of jcds with JMS queues between them.
It seems that we can handle one message at a time but as soon as we get two of these messages simultaneously the logicalhost freezes and crashes in one of the conversion steps without any error message reported in the logicalhost log. We can't relate the crashes to a specific jcd and it seems that the memory consumption increases A LOT for the logicalhost-process while handling the messages. After restart of the server the message that are in the queues are usually converted ok. Sometimes we have however seen that some message seems to disappear. Scary stuff!
I have heard of two possible solutions to handle large messages in JCAPS so far; Splitting them into smaller chunks or streaming them. These solutions are however not an option in our setup.
We have manipulated the JVM memory settings without any improvements and we have discussed the issue with Sun's support but they have not been able to help us yet.
My questions:
* Any ideas how to handle large messages most efficiently?
* Any ideas why the crashes occur without error messages in the logs or nothing?
* Any ideas why messages sometimes disappear?
* Any other suggestions?
Thanks
/Alex

* Any ideas how to handle large messages most efficiently? --
Strictly If you want to send entire file content in JMS message then i don't have answer for this question.
Generally we use following process
After reading the file from FTP location, we just archive in local directory and send a JMS message to queue
which contains file name and file location. Most of places we never send file content in JMS message.
* Any ideas why the crashes occur without error messages in the logs or nothing?
Whenever JMSIQ manager memory size is more lgocialhosts stop processing. I will not say it is down. They
stop processing or processing might take lot of time
* Any ideas why messages sometimes disappear?
Unless persistent is enabled i believe there are high chances of loosing a message when logicalhosts
goes down. This is not the case always but we have faced similar issue when IQ manager was flooded with lot
of messages.
* Any other suggestions
If file size is more then better to stream the file to local directory from FTP location and send only the file
location in JMS message.
Hope it would help.

Handling large messages with MQ JMS sender adapter

Hi.
Im having trouble handling large messages with a MQ JMS sender adapter.
The messages are around 35-40MB.
Are there any settings I can ajust to make the communication channel work?
Error message is:
A channel error occurred. The detailed error (if any) : JMS error:MQJMS2002: failed to get message from MQ queue, Linked error:MQJE001: Completion Code 2, Reason 2010, Error Code:MQJMS2002
The communication channel works fine with small messages!
Im on SAP PI 7.11, MQ Driver is version 6.
Best Regards...
Peter

The problem solved itself, when the MQ server crashed and restarted.
I did find a note that might could have been useful:
Note 1258335 - Tuning the JMS service for large messages or many consumers
A relevant post as well: http://forums.sdn.sap.com/thread.jspa?threadID=1550399

Using DataSet with large datasets

I have a product, like a shirt, that comes in 800 colors.
I've created an xml file with all the color id's, names and RGB
codes (5 attributes in all) and this xml file is 5,603 lines long.
It takes a noticeably long time to load. I'm using the auto-suggest
widget to then show subsets of this list based on ID or color name.
Is there an example of a way to connect to a php-driven
datasource, so I can query a database and return the matches to the
auto-suggest widget?
Thanks, Scott

In my Googling I came across this Cold Fusion example:
http://www.brucephillips.name/blog/index.cfm/2007/3/31/Use-Sprys-New-Auto-Suggest-Widget-T o-Handle-Large-Numbers-of-Suggestions

Best way to handle large amount of text

hello everyone
My project involves handling large amount of text.(from
conferences and
reports)
Most of them r in Ms Word. I can turn them into RTF format.
I dont want to use scrolling. I prefer turning pages(next,
previous, last,
contents). which means I need to break them into chunks.
Currently the process is awkward and slow.
I know there wud b lots of people working on similar
projects.
Could anyone tell me an easy way to handle text. Bring them
into cast and
break them.
any ideas would be appreciated
thanx
ahmed

Hacking up a document with lingo will probably loose the rtf
formatting
information.
Here's a bit of code to find the physical position of a given
line of on
screen text (counting returns is not accurate with word
wrapped lines)
This stragety uses charPosToLoc to get actual position for
the text
member's current width and font size
maxHeight = 780 -- arbitrary display height limit
T = member("sourceText").text
repeat with i = 1 to T.line.count
endChar = T.line[1..i].char.count
lineEndlocV = charPosToLoc(member "sourceText",
endChar).locV
if lineEndlocV > maxHeight then -- fount "1 too many"
line
-- extract identified lines "sourceText"
-- perhaps repeat parce with remaining part of "sourceText"
singlePage = T.line[1..i - 1]
member("sourceText").text = T.line[i..99999] -- put remaining
text back
into source text member
If you want to use one of the roundabout ways to display pdf
in
director. There might be some batch pdf production tools that
can create
your pages in pretty scalable pdf format.
I think flashpaper documents can be adapted to director.

Handling large files in scope of WSRP portlets

Hi there,
just wanted to ask if there are any best practices in respect to handling large file upload/download when using WSRP portlets (apart from by-passing WebCenter all-together for these use-cases, that is). We continue to get OutOfMemoryErrors and TimeoutExceptions as soon as the file being transfered becomes larger than a few hundred megabytes. The portlet is happily streaming the file as part of its javax.portlet.ResourceServingPortlet.serveResource(ResourceRequest, ResourceResponse) implementation, so the problem must somehow lie within WebCenter itself.
Thanks in advance,
Chris

Hi Yash,
Check this blogs for the strcuture you are mentioning:
/people/shabarish.vijayakumar/blog/2006/02/27/content-conversion-the-key-field-problem
/people/shabarish.vijayakumar/blog/2005/08/17/nab-the-tab-file-adapter
Regards,
---Satish

Can express vi handle large data

Hello,
I'm facing problem in handling large data using express vi's. The input to express vi is a large data of 2M samples waveform & i am using 4 such express vi's each with 2M samples connected in parallel. To process these data the express vi's are taking too much of time compared to other general vi's or subvi's. Can anybody give the reason why its taking too much time in processing. As per my understanding since displaying large data in labview is not efficient & since the express vi's have an internal display in the form of configure dialog box. Hence i feel most of the processing time is taken to plot the data on the graph of configure dailog box. If this is correct then Is there any solution to overcome this.
waiting for reply
Thanks in advance

Hi sayaf,
I don't understand your reasoning for not using the "Open Front Panel"
option to convert the Express VI to a standard VI. When converting the
Express VI to a VI, you can save it with a new name and still use the
Express VI in the same VI.
By the way, have you heard about the NI LabVIEW Express VI Development Toolkit? That is the choice if you want to be able to create your own Express VIs.
NB: Not all Express VIs can be edited with the toolkit - you should mainly use the toolkit to develop your own Express VIs.
Have fun!
- Philip Courtois, Thinkbot Solutions

Is anyone working with large datasets ( 200M) in LabVIEW?

I am working with external Bioinformatics databasesa and find the datasets to be quite large (2 files easily come out at 50M or more). Is anyone working with large datasets like these? What is your experience with performance?

Colby, it all depends on how much memory you have in your system. You could be okay doing all that with 1GB of memory, but you still have to take care to not make copies of your data in your program. That said, I would not be surprised if your code could be written so that it would work on a machine with much less ram by using efficient algorithms. I am not a statistician, but I know that the averages & standard deviations can be calculated using a few bytes (even on arbitrary length data sets). Can't the ANOVA be performed using the standard deviations and means (and other information like the degrees of freedom, etc.)? Potentially, you could calculate all the various bits that are necessary and do the F-test with that information, and not need to ever have the entire data set in memory at one time. The tricky part for your application may be getting the desired data at the necessary times from all those different sources. I am usually working with files on disk where I grab x samples at a time, perform the statistics, dump the samples and get the next set, repeat as necessary. I can calculate the average of an arbitrary length data set easily by only loading one sample at a time from disk (it's still more efficient to work in small batches because the disk I/O overhead builds up).
Let me use the calculation of the mean as an example (hopefully the notation makes sense): see the jpg. What this means in plain english is that the mean can be calculated solely as a function of the current data point, the previous mean, and the sample number. For instance, given the data set [1 2 3 4 5], sum it, and divide by 5, you get 3. Or take it a point at a time: the average of [1]=1, [2+1*1]/2=1.5, [3+1.5*2]/3=2, [4+2*3]/4=2.5, [5+2.5*4]/5=3. This second method required far more multiplications and divisions, but it only ever required remembering the previous mean and the sample number, in addition to the new data point. Using this technique, I can find the average of gigs of data without ever needing more than three doubles and an int32 in memory. A similar derivation can be done for the variance, but it's easier to look it up (I can provide it if you have trouble finding it). Also, I think this funtionality is built into the LabVIEW pt by pt statistics functions.
I think you can probably get the data you need from those db's through some carefully crafted queries, but it's hard to say more without knowing a lot more about your application.
Hope this helps!
Chris
Attachments:
Mean Derivation.JPG ‏20 KB

How to handle large heap requirement

Hi,
Our Application requires large amount of heap memory to load data in memory for further processing.
Application is load balanced and we want to share the heap across all servers so one server can use heap of other server.
Server1 and Server2 have 8GB of RAM and Server3 has 16 GB of RAM.
If any request comes to server1 and if it requires some more heap memory to load data, in this scenario can server1 use serve3’s heap memory?
Is there any mechanism/product which allows us to share heap across all the servers? OR Is there any other way to handle large heap requirement issue?
Thanks,
Atul

user13640648 wrote:
Hi,
Our Application requires large amount of heap memory to load data in memory for further processing.
Application is load balanced and we want to share the heap across all servers so one server can use heap of other server.
Server1 and Server2 have 8GB of RAM and Server3 has 16 GB of RAM.
If any request comes to server1 and if it requires some more heap memory to load data, in this scenario can server1 use serve3’s heap memory?
Is there any mechanism/product which allows us to share heap across all the servers? OR Is there any other way to handle large heap requirement issue? That isn't how you design it (based on your brief description.)
For any transaction A you need a set of data X.
For another transaction B you need a set of data Y which might or might not overlap with X.
The set of data (X or Y) is represented by discrete hunks of data (form is irrelevant) which must be loaded.
One can preload the server with this data or do a load on demand.
Once in memory it is cached.
One can refine this further with alternative caching strategies that define when loaded data is unloaded and how it is unloaded.
JEE servers normally support this in a variety of forms. But one can custom code it as well.
JEE servers can also replicate cached data across server instances. Custom code can do this but it is more complicated than doing the custom caching.
A load balanced system exists for performance and failover scenarios.
Obviously in a failover situation a "shared heap" would fail completely (as asked about) because the other server would be gone.
One might also need to support very large data sets. In that case something like Memcached (google for it) can be used. There are commercial solutions in this space as well. This allows for distributed caching solutions which can be scaled.

Issues when Downloading Large Datasets to Excel and CSV

Hi,
Hoping someone could lend a hand on the issues described below.
I have a prompted dahsboard that, dependent upon prompts selected, can return detail datasets. THe intent of this dashboard is to AVOID giving end users Answers Access, but still providing the ability to pull large amounts of detail data in an ad-hoc fashion. When large datasets are returned, end users will download the data to thier local machines and use excel for further analysis. I have tried two options:
1) Download to CSV
2) Download data to Excel
For my test, I am uses the dashboard prompts to return 1 years (2009) worth of order data for North America, down to the day level of granularity. Yes alot of detail data...but this is what many "dataheads" at my organization are requesting...(despite best efforts to evangelize the power of OBIEE to do the aggregation for them...). I expext this report to return somewhere around 200k rows...
Here are the results:
1) Download to CSV
Filesize: 78MB
Opening the downloaded file is failrly quick...
126k rows are present in the CSV file...but the dataset abruptly ends in Q3(August) 2009. The following error appears at the end of the incomplete dataset:
<div><script language="javascript" src="res/b_mozilla/browserdom.js"></script><script language="javascript" src="res/b_mozilla/common.js"></script><div class="ErrorMessage">Odbc driver returned an error (SQLFetchScroll).</div><div style="margin-top:2pt" onclick="SAWMoreInfo(event); return false;"><img class="ErrorExpanderImg" border="0" src="res/sk_oracle10/common/errorplus.gif" align="absmiddle">  Error Details<div style="margin-left:15px;display:none" compresssrc="res/sk_oracle10/common/errorminus.gif">
<div class="ErrorCodes">Error Codes: <span dir="ltr">OPR4ONWY:U9IM8TAC</span></div>
<div style="margin-top:4pt"><div class="ErrorSubInfo">State: HY000. Code: 10058. [NQODBC] [SQL_STATE: HY000] [nQSError: 10058] A general error has occurred.
[nQSError: 46073] Operation 'stat()' on file '/opt/apps/oracle/obiee/OracleBIData/tmp/nQS_31951_2986_15442940.TMP' failed with error: (75) ,Çyô@BÀŽB@B¨Ž¡pÇôäìž¬ü5HB. (HY000)</div></div></div></div></div>
2) Download to Excel
Filesize: 46MB
Opening the Excel file is extremely painful...over 20 minutes to open the file...making excel unusable during the opening process...defeinately not acceptable for end users.
When opened the file contains only 65k rows...when there should be over 200k...
Can you please help me understand the limitations of detail data output (downloading) from OBIEE...or provide workarounds for the circumstances above?
Thanks so much in advance.
Adam
Edited by: AdamM on Feb 9, 2010 9:01 PM
Edited by: AdamM on Feb 9, 2010 9:02 PM

@chandrasekhar:
Thanks for your response. I'll try with the export button but also willing to know how to create button on toolbar.And by clicking on that button a popup box will come having two radio buttons asking to download the report either in .xls or in .csv format. I am looking for the subroutines for that.
Thanks.
Message was edited by:
        cinthia nazneen

To update large dataset in columnar database (Sybase IQ)

Hi,
I want to update a column with random values in Sybase IQ.The no of rows are very large(approx 2 crore).
I have created a procedure using cursor.
it is working fine with low dataset but having performance issue with large dataset.
Is there a workaround for this issue.
regards,
Neha Khetan

Hi Eugene,
Is it possible to implement this in BDB JE somehowYes, you can create a new separate database for storing the sets of integers. Each record in this database would be one partition (e.g., 1001-2000) for one record in the "main" database.
The key to this database would be a two part key:
- the key to the "main" database, followed by
- the beginning partition value (e.g., 1001)
For example:
Main Database:
Key     Data
   X      string/integer parameters for X
   Y      string/integer parameters for Y
Integer Partition Database:
Key     Data
X,1     Set of integers in range 1-1000 for X
X,1001 Set of integers in range 1001-2000 for X
Y,1     Set of integers in range 1-1000 for Y
Y,1001 Set of integers in range 1001-2000 for Y
   ...Two part keys are easy to implement with a tuple binding. You simply read/write the two fields for the record key, one after another, in the same way that you read/write multiple fields in the record data.
Mark

Handling large datasets

Similar Messages

Maybe you are looking for