Working with Large data sets Waveforms

When collection data at a high rate ( 30K ) and for a long period (120 seconds) I'm unable rearrange the data due to memory errors, is there a more efficient method?
Attachments:
Convert2Dto1D.vi ‏36 KB

Some suggestions:
Preallocate your final data before you start your calculations. The build array you have in your loop will tend to fragment memory, giving you issues.
Use the In Place Element to get data to/from your waveforms. You can use it to get single waveforms from your 2D array and Y data from a waveform.
Do not use the Transpose and autoindex. It is adding a copy of data.
Use the Array palette functions (e.g. Reshape Array) to change sizes of current data in place (if possible).
You may want to read Managing Large Data Sets in LabVIEW.
Your initial post is missing some information. How many channels are you acquiring and what is the bit depth of each channel? 30kHz is a relatively slow acquisition rate for a single channel (NI sells instruments which acquire at 2GHz). 120s of data from said single channel is modestly large, but not huge. If you have 100 channels, things change. If you are acquiring them at 32-bit resolution, things change (although not as much). Please post these parameters and we can help more.
This account is no longer active. Contact ShadesOfGray for current posts and information.

Similar Messages

64-bit LabVIEW - still major problems with large data sets

Hi Folks -
I have LabVIEW 2009 64-bit version running on a Win7 64-bit OS with Intel Xeon dual quad core processor, 16 gbyte RAM. With the release of this 64-bit version of LabVIEW, I expected to easily be able to handle x-ray computed tomography data sets in the 2 and 3-gbyte range in RAM since we now have access to all of the available RAM. But I am having major problems - sluggish (and stoppage) operation of the program, inability to perform certain operations, etc.
Here is how I store the 3-D data that consists of a series of images. I store each of my 2d images in a cluster, and then have the entire image series as an array of these clusters. I then store this entire array of clusters in a queue which I regularly access using 'Preview Queue' and then operate on the image set, subsets of the images, or single images.
Then enqueue:
I remember talking to LabVIEW R&D years ago that this was a good way to do things because it allowed non-contiguous access to memory (versus contigous access that would be required if I stored my image series as 3-D array without the clusters) (R&D - this is what I remember, please correct if wrong).
Because I am experiencing tremendous slowness in the program after these large data sets are loaded, and I think disk access as well to obtain memory beyond 16 gbytes, I am wondering if I need to use a different storage strategy that will allow seamless program operation while still using RAM storage (do not want to have to recall images from disk).
I have other CT imaging programs that are running very well with these large data sets.
This is a critical issue for me as I move forward with LabVIEW in this application. I would like to work with LabVIEW R&D to solve this issue. I am wondering if I should be thinking about establishing say, 10 queues, instead of 1, to address this. It would mean a major program rewrite.
Sincerely,
Don

First, I want to add that this strategy works reasonably well for data sets in the 600 - 700 mbyte range with the 64-bit LabVIEW.
With LabVIEW 32-bit, I00 - 200 mbyte sets were about the limit before I experienced problems.
So I definitely noticed an improvement.
I use the queuing strategy to move this large amount of data in RAM. We could have used other means such a LV2 globals. But the idea of clustering the 2-d array (image) and then having a series of those clustered arrays in an array (to see the final structure I showed in my diagram) versus using a 3-D array I believe even allowed me to get this far using RAM instead of recalling the images from disk.
I am sure data copies are being made - yes, the memory is ballooning to 15 gbyte. I probably need to have someone examine this code while I am explaining things to them live. This is a very large application, and a significant amount of time would be required to simplify it, and that might not allow us to duplicate the problem. In some of my applications, I use the in-place structure for indexing
data out of arrays to minimize data copies. I expect I might have to
consider this strategy now here as well. Just a thought.
What I can do is send someone (in US) via large file transfer a 1.3 - 2.7 gbyte set of image data - and see how they would best advise on storing and extracting the images using RAM, how best to optimize the RAM usage, and not make data copies. The operations that I apply on the images are irrelevant. It is the storage, movement, and extractions that are causing the problems. I can also show a screen shot(s) of how I extract the images (but I have major problems even before I get to that point),
Can someone else comment on how data value references may help here, or how they have helped in one of their applications? Would the use of this eliminate copies? I currently have to wait for 64-bit version of the Advanced Signal Processing Toolkit for LabVIEW 2010 before I can move to LabVIEW 2010.
Don

Just in case any one needs a Observable Collection that deals with large data sets, and supports FULL EDITING...

the VirtualizingObservableCollection does the following:
Implements the same interfaces and methods as ObservableCollection<T> so you can use it anywhere you’d use an ObservableCollection<T> – no need to change any of your existing controls.
Supports true multi-user read/write without resets (maximizing performance for large-scale concurrency scenarios).
Manages memory on its own so it never runs out of memory, no matter how large the data set is (especially important for mobile devices).
Natively works asynchronously – great for slow network connections and occasionally-connected models.
Works great out of the box, but is flexible and extendable enough to customize for your needs.
Has a data access performance curve so good it’s just as fast as the regular ObservableCollection – the cost of using it is negligible.
Works in any .NET project because it’s implemented in a Portable Code Library (PCL).
The latest package can be found on nugget. Install-Package VirtualizingObservableCollection. The source is on github.

Good job, thank you for sharing
Best Regards,
Please remember to mark the replies as answers if they help

Help Working With Variable Data Sets (PS CS3)

I have two different projects for which I believe the Variable Data Sets would / could work.
Project 1:
I have created a student badge for our TV Media class and I would like to be able to use a Variable Data Set to automatically read a data file with First and Last names and then place them in the locations of my choosing. I would like to be able to import the list and have it automatically create the 30 ID cards I need and/or print them too.
Project 2:
I have created a really nice looking school dance ticket. Currently they are about 1.5" x 6". I would like to be able to use a Variable Data Set to automatically add a couple of different pieces of information: Date, Ticket Number (sequential).
In each case, I have been able to successfully create a data set and replace a single variable, but I have not been able to modify more than one variable in a file.
Each text selection is on its own layer. I have tried to create a text file that looks like this:
FName, LName
Joe, Smith
Barney, Jones
Thalia, Chamoix
I have also tried to create two different files, such as for project two:
Date
January 04
February 12
March 19
Number
1001
1002
1003
No matter what I have tried, I still cannot seem to get more than one variable to function at a time.
Any assistance would be greatly appreciated.
Thank you!

I don't hink Data Merge is sophisticated enough to skip a page if the entire record is null, but it will skip blank lines if there is a null field and nothing else on the line.

Loading Matrix Bound to UDO with Large Data Set

Hello Experts,
I have been looking on the forums for the best method out there to effectively load a Matrix that is bound to a User Defined Object (UDO). In short, I will explain to you what I would like to do. I have a form that has a matrix on it bound to a User Defined Object. This matrix takes data stored in other UDO forms/tables and processes it to extract new information.
Unfortunately, the resulting dataset is quite large (up to 1000 rows). I realize if this were just a "report" I could easily do this with a Grid. I also realize if this were just a Matrix bound to a User Defined Table, I could bind it to a DataTable and perform the query that way. However, since this is a Matrix bound to a DBDataSource (as I would like to have SAP handle any updates/finds) I believe my only options are to try and use a DBDataSource.Query method and try to work with Conditions.
The DBDataSource.Query method has not proven to be effective due to the complexity of the query and the multiple tables involved. I have read from others on the forum that I could just load the matrix by temporarily databinding the matrix to a DataTable and then, after it is loaded, switch the databinding back to the DBDataSource but this does not work as it comes back with an error informing me (rightly so) that there are already rows in the matrix.
One final option would be to use the User Interface (UI) to cycle through and update each cell of the matrix with the results of a recordset, but, as I said, this can be a large dataset and that could take hours (literally).
In short, I was wondering if anyone out there can advise me on the most effective options I have. Is there a way to quickly load a matrix bound to a DBDataSource? Is there someway I can load the matrix by binding it to a DataTable and then quickly move this information over to the DBDataSource (I already attempted this and the method I used was as slow as using the UI to update the Matrix)? Are there effective ways to use the DBDataSource.Query method that I do not know much about (and cannot find many examples of how this functionality is truly used)? Should I abandon the DBDataSource (though I believe this is the SAP preferred method) and, if so, is there another technique to appropriately update the database other than using DBDataSource? Others have mentioned handling the updates to the database themselves but I am not sure what this means (maybe it means using SQL UPDATE/INSERT?). Is there a ways to Flush matrix information to a DBDataSource if the DBDataSource was not used in the loading and is not currently bound to the matrix?
Sorry for the numerous amount of questions but thanks for the advise.

        Dim oForm As SAPbouiCOM.Form
        Dim creationPackage As SAPbouiCOM.FormCreationParams
        creationPackage = sbo_application.CreateObject(SAPbouiCOM.BoCreatableObjectType.cot_FormCreationParams)
        creationPackage.UniqueID = "MyFormID"
        creationPackage.FormType = "MyFormID"
        creationPackage.ObjectType = "UDO_TEST"
        creationPackage.BorderStyle = SAPbouiCOM.BoFormBorderStyle.fbs_Fixed
        oForm = sbo_application.Forms.AddEx(creationPackage)
        oForm.Visible = True
        oForm.Width = 300
        oForm.Height = 400
        Dim oItem As SAPbouiCOM.Item
        oItem = oForm.Items.Add("1", BoFormItemTypes.it_BUTTON)
        oItem.Top = 336
        oItem.Left = 5
        oItem = oForm.Items.Add("2", BoFormItemTypes.it_BUTTON)
        oItem.Top = 336
        oItem.Left = 80
Now put and Edit box to DocEntry
        oItem = oForm.Items.Add("3", SAPbouiCOM.BoFormItemTypes.it_EDIT)
        oItem.Top = 5
        oItem.Left = 5
        oItem.Width = 100
        Dim oEditText As SAPbouiCOM.EditText = oItem.Specific
        oForm.DataSources.DataTables.Add("oMatrixDT")
        oItem = oForm.Items.Add("oMtrx1", SAPbouiCOM.BoFormItemTypes.it_MATRIX)
        oItem.Top = 20
        oItem.Left = 20
        oItem.Width = oForm.Width - 30
        oItem.Height = oForm.Height - 100
        Dim oMatrix As SAPbouiCOM.Matrix = oItem.Specific
        Dim oColumn As SAPbouiCOM.Column = oMatrix.Columns.Add("#", SAPbouiCOM.BoFormItemTypes.it_EDIT)
        oColumn.TitleObject.Caption = "#"
        oColumn = oMatrix.Columns.Add("oClmn0", SAPbouiCOM.BoFormItemTypes.it_LINKED_BUTTON)
        oColumn.TitleObject.Caption = "BP Code"
        Dim oLinkedButton As SAPbouiCOM.LinkedButton = oColumn.ExtendedObject
        oLinkedButton.LinkedObject = SAPbouiCOM.BoLinkedObject.lf_BusinessPartner
        ' Now bind Columns to UDO Objects in Add Mode
        oEditText.DataBind.SetBound(True, "@UDO_TEST", "DocEntry")
        oMatrix.Columns.Item("oClmn0").DataBind.SetBound(True, "@UDO_TEST1", "U_CARDCODE")
        oForm.DataBrowser.BrowseBy = "3"

How do I select one record when working with image data sets?

David Powers had an example with creating spry data sets and using the filename in the database linked to images in the local files as data sources. The pages shows the images with the specified information requested, however all of the images display with their content. I want to pull an individual record with the image and content. HELP!
if (!function_exists("GetSQLValueString")) {
function GetSQLValueString($theValue, $theType, $theDefinedValue = "", $theNotDefinedValue = "")
if (PHP_VERSION < 6) {
    $theValue = get_magic_quotes_gpc() ? stripslashes($theValue) : $theValue;
$theValue = function_exists("mysql_real_escape_string") ? mysql_real_escape_string($theValue) : mysql_escape_string($theValue);
switch ($theType) {
    case "text":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;
    case "long":
    case "int":
      $theValue = ($theValue != "") ? intval($theValue) : "NULL";
      break;
    case "double":
      $theValue = ($theValue != "") ? doubleval($theValue) : "NULL";
      break;
    case "date":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;
    case "defined":
      $theValue = ($theValue != "") ? $theDefinedValue : $theNotDefinedValue;
      break;
return $theValue;
$maxRows_rs_getPhoto = 10;
$pageNum_rs_getPhoto = 0;
if (isset($_GET['pageNum_rs_getPhoto'])) {
$pageNum_rs_getPhoto = $_GET['pageNum_rs_getPhoto'];
$startRow_rs_getPhoto = $pageNum_rs_getPhoto * $maxRows_rs_getPhoto;
mysql_select_db($database_gepps1_db, $gepps1_db);
$query_rs_getPhoto = "SELECT last_name, first_name, personal_bio, file_name, width, height FROM mem_profile";
$query_limit_rs_getPhoto = sprintf("%s LIMIT %d, %d", $query_rs_getPhoto, $startRow_rs_getPhoto, $maxRows_rs_getPhoto);
$rs_getPhoto = mysql_query($query_limit_rs_getPhoto, $gepps1_db) or die(mysql_error());
$row_rs_getPhoto = mysql_fetch_assoc($rs_getPhoto);
if (isset($_GET['totalRows_rs_getPhoto'])) {
$totalRows_rs_getPhoto = $_GET['totalRows_rs_getPhoto'];
} else {
$all_rs_getPhoto = mysql_query($query_rs_getPhoto);
$totalRows_rs_getPhoto = mysql_num_rows($all_rs_getPhoto);
$totalPages_rs_getPhoto = ceil($totalRows_rs_getPhoto/$maxRows_rs_getPhoto)-1;
?>
<table width="800" border=" ">
<tr>
    <td>Image</td>
    <td>thumbnail</td>
    <td>firstname</td>
    <td>lastname</td>
    <td>personal bio</td>
</tr>
<?php do { ?>
    <tr>
      <td><img src="<?php echo $row_rs_getPhoto['file_name']; ?>" alt="" width="<?php echo $row_rs_getPhoto['width']; ?>" height="<?php echo $row_rs_getPhoto['height']; ?>"></td>
      <td><img src="<?php echo $row_rs_getPhoto['file_name']; ?>" alt="" width="50" height="35"></td>
      <td><?php echo $row_rs_getPhoto['first_name']; ?></td>
      <td><?php echo $row_rs_getPhoto['last_name']; ?></td>
      <td><?php echo $row_rs_getPhoto['personal_bio']; ?></td>
    </tr>
    <?php } while ($row_rs_getPhoto = mysql_fetch_assoc($rs_getPhoto)); ?>
</table>
<?php
mysql_free_result($rs_getPhoto);
?>

I tried pulling the record by using entered value, but then I would need to create several recordsets.
It's actually no problem doing that. Can you explain more what you want the final result of this page to display? You are pulling a recordset of the entire group of photos. Do you still want that comprehensive recordset on this page? How many other images do you want? Are you trying to make a master/detail pair where this page displays only the details for a single image? See what I mean?

Working with large data and PHP

Using a backend MySQL database, I'd like to interact with
this data using PHP. I got information from one link:
http://www.sephiroth.it/tutorials/flashPHP/pageable_recordset/
Which is exactly what I need. Page the Mysql server every
time you need to grab data and also have a listener that will
automatically update the client. However I can't seem to get it
work. Does anyone know how to make this happen using a repeater and
a panel?

Read Developper's Guide at
http://www.adobe.com/support/documentation/en/flex/.
You'll find everything there !

Working with spry data sets

Hi,
I have page that uses a spry data set called 'dsSupport',
however i do not want to use a table to select the item in the list
i am instead using a spry select box:
<div spry:region="dsSupport">
<h1>Step1: Select your product:</h1>
<form id="form1" name="form1" method="post" action="">
<p><strong>Choose from a
list:</strong><br />
<span id="spryselect1">
<label>
<select name="prodlist" id="prodlist">
<option spry:repeat="dsSupport" spry:setrow="dsSupport"
value="{model}">{name}</option>
</select>
</label>
<span class="selectRequiredMsg">Please select an
item.</span>
</span>
</p>
</form>
</div>
This is connected to a spry detail region so i can pull up
more details from the data set, but when i change the option in the
select box nothing happens. Is this possible? If so can anyone
help!
If i drop a spry table in to the div tag as well i can select
the items in there and the detail region does change so i know its
linked ok and all the table items are showing in the select box, i
just cannot get it to change when i select something different in
the select box!

fixed it using:
<select spry:repeatchildren="dsSupport"
spry:choose="choose" name="prodlist"
onChange="dsSupport.setCurrentRow(this.selectedIndex);">
<option spry:when="{ds_RowNumber} ==
{ds_CurrentRowNumber}" selected="selected">{name}</option>
<option spry:default="default">{name}</option>
</select>

Working with large data

I have to create a database with specific distribution of key size and data
size. Key size is some few bytes, and data size varies in a great range from
few bytes to some megabytes with average size near 64K. Overall size of a
database filled by key/data pairs is some gigabytes. One can imagine our
key/data pairs form context index (inverted file) for a large set of
Russian/English texts.
Could you recommend the best way to configure such a data storage? I mean the
best random read speed for key/data pairs in a given database and good enough
write speed (for "context index" updating).

Hi,
The most important configuration item in your case will be the cache size configuration. The larger the cache the better performance will be - especially for random read-oriented usage. Documentation about the cache configuration API are here:
http://www.sleepycat.com/docs/api_c/db_set_cachesize.html
An article about tuning cache size is available here:
http://www.sleepycat.com/newsletters/0511/a31_Perf_Size.html
Selecting the format for the database will also have an impact. Given your description I suggest that hash is likely the best solution - since the data access will be random. You should test with both hash and btree. An article describing the benefits/drawbacks of both is here:
http://www.sleepycat.com/docs/ref/am_conf/select.html
Then you might want to adjust the pagesize - given that your data items are generally large, a bigger page size will probably result in better performance. API here:
http://www.sleepycat.com/docs/api_c/db_set_pagesize.html
The db_stat utility is a very useful tool for tuning your database. Documentation can be found here:
http://www.sleepycat.com/docs/utility/db_stat.html
If you have any specific questions I will be glad to help.
Regards,
Alex

Large data sets and key terms

Hello, I'm looking for some guidance on how BI can help me. I am a business analyst in a health solutions firm, but not proficient in SQL. However, I have to work with large data sets that just exceed the capabilities of Excel.
Basically, I'm having to use Excel to manaully search for key terms and apply a values to those results. For instance, I have a medical claims file, with Provider Names, Tax ID, Charges, etc. It's 300,000 records long and 15-25 columsn wide. I need to search for key terms in the provider name like Ambulance, Fire Dept, Rescue, EMT, EMS, etc. Anything that resembles an ambulance service. Also, need to include abbreviations of them such as AMB, FD, or variations like EMT, E M T, EMS, E M S, etc. Each time I do a search, I have filter and apply an "N/A" flag.
That's just one key term. I also have things like Dentists or DDS, Vision, Optomemtry and a dozen other Provider Types that need to be flagged as "N/A".
Is this something that can be handled using BI? I have access to a BI group, but I need to understand more about the capabilities of what can be done. As an analyst, I'm having to deal with poor data inegrity. So, just cleaning up the file can be extremely taxing and cumbersome.
Some insight would be very helpful. Thanks.

I am not sure if you are looking for an explanation about different BI products? If so, may be this forum is not the place to get a straight answer.
But, Information Discovery product suite might be useful in your case. Regarding the "large date set" you mentioned, searching and analyzing 300,000 records may not be considered a large data set at least in Endeca standards :).
All your other requests, could also be very easily implemented using Endeca's product suite. Please reach out to Oracle's Endeca product team and they might guide you on how this product suite would help you.

Running out of memory while using cursored stream with large data

We are following the suggestions/recommendations for the cursored stream:
CursoredStream cursor = null;
          try
               Session session = getTransaction();
               int batchSize = 50;
               ReadAllQuery raq = getQuery();
               raq.useCursoredStream(batchSize, batchSize);
               int num = 0;
               ArrayList<Request> limitRequests = null;
               int totalLimitRequest = 0;
               cursor = (CursoredStream) session.executeQuery(raq);
               while( !cursor.atEnd() )
                    Request request = (Request) cursor.read() ;
                    if( num == 0 )
                         limitRequests = new ArrayList<Request>(batchSize);
                    limitRequests.add(request);
                    totalLimitRequest++;
                    num++;
                    if( num >= batchSize )
                         log.warn("Migrating batch of " + batchSize + " Requests.");
                         updateLimitRequestFillPriceForBatch(limitRequests);
                         num = 0;
                         cursor.releasePrevious();
               if( num > 0 )
                    updateLimitRequestFillPriceForBatch(limitRequests);
               cursor.close();
We are committing every 50 records in the unit of work, if we set DontMaintianCache on the ReadAllQuery we are getting PrimaryKeyExceptions intermittently, and we do not see much difference in the IdentityMap size.
Any suggestions/ideas for dealing with large data sets? Thanks

Hi,
If I use read-only classes with CursoredStream and execute the query within UOW, should I be saving any memory?
I had to use UOW because when I use Session to execute the query I get
6115: ISOLATED_QUERY_EXECUTED_ON_SERVER_SESSION
Cause: An isolated query was executed on a server session: queries on isolated classes, or queries set to use exclusive connections, must not be executed on a ServerSession or in CMP outside of a transaction.
I assume marking the descriptor as read-only will avoid registering in UOW, but I want to make sure that this is the case while using CursoredStream.
We are running in OC4J(OAS10.1.3.4) with BeanManagedTransaction.
Please suggest.
Thanks
-Raam
Edited by: Raam on Apr 2, 2009 1:45 PM

Working with Large List in sharepoint 2010

Hi All
I have a list with almost 10k records in my sharepoint list and based on some business requirement i am binding (almost 6k records) the data to asp.net grid view and this will visible on the home page of the portal where most of the users will access. Can
someone please guide the best method to reduce the performance inorder the program to hit the SP list everytime the page loads...
Thanks & Regards
Rakesh Kumar

Hi,
If you are Working with large data retrieval from the content database (SharePoint list), the points below for your reference:
1. Limit the number of returned items.
SPQuery query = new SPQuery();
query.RowLimit =6000; // we want to retrieve2000 items
query.ListItemCollectionPosition = prevItems.ListItemCollectionPosition; // starting at a previous position
SPListItemCollection items = SPContext.Current.List.GetItems(query);
2. Limit the number of returned columns.
SPQuery query = new SPQuery();
query.ViewFields = "";
3. Query specific items using CAML (Collaborative Markup Language).
SPQuery query = new SPQuery();
query.Query = "15";
4.Use ContentIterator class
https://spcounselor-public.sharepoint.com/Blog/Post/2/Querying-a--big-list--with-ContentIterator-and-multiple-filters
5. Create a Stored Procedure in Database to get the special data, create a web service to get the data, when create a web part to show the data in home page.
Best Regards
Dennis Guo
TechNet Community Support

Converting tdm to lvm/ working with large amount of data

I use a PCI 6251 card for data aquisition, and labview version 8; to log 5 channels at 100 Khz for approximately 4-5 million samples on each channel (the more the better). I use the express VI for reading and writing data which is strored in .tdm format (file size of the .tdx file is around 150 MB). I did not store it in lvm format to reduce the time taken to aquire data.
1. how do i convert this binary file to a .mat file ?
2. In another approach, I converted the tdm file into lvm format, this works as long as the file size is small (say 50 MB) bigger than that labview memory gets full and will not save the new file. what is an efficient method to write data (say into lvm format) for big size files without causing labview memory to overflow? I tried saving to multiple files, saving one channel at a time, increased the computer's virtual memory (upto 4880 MB) but i still have problems with 'labview memory full' error.
3. Another problem i noticed with labview is that once it is used to aquire data, it occupies a lot of the computer's memory, even after the VI stops running, is ther a way to refresh the memory and is this mainly due to bad programming?
any suggestions?

I assume from your first question that you are attempting to get your data into Matlab. If that is the case, you have three options:
You can treat the tdx file as a binary file and read directly from Matlab. Each channel is a contiguous block of the data type you stored it in (DBL, I32, etc.), with the channels in the order you stored them. You probably know how many points are in each channel. If not, you can get this information from the XML in the tdm file. This is probably your best option, since you won't have to convert anything.
Early versions of TDM storage (those shipping with LV7.1 or earlier) automatically read the entire file into memory when you load it. If you have LV7.1, you can upgrade to a version which allows you to read portions of the file by downloading and installing the demo version of LV8. This will upgrade the shared USI component. You can then read a portion of your large data set into memory and stream it back out to LVM.
Do option 2, but use NI-HWS (available on your driver CD under the computer based instruments tab) instead of LVM. HWS is a hierarchical binary format based on HDF5, so Matlab can read the files directly through its HDF5 interface. You just need to know the file structure. You can figure that out using HDFView. If you take this route and have questions, reply to this post and I will try to answer them. Note that you may wish to use HWS for your future storage, since its performance is much better than TDM and you can read it from Matlab. HWS/HDF5 also supports compression, and at your data rates, you can probably pull this off while streaming to disk, if you have a reasonably fast computer.
Handling large data sets in LabVIEW is an art, like most programming languages. Check out the tutorial Managing Large Data Sets in LabVIEW for some helpful pointers and code.
LabVIEW does not release memory until a VI exits memory, even if the VI is not running. This is an optimization to prevent a repeatedly called VI from requesting the same memory every time it is called. You can reduce this problem considerably by writing empty arrays to all your front panel objects before you exit your top level VI. Graphs are a particulary common problem.
This account is no longer active. Contact ShadesOfGray for current posts and information.

How to handle large data sets?

Hello All,
I am working on a editable form document. It is using a flowing subform with a table. The table may contain up to 50k rows and the generated pdf may even take up to 2-4 Gigs of memory, in some cases adobe reader fails and "gives up" opening these large data sets.
Any suggestions?

On 25.04.2012 01:10, Alan McMorran wrote:
> How large are you talking about? I've found QVTo scales pretty well as
> the dataset size increases but we're using at most maybe 3-4 million
> objects as the input and maybe 1-2 million on the output. They can be
> pretty complex models though so we're seeing 8GB heap spaces in some
> cases to accomodate the full transformation process.
Ok, that is good to know. We will be working in roughly the same order
of magnitude. The final application will run on a well equipped server,
unfortunately my development machine is not as powerful so I can't
really test that.
> The big challenges we've had to overcome is that our model is
> essentially flat with no containment in it so there are parts of the
We have a very hierarchical model. I still wonder to what extent EMF and
QVTo at least try to let go of objects which are not needed anymore and
allow them to be garbage collected?
> Is the GC overhead limit not tied to the heap space limits of the JVM?
Apparently not, quoting
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html:
"The concurrent collector will throw an OutOfMemoryError if too much
time is being spent in garbage collection: if more than 98% of the total
time is spent in garbage collection and less than 2% of the heap is
recovered, an OutOfMemoryError will be thrown. This feature is designed
to prevent applications from running for an extended period of time
while making little or no progress because the heap is too small. If
necessary, this feature can be disabled by adding the option
-XX:-UseGCOverheadLimit to the command line."
I will experiment a little bit with different GC's, namely the parallel GC.
Regards
Marius

Is anyone working with large datasets ( 200M) in LabVIEW?

I am working with external Bioinformatics databasesa and find the datasets to be quite large (2 files easily come out at 50M or more). Is anyone working with large datasets like these? What is your experience with performance?

Colby, it all depends on how much memory you have in your system. You could be okay doing all that with 1GB of memory, but you still have to take care to not make copies of your data in your program. That said, I would not be surprised if your code could be written so that it would work on a machine with much less ram by using efficient algorithms. I am not a statistician, but I know that the averages & standard deviations can be calculated using a few bytes (even on arbitrary length data sets). Can't the ANOVA be performed using the standard deviations and means (and other information like the degrees of freedom, etc.)? Potentially, you could calculate all the various bits that are necessary and do the F-test with that information, and not need to ever have the entire data set in memory at one time. The tricky part for your application may be getting the desired data at the necessary times from all those different sources. I am usually working with files on disk where I grab x samples at a time, perform the statistics, dump the samples and get the next set, repeat as necessary. I can calculate the average of an arbitrary length data set easily by only loading one sample at a time from disk (it's still more efficient to work in small batches because the disk I/O overhead builds up).
Let me use the calculation of the mean as an example (hopefully the notation makes sense): see the jpg. What this means in plain english is that the mean can be calculated solely as a function of the current data point, the previous mean, and the sample number. For instance, given the data set [1 2 3 4 5], sum it, and divide by 5, you get 3. Or take it a point at a time: the average of [1]=1, [2+1*1]/2=1.5, [3+1.5*2]/3=2, [4+2*3]/4=2.5, [5+2.5*4]/5=3. This second method required far more multiplications and divisions, but it only ever required remembering the previous mean and the sample number, in addition to the new data point. Using this technique, I can find the average of gigs of data without ever needing more than three doubles and an int32 in memory. A similar derivation can be done for the variance, but it's easier to look it up (I can provide it if you have trouble finding it). Also, I think this funtionality is built into the LabVIEW pt by pt statistics functions.
I think you can probably get the data you need from those db's through some carefully crafted queries, but it's hard to say more without knowing a lot more about your application.
Hope this helps!
Chris
Attachments:
Mean Derivation.JPG ‏20 KB

Working with Large data sets Waveforms

Similar Messages

Maybe you are looking for