Parse large (2GB) text file

Hi all,
I would like your expert tips on efficient ways (speed and memory considerations) for parsing large text files (~2GB) in Java.
Specifically, the text files in hand contain mobility traces that I need to process. Traces have a predefined format, and each trace is given on a new line in the text file.
To obtain the attribues of each trace i use java.util.regex.Pattern and java.util.regex.Matcher.
Thanks in advance,
Nick

Memory mapped files are faster when you need random access and you don't need to load all the data, however here it just add complexity you don't need IMHO.
I suspect most of the time is taken by the parser so if you customise your parser it could be faster. Here is a simple custom parser
public static void main(String... args) throws IOException {
    String template = "tr at %.1f \"$obj(1) pos 123.20 270.98 0.0 2.4\"%n";
    File file = new File("/tmp/deleteme.txt");
//        if(!file.exists()) {
        System.out.println(new Date()+": Writing to "+file);
        PrintWriter pw = new PrintWriter(file);
        for(int i=0;i<Integer.MAX_VALUE/template.length();i++)
            pw.printf(template, i/10.0);
        pw.close();
        System.out.println(new Date()+": ... finished writing to " + file + " length= " + file.length() / 1024 / 1024 + " MB.");
    long start = System.nanoTime();
    final BufferedReader br = new BufferedReader(new FileReader(file), 64 * 1024);
    for(String line;(line = br.readLine()) != null;) {
        int pos = 6;
        int end = line.indexOf(' ', pos);
        double time = Double.parseDouble(line.substring(pos, end));
        pos = line.indexOf('s', end+12)+2;
        end = line.indexOf(' ', pos+1);
        double x = Double.parseDouble(line.substring(pos, end));
        pos = end+1;
        end = line.indexOf(' ', pos+1);
        double y = Double.parseDouble(line.substring(pos, end));
        pos = end+1;
        end = line.indexOf(' ', pos+1);
        double z = Double.parseDouble(line.substring(pos, end));
        pos = end+1;
        end = line.indexOf('"', pos+1);
        double velocity = Double.parseDouble(line.substring(pos, end));
    br.close();
    long time = System.nanoTime() - start;
    System.out.printf(new Date()+": Took %,f sec to read %s%n", time / 1e9, file.toString());
{code}
prints
{code}
Sun May 08 09:38:02 BST 2011: Writing to /tmp/deleteme.txt
Sun May 08 09:42:15 BST 2011: ... finished writing to /tmp/deleteme.txt length= 2208 MB.
Sun May 08 09:43:21 BST 2011: Took 66.610883 sec to read /tmp/deleteme.txt
{code}

Similar Messages

Read a large size text file

how can i read a large size text file in multiple parts without lossin any data ?
Ben

Why are you afraid of losing data? There's no reason that you would lose data if you are reading a large text file.
You should use the various Reader and Writer classes in package java.io to read and write text files. Here's an example of how you can read a text file line by line:
BufferedReader r = new BufferedReader(new FileReader("myfile.txt));
int lineno = 1;
String line;
while ((line = r.readLine()) != null) {
System.out.printf("%d: %s%n", i, line);
i++;
r.close();

Create SSIS Import parsing based on text file...

I have a text file listing 500 field names and their lengths. (In a few months we will begin receiving monthly data files matching that schema). I can create an SSIS 2008 project and manually parse the columns (inside SSIS) to match that
schema but I wonder if there is an easier way to do what I want (e.g. if I created a table using t-sql commands could SSIS somehow "pickup" the SSIS parsing from the table schema?)
TIA,
edm2

Not exactly. What I was hoping for is that instead of pointing SSIS to the data file and manually parsing all the fields that I could create an (empty) table with the right schema and have SSIS use that table schema to define how it should
parse the input data. (Kind of backwards from the usual approach.)
edm2
Sorry you cant parse text file for metadata like that as metadata has to fixed in SSIS
However one way you can implement this is as follows
1. Create your table with required schema
2. In SSIS have a data flow task with flat file source which points to your file. Choose only a row delimiter and no column delimiter
3. Put a oledb destination and point to staging table with an identity column
4. Put a Execute SQL Task with query as below
SELECT CASE WHEN (SELECT Column FROM Staging WHERE IDCol=1) = STUFF((SELECT ',' + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'Your schema table name'
AND TABLE_SCHEMA = 'dbo'
FOR XML PATH('')),1,1,'')
THEN 0
ELSE 1
END AS SchemaDiff
FROM (SELECT 1)t
Then map SchemaDiff column in the resultset to a boolean variable created in ssis. Also make sure you set resultset option to single row
Then you can use this boolean variable to check if schema is same. If False means no difference in schema else there's difference.
just a caution thar this will only compare column details without comparing their datatypes,lengths etc
Please Mark This As Answer if it solved your issue
Please Vote This As Helpful if it helps to solve your issue
Visakh
My Wiki User Page
My MSDN Page
My Personal Blog
My Facebook Page

Parse from a text file

Is there a way where i can parse a method, or conditional statement from a text file? Like lets say that i had the statement in Test.txt
if(X >= Y)
do something
}and X and Y are already defined in the java file, is there a way that i could parse it so that the java code invokes it, as if it were part of the java file?

Here's a little Groovy demo:
import groovy.lang.GroovyShell;
* To run this demo,
* - download 'groovy-binary-1.5.5.zip': http://dist.groovy.codehaus.org/distributions/groovy-binary-1.5.5.zip
* - unzip, and add 'lib/groovy-all-1.5.5.jar' to your classpath
public class GroovyDemo {
    public static void main(String[] args) throws Exception {
        String script =
            " if(X > Y) { \n"+
            "   return X+X \n"+
            " } else {     \n"+
            "   return Y/4 \n"+
            " }            \n";
        System.out.println("script=\n"+script);
        GroovyShell shell = new GroovyShell();
        shell.setProperty("X", new Integer(10));
        shell.setProperty("Y", new Integer(9));
        Object value = shell.evaluate(script);
        System.out.println("value="+value+", "+value.getClass());
        shell.setProperty("X", new Integer(8));
        value = shell.evaluate(script);
        System.out.println("value="+value+", "+value.getClass());
/* output:
    script=
     if(X > Y) {
       return X+X
     } else {
       return Y/4
    value=20, class java.lang.Integer
    value=2.25, class java.math.BigDecimal
*/

How to parse a custom text file (with custom separators) to a list inside a table

Hi,
i'm trying to parse +/-50 product detail html web pages to a combined PQ table showing each product and all sub-products inside their package :
let
Source = Folder.Files("N:\sample\Product_Details"),
TransformedColumn = Table.TransformColumns(Source,{{"Content", Lines.FromBinary}}),
RemovedOtherColumns = Table.SelectColumns(TransformedColumn,{"Content", "Name"}),
DuplicatedColumn = Table.DuplicateColumn(RemovedOtherColumns, "Content", "Copy of Content"),
#"Expand Content1" = Table.ExpandListColumn(DuplicatedColumn, "Content"),
FilteredRows = Table.SelectRows(#"Expand Content1", each Text.Contains([Content], "/h1")),
#"Expand Copy of Content" = Table.ExpandListColumn(FilteredRows, "Copy of Content"),
FilteredRows1 = Table.SelectRows(#"Expand Copy of Content", each Text.Contains([Copy of Content],">• ")),
---> try outs :
TransformedColumn2 = Table.TransformColumns(FilteredRows1,{{"Copy of Content",Lines.FromText}})
TransformedColumn3 = Table.TransformColumns(TransformedColumn2,{{{"Copy of Content",">• "},Text.splitAny}}),
#"Expand Copy of Content1" = Table.ExpandListColumn(TransformedColumn2,{"Copy of Content",">• "})
in
#"Expand Copy of Content1"
So the code here above...
_ list all HTML Files of the folder
_ create, for each file of the table, 1 row of data per line from inside the related HTML page
_ filter the lines to retrieve the Product Package name of each HTML page
_ create, for each "package entry" of the table, 1 row of data per line from inside the related HTML page
_ filter the lines to retrieve the sub-product Package details of each HTML page (one single line without carriage return)
---> stuck
So now for each "Package" entry row I've a text cell containing a list of sub-products separated by ">• " characters and I would like to convert this text to a list separated at each >• so I could afterward expand it to 1 row
per sub-product (with package name as first cell of the row)
normally Lines.FromText( sould provide the option to define a custom separator but when nested inside Table.TransformColumns(
I cannot find where to put this optionnal field !
I've search for some explanations on http://office.microsoft.com/en-us/excel-help/power-query-formula-categories-HA104122363.aspx but syntax transformation due to nesting isn't explained and samples doesn't cover much cases of usage they only cover
obvious usage with no option !
Can somebody help me on this ?

Oh I see. I missed the part about it being separate rows when I read your post the first time.
How about this... first do the Table.SplitColumn operation and then use the Unpivot operation. Doing this through the UI is pretty simple, but you can go straight through the formula language if you want to. The formula is Table.UnpivotOtherColumns.
Here's a simplified example:
let
    Source = #table({"Column1","Column2"},{{1,"a,b,c,d"},{2,"e,f,g"},{3,"h"}}),
    SplitColumnDelimiter = Table.SplitColumn(Source,"Column2",Splitter.SplitTextByDelimiter(","),{"Column2.1", "Column2.2", "Column2.3", "Column2.4"}),
    Unpivot = Table.UnpivotOtherColumns(SplitColumnDelimiter,{"Column1"},"Attribute","Value"),
    RemovedColumns = Table.RemoveColumns(Unpivot,{"Attribute"})
in
    RemovedColumns
There's probably a way to do it with Lines.FromText, but I think this is a bit simpler. It can all be done with clicks in the UI.

VBScript for parsing multiple text files

Hi,
I have around 175 text files that contain inventory information that I am trying to parse into an Excel file. We are upgrading our Office platform from 2003 to 2010 and my boss wants to know which machines will have trouble supporting it. I found a script
that will parse a single text file based upon ":" as the delimiter and I'm having trouble figuring out how to change it to open an entire folder of text files and write all of the data to a single Excel spreadsheet. Here is an example of the text
file I'll be parsing. I'm interested in the "Memory and Processor Information" and "Disk Drive Information" sections mainly.
ABEHRENS-XP Computer Inventory
OS Information
OS Details
Caption: Microsoft Windows XP Professional
Description:
InstallDate: 20070404123855.000000-240
Name: Microsoft Windows XP Professional|C:\WINDOWS|\Device\Harddisk0\Partition1
Organization: Your Mom
OSProductSuite:
RegisteredUser: Bob
SerialNumber: 55274-640-3763826-23029
ServicePackMajorVersion: 3
ServicePackMinorVersion: 0
Version: 5.1.2600
WindowsDirectory: C:\WINDOWS
Memory and Processor Information
504MB Total memory HOW CAN I PULL THIS WITHOUT ":" ALSO
Computer Model: HP d330 uT(DG291A)
Processor:               Intel(R) Pentium(R) 4 CPU 2.66GHz
Disk Drive Information
27712MB Free Disk Space ANY WAY TO PULL THIS WITHOUT ":"
38162MB Total Disk Space
Installed Software
Here is the start of the script I have so far. . .
Const ForReading = 1
Set objDict = CreateObject("Scripting.Dictionary")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile("C:\Test\test.txt" ,ForReading)
WANT THIS TO BE C:\Test
Do Until objTextFile.AtEndOfStream
strLine = objTextFile.ReadLine
If Instr(strLine,":") Then
arrSplit = Split(strLine,":") IS ":" THE BEST DELIMITER TO USE?
strField = arrSplit(0)
strValue = arrSplit(1)
If Not objDict.Exists(strField) Then
objDict.Add strField,strValue
Else
objDict.Item(strField) = objDict.Item(strField) & "||" & strValue
End If
End If
Loop
objTextFile.Close
Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = True
objExcel.Workbooks.Add
intColumn = 1
For Each strItem In objDict.Keys
objExcel.Cells(1,intColumn) = strItem
intColumn = intColumn + 1
Next
intColumn = 1
For Each strItem In objDict.Items
arrValues = Split(strItem,"||")
intRow = 1
For Each strValue In arrValues
intRow = intRow + 1
objExcel.Cells(intRow,intColumn) = strValue
Next
intColumn = intColumn + 1
Next
Thank you for any help.

You are The Bomb.com! I had to play around with it to pull some additional data (model and processor) and then write a quick macro to remove the unwanted text and finally I wanted the data to write in columns instead of rows so this is what I ended up with:
Option Explicit
Dim objFSO, objFolder, strFolder, objFile
Dim objReadFile, strLine, objExcel, objSheet
Dim intCol, strExcelPath
Const ForReading = 1
strFolder = "c:\Test"
strExcelPath = "c:\Test\Inventory.xlsx"
Set objExcel = CreateObject("Excel.Application")
objExcel.Workbooks.Add
Set objSheet = objExcel.ActiveWorkbook.Worksheets(1)
intCol = 0
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder(strFolder)
For Each objFile In objFolder.Files
intCol = intCol + 1
Set objReadFile = objFSO.OpenTextFile(objFile.Path, ForReading)
Do Until objReadFile.AtEndOfStream
    strLine = objReadFile.ReadLine
    If (InStr(strLine, "Computer Inventory") > 0) Then
      objSheet.Cells(intCol, 1).Value = Left(strLine, InStr(strLine, "Computer Inventory") - 2)
    End If
    If (InStr(strLine, "Total memory") > 0) Then
      objSheet.Cells(intCol, 2).Value = Left(strLine, InStr(strLine, "Total memory") - 2)
    End If
    If (InStr(strLine, "Computer Model:") > 0) Then
      objSheet.Cells(intCol, 3).Value = (strLine)
    End If
    If (InStr(strLine, "Processor:") > 0) Then
      objSheet.Cells(intCol, 4).Value = (strLine)
    End If
    If (InStr(strLine, "Total Disk Space") > 0) Then
      objSheet.Cells(intCol, 5).Value = Left(strLine, InStr(strLine, "Total Disk Space") - 2)
    End If
    If (InStr(strLine, "Free Disk Space") > 0) Then
      objSheet.Cells(intCol, 6).Value = Left(strLine, InStr(strLine, "Free Disk Space") - 2)
    End If
Loop
Next
objExcel.ActiveWorkbook.SaveAs strExcelPath
objExcel.ActiveWorkbook.Close
objExcel.Quit
Thanks again!
Hi ,
I am have very basic knowledge about VB scripting, but this code could be the perfect solution i am looking for. could you guide me exactly how to run and test the same , i would be really thankful for your kind and generous support on this.
Thanks ,
Veer

Parsing a Text file with nulls in records

Hello All,
I am relatively new to Java programming and I have been given a task that requires me to parse a CSV text file. I have to group the records based on a particular column and then do some math (add columns that have same keys). My problem is that the input file can have nulls and non numeric chars. I am confused how I can proceed in this situation, since I have to add these records, when I do a parseDouble it might fail. OK. I can get around it by assigning a zero in case there is a NumberFormatException but the result of my task is to render an output text file that from the input file. Here comes the catch, the requirement has it, that if the input file had a null or a non numeric char then while rendering the output, I have to populate a code as a place holder for that location where a null or non numeric char was found. I'd like to know if there is any trivial way of getting around this problem without using a Map to remember the location where the null was found. Any help is greatly appreciated. Thanks all in advance.

maybemedic wrote:
Mogalr,
The non numeric chars could be any random chars like aabb,null strings etc.In the past I've made small methods that would just check to see if the string was all characters and decimal point... and check formatting... check that it doesn't have 2 decimal points and after it's trimmed that there aren't any spaces and the length is >0.
The checking is slower than using like Double.doubleValue(), unless you hit a format exception. So you have to decide what quality of data you have before committing to you're game plan.

How to parse large xml file

I need to parse large xml file which contains following tag. The size of the file is upto 10MB-50MB or more.
<departments>
<department>
<a_depart id="124">
<b_depart id="Bss_253">
<bss_depart id="253">
<attributes>
<name_one>abc</name_one>
</attributes>
</bss_depart id="253">
</b_depart id="Bss_253">
</a_depart id="124">
</department>
<department>
<a_depart id="124">
<b_depart id="Bss_254">
<mss_depart id="253">
          <attributes>
          <name_one>abc</name_one>
          <name_two>xyz</name_one>
          </attributes>
     </mss_depart>
     </b_depart>
</a_depart>
</department>
<department>
<a_depart id="124">
<b_depart id="Bss_254">
<mss_depart id="255">
          <attributes>
          <name_one>abc</name_one>
          <name_two>xyz</name_one>
          </attributes>
     </mss_depart>
     </b_depart>
</a_depart>
</department>
<department>
<a_depart id="125">
<b_depart id="Bss_254">
<mss_depart id="253">
          <attributes>
          <name_one>abc</name_one>
          <name_two>xyz</name_one>
          </attributes>
     </mss_depart>
     </b_depart>
</a_depart>
</department>
I want to get the infomation for that xml file. like mss_depart id=233, building xpath dyanmically for every id and loading
that using dom4j. which is very very slow.
Is there any other solution for that to read the data using sax parser only.
I want to execute the xpath or data for the following way.
//a_depart/@id ------> all the ids of a_depart tags if it returns 3 values say 123,124,125
after that i want to execute
//a_depart[@id='123']/b_depart/@id like this ...to retrive the values of all the levels ...
     I am executing following xpath for every unique ids at all levels.
     List l = doc.selectNodes(xPathForID);
     List l1 = doc.selectNodes(xPathForAttributes+attributes.get(j)+"/text()");
But it is very slow and taking lot of time.
Is there any other way to solve this problem. If any please mail me it is urgent.
I am using jdk1.4 and jdk1.5
Is there any support for sax parser to execute xpath in jdk1.5 direclty, with out using dom4j
Thanks in advance....

I doubt you will find a preexisting solution to your problem.
SAX is usually recommended for processing big files (where "big" is undefined"). It works on big files by avoiding the messy problem of storing the data -- that is left as an exercise to you.
DOM (and its variants) works by building a Document object as the head of the tree of objects for the entire contents. With DOM, you can then use XPath, because there is something to search that is already in memory. To use XPath, you seem to have two choices, build a DOM-ish tree, or if you can find an XPath processor (I'm not sure if one exists) that can process the XML file directly, but it will be slow, since you are looking for "all" occurences of an attribute, and this means you have to read the entire file each time.
It might be worth exploring a hybrid approach -- use SAX to get some information, and build your own objects to store the data. Maybe a HashMap as the main index. But, that will keep you from using XPath, since you do not have the data structures it expects.
A third alternative would be to look at JAXB. It builds Java code from a Schema of your data and then when you import the data, it creates the necessary objects and fills in values. But, I don't think XPath woll work there either.
Dave Patterson

Arbitrary waveform generation from large text file

Hello,
I'm trying to use a PXI 6733 card hooked up to a BNC 2110 in a PXI 1031-DC chassis to output arbitrary waveforms at a sample rate of 100kS/s. The types of waveforms I want to generate are generally going to be sine waves of frequencies less than 10 kHz, but they need to be very high quality signals, hence the high sample rate. Eventually, we would like to go up to as high as 200 kS/s, but for right now we just want to get it to work at the lower rate.
Someone in the department has already created for me large text files > 1GB with (9) columns of numbers representing the output voltages for the channels(there will be 6 channels outputting sine waves, 3 other channels with a periodic DC voltage. The reason for the large file is that we want a continuous signal for around 30 minutes to allow for equipment testing and configuration while the signals are being generated.
I'm supposed to use this file to generate the output voltages on the 6733 card, but I keep getting numerous errors and I've been unable to get something that works. The code, as written, currently generates an error code 200290 immediately after the buffered data is output from the card. Nothing ever seems to get enqued or dequed, and although I've read the Labview help on buffers, I'm still very confused about their operation so I'm not even sure if the buffer is working properly. I was hoping some of you could look at my code, and give me some suggestions(or sample code too!) for the best way to achieve this goal.
Thanks a lot,
Chris(new Labview user)

Chris:
For context, I've pasted in the "explain error" output from LabVIEW to refer to while we work on this. More after the code...
Error -200290 occurred at an unidentified location
Possible reason(s):
The generation has stopped to prevent the regeneration of old samples. Your application was unable to write samples to the background buffer fast enough to prevent old samples from being regenerated.
To avoid this error, you can do any of the following:
1. Increase the size of the background buffer by configuring the buffer.
2. Increase the number of samples you write each time you invoke a write operation.
3. Write samples more often.
4. Reduce the sample rate.
5. Change the data transfer mechanism from interrupts to DMA if your device supports DMA.
6. Reduce the number of applications your computer is executing concurrently.
In addition, if you do not need to write every sample that is generated, you can configure the regeneration mode to allow regeneration, and then use the Position and Offset attributes to write the desired samples.
By default, the analog output on the device does what is called regeneration. Basically, if we're outputting a repeating waveform, we can simply fill the buffer once and the DAQ device will reuse the samples, reducing load on the system. What appears to be happening is that the VI can't read samples out from the file fast enough to keep up with the DAQ card. The DAQ card is set to NOT allow regeneration, so once it empties the buffer, it stops the task since there aren't any new samples available yet.
If we go through the options, we have a few things we can try:
1. Increase background buffer size.
I don't think this is the best option. Our issue is with filling the buffer, and this requires more advanced configuration.
2. Increase the number of samples written.
This may be a better option. If we increase how many samples we commit to the buffer, we can increase the minimum time between writes in the consumer loop.
3. Write samples more often.
This probably isn't as feasible. If anything, you should probably have a short "Wait" function in the consumer loop where the DAQmx write is occurring, just to regulate loop timing and give the CPU some breathing space.
4. Reduce the sample rate.
Definitely not a feasible option for your application, so we'll just skip that one.
5. Use DMA instead of interrupts.
I'm 99.99999999% sure you're already using DMA, so we'll skip this one also.
6. Reduce the number of concurrent apps on the PC.
This is to make sure that the CPU time required to maintain good loop rates isn't being taken by, say, an antivirus scanner or something. Generally, if you don't have anything major running other than LabVIEW, you should be fine.
I think our best bet is to increase the "Samples to Write" quantity (to increase the minimum loop period), and possibly to delay the DAQmx Start Task and consumer loop until the producer loop has had a chance to build the queue up a little. That should reduce the chance that the DAQmx task will empty the system buffer and ensure that we can prime the queue with a large quantity of samples. The consumer loop will wait for elements to become available in the queue, so I have a feeling that the file read may be what is slowing the program down. Once the queue empties, we'll see the DAQmx error surface again. The only real solution is to load the file to memory farther ahead of time.
Hope that helps!
Caleb Harris
National Instruments | Mechanical Engineer | http://www.ni.com/support

How to parse data from a text file with no convenient delimiters?

I need to read data from a text file. This file contains one line of data with the repeating pattern "time 00 ADVar2: ___ Height: ____ time 01 ADVar2: ___ Height: ___ ..." I need LabView to parse out the "time" and "height" values, build an array with the values, and graph the correlation on an X&Y plot. Does Labview have an automated way to read to the input data file and parse out the correct values, even without convenient delimiters? Thank you.

You actually do have a convenient delimiter: "time". Thus, you can make an array using that as the delimiter. Only caveat is that the first array element will be empty. Then you can conveniently use the Scan From String function in a for-loop. Something like this:
Message Edited by smercurio_fc on 11-21-2008 03:13 PM
Attachments:
Example_VI.png ‏9 KB

Exit labview (executables) after using large text files

Hello,
I am using LabView 6.0 and his aplication builder / runtime engine. I wrote some VI`s to convert large Tab delimited textfiles (up to 50 mb). When I am finished with the file it is staying in the memory somehow and is staggered with other (text)files in such a way the computer is slowing down.
When I want to exit the VI (program) it will take a very long time to get lost of the program (resetting LabView) and get my speed back.
How kan I solve this problem for these large files?
Martin.

OK, this may be a bit of a problem to track down, but let's start.
First, while your front panel looks great, your code is very hard to read. Overlapping elements, multiple nested structures and a liberal use of locals make this a problem. My first suggestion would be to with a massive cleanup operation. Make more room, make wires straight, make sure things are going left-to-right, make subVIs, place some documentation and so on. You won't believe the difference this makes.
After you did that, we can turn to find the problems. Some likely suspects are the local variables and the array functions. You use many local variables and perform resizing operations which are certain to generate copies. If you do this on arrays with dozens of MBs of data, this looks like the most likely source of the problem. Some suggestions to deal with this - if you have repeating code, make subVIs or move the code outside of the structures, so that it only has to be run once. Also, you seem to have some redundant code. For instance, you open the file only to see if you get an error. You should be able to do this with the VIs in the advanced palette without opening it (and you won't need to close it, either). Another example - you check the exit conditions in many places in your code. If your loop runs fast enough, there is no need for it. Some more suggestions - use shift registers instead of locals and avoid setting the same properties over and over again in the loop.
After you do these, it will probably be much easier to find the problem.
To learn more about LabVIEW, I suggest you try searching this site and google for LabVIEW tutorials. Here and here are a couple you can start with. You can also contact your local NI office and join one of their courses.
In addition, I suggest you read the LabVIEW style guide and the LabVIEW user manual (Help>>Search the LabVIEW Bookshelf).
And one last thing - having the VI run automatically and then use the Quit VI at the end is not very nice. Since you are building it, it will run automatically on its own and you can use the Application>>Kind property to quit only if it's an executable.
Try to take over the world!

Adding large arrays to a text file as new columns

I am trying to merge several large text data files into a single file in Labview 8.0. The files are too large to read in all at once (9-15 million lines each), so decided I need to read them in as smaller chunks, combine the arrays, and write them to a new file.
The reason there are three separate data files was for speed and streaming purposes in the project, and the users wanted the raw, unadulterated data written to file before any kind of manipulation took place.
My VI:
1. Takes a header generated from another VI and writes it to the output file.
2. Creates a time column based on sample rate and the total number of data points
3. Reads in 3 files that each have text data (each data point is 9 bytes wide, there are up to 15 million data points per file.
4. Each iteration of the for loop writes a chunk of 10 to 100 thousand points (Somewhere in there seems to be the fastest it will do), formatted with the time column on the left, then the three data columns, until it's done. I haven't quite figured out how to write the last iteration if there are fewer data points than the chunk size.
Anyways, the main thing I was looking for was suggestions on how to do this faster. It takes about a minute per million points on my laptop to do this operation, and though I recognize it is a lot of data to be moving around, this speed is painfully slow. Any ideas?
Attachments:
Merge Fast Data.vi ‏67 KB

Thanks for the tip. I put the constants outside the array and noticed a little improvement in the speed. I know I could improve the speed by using the binary file VI's but I need the files as tab delimited text files to import them into MATLAB for another group to do analysis. I have not had any luck converting binary files into text files. Is there an easy way to do that? I don't know enough about binary file systems to use them. I looked at the high speed data logger examples but they seemed complicated and hard to adapt to what I need to do. Creating the binary header file seemed like a chore.
I am up for more advice on the VI I posted, or suggestions on different ways to convert a binary file to a MATLAB readable text file.
Thanks!

Parse TEXT File??

Hi,
I am using CF 8
I have a text file (mylog.txt) that currently stores login
information for our users. I need to parse and extract some basic
information from the file and I am having a hard time. I need to
know the following
1. A list of all UNIQUE email addresses that appear in the
text file.
2. A count of how many times each email address appears
Each time a user logs on a "new line" is entered in the text
file. Here is some sample data:
[1/05/08 1:15:31 PM] User [email protected] logged in.
[1/11/08 3:57:30 PM] User [email protected] logged in.
[1/12/08 8:33:17 PM] User [email protected] logged in.
[2/17/08 6:37:07 AM] User [email protected] logged in.
[2/19/08 3:57:30 PM] User [email protected] logged in.
I would like to parse the text file and then output my
results to the screen. The output should look like this, note John
logged in twice:
Email Address | Count
[email protected] 1
[email protected] 2
[email protected] 1
[email protected] 1
The prior user here should have stored this in the database
but for now I still need to get at this data.
Any help appreciated.
Thanks,
-Westside

You can use cfhttp to convert your file variable to a query
and use query of queries to count email adddresses.

Could not parse the file contents as a data set. There were too many variable names in the first line of the text file.

Could not parse the file contents as a data set. There were too many variable names in the first line of the text file.

What are the Variables settings, what is the text file’s content, …?

Have a very large text file, and need to read lines in the middle.

I have very large txt files (around several hundred megabytes), and I want to be able to skip and read specific lines. More specifically, say the file looks like:
scan 1
scan 2
scan 3
scan 100,000
I want to be able to skip move the filereader immediately to scan 50,000, rather than having to read through scan 1-49,999.
Thanks for any help.

If the lines are all different lengths (as in your example) then there is nothing you can do except to read and ignore the lines you want to skip over.
If you are going to be doing this repeatedly, you should consider reformatting those text files into something that supports random access.

Parse large (2GB) text file

Similar Messages

Maybe you are looking for