Sorting of a large text file

I am rewriting a UNIX script to Java. The UNIX script has syncsort code contained in it. The sort is used to sort large text files. The files consist of very large records, up to 3000 characters. I have the file layouts for the fields within the records. Below are a couple syncsorts showing what I need to re-write.
The first sort, sorts the file by transaction type and only keeps certain transaction types.
$SYNCSORT << ____endsort1
/WARNING 1 /SKIPBLANK
/INFILE $XJOB/tran.ACHP
/OUTFILE $XJOB/tran.ACHP.careabout
/FIELDS tran 1 CHAR 3, seq 4 CHAR 2, check_field 9 CHAR 1
/CONDITION CARE_ABOUT ((tran="104" OR tran="154" OR tran="165" OR tran="301" OR tran="385") AND (seq="00" OR seq="01")) OR (tran="104" AND check_field !="$")
/COPY
/INCLUDE CARE_ABOUT
/END
endsort1
The second sort. sorts by plan number, participant number in ascending order and summarizes the amount. I can summarize the amount after the sort in Java so that does not need to be done in the same sort step.
$SYNCSORT << ____endsort2
/WARNING 1 /SKIPBLANK
/INFILE $XJOB/extract.ACHP
/FIELDS plan 1 CHAR 6, amount 8 TS 15, partid 23 CHAR 9
/KEYS plan, partid ASCENDING
/SUMMARIZE TOTAL amount
/OUTFILE $XJOB/extract.ACHP.summary
/END
endsort2
What is the best way to read in the file, store the data, i.e. vector or array, and how would I go about sorting a large file with very long records?
Much thanks in advance!!!

try looking at the comparable interface and the collections.sort() methods. maybe they will save u a little work. as for using arrays or vectors i would use arraylists/vectors so i could use the collections methods. and for the rest of it, it will depend more on your exact needs and i guess you'll just have to go about and make your own configuration systems etc.. shouldnt be too tuff.

Similar Messages

Arbitrary waveform generation from large text file

Hello,
I'm trying to use a PXI 6733 card hooked up to a BNC 2110 in a PXI 1031-DC chassis to output arbitrary waveforms at a sample rate of 100kS/s. The types of waveforms I want to generate are generally going to be sine waves of frequencies less than 10 kHz, but they need to be very high quality signals, hence the high sample rate. Eventually, we would like to go up to as high as 200 kS/s, but for right now we just want to get it to work at the lower rate.
Someone in the department has already created for me large text files > 1GB with (9) columns of numbers representing the output voltages for the channels(there will be 6 channels outputting sine waves, 3 other channels with a periodic DC voltage. The reason for the large file is that we want a continuous signal for around 30 minutes to allow for equipment testing and configuration while the signals are being generated.
I'm supposed to use this file to generate the output voltages on the 6733 card, but I keep getting numerous errors and I've been unable to get something that works. The code, as written, currently generates an error code 200290 immediately after the buffered data is output from the card. Nothing ever seems to get enqued or dequed, and although I've read the Labview help on buffers, I'm still very confused about their operation so I'm not even sure if the buffer is working properly. I was hoping some of you could look at my code, and give me some suggestions(or sample code too!) for the best way to achieve this goal.
Thanks a lot,
Chris(new Labview user)

Chris:
For context, I've pasted in the "explain error" output from LabVIEW to refer to while we work on this. More after the code...
Error -200290 occurred at an unidentified location
Possible reason(s):
The generation has stopped to prevent the regeneration of old samples. Your application was unable to write samples to the background buffer fast enough to prevent old samples from being regenerated.
To avoid this error, you can do any of the following:
1. Increase the size of the background buffer by configuring the buffer.
2. Increase the number of samples you write each time you invoke a write operation.
3. Write samples more often.
4. Reduce the sample rate.
5. Change the data transfer mechanism from interrupts to DMA if your device supports DMA.
6. Reduce the number of applications your computer is executing concurrently.
In addition, if you do not need to write every sample that is generated, you can configure the regeneration mode to allow regeneration, and then use the Position and Offset attributes to write the desired samples.
By default, the analog output on the device does what is called regeneration. Basically, if we're outputting a repeating waveform, we can simply fill the buffer once and the DAQ device will reuse the samples, reducing load on the system. What appears to be happening is that the VI can't read samples out from the file fast enough to keep up with the DAQ card. The DAQ card is set to NOT allow regeneration, so once it empties the buffer, it stops the task since there aren't any new samples available yet.
If we go through the options, we have a few things we can try:
1. Increase background buffer size.
I don't think this is the best option. Our issue is with filling the buffer, and this requires more advanced configuration.
2. Increase the number of samples written.
This may be a better option. If we increase how many samples we commit to the buffer, we can increase the minimum time between writes in the consumer loop.
3. Write samples more often.
This probably isn't as feasible. If anything, you should probably have a short "Wait" function in the consumer loop where the DAQmx write is occurring, just to regulate loop timing and give the CPU some breathing space.
4. Reduce the sample rate.
Definitely not a feasible option for your application, so we'll just skip that one.
5. Use DMA instead of interrupts.
I'm 99.99999999% sure you're already using DMA, so we'll skip this one also.
6. Reduce the number of concurrent apps on the PC.
This is to make sure that the CPU time required to maintain good loop rates isn't being taken by, say, an antivirus scanner or something. Generally, if you don't have anything major running other than LabVIEW, you should be fine.
I think our best bet is to increase the "Samples to Write" quantity (to increase the minimum loop period), and possibly to delay the DAQmx Start Task and consumer loop until the producer loop has had a chance to build the queue up a little. That should reduce the chance that the DAQmx task will empty the system buffer and ensure that we can prime the queue with a large quantity of samples. The consumer loop will wait for elements to become available in the queue, so I have a feeling that the file read may be what is slowing the program down. Once the queue empties, we'll see the DAQmx error surface again. The only real solution is to load the file to memory farther ahead of time.
Hope that helps!
Caleb Harris
National Instruments | Mechanical Engineer | http://www.ni.com/support

Editing and changing large text file

hi,
new to this, so bare with me.
got a large text file 44meg and i need to change some values in it.
example:
TSX ;20030102;40302216;40300579;1980;1900;3762000
i need to change the lines so that they read:
TSX ;20030102;302216;300579;1980;1900;3762000
thus removing the leading 40 in the middle cols.
Thanks in advance
john

crap, small mistake
1) use BufferedReader to read in the file line by line (BufferedReader.readLine())
2a) for each line, split it on the semicolons (String.split())
2b) change the middle value using String.substring()
2c) construct a new line by appending all strings in the array returned by 2a) to eachother
2d) write this new line to a file using PrintStream (PrintSteam.println())
3) when done, close both the reader and the printstream.

Sorting Names in a text file using java program

h1. Deleting numbers in a text file, sorting the names and writing in to a new file
h2. Sample data
=================
71234 RAJA
89763 KING
89877 QUEEN
==================
h2. Java Program
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.InputStreamReader;
import java.io.Writer;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
class Sortnames {
public static void main(String args[]) {
try {
ArrayList storeWordList = new ArrayList();
ArrayList storeWordList2 = new ArrayList();
               int i;
char c;
               String inputfile = "names_konda.txt";
               String tempfile = "newdatafile.txt";
               String outputfile = "sortedoutput.txt";
               StringBuffer strBuff=null;
Writer output = null;
File file =null;
               Writer sortedoutput = null;
boolean select =false;
               //*********************************************** Reading data from Input file
FileInputStream fstream = new FileInputStream(inputfile);
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
storeWordList.add(strLine);
//Close the input stream
in.close();
               //*********************************************** Adding only alphabets to buffer and writing to file
               file= new File(tempfile);
               output = new BufferedWriter(new FileWriter(file));
strBuff = new StringBuffer();
for (Iterator iter = storeWordList.iterator(); iter.hasNext();) {
     String s = (String) iter.next();
                    //System.out.println("String "+s);
     for (i = 0; i < s.length() ; i++)
c = s.charAt(i);
if (Character.isLetter(c)) {
strBuff.append(c);
                    strBuff.append("\n");
String myout = strBuff.toString();
     output.write(myout);
     output.close();
     //============================================== Reading the created file,Sorting and placing in Collections
               FileInputStream fstream2 = new FileInputStream(tempfile);
// Get the object of DataInputStream
DataInputStream in2 = new DataInputStream(fstream2);
BufferedReader br2 = new BufferedReader(new InputStreamReader(in2));
String strLine2;
//Read File Line By Line
while ((strLine2 = br2.readLine()) != null) {
storeWordList2.add(strLine2);
Collections.sort(storeWordList2);
               //===================================================
File sortedfile = new File(outputfile);
sortedoutput = new BufferedWriter(new FileWriter(sortedfile));
for(int m=0; m<storeWordList2.size(); m++)
     if (storeWordList2.get(m).toString().trim().length() > 0)
          sortedoutput.write(storeWordList2.get(m).toString());
                    sortedoutput.write("\n");
sortedoutput.close();
System.out.println("Names Sorted SUCCESFULLY");
br.close();
br2.close();
in.close();
fstream.close();
fstream2.close();
               ///==========================================
               File f1 = new File(tempfile);
               boolean success = f1.delete();
               if (!success){
                    System.out.println("failed deletion ");
                    System.exit(0);
                    }else{
                    System.out.println("deleted");
               // ==========================================
} catch (Exception e) {//Catch exception if any
System.out.println("Error: " + e);
h2. Thanks and regards
h1. BalaNagaRaju.M
h1. [email protected]

Do you have a question? Also see the sticky welcome post on how to post formatted code.

Exit labview (executables) after using large text files

Hello,
I am using LabView 6.0 and his aplication builder / runtime engine. I wrote some VI`s to convert large Tab delimited textfiles (up to 50 mb). When I am finished with the file it is staying in the memory somehow and is staggered with other (text)files in such a way the computer is slowing down.
When I want to exit the VI (program) it will take a very long time to get lost of the program (resetting LabView) and get my speed back.
How kan I solve this problem for these large files?
Martin.

OK, this may be a bit of a problem to track down, but let's start.
First, while your front panel looks great, your code is very hard to read. Overlapping elements, multiple nested structures and a liberal use of locals make this a problem. My first suggestion would be to with a massive cleanup operation. Make more room, make wires straight, make sure things are going left-to-right, make subVIs, place some documentation and so on. You won't believe the difference this makes.
After you did that, we can turn to find the problems. Some likely suspects are the local variables and the array functions. You use many local variables and perform resizing operations which are certain to generate copies. If you do this on arrays with dozens of MBs of data, this looks like the most likely source of the problem. Some suggestions to deal with this - if you have repeating code, make subVIs or move the code outside of the structures, so that it only has to be run once. Also, you seem to have some redundant code. For instance, you open the file only to see if you get an error. You should be able to do this with the VIs in the advanced palette without opening it (and you won't need to close it, either). Another example - you check the exit conditions in many places in your code. If your loop runs fast enough, there is no need for it. Some more suggestions - use shift registers instead of locals and avoid setting the same properties over and over again in the loop.
After you do these, it will probably be much easier to find the problem.
To learn more about LabVIEW, I suggest you try searching this site and google for LabVIEW tutorials. Here and here are a couple you can start with. You can also contact your local NI office and join one of their courses.
In addition, I suggest you read the LabVIEW style guide and the LabVIEW user manual (Help>>Search the LabVIEW Bookshelf).
And one last thing - having the VI run automatically and then use the Quit VI at the end is not very nice. Since you are building it, it will run automatically on its own and you can use the Application>>Kind property to quit only if it's an executable.
Try to take over the world!

Have a very large text file, and need to read lines in the middle.

I have very large txt files (around several hundred megabytes), and I want to be able to skip and read specific lines. More specifically, say the file looks like:
scan 1
scan 2
scan 3
scan 100,000
I want to be able to skip move the filereader immediately to scan 50,000, rather than having to read through scan 1-49,999.
Thanks for any help.

If the lines are all different lengths (as in your example) then there is nothing you can do except to read and ignore the lines you want to skip over.
If you are going to be doing this repeatedly, you should consider reformatting those text files into something that supports random access.

Creating large text files

What is the fastest as well as less memory intensive way to create large files.
Current State
I have an application where I am reading the database and processing the information (formatting it) and using some third party API to add the information and then finally save as text file. The problem is since its third party API and we do not have any control on that and it takes very long time to generate the files
Future state
I want to build the file generator which will read from the database and then process/format the information one by one. Now I have following options.
1) Add the processed information in a StringBuffer line by line and at the end of it create a file from the StringBuffer and save it.
2) Create a custom object with different ArrayLists and keep on adding the processed lines into appropriate lists and at the end of it while saving it to a file read the custom object and save it as a file.
3) Create a file at the start of it and then keep on adding and flushing the lines one by one and at the end of it close the file.
For handling files I was thinking of using PrintWriter. I am talking about the text files anyware from 50 KB to 20 MB.
I have performance concerns as well as memory issues. So I want a balanced solution so that I am able to handle both.

Use a BufferedWriter to write each line/entry/record as you process it.
Don't do any special flushing() (unless you need special transactional properties in which case you need a lot more than simple flush() calls).

Loading large text files in JTextArea

JTextArea can't seem to load an 8MB text file, as doing so will throw an OutOfMemoryError. The problem is, 8MB really isn't that much data, and there are files in our system which are much bigger than that. And I'm at a loss to find why a mere 8MB is causing Java to run out of its 64MB of memory.
So I'm starting to try another approach.
Here's the idea:
    public void doStuff() {
        Reader reader = dataStore.retrieve(...);
        RandomAccessFile tempFile = new RandomAccessFile(...);
        FileChannel tempChannel = tempFile.getChannel();
        // CODE OMITTED: copying reader to tempChannel as Java chars.
        // Now, map the file into memory.
        CharBuffer charBuffer = tempChannel.map(FileChannel.MapMode.READ_WRITE, 0, fileLength).asCharBuffer();
        Document doc = new CharBufferDocument(charBuffer);
        textArea.setDocument(doc);
    }This should work in theory. The problem I'm having is figuring out how to make this custom document.
Here's what I have so far:
    public class CharBufferDocument extends PlainDocument {
        private CharBufferDocument(CharBuffer charBuffer) {
            super(new CharBufferContent(charBuffer));
    class CharBuffeclass CharBufferContent implements AbstractDocument.Content {
        private CharBuffer charBuffer;
        private CharBufferContent(CharBuffer charBuffer) {
            this.charBuffer = charBuffer;
        public Position createPosition(int offset) throws BadLocationException {
            return new FixedPosition(offset);
        public int length() {
            return charBuffer.length();
        public UndoableEdit insertString(int where, String str) throws BadLocationException {
            throw new UnsupportedOperationException("Editing not supported");
        public UndoableEdit remove(int where, int nitems) throws BadLocationException {
            throw new UnsupportedOperationException("Editing not supported");
        public String getString(int where, int len) throws BadLocationException {
            Segment segment = new Segment();
            getChars(where, len, segment);
            return segment.toString();
        public void getChars(int where, int len, Segment txt) throws BadLocationException {
            char[] buf = new char[len];
            // Sync this, as the get method moves the cursor.
            synchronized (this) {
                charBuffer.get(buf, where, len);
                charBuffer.rewind();
            txt.array = buf;
            txt.offset = where;
            txt.count = len;
    class FixedPosition implements Position {
        private int offset;
        private FixedPosition(int offset) {
            this.offset = offset;
        public int getOffset() {
            return offset;
    }When I run this, I get a text area which only shows one character. What's happening is that my getChars(int,int,Segment) method is being called from Swing's classes, and only being asked for one character!
Does anyone have any idea how this is supposed to work? It seems that if Swing only ever asks for the first character, I'm never going to be able to display 8,000,000 characters. :-)

Not too sure though how to go about reading the last 5 lines say.One solution would be to read in an increasingly large block (estimate the typical line size * 5 + some bonus) of the file starting at position file size[i] - block size. As long as the block doesn't contains 5 complete lines (count newline chars), increase it by a given size and try again. Should still be faster than scanning the whole file from start to end.

Loading large text files into java vectors and outof memmory

Hi there,
need your help for the following:
i'm trying to load large ammoubnts of data into a Vector in order to concatenate several text files and treat them, but i'm getting outofmemory error. I even tried using xml structure and saving to database but the error is still the same. Can you help?
thanks
here's the code:
public void Concatenate() {
try {
//for(int i=0;i<1;i++) {
vEntries = new Vector();
for(int i=0;i<BopFiles.length;i++) {
MainPanel.WriteLog("reading file " + BopFiles[i] + "...");
FileInputStream fis = new FileInputStream(BopFiles);
BufferedInputStream bis = new BufferedInputStream(fis);
DataInputStream in = new DataInputStream(bis);
String line = in.readLine();
Database db = new Database();
Connection conn = db.open();
while(line != null) {
DivideLine(BopFiles[i], line);
line = in.readLine();
FreeMemory(db, conn);
MainPanel.WriteLog("Num of elements: " + root.getChildNodes().getLength());
MainPanel.WriteLog("Done!");
} catch (Exception e) {
e.printStackTrace();
public void DivideLine(String file, String line) {
     if (line.toLowerCase().startsWith("00694")) {
          Header hd = new Header();
          hd.headerFile = file;
          hd.headerLine = line;
          vHeaders.add(hd);
     } else if (line.toLowerCase().startsWith("10694")) {
          Line entry = new Line();
          Vector vString = new Vector();
          Vector vType = new Vector();
          Vector vValue = new Vector();
          entry.name = line.substring(45, 150).trim();
          entry.number = line.substring(30, 45).trim();
          entry.nif = line.substring(213, 222).trim();
          entry.index=BopIndex;
          entry.message=line;
          entry.file=file;
          String series = line.substring(252);
          StringTokenizer st = new StringTokenizer(series, "A");
          while (st.hasMoreTokens()) {
               String token=st.nextToken();
               if(!token.startsWith(" ")) {
                    vString.add(token);
                    vType.add(token.substring(2,4));
                    vValue.add(token.substring(4));
               token=null;
          entry.strings= new String[vString.size()];
          vString.copyInto(entry.strings);
          entry.types= new String[vType.size()];
          vType.copyInto(entry.types);
          entry.values= new String[vType.size()];
          vValue.copyInto(entry.values);
          vEntries.add(entry);
          entry=null;
          vString=null;
          vType=null;
          vValue=null;
          st=null;
          series=null;
          line=null;
          file=null;
          MainPanel.SetCount(BopIndex);
          BopIndex ++;
public void FreeMemory(Database db, Connection conn) {
try {
//db.update("CREATE TABLE entries (message VARCHAR(1000))");
               db.update("DELETE FROM entries;");
               PreparedStatement ps = null;
               for( int i=0; i<vEntries.size(); i++ ) {
                    Line entry = (Line) vEntries.get(i);
                    String value = "" + entry.message;
                    if(!value.equals("")) {
                         try {
                              ps = conn.prepareStatement("INSERT INTO entries (message) VALUES('" + Tools.RemoveSingleQuote(value) + "');");
                              ps.execute();
                         } catch(Exception e) {
                              e.printStackTrace();
                              System.out.println("error in number->" + i);
               MainPanel.WriteLog("Releasing memory...");
     vEntries = null;
     vEntries = new Vector();
     System.gc();
          } catch (Exception e1) {
               e1.printStackTrace();

Well, i need to treat those contents, and calculate values withing those files, so wrinting files using FileInputstream wont do. for instance i need to get line 5 from file 1, split it, grab a value according to its class (value also taken) and compare it with another line of another file, adding those values to asingle file.
that's why i need vector capabilities, but since these files have more than 5 Mb each, an out of memory error is returned by loading those values into vector.
A better explanaition:
Each file has a line like
CLIENTNUM CLASS VALUE
so if the client is the same withing 2 files, i need to sum the lines into a single file.
If class is the same, then sum values, if not add it to the front.
we could have a final line like
CLIENTNUM CLASS1 VALUE1 CLASS2 VALUE2

Problem with large text-files, HOWTO?

Hi!
I'm making an application witch shall search through a dir with 3000 html-files, and find all links in those files.
I have a text files with the format:
file1: linktofile:linktofile6:linktofile5
file2: linktofile1:linktofile87:
and so on.
This file shall then be searched when I'm pressing hyperlinks in IExplorer. The problem is that this file is VERY long both "horizontally and vertical". Is there a clever way to shorten it?

If you have to search the entire contents of all 3000 files every time, then I don't see how that could be shortened. But if you have to search those files only for instances of "linktofile1295", for example, then you could redesign your text files into a database where you could access those instances directly via an index.

Best Way To Extract Large Text Files From Query

Using CF7/SQL Server 2000. I need to extract about 50,000 records from another server, and save it to a .CSV file. Not sure what the best method is for this. I've got the query written, and it will pull all the records. But don't want to display them on the screen, just store in a .CSV file somewhere on the server.
I've looked at CFFILE and CFDIRECTORY, and those don't seem to be the best tools for this. What about using CFCONTENT with CFHEADER? I use that to output to Excel files
<CFCONTENT type="application/vnd.ms-excel">
<CFHEADER NAME="Content-Disposition" VALUE="attachment; filename=c:\mydir\myfile.xls">
But when trying to use it, it doesn't save the file on my drive. Would appreciate advice on the best tools/tags for extracting large volumes of data from other servers, and saving to a local file.
Once working, ultimately, I'd like to set it up, so the .CFM job runs daily, automatically, extracts the data and FTP's it to another server. I've seen the CFFTP tag, but until I can get a file saved on my drive, or another server's drive, there's nothing to FTP. Thanks for any help/advice.
Gary

Thanks, but I can't find any good examples of using CFFILE (just syntax of the tag, which doesn't make a lot of sense). I've been writing CF code for 8 years, and feel I can make it sing and dance. But I didn't understand how CFFILE worked, from reading the online documentation and my Ben Forta books. There are no good examples. Do you still write the query with CFQUERY, then "substitutute" CFFILE for CFOUTPUT? Or enclose CFFILE inside CFOUTPUT? I can't find any examples that explain this.
I just need to see a basic example of CFFILE "in action." Starting with a simple query, and a simple output that writes the query results to a file.
Lastly, once you have the data written to a file, using CFFILE, is that when you can use CFFTP, to FTP that file, (or any file on the hard drive for that matter) to another server?
Thanks for help, and any simple examples, just to get me started. Thanks.
Gary

Loading large text files to find duplicates

Hi there,
I have several files with 500 chars long at each line, and for each one i need to get a number (at index 213, 222), to make sure only one exists.
I also need to take the last 250 lines to manipulate them according to the number given.
The problem is that i've tried to store those results either in a vector or in a a table (using hsqldb), but in both case i get out of memory error while processing more than 74000 results.
So what to do?
Here's the code:
     public void Concatenate() {
          try {
               vClients = new Vector();
               GroupIdentical = Main.getProp("javabop.group.identical", "N");
               if(GroupIdentical.equalsIgnoreCase("s")) vNifs = new Vector();
               for(int i=0;i<BopFiles.length;i++) {
                    BoPPanel.WriteLogPane("A ler ficheiro " + BopFiles[i] + "...");
                    FileInputStream fis = new FileInputStream(BopFiles);
                    BufferedInputStream bis = new BufferedInputStream(fis);
                    DataInputStream in = new DataInputStream(bis);
                    String line = in.readLine();
                    //BoPPanel.SearchPane.append("\n Ficheiro " + BopFiles[i] + "\n\n");
                    while(line != null) {
                         if(line.toLowerCase().startsWith("10694")) {
                              GetEntry(BopFiles[i], line);
                              //BoPPanel.SearchPane.append(line + "\n");
                         } else if (line.toLowerCase().startsWith("00694")) {
                              Header hd = GetHeader(BopFiles[i], line);
                              vHeaders.add(hd);
                         line = in.readLine();
                    fis.close();
                    bis.close();
                    in.close();
                    System.gc();
               BoPPanel.WriteLogPane("Numero de elementos obtidos nos ficheiros: " + vClients.size());
               BoPPanel.WriteLogPane("Concatena��o conclu�da!");
               //if(GroupIdentical.equalsIgnoreCase("s")) FindDuplicated();
          } catch (Exception e) {
               e.printStackTrace();
               Main.WriteLogFile(e.getMessage());
     public Header GetHeader(String file, String line) {
          Header hd = new Header();
          hd.headerFile = file;
          hd.headerLine = line;
          vHeaders.add(hd);
          return hd;
     public void Saveintable(int num, int nif, String file, int index, String series, String line) {
          try {
               Database db = new Database();
               Connection conn = db.open();
               //db.update("DROP TABLE Save");
               //db.update("CREATE TABLE Save ( num INTEGER, nif INTEGER, file VARCHAR(100), index INTEGER, series VARCHAR(150), line VARCHAR(500))");
               //db.update("DELETE FROM Save;");
               String sqlInsert="INSERT INTO Save (num, nif, file, index, series, line) "
                    + " VALUES (?,?,?,?,?,?)";
                    PreparedStatement prep = conn.prepareStatement(sqlInsert);
                    prep.setInt(1, num);
                    prep.setInt(2, nif);
                    prep.setString(3, file);
                    prep.setInt(4, index);
                    prep.setString(5, series);
                    prep.setString(6, line);
                    prep.executeUpdate();
                    //prep.close();
                    //conn.close();
          } catch(Exception e) {
               e.printStackTrace();
     public void GetEntry(String file, String line) {
          String series = line.substring(252).trim();
          String numberstr = line.substring(30, 45).trim();
          String nifstr = line.substring(213, 222).trim();
          int num=0;
          if(!numberstr.equals("")) num=Integer.parseInt(numberstr);
          int nif=0;
          if(!nifstr.equals("")) nif=Integer.parseInt(nifstr);
          if(GroupIdentical.equalsIgnoreCase("s") && !nifstr.equals("")) vNifs.add(nifstr);
          Saveintable(num, nif, file, BopIndex, series, line);
          BoPPanel.SetCount(BopIndex);
          BopIndex ++;

here's the example fo 2 lines:
10694000000000200000000000000H000000001000504AAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAA                                                                                                                     195501231504PRT50YYYYYYYYY                     3 04000000000029000A
10694000000000300000000000000H000000001000153BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB                                                                                                                       195110151105PRT501XXXXXXXXX                     2 04000000000079680A                                                            I need to take out the numbers YYYYYYYYY and XXXXXXXXXXX and see if they match (there are 4 files with a total of 840000 lines and i need to concatenate all info from the files in only one and if they are the same i need to sum the values ending with A (29000A and 79680A).
Now imagine this for those 840000 lines....

Parse large (2GB) text file

Hi all,
I would like your expert tips on efficient ways (speed and memory considerations) for parsing large text files (~2GB) in Java.
Specifically, the text files in hand contain mobility traces that I need to process. Traces have a predefined format, and each trace is given on a new line in the text file.
To obtain the attribues of each trace i use java.util.regex.Pattern and java.util.regex.Matcher.
Thanks in advance,
Nick

Memory mapped files are faster when you need random access and you don't need to load all the data, however here it just add complexity you don't need IMHO.
I suspect most of the time is taken by the parser so if you customise your parser it could be faster. Here is a simple custom parser
public static void main(String... args) throws IOException {
    String template = "tr at %.1f \"$obj(1) pos 123.20 270.98 0.0 2.4\"%n";
    File file = new File("/tmp/deleteme.txt");
//        if(!file.exists()) {
        System.out.println(new Date()+": Writing to "+file);
        PrintWriter pw = new PrintWriter(file);
        for(int i=0;i<Integer.MAX_VALUE/template.length();i++)
            pw.printf(template, i/10.0);
        pw.close();
        System.out.println(new Date()+": ... finished writing to " + file + " length= " + file.length() / 1024 / 1024 + " MB.");
    long start = System.nanoTime();
    final BufferedReader br = new BufferedReader(new FileReader(file), 64 * 1024);
    for(String line;(line = br.readLine()) != null;) {
        int pos = 6;
        int end = line.indexOf(' ', pos);
        double time = Double.parseDouble(line.substring(pos, end));
        pos = line.indexOf('s', end+12)+2;
        end = line.indexOf(' ', pos+1);
        double x = Double.parseDouble(line.substring(pos, end));
        pos = end+1;
        end = line.indexOf(' ', pos+1);
        double y = Double.parseDouble(line.substring(pos, end));
        pos = end+1;
        end = line.indexOf(' ', pos+1);
        double z = Double.parseDouble(line.substring(pos, end));
        pos = end+1;
        end = line.indexOf('"', pos+1);
        double velocity = Double.parseDouble(line.substring(pos, end));
    br.close();
    long time = System.nanoTime() - start;
    System.out.printf(new Date()+": Took %,f sec to read %s%n", time / 1e9, file.toString());
{code}
prints
{code}
Sun May 08 09:38:02 BST 2011: Writing to /tmp/deleteme.txt
Sun May 08 09:42:15 BST 2011: ... finished writing to /tmp/deleteme.txt length= 2208 MB.
Sun May 08 09:43:21 BST 2011: Took 66.610883 sec to read /tmp/deleteme.txt
{code}

Read a large size text file

how can i read a large size text file in multiple parts without lossin any data ?
Ben

Why are you afraid of losing data? There's no reason that you would lose data if you are reading a large text file.
You should use the various Reader and Writer classes in package java.io to read and write text files. Here's an example of how you can read a text file line by line:
BufferedReader r = new BufferedReader(new FileReader("myfile.txt));
int lineno = 1;
String line;
while ((line = r.readLine()) != null) {
System.out.printf("%d: %s%n", i, line);
i++;
r.close();

Importing text file

Hello,
I am having trouble importing a large text file. I have done it a number of time successfully for smaller files but now it seems to run out of memory. This makes sense since the buffer can probably only hold so much data. But I have not been able to find a good example on how to import part of the file and then clear the buffer and import more data... if this is the way it should be done? The code I am using is below. Thanks soo much for your help. Stresed out ................. !!!!
FileReader fro = null;
        try {
            fro = new FileReader( "D:\\optegra2009\\OptegraFinder\\extract_input.txt" );
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        BufferedReader bro = new BufferedReader( fro );
        while( stringFromFile != null ) {// end of the file
            String[] result = stringFromFile.split("\\|");
            for (int x=0; x<result.length; x++) {
        }

Code for the output file:
FileOutputStream fos = new FileOutputStream("D:\\optegra2009\\OptegraFinder\\extract_output.txt");
                DataOutputStream dos = new DataOutputStream(fos);
dos.writeBytes(inputData.get("Doc"));
                      dos.writeBytes("|");
                     dos.writeBytes(inputData.get("Type"));
                    // dos.writeBytes(stringFromFile);
                      dos.writeBytes("|");
                     dos.writeBytes(resultSeta.getString("dm_file_name"));
                      dos.writeBytes("|");
                     dos.writeBytes(resultSeta.getString("dm_file_char_rev"));
                      dos.writeBytes("|");
                     dos.writeBytes(resultSeta.getString("dm_file_pool_name"));

Sorting of a large text file

Similar Messages

Maybe you are looking for