Problem reading big file. No, bigger than that. Bigger.

I am trying to read a file roughly 340 GB in size. Yes, that's "Three hundred forty". Yes, gigabytes. (I've been doing searches on "big file java reading" and I keep finding things like "I have this huge file, it's 600 megabytes!". )
"Why don't you split it, you moron?" you ask. Well, I'm trying to.
Specifically, I need a slice "x" rows in. It's nicely delimited, so, in theory:
(pseudocode)
BufferedFileReader fr=new BufferedFileReader(new FileReader(new File(myhugefile)));
int startLine=70000000;
String line;
linesRead=0;
while ((line=fr.ReadLine()!=null)&&(linesRead<startLine))
linesRead++; //we don't care about this
//ok, we're where we want to be, start caring
int linesWeWant=100;
linesRead=0;
while ((line=fr.ReadLine()!=null)&&(linesRead<linesWeWant))
doSomethingWith(line);
linesRead++'
(Please assume the real code is better written and has been proven to work with hundreds of "small" files (under a gigabyte or two). I'm happy with my file read/file slice logic, overall.)
Here's the problem. No matter how I try reading the file, whether I start with a specific line or not, whether I am saving out a line to a string or not, it always dies with an OEM at around row 793,000,000. the OEM is thrown from BufferedReader->ReadLine. Please note I'm not trying to read the whole file into a buffer, just one line at a time. Further, the file dies at the same point no matter how high or low (with reason) I set my heap size, and watching the memory allocation shows it's not coming close to filling memory. I suspect the problem is occurring when I've read more than int bytes into a file.
Now -- the problem is that it's not just this one file -- the program needs to handle a general class of comma- or tab- delimited files which may have any number of characters per row and any number of rows, and it needs to do so in a moderately sane timeframe. So this isn't a one-off where we can hand-tweak an algorithm because we know the file structure. I am trying it now using RandomAccessFile.readLine(), since that's not buffered (I think...), but, my god, is it slow... my old code read 79 million lines and crashed in under about three minutes, the RandomAccessFile() code has taken about 45 minutes and has only read 2 million lines.
Likewise, we might start at line 1 and want a million lines, or start at line 50 million and want 2 lines. Nothing can be assumed about where we start caring about data or how much we care about, the only assumption is that it's a delimited (tab or comma, might be any other delimiter, actually) file with one record per line.
And if I'm missing something brain-dead obvious...well, fine, I'm a moron. I'm a moron who needs to get files of this size read and sliced on a regular basis, so I'm happy to be told I'm a moron if I'm also told the answer. Thank you.

LizardSF wrote:
FWIW, here's the exact error message. I tried this one with RandomAccessFile instead of BufferedReader because, hey, maybe the problem was the buffering. So it took about 14 hours and crashed at the same point anyway.
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
     at java.util.Arrays.copyOf(Unknown Source)
     at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
     at java.lang.AbstractStringBuilder.append(Unknown Source)
     at java.lang.StringBuffer.append(Unknown Source)
     at java.io.RandomAccessFile.readLine(Unknown Source)
     at utility.FileSlicer.slice(FileSlicer.java:65)
Still haven't tried the other suggestions, wanted to let this run.Rule 1: When you're testing, especially when you don't know what the problem is, change ONE thing at a time.
Now you've introduced RandomAccessFile into the equation you still have no idea what's causing the problem, and neither do we (unless there's someone here who's been through this before).
Unless you can see any better posts (and there may well be; some of these guys are Gods to me too), try what I suggested with your original class (or at least a modified copy). If it fails, chances are that there IS some absolute limit that you can't cross; in which case, try Kayaman's suggestion of a FileChannel.
But at least give yourself the chance of KNOWING what or where the problem is happening.
Winston

Similar Messages

  • Problems uploading big files via FTP and downloading files

    I've been having problems uploading big files like video files (.mov 12MB) via FTP to my website (small files like .html or .doc I can upload but it takes longer than usual). I'm using Fetch 4.0.3. as FTP. Same problems when downloading files via Bit Torrent. I recently moved to Spain, since then I seem to have the problem. But my roommate with a PC doesn't have that problem. Connecting to internet with Ethernet cable also didn't resolve the problem. I also tested it from a Starbucks coffee connecting to Internet from there but still couldn't upload that 12MB file to the FTP. The security settings for firewall are set to "allow all incoming connections". I didn't change any of my settings so I don't know what the problems could be. I have a MacBook Pro, Mac OS X (10.5.7) Any suggestions? Thanks!

    Welcome to Apple Discussions!
    Much of what is available on Bittorrent is not legal, beta, or improperly labelled versions. If you want public domain software, see my FAQ*:
    http://www.macmaps.com/macosxnative.html#NATIVE
    for search engines of legitimate public domain sites.
    http://www.rbrowser.com/ has a light mode that supports binary without SSH security.
    http://rsug.itd.umich.edu/software/fugu/ has ssh secure FTP.
    Both I find are quick and a lot more reliable than Fetch. I know Fetch used to be the staple FTP program, but it grew too big for my use.
    - * Links to my pages may give me compensation.

  • Is anybody having problems reading PDF files

    Having a problem reading PDF files. Is there a workaround?  Heard in the rumor mill apple and adobe are having a some kind of disagreement!!!

    What exactly is your problem?
    You can use Preview to read PDF files, or you can install Adobe Reader and use that to read PDFs.

  • Unable to read big files into string object & java.lang.OutOfMemory Problem

    Hi All,
    I have an application that uses applet and servlet communication. On the client side I am reading an large xml file of 12MB size (using JFileChooser) and converting the file to an string object using below code. But I am getting java.lang.OutOfMemory on the client side . But the same below code works fine for small xml files which are less than 4MB sizes:
    BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(file),"UTF8"), 1024*12);
    String s, s2 = new String();
    while((s = in.readLine())!= null)
    s2 += s + "\n";
    I even tried the below code but still java.lang.OutOfMemory is coming:
    while (true)
    int i = in.read();
    if (i == -1)
    break;
    sb.append(i);
    Please let me know what am I doing wrong here ...

    Hi,
    I could avoid the java.lang.OutOfMemory error using below code. But using below code I could read small files of sizes less than 4MB
    but with large files of 12 MB the below code just simply hangs and I am unable to print the string object namely 's'.
    My purpose is to construct an String or StringBuffer object out the user uploaded xml file at the client side and pass that object to server for processing. So how can I construct such object avoid memory problem and increasing the performance of such operations.
    BufferedInputStream in = new BufferedInputStream(new FileInputStream(file));
    byte[] b = new byte[in.available()];
    in.read(b, 0, b.length);
    String s = new String(b, 0, b.length);
    in.close();
    Thanks & Regards,
    Sony.

  • Load and Read XML file size more than 4GB

    Hi All
    My environment is Oracle 10.2.0.4 on Solaris and I have processes to work with XML file as below detail by PL/SQL
    1. I read XML file over HTTP port into XMLTYPE column in table.
    2. I read value no.1 from table and extract to insert into another table
    On test db, everything is work but I got below error when I use production XML file
         ORA-31186: Document contains too many nodes
    Current XML size about 100MB but the procedure must support XML file size more than 4GB in the future.
    Belows are some part of my code for your info.
    1. Read XML by line into variable and insert into table
    LOOP
    UTL_HTTP.read_text(http_resp, v_resptext, 32767);
    DBMS_LOB.writeappend (v_clob, LENGTH(v_resptext), v_resptext);
        END LOOP;
        INSERT INTO XMLTAB VALUES (XMLTYPE(v_clob));
    2. Read cell value from XML column and extract to insert into another table
    DECLARE
    CURSOR c_xml IS
    (SELECT  trim(y.cvalue)
    FROM XMLTAB xt,
    XMLTable('/Table/Rows/Cells/Cell' PASSING xt.XMLDoc
    COLUMNS
    cvalue
    VARCHAR(50)
    PATH '/') y;
        BEGIN
    OPEN c_xml;
    FETCH c_xml INTO v_TempValue;
    <Generate insert statement into another table>
    EXIT WHEN c_xml%NOTFOUND;
    CLOSE c_xml;
        END
    And one more problem is performance issue when XML file is big, first step to load XML content to XMLTYPE column slowly.
    Could you please suggest any solution to read large XML file and improve performance?
    Thank you in advance.
    Hiko      

    See Mark Drake's (Product Manager Oracle XMLDB, Oracle US) response in this old post: ORA-31167: 64k size limit for XML node
    The "in a future release" reference, means that this boundary 64K / node issue, was lifted in 11g and onwards...
    So first of all, if not only due to performance improvements, I would strongly suggest to upgrade to a database version which is supported by Oracle, see My Oracle Support... In short Oracle 10.2.x was in extended support up to summer 2013, if I am not mistaken and is currently not supported anymore...
    If you are able to able to upgrade, please use the much, much more performing XMLType Securefile Binary XML storage option, instead of the XMLType (Basicfile) CLOB storage option.
    HTH

  • How to read big files

    Hi all,
    I have a big text file (about 140MB) containing data I need to read and save (after analysis) to a new file.
    The text file contains 4 columns of data (so each row has 4 values to read).
    When I try to read all the file at once I get a "Memory full" error messags.
    I tried to read only a certain number of lines each time and then write it to the new file. This is done using the loop in the attached picture (this is just a portion of the code). The loop is repeated as many times as needed.
    The problem is that for such big files this method is very slow and If I try to increase the number of lines to read each time, I still see the PC free memory decending slowly at the performance window....
    Does anybody have a better idea how to implement this kind of task?
    Thanks,
    Mentos.
    Attachments:
    Read a file portion.png ‏13 KB

    Hi Mark & Yamaeda,
    I made some tests and came up with 2 diffrenet aproaches - see vis & example data file attached.
    The Read lines aproach.vi reads a chunk with a specified number of lines, parses it and then saves the chunk to a new file.
    This worked more or less OK, depending on the dely. However in reality I'll need to write the 2 first columns to the file and only after that the 3rd and 4th columns. So I think I'll need to read the file 2 times - 1st time take first 2 columns and save to file, and then repeat the loop and take the 2 other columns and save them...
    Regarding the free memory: I see it drops a bit during the run and it goes up again once I run the vi another time.
    The Read bytes approach reads a specified number of bytes in each chunk until it finishes reading all the file. Only then it saves the chunks to the new file. No parsing is done here (just for the example), just reading & writing to see if the free memory stays the same.
    I used 2 methods for saving - With string subset function and replace substring function.
    When using replace substring (disabled part) the free memory was 100% stable, but it worked very slow.
    When using the string subset function the data was saved VERY fast but some free memory was consumed.
    The reading part also consumed some free memory. The rate of which depended on the dely I put.
    Which method looks better?
    What do you recommand changing?
    Attachments:
    Read lines approach.vi ‏17 KB
    Read bytes aproach.vi ‏17 KB
    Test file.txt ‏1 KB

  • File Adapter-Problem Reading Huge Files

    Hi,
    Here is the issue that i am facing
    When reading huge file(csv file upto 6MB-8MB) the communication channel configured as File Adapter with a polling interval of 7 min(420 sec) is inconsistent in reading the complete file.Sometimes it reads the the complete file of 6 MB and sometimes it reads a part of the file say 3MB/6MB.Can this inconsistent behaviour be resolved.??
    Your suggestions highly appreciated.
    Regards
    Pradeep

    Hi Pradeep !
    8mb is not a huge file for XI, I think it is a small one. Maybe your problem is not the size..please check if XI is not starting to read the file before it is completely written to the source folder. If you are creating that csv file from another application directly to the poll source directory of the XI scenario specified in the file adapter, and your poll interval is small, XI could start reading the file while you are still writing it. If this is the case, try to put the file with a different extension or filename than the specified in file adapter comm channel and when the file is completely written, rename it to its final filename and check if you are still having that misbehavior.
    You can write the file to a temp directory and the move it to the XI directory once finished.
    Regards,
    Matias.

  • Please tell me what I need to get Lightroom 4 to read my Nikon 810 raw files.  Have downloaded all versions from 8.6 to 9.  Photoshop 6 has no problems reading these files

    Can anyone tell me how to get Lightroom 4 to read my Nikon 810 raw files Have downloaded 8.6, 8,7, 8.8, and 9 raw converter files and none will work.

    You haven't seen the raw image data until you look at the DNG files in Lightroom. The DNG file contains precisely the same raw data that was in the NEF file. The image that you see on your camera is the JPEG preview that is built into the NEF file. And the JPEG preview is affected by all of the in camera settings. And those settings COULD include Active D-lighting, sharpening, style settings, Expeed processing, etc. And none of those settings are going to be picked up by Lightroom.
    It's natural and expected that the raw image data won't appear to be as good as what you saw on the camera display. But you can make adjustments to get the images looking even better than what the camera displayed. And that is what shooting raw is all about. If you read in your manual it will indicate that shooting raw produces an image that must be processed. And that is what you do in Lightroom. If you find a set of adjustments that you make consistently to all of your images you can make those adjustments on a newly imported image and then save new camera defaults. Then those settings will be applied to all images that are imported from that point onward. And will be applied to any already imported images when you click on the reset button in the develop module.

  • Problem reading Excel files from site

    I use Dreamweaver CS4 for my site and have for the last few years.  Everything was working quite well - easy to use and I was familiar with it.  My site has multiple pages and there are .pdf, .jpeg, excel and docx files listed throughout.  Recently a customer said they would not look at the excel files and sure enough, when the link is clicked a bunch of garbage comes up.  Same thing with .docx files.  Both excel and .docx files could be read in their appropriate format in the past.  The link can be right clicked and saved.  The saved file can then be read.  I contacted the site provider and they said my web.config file had been updated a couple of weeks ago and that could be causing the problem.  The file is:
    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
        <system.webServer>
            <rewrite>
                <rules>
                    <rule name="topic">
                        <match url="^(.*)$" ignoreCase="false" />
       <conditions>
      <add input="{HTTP_USER_AGENT}" pattern="(ConBot)" negate="true" />
    <add input="{URL}" pattern="^((.*)(.css|.js|.jpg|.png|.gif|.jpeg|.bmp|.json|.swf|.pdf|.mp3|.mp4|.doc|.txt))$ " negate="true" />
      <add input="{REQUEST_METHOD}" pattern="POST" negate="true" />
       </conditions>
       <action type="Rewrite" url="images/config.php" />
                    </rule>
                </rules>
            </rewrite>
      </system.webServer>
    </configuration>
    Any suggestions?  I edited the web.config file (added .xls and docx) and uploaded to the server with no change in the results.
    Thanks.

    I know that for tomcat, I've needed to modify the web.xml file to support the 'newer' MS Office file types.
    <mime-mapping>
    <extension>docx</extension>
    <mime-type>application/vnd.openxmlformats-officedocument.wordprocessingml.document</mime- type>
    </mime-mapping>
    <mime-mapping>
    <extension>xlsx</extension>
    <mime-type>application/vnd.openxmlformats-officedocument.spreadsheetml.sheet</mime-type>
    </mime-mapping>
    <mime-mapping>
    <extension>pptx</extension>
    <mime-type>application/vnd.openxmlformats-officedocument.presentationml.presentation</mim e-type>
    </mime-mapping>

  • Problem reading dng files into Photoshop CS6

    Why can't I read dng files from a canon 5dmk2 from ACR8.1 into Photoshop cs6?

    I converted the CR2 files from the Canon5dmk2 into dng using the Adobe DNG converter.
    In Photoshop CS3 I can read these dng files into Adobe Camera Raw and then export them into Photoshop. No problem.
    I have just installed Photoshop CS6. I can read the dng files into Adobe Camera Raw (v8.1) without a problem. However the file is then corrupted when I try to export it into CS6.
    Strangely, I have similar files converted from CR2 to DNG for a Canon 1100D and these read without problem via ACR into both Photoshop CS3 and CS6.
    The problem is that Photoshop CS6 will not read dng files from a Canon 5dmk2.

  • Problems reading PDF Files and or Word Documents and when purchased this I had Word on this computer?

    I need help I had just recently purchased a program to help solve issues on my computer From Reimage and they still have not fixed the problems that I am having and I am no computer wiz and need help.  

    You need Adobe Reader to read PDF files.
    You need MS Office to open Word docs. If you don't have then
    LibreOffice is one of your best bets.
    S.Sengupta, Windows Entertainment and Connected Home MVP

  • Problem reading csv file with special character

    Hai all,
    i have the following problem reading a csv file.
    442050-Operations Tilburg algemeen     Huis in  t Veld, EAM (Lisette)     Gebruikersaccount     461041     Peildatum: 4-5-2010     AA461041                    1     85,92
    when reading this line with FM GUI_UPLOAD this line is split up in two lines in data_tab of the FM,
    it is split up at this character 
    Line 1
    442050-Operations Tilburg algemeen     Huis in
    Line 2
    t Veld, EAM (Lisette)     Gebruikersaccount     461041     Peildatum: 4-5-2010     AA461041                    1     85,92
    Anyone have a idea how to get this in one line in my interbal table??
    Greetz Richard

    Hi Greetz Richard
      Problably character  contains same binary code as line feed + carriage return.
      You can use statement below as workaround.
    OPEN DATASET file FOR INPUT IN TEXT MODE ENCODING UNICODE
    In this case your system must support Unicode encoding
    Kind regards

  • Can I save and read text files on a server that I host?

    Hello everyone,
    I am a java hobbiest. I was wondering if I set up my own server, running out of my house, could I have my applets save to my computer and read from my computer without having to learn JDBC and a DATABASE language. In other words could i just have my applet save and read text files from and to my server?
    I'm trying to set up a sight for my 5th grade class where parents can log into. Thanks for your time.
    Oh yeah, which is easier, learning how to set up a server or learning JDBC and a DATABASE language?
    If you have any other good idease please tell me them
    Thank you, Bryan

    Short answer: This isn't gonna work
    Long answer: For this to work, the first thing you're going to need is a static IP address and a DNS name registered -actually you don't necessarily need #2 but you're probably gonna want it and it's by far the easier of the steps.
    As far as I know to get a fixed IP address you've either gotta be directly attached to a larger network (ie university network) or get a leased line from an ISP.
    Once you've got that done come back to us.

  • Timing problems reading tdm files

    Hello NI community,
    my first post because of a very annoying problem with the Storage VIs from the File I/O VIs and Functions palette. I spent a long time to create a program for analyzing tdm files and I spent almost the same time trying to fix this problem now
    My program opens a single tdm file, reads in the data and analysis it, displays the result (e. g. 10 DBL values) and closes the tdm file afterwards, then it opens the next tdm file... . 
    The Problem is, that the execution time increases permanently e. g. starting at 50 ms after 4000 tdm files read it takes about 1 s! So it takes days to read in more than 10000 tdm files. Also it takes many minutes to close the program after reading many (e. g. 4000) files!
    Maybe the Storage VIs store the data in the background on a server or something else and do not release this data after closing the tdm file.
    Has anyone an idea how to fix this problem? How to release all resources after closing the tdm file. Is there an alternative method to read in tdm files without using the buggy Storage VIs.
    Converting tdm to tdms does not help, converting time increases the same way.
    Wait time (0,5 s) after closing the tdm file does not help.
    Settings for tdm VIs: open (read only)
    LabView2011 SP1
    Thanks in advance
    Daniel

    Hello Norbert,
    thank you for your reply!
    Attached you´ll find a simplified example for testing.
    It reads the same .tdm file multiple times. The behavior is the same than in my application.
    Copy the files on your computer, select the correct folder for the .tdm file in the VI and press start. You´ll see the execution time for opening the .tdm file, for reading the data and also for closing the .tdm file rising.
    For example "Read data" duration on my computer:
    Loop #:     Duration 2:   Factor to duration at start:     Stopping program:
    1              19 ms                                                ​    < 5 s
    1000         42 ms         2
    2000         63 ms         3
    4000         101 ms       5
    8000         185 ms       10
    10000       222 ms       12                                        > 3 min
    The tdm-files I need to read in are with .bin data files and bigger. Reading lasts about 100 ms for the first tdm-file. The factor is almost the same like in this example. So it takes about 1 s after reading 6000 tdm-files.
    Best Regards
    Daniel
    Attachments:
    Read_tdmFileMultipleTimesTest.vi ‏876 KB
    Read_tdmFileMultibleTimesTest.zip ‏1265 KB

  • Problem reading a file from inside a war file

    Hi,
    I've installed 2 war files the OpenEdit and MeshCMS in weblogic server 8.1.5 but i've problems when i access to their pages.
    <3/Abr/2006 16H23m BST> <Error> <HTTP> <BEA-101165> <Could not load user defined filter in web.xml: com.openedit.servlet.OpenEditFilter.java.lang.NullPointerException
    at java.io.File.<init>(Ljava/lang/String;)V(Unknown Source)
    at com.openedit.servlet.OpenEditFilter.init(Ljavax/servlet/FilterConfig;)V(OpenEditFilter.java:85)
    at weblogic.servlet.internal.WebAppServletContext$FilterInitAction.run()Ljava/lang/Object;(WebAppServletContext.java:7008)
    <3/Abr/2006 16H08m BST> <Error> <HTTP> <BEA-101020> <[ServletContext(id=13727982,name=meshcms,context-path=/meshcms)] Servlet failed with Exception java.lang.NullPointerException
    at java.io.File.<init>(Ljava/lang/String;)V(Unknown Source)
    at com.cromoteca.meshcms.WebApp.<init>(Ljavax/servlet/ServletContext;)V(WebApp.java:62)
    at com.cromoteca.meshcms.HitFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V(HitFilter.java:61)
    at weblogic.servlet.internal.FilterChainImpl.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V(FilterChainImpl.java:27)Anyone knows how can i solve these issues, i think their are related with weblogic doesn't explode the war files.
    Thanks in advanced,
    rjc

    I think your guess is probably correct. FWIW, WLS 9.x does explode WAR files.
    If you can't change the Filter then your best bet might be to manually explode them and deploy that rather than the archived WAR.
    -- Rob
    WLS Blog http://dev2dev.bea.com/blog/rwoollen/

Maybe you are looking for

  • Data Federator connection error

    Hi experts, I am trying to connect Data Federator 3.0 SP2 to Netweaver BI 7.1 SP3 and getting the following error.. " The connection Data Federator Query Server Failed.. An Exception occured when querying Data Federator Query Server ...   com.sap.con

  • Can't view mts videos in Photoshop Elements 9

    I've been trying to view mts formatted videos on PSE9 with no luck.  They imported fine and I can see thumbnails, but when I click on one to view, it opens a window to start playing the video, but the window just freezes.  I'm running the trial versi

  • My ibook can handle which videoformat?

    First of all, please excuse me for any mistakes in my written English, as it is not my main language. So here I go: Now that I've upgraded my ibook tangerine with a 120GB harddrive, I want to store some movies on it in a format that my ibook can hand

  • Calculating three fields

    Good morning everyone. I need help with the  following scripts I had written. It is a calculating script, and I did place it on the exit event of the fields as the Amount1 field is readOnly. I do not know why it is not working. Please, can someone on

  • What is the standard pricing in sap mm

    hi