Memory mapping large files

Hi folks.
I am developing an application that has very large input files. During execution, the files will be processed twice: once, sequentially to get the position of each piece of data in the file, and then directly by seeking to a specific position to retrieve a specific piece of information.
My rational for doing this is to avoid loading the entire content of the file into memory via some data structure. However, all of the seeking/reading seems to be quite a performance hit.
Is there a way to memory map a file and then be able to read only a portion of the data based on its byteposition? I've searched around for sample code, but I can only find examples of sequential access.
Any help will be appreciated extremely!!
Thanks

That's pretty simple. Thanks
Follow-up questions:
The code I have now reads:
FileChannel fc = seqDBRAF.getChannel();
ByteBuffer roBuf = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
CharBuffer cb = Charset.forName("ISO-8859-15").newDecoder().decode(roBuf);
The decode line takes a long time to execute not the "map" line. Why is this?
If/when I use the position method to "seek" to the right place, should I do this to the ByteBuffer and then decode? Or decode first and then just read from the position in the Charbuffer?
Thanks

Similar Messages

Nio ByteBuffer and memory-mapped file size limitation

I have a question/issue regarding ByteBuffer and memory-mapped file size limitations. I recently started using NIO FileChannels and ByteBuffers to store and process buffers of binary data. Until now, the maximum individual ByteBuffer/memory-mapped file size I have needed to process was around 80MB.
However, I need to now begin processing larger buffers of binary data from a new source. Initial testing with buffer sizes above 100MB result in IOExceptions (java.lang.OutOfMemoryError: Map failed).
I am using 32bit Windows XP; 2GB of memory (typically 1.3 to 1.5GB free); Java version 1.6.0_03; with -Xmx set to 1280m. Decreasing the Java heap max size down 768m does result in the ability to memory map larger buffers to files, but never bigger than roughly 500MB. However, the application that uses this code contains other components that require the -xMx option to be set to 1280.
The following simple code segment executed by itself will produce the IOException for me when executed using -Xmx1280m. If I use -Xmx768m, I can increase the buffer size up to around 300MB, but never to a size that I would think I could map.
try
String mapFile = "C:/temp/" + UUID.randomUUID().toString() + ".tmp";
FileChannel rwChan = new RandomAccessFile( mapFile, "rw").getChannel();
ByteBuffer byteBuffer = rwChan.map( FileChannel.MapMode.READ_WRITE,
0, 100000000 );
rwChan.close();
catch( Exception e )
e.printStackTrace();
I am hoping that someone can shed some light on the factors that affect the amount of data that may be memory mapped to/in a file at one time. I have investigated this for some time now and based on my understanding of how memory mapped files are supposed to work, I would think that I could map ByteBuffers to files larger than 500MB. I believe that address space plays a role, but I admittedly am no OS address space expert.
Thanks in advance for any input.
Regards- KJ

See the workaround in http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038

CFS and memory mapped file

I would like to know if it is possible to memory map (mmap) a file that is residing on a cluster file system (CFS or GFS).
If I remember correctly, memory mapping a file residing on NFS has issues.
Thanks,
Harsh

I'm using SC 3.1u4 on Solaris 9. I ran in to a problem with memory mapped files on CFS.
I've multiple processes (on one cluster node) sharing such a file that was created using the following command:
mmap ((caddr_t)0,SOME_SIZE,(PROT_READ | PROT_WRITE), (MAP_SHARED | MAP_NORESERVE),fd,0);
Issuing msync with MS_INVALIDATE as the third argument is ok. But when some other process tries to read the memory the node seems to hang.
I can't examine the processes using pstack or truss as both of them get hung too. Only way out of this mess is to reboot the node.
I can't imagine this problem hasn't been seen before. Is there a patch for it?

Efficient I/O for regex analyses of large files

I'm analysing a log file, one line at a time, using regular expressions.
It's a http access log, so obviously it can get quite big and any efficiency increase would be great.
Seeing as Pattern is able to match against any CharSequence, I thought I might be able to employ some New I/O and get the input from the FileInputStream ( / Channel) to the Pattern without having to create a String for each line.
Alas, the Channel is not a CharSequence, only CharBuffer is a CharSequence.
It's at this point that I get a bit lost.
I know how to read input into a buffer, but what I really want to do is either:
(a) read one line at a time; or
(b) tell the Pattern to only match one line at a time.
Though I expect I can achieve what I'm trying to do with some combination of reads and flips or marks and resets, I'm not savvy enough with buffers to know how to do what I want.
Any help (even pointers to a good tutorial on buffers) would be much appreciated.
Graham.

What I'm doing is processing one line at a time, performing a regex on each line so I can pull out parts of the log information using the groups of the regex.
So the code at the moment look like (psuedo):
BufferedReader = ...
while (String line = read a line) {
   matcher = pattern.matcher(line);
   // Do some stuff with the groups...
}What I'd like to do is eliminate that intermediate string so it looks more like:
CharBuffer = ...;
matcher = pattern.matcher(charBuffer);
while (matcher.find()) {
   // Do some stuff with the groups...
}After doing a bit more research today, I guess the questions now are:
Will memory mapping the file make this code more efficient?
Will there be any problems memory-mapping a really, really big file?
Thanks.

Large file copy to iSCSI drive fills all memory until server stalls.

I am having the file copy issues that people have been having with various versions of Server now for years, as can be read in the forums. I am having this issue on Server 2012 Std., using Hyper-V.
When a large file is copied to an iSCSI drive, the file is copied into memory first faster than it can be sent over the network. It fills all available GB of memory until the server, which is a VM host, pretty much stalls and also all the VMs stall. This
continues until the file copy is finished or stopped, then the memory is gradually released as it is taken out of memory as it is sent over the network.
This issue was happening on send and receive. I change the registry setting for Large Cache to disable it, and now I can receive large files from the iSCSI. They now take an additional 1 GB of memory and it sits there until the file copy is finished.
I have tried all the NIC and disk settings as can be found in the forums around the internet that people have posted in regard to this issue.
To describe in a little more detail, when receiving a file from iSCSI, the file copy windows shows a speed of around 60-80 MB / sec, which is wire speed. When sending a file to iSCSI, the file copy window shows a speed of 150 MB/sec, which is actually the
speed at which it is being written to memory. The NIC counter in Task Mgr shows instead the actual network speed which is about half of that. The difference is the rate at which memory fills until it is full.
This also happens when using Window Server Backup. It freezes up the VM Host and Guests while the host backup is running because of this issue. It does cause some software issues.
The problem does not happen inside the Guests. I can transfer files to a different LUN on the same iSCSI, which uses the same NIC as the Host with no issue.
Does anyone know if the fix has been found for this? All forum posts I have found for this have closed with no definite resolution found.
Thanks for you help.
KTSaved

Hi,
Sorry if it causes confusion but "by design" I mean "by design it will use memory for copying files via network".
In Windows 2000/2003, the following keys could help control the memory usage:
LargSystemCache (0 or 1) HKEY_LOCAL_MACHINE\CurrentControlSet\Control\Session Manager\Memory Management
Size (1, 2 or 3) in HKLM\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameter
I saw threads mentioned that it will not work in later systems such as Windows 2008 R2.
For Windows 2008 R2 and Windows 2008, there is a service named Microsoft Windows Dynamic Cache Service which addressed this issue:
https://www.microsoft.com/en-us/download/details.aspx?id=9258
However I searched and there is no update version for Windows 2012 and 2012 R2.
I also noticed that the following command could help control the memory usage. With value = 1, NTFS uses the default amount of paged-pool memory:
fsutil behavior set memoryusage 1
You need a reboot after changing the value.
Please remember to mark the replies as answers if they help and un-mark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

Streaming Large Files to Response OutputStream consumes plenty of memory

When we write large file's (> 100 MB) binary content to the Servlet Response OutputStream using the write method, it consumes plenty of Heap Memory on the server (Weblogic 7). The Heap does'nt come down after
writing to the outputstream, it stays high for quite a long time, infact it comes down only we access the same page again. Is there a way to get this optimized?
TIA.

The servlet container will buffer your output, so that it knows how much of it there is, so that it can set the Content-Length header, so that the recipient knows where the payload ends.
I don't know about Weblogic, but in Tomcat there are AFAIK two ways to get genuine streaming instead of buffering: set the Content-Length header before writing any output (and if you know the content length in advance). Or use chunked encoding (set the "Transfer-Encoding: chunked" header, google for it.)
I don't know if it works, but if those two don't do it for you, you could also try forcing the reply to be HTTP/1.0 instead of HTTP/1.1, or disable keep-alive ("Connection: close"). HTTP/1.0 (which doesn't have keep-alive) and no-keep-alive might not need the Content-Length header because closing the socket will give an EOF to the recipient, so it knows where the payload ends.

Out of memory error when writing large file

I have the piece of code below which works fine for writing small files, but when it encounters much larger files (>80M), the jvm throws an out of memory error.
I believe it has something to do with the Stream classes. If I were to replace my PrintStream reference with the System.out object (which is commented out below), then it runs fine.
Anyone else encountered this before?
     print = new PrintStream(new FileOutputStream(new File(a_persistDir, getCacheFilename()),
                                                            false));
//      print = System.out;
          for(Iterator strings = m_lookupTable.keySet().iterator(); strings.hasNext(); ) {
               StringBuffer sb = new StringBuffer();
               String string = (String) strings.next();
               String id = string;
               sb.append(string).append(KEY_VALUE_SEPARATOR);
               Collection ids = (Collection) m_lookupTable.get(id);
               for(Iterator idsIter = ids.iterator(); idsIter.hasNext();) {
                    IBlockingResult blockingResult = (IBlockingResult) idsIter.next();
                    sb.append(blockingResult.getId()).append(VALUE_SEPARATOR);
               print.println(sb.toString());
               print.flush();
} catch (IOException e) {
} finally {
     if( print != null )
          print.close();
}

Yes, my first code would just print the strings as I got them. But it was running out of memory then as well. I thought of constructing a StringBuffer first because I was afraid that the PrintStream wasn't allocating the memory correctly.
I've also tried flushing the PrintStream after every line is written but I still run into trouble.

Quicklook consumes a lot of memory on column view previewing large files

After upgrading to Mountain Lion, everytime I click on a large file (like a disk image, DMG, ISO) on finder's column view, it starts spinning the icon and quicklookd consumes a lot of memory even causing other apps to swap to disk. After the preview icon is done, the quicklookd memory is dealocated.
This do not happen on Lion.

Just found out that a plugin (QLStephen - http://whomwah.github.com/qlstephen/) was causing the problem.
Removed it from /Library/Quicklook, restarted quicklook (qlmanage -r) and the problem is gone.

Trying to download update to CoPilot Live and CoPilot GPS with maps. files sizes are large and taking hours to download on wireless connection. How can I download App updates and new maps while connected to PC and Itunes through hard wire internet link?

Trying to download update to CoPilot Live and CoPilot GPS with maps. Files sizes are large and taking hours to download on wireless connection. How can I download updates and new maps while connected to PC and Itunes through hard wire internet link?

I'm on my iPad, so I don't know if this is the page with an actual download. I don't see a button, but assume that is because I am on an iPad. It is in the DL section of Apple downloads.
http://support.apple.com/kb/DL1708

Out of memory when coverting large files using Web service call

I'm running into an out of memory error on the LiveCycle server when converting a 50 meg Word document with a Web service call. I've already tried increasing the heap size, but I'm at the limit for the 32 bit JVM on windows. I could upgrade to a 64 bit JVM, but it would be a pain and I'm trying to avoid it. I've tried converted the 50 meg document using the LiveCycle admin and it works fine, the issue only occurs when using a web service call. I have a test client and the memory spikes when it's generating the web service call taking over a gig of memory. I assume it takes a similar amount of memory on the receiving end which is why LiveCycle is running out of memory. Does any one have any insight on why passing over a 50 meg file requires so much memory? Is there anyway around this?
-Kelly

Hi,
You are correct that a complete 64bit environment would solve this. The problem is that you will get the out of memory error when the file is written to memory on the server. You can solve this by creating an interface which stores large files on the server harddisk instead, which allows you to convert as large files as LC can handle without any memory issue.

Error code 1450 - memory mapped file

Hello,
in my application I am using memory mapped files. I have three of it, the maximum size of the biggest one is 5MB. I store 64 Waveforms from a DAQ card in it.
The application runs fine, but sometimes comes an error, when I try to access the MMF. The error code is 1450, "insufficient system resources"
Is a size of 5MB too big? Should I rather create one MMF for each waveform?

Hi mitulatbati,
which development tools are you actually using?
Which platform, libraries and so on...?
Can you post example code?
Marco Brauner NIG

Memory-mapped file is possible?

Hi everyone, I'm a new Labview user and I want to start a new project that uses Memory mapped file.
I have a working C# code to read the $gtr2$ MMF, where i simple use
MemoryMappedFile.OpenExisting("$gtr2$")
to get data from it.
How it is possible to read this kind of file in labview? I can't find anything useful on the web.
I'm using a LabVIEW 2013 student edition.
Thanks to everyone who wants to answer my question.
Have a nice day.

Hi,
I too only have done the CLAD…
You have to look for DotNet examples, you will find them here in the forum…
And usually it helps to read the documentation for that MMF class to recreate your C++ code in LabVIEW!
Best regards,
GerdW
CLAD, using 2009SP1 + LV2011SP1 + LV2014SP1 on WinXP+Win7+cRIO
Kudos are welcome

Memory mapped files Are they still used.

To System programmers.
In some of my old code David used memory mapped files in handling huge sets of random points. The code reads in the whole file and then sets flags similar to an async process. The filemapping handles memory instead of using mallocs. the
data maybe stored on the heap or in hte global stack. I went back to Viusal Studio 6 and tried to take out the code as the standard c++ handles a full file read as a char buffer as a void * structure. I found some valloc data types and
then found the newer filemapping routines in VS2013. Plus an explanation of global stack and heap.
Are software developers using file mapping or are they using say vectors to form stl structures.
Cheers
John Keays
John Keays

Here is some typical code in the old C. This is close to the code I used in Visual studio 6. I need to put this in vs2013 under c++ or C++ 11/14. I have guessed the file handle open and size code.
main{
int fsize, numRecords;
Point *allPoints;
fsize = readAllFile(name, &allPoints);
numRecords = fsize/ sizeof(Point);
for (i=0; i < numRecords:; I++) printf("rec %d values x %.3f\n", i, allPoints[i].x);
int
readAllFile(char*name, void **addr){
file *fh;
int fsize;
openf(fh, name);
fsize = getfilesize(fh);
*addr = malloc(sizeof(char)*fsize);
fclose(fh);
return fsize;
This is the boilerplate for the file reads. Even tried this with text files and parsing the text records. Instead of the mallocs you suggest vector and the scheme of the code remains the same.
For a lidar file the xyz records have grown from 10,000 in the 1990's to 1,000,000 points in the mid 2000's. For this size file 24 M bytes are allocated in one hit. The whole of the Gold Coast in terms of lidar points in 2003 was 110 million
points. It could be more.
Where is the data stored in the Malloc, Vector or memory Mapped file. What is good and bad practice.
Cheers
john Keays
John Keays

Memory mapped files

Does anyone know if there is any way to use memory mapped files in Java? If so, what are the calls that emulate the C++ calls to CreateFileMapping() MapViewOfFile() and OpenFileMapping()?

http://java.sun.com/j2se/1.4.1/docs/api/java/nio/MappedByteBuffer.html

How to truncate a memory mapped file

If one maps a file, the mapped size will become the file size. So the size parameter passed to the map() method of FileChannel should be carefully calculated. However, what if one can't decide beforehand the size of the file?
I tried to use truncate(), but that throws a runtime exception: truncate() can't be used on a file with user-mapped section open.
public class MapFileSizeTest extends TestCase
public void testMapFileSize() throws Exception
    final File file=new File("testMapFileSize.data");
    FileChannel configChannel= new RandomAccessFile(file, "rw").getChannel();
    //this will result a file with size 2,097,152kb
    MappedByteBuffer configBuffer= configChannel.map(FileChannel.MapMode.READ_WRITE,
        0, 1000000000);
    configBuffer.flip();
    configBuffer.force();
    //truncate can't be used on a file with user-mapped section open
//    configChannel.truncate( configBuffer.limit());
    configChannel.close();
}Could somebody please give some suggestions? Thank you very much.

The region (position/size) that you pass to the map method should be contained in the file. The spec includes this statement: "The behavior of this method when the requested region is not completely contained within this channel's file is unspecified. " In the Sun implementation, we attempt to extend the file if the requested region is not completely contained but this is not required by the specification. Once you map a region you should not attempt to truncate the file as it can lead to unspecified exceptions (see MappedByteBuffer specification). Windows prevents it; others allows it but cause access to the inaccessible region to SIGSEGV (which must be handled and converted into a runtime error).

Memory mapping large files

Similar Messages

Maybe you are looking for