Determining the max allowable buffer size

I am working on a program which works with very large amounts of data. Each job that is processed requires the program to read through as many as several hundred files, each of which has tens of thousands of values. The result of the operation is to take the data values in these files, rearrange them, and output the new data to a single file. The problem is with the new order of the data. The first value in each of the input files (all several hundred of them) must be output first, then the 2nd value from each file, and so on. At first I tried doing this all in memory, but for large datasets, I get OutOfMemoryErrors. I then rewrote the program to first output to a temporary binary file in the correct order. However, this is a lot slower. As it processes one input file, it must skip all around in the output file, and do this several hundred times in a 60+ MB file.
I could tell it to increase the heap size, but there is a limit to how much it will let me give it. I'd like to design it in a way so that I could allocate as big of a memory buffer as I can, read through all the files and put as much data in as it can into the buffer, output the block of data to the output file, and then run through everything again for the next block.
So my question is, how do I determine the biggest possible byte buffer I can allocate, yet still have some memory left over for the other small allocations that will need to be done?

The program doesn't append, it has to write data to the binary file out of order because the order that it reads data from the input files is not in the order that it will need to be output. The first value from all of the input files have to be output first, followed by the second value from every input file, and so on. Unless it is going to have several hundred input files open at once, the only way to output it in the correct order is to jump around the output file. For example, say that there are 300 input files with double values. The program opens the first input file, gets the first value and outputs at position 0. Then it reads the second value and must write it to position 300 (or byte position 8*300 considering the size of a double). The third value must go to position 600, and so on. That is the problem, it must output the values in a different order than it reads them.
Without giving any VM parameters to increase the heap size, Java will let me create a byte buffer of 32MB in size, but not much more. If the data to be output is more than 32MB, I will need to work on the data in 32MB chunks. However, if the user passes VM parameters to increase the heap size, the program could create a larger buffer to work on bigger chunks and get better performance. So somehow I need to find out what a reasonable buffer size is to get the best performance.

Similar Messages

  • How to determine the maximum allowable length of a filename for Window ?

    Hi all,
    Could I know how to determine the allowable file length (the length of the absolute path) for a file in Window environment?
    Due to some reason, I generated a zip file with a very long filename ( > 170) and put in a folder(the length of the folder path around 90). The length of the absolute path is around 260.
    I used FileOutputStream with the ZipOutputStream to write out the zip file. Everything is working fine while i generating the zip file.
    However, while i try to extract some files from the zip file i just created, i encountered the error
    java.util.zip.ZipException The filename is too long.
    I am using the class ZipFile to extract the files from the zip file like the following
    String absPath = "A very long filepath which exceed 260";
    ZipFile zipF = new ZipFile(absPath);  //<-- here is the root causeIs it possible to pre-determine the maximum allowable filepath length prior i generate the zip file ? This is weird since i got no error while i created the zip file, but have problem in extracting the zip file ......
    Thanks

    Assuming you could determine the max, what would you do about it? I'd say you should just assume it will be successful, but accommodate (handle) the possible exception gracefully. Either way you're going to have to handle it as an "exception", whether you "catch" an actual "Exception" object and deal with that, or manually deal with the length exceeding the max.

  • What is the max mp4 file size?

    Does anyone know what the max mp4 file size that can be imported into iTunes for sync with ATV is?

    RichSolNuv wrote:
    I got HandBrake to work. What settings do you recomment for ATV? When I used the Apple Tv setting it worked pretty quick and I could view the file in iTunes, but it would not import into ATV. I got a message saying it was not playable on ATV. The "Normal" setting (h.264) seems to work good. I did a few minutes of a movie and it looks good and works with ATV, but it seems to take a lot longer (hours).
    With Handbrake 0.9.1, always select "Apple TV" under the "Presents" column on the right side of the window. When you say it is "not playable on ATV" do you mean by streaming or syncing or both?

  • The request exceeds the maximum allowed database size of 4 GB

    I have craeted a user in oracle 10g with following commands
    CRAETE USER USERNAME IDENTIFIED BY PASSWORD
    DEFAULT TABLESPACE users TEMPORARY TABLESPACE temp;
    Grant create session to USERNAME;
    Grant create table to USERNAME;
    Grant create view to USERNAME;
    Grant create trigger to USERNAME;
    Grant create procedure to USERNAME;
    Grant create sequence to USERNAME;
    grant create synonym to USERNAME;
    after that when i want to craete a table i got following error
    SQL Error: ORA-00604: error occurred at recursive SQL level 1
    ORA-12952: The request exceeds the maximum allowed database size of 4 GB
    00604. 00000 - "error occurred at recursive SQL level %s"

    Error starting at line 1 in command:
    SELECT /* + RULE */ df.tablespace_name "Tablespace", df.bytes / (1024 * 1024) "Size (MB)", SUM(fs.bytes) / (1024 * 1024) "Free (MB)", Nvl(Round(SUM(fs.bytes) * 100 / df.bytes),1) "% Free", Round((df.bytes - SUM(fs.bytes)) * 100 / df.bytes) "% Used" FROM dba_free_space fs, (SELECT tablespace_name,SUM(bytes) bytes FROM dba_data_files GROUP BY tablespace_name) df WHERE fs.tablespace_name = df.tablespace_name GROUP BY df.tablespace_name,df.bytes
    UNION ALL
    SELECT /* + RULE */ df.tablespace_name tspace, fs.bytes / (1024 * 1024), SUM(df.bytes_free) / (1024 * 1024), Nvl(Round((SUM(fs.bytes) - df.bytes_used) * 100 / fs.bytes), 1), Round((SUM(fs.bytes) - df.bytes_free) * 100 / fs.bytes) FROM dba_temp_files fs, (SELECT tablespace_name,bytes_free,bytes_used FROM v$temp_space_header GROUP BY tablespace_name,bytes_free,bytes_used) df WHERE fs.tablespace_name = df.tablespace_name GROUP BY df.tablespace_name,fs.bytes,df.bytes_free,df.bytes_used ORDER BY 4 DESC
    Error at Command Line:1 Column:319
    Error report:
    SQL Error: ORA-00942: table or view does not exist
    00942. 00000 - "table or view does not exist"
    *Cause:   
    *Action:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

  • When making changes to the I/O buffer size( motu-pci 424) it takes an abnormal length of time for the changes to take affect any ideas?

    when making changes to the I/O buffer size( motu-pci 424) it takes an abnormal length of time for the changes to take affect any ideas?

    A fully loaded one( keep in mind I have had fully loaded projects in the past, and it did not take this long) this was noticed after installing LP8, also major latencey even at 64 buffer size

  • How do you determine ip and op buffer size on a 3550-12G

    I have a Cisco 3550-12G switch and I want to check to see if the input buffers and the output buffers for port gi0/12 are the same size. Is there a simple way to do this, I tried using the show buffers command but I couldn't seem to find what I was looking for. Help!

    Hi,
    "The 3550 switch uses central buffering. This means that there are no fixed buffer sizes per port. However, there is a fixed number of packets on a Gigabit port that can be queued. This fixed number is 4096. By default, each queue in a Gigabit port can have up to 1024 packets, regardless of the packet size."
    http://www.cisco.com/warp/public/473/187.html#topic7
    HTH,
    Bobby
    *Please rate helpful posts.

  • What is the Max import file size for Aperture

    I'll some times shoot film and get very large scans done as Tiffs or Jpegs. 500MB+ What is the Max that Aperture can manage, or is it only determined by the Mac configuration?

    I'm curious to know this as well before upgrading. Hopefully it has been improved in Aperture 3. I know Aperture 2 only supported up to 250MB which was near useless if you'd ever made any significant adjustments and edits in Ps and saved it as a .psd.
    All my medium format (digital and film scans) and large format scans give me the maroon "Unsupported Image Format" error while trying to view in Aperture 2.

  • In C4: What determines the max number of slide one can successfully publish?

    Using C4.
    I am required by policy to use an original PowerPoint file. The total number of slide equals 72 (file size 38,201 KB when completed). The internal customer wants a "kicky" presentation based on her slides, so I must use some of the Captivate tools (e.g. zoom area, highlight box -- about the limit of my skills). And there must be spoken narration, not the text to speech feature. Total time for the 72 slides is 22.5 minutes when all is done. When published the narration is good, but the "kicky" part fails. Specifically some highlight boxes do not appear on many slides.
    I deleted the table-of-contents feature. The most unusual tools that I'm using are the certificate widget, and a page turning widget.
    I tried breaking the presentation into two parts. Part 1 was 58 slides and I experienced the same behavior. I've not tried part two yet.
    In the past, I practiced, on my computer, with the aggregator. Is this the best route? We have no experience hosting an aggregator project on our server. Are there any special concerns about doing so? (I'm not the person who does this. He is out of the office at this time). Would daisy-chaining be a better route? (if so, what is the best source for instruction to do this?)

    Again, I did not see the elephant in the room. I neglected to notice a default setting. Once corrected, everything works. My bad.
    But, I would still like to know what determines the maximum number of slides on can successfully publish without Aggregator or daisy-chaining.

  • What determines the client connection memory size?

    We are trying to scale up the number of connections on our db - (dedicated not shared) .. but quickly consume the box.
    Its 11gr1 - Linux .. 500G of memory ..
    The "only" parameter we have set is:
    *.memory_target=216522555392  (~200G)
    Processes set to 6000 - but are are only around 1800 at this point.
    We are seeing (via top) client connections with a reserved memory of 25-30g (usually the dbwr type processes) and client connections showing 5-10G in size.
    With clients taking this much memory,  we start to see swapping on the box. In our dev/qa environment the clients are in the MB range .. of course they are not seeing real world traffic so I presume that memory requirements are growing as the app runs.
    Can we set something to reduce the footprint of the client connections?
    Thanks for any tips..
    If we get one db bounce this year .. we want to be right .. cant guess here..
    Daryl

    DarylE. wrote:
    We are trying to scale up the number of connections on our db - (dedicated not shared) .. but quickly consume the box.
    Its 11gr1 - Linux .. 500G of memory ..
    The "only" parameter we have set is:
    *.memory_target=216522555392  (~200G)
    Processes set to 6000 - but are are only around 1800 at this point.
    We are seeing (via top) client connections with a reserved memory of 25-30g (usually the dbwr type processes) and client connections showing 5-10G in size.
    With clients taking this much memory,  we start to see swapping on the box. In our dev/qa environment the clients are in the MB range .. of course they are not seeing real world traffic so I presume that memory requirements are growing as the app runs.
    Can we set something to reduce the footprint of the client connections?
    Thanks for any tips..
    If we get one db bounce this year .. we want to be right .. cant guess here..
    Daryl
    >Can we set something to reduce the footprint of the client connections?
    no
    Since *NIX maps SGA into every client's process, displayed RAM size is distorted.
    If you simply SUM every reported client size, it will greatly exceed total RAM (in most cases)
    The fact that any swap is used is in itself not a negative indicator.
    run vmstat like below
    all is OK when (si + so) is less than (bi + bo)
    [oracle@localhost dbs]$ vmstat 10 6
    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
    r  b   swpd   free   buff     cache     si   so    bi    bo   in   cs us sy id wa st
    0  0     32  82684  20384 557096    0    0   120    32 1008 1057  3  7 88  1  0
    0  0     32  82560  20408 557124    0    0     0    20 1031 1225  2  4 93  0  0
    0  0     32  82560  20432 557116    0    0     0    16 1002 1183  2  6 92  0  0
    0  0     32  79212  20456 557144    0    0     0    74 1007 1185  4 12 84  0  0
    0  0     32  78592  20480 557148    0    0     0    21  999  998  2  5 92  0  0
    0  0     32  78592  20504 557140    0    0     0    20 1002  929  2  7 91  0  0
    [oracle@localhost dbs]$

  • What is the max hard drive size in an early 2008 MacBook?

    I've got an early 2008 black macbook.  I want to upgrade from a 250gb hard drive to either a 750gb 7200rpm drive or a 1tb 5200rpm drive.  Will either of these work in there?  I don't know if there is a maximum hard drive size/rpm that I can put in there.  Thank you for your help!

    I know these posts were a while ago, but I wanted to check in on your 2008 Macbook. I have the same model and want to do the hard drive update. Is your machine still up and running, or have you moved on?
    Thank you!

  • Doing Buffered Event count by using Count Buffered Edges.vi, what is the max buffer size allowed?

    I'm currently using Count Buffered Edges.vi to do Buffered Event count with the following settings,
    Source : Internal timebase, 100Khz, 10usec for each count
    gate : use the function generator to send in a 50Hz signal(for testing purpose only). Period of 0.02sec
    the max internal buffer size that i can allocate is only about 100~300. Whenever i change both the internal buffer size and counts to read to a higher value, this vi don't seem to function well. I need to have a buffer size of at least 2000.
    1. is it possible to have a buffer size of 2000? what is the problem causing the wrong counter value?
    2. also note that the size of max internal buffer varies w
    ith the frequency of signal sent to the gate, why is this so? eg: buffer size get smaller as frequency decrease.
    3. i'll get funny response and counter value when both the internal buffer size and counts to read are not set to the same. Why is this so? is it a must to set both value the same?
    thks and best regards
    lyn

    Hi,
    I have tried the same example, and used a 100Hz signal on the gate. I increased the buffer size to 2000 and I did not get any errors. The buffer size does not get smaller when increasing the frequency of the gate signal; simply, the number of counts gets smaller when the gate frequency becomes larger. The buffer size must be able to contain the number of counts you want to read, otherwise, the VI might not function correctly.
    Regards,
    RamziH.

  • Determine the current max shared memory size

    Hello,
    How do I determine the max shared memory value in effect on the box I have. I dont wish to trust /etc/system as that value may not be the effective value unless the system has been rebooted after that setting.
    Any pointers would be appreciated.
    Thanks
    Mahesh

    You can get the current values using adb:
    # adb -k
    physmem 3e113
    shminfo$<shminfo
    shminfo:
    shminfo: shmmax shmmin shmmni
    10000000 c8 c8
    shminfo+0x14: shmseg
    c8
    The shminfo structure holds the values being used and the shminfo
    macro will print out the values in hex.
    Alan
    Sun Developer Technical Support
    http://www.sun.com/developers/support

  • Linux Serial NI-VISA - Can the buffer size be changed from 4096?

    I am communicating with a serial device on Linux, using LV 7.0 and NI-VISA. About a year and a half ago I had asked customer support if it was possible to change the buffer size for serial communication. At that time I was using NI-VISA 3.0. In my program the VISA function for setting the buffer size would send back an error of 1073676424, and the buffer would always remain at 4096, no matter what value was input into the buffer size control. The answer to this problem was that the error code was just a warning, letting you know that you could not change the buffer size on a Linux machine, and 4096 bytes was the pre-set buffer size (unchangeable). According to the person who was helping me: "The reason that it doesn't work on those platforms (Linux, Solaris, Mac OSX) is that is it simply unavailable in the POSIX serial API that VISA uses on these operating systems."
    Now I have upgraded to NI-VISA 3.4 and I am asking the same question. I notice that an error code is no longer sent when I input different values for the buffer size. However, in my program, the bytes returned from the device max out at 4096, no matter what value I input into the buffer size control. So, has VISA changed, and it is now possible to change the buffer size, but I am setting it up wrong? Or, have the error codes changed, but it is still not possible to change the buffer size on a Linux machine with NI-VISA?
    Thanks,
    Sam

    The buffer size still can't be set, but it seems that we are no longer returning the warning. We'll see if we can get the warning back for the next version of VISA.
    Thanks,
    Josh

  • What is the max Captivate 5 project size?

    What is the max allowed power point presentation size (including audio files and animations), if one wants to import it into captivate 5 and then published seamlessly?

    Hi Rick
    Thanks for the info, and this is a big problem especially when you import from power point (the one I have is 160 slides complete with 450 MB worth of audio flies!!).
    I also noticed that captivate uses a lot of RAM during import (at least 2-3 Gigs).
    Any suggestions on how to publish this project or if you know anyone who could help me with this? the alternative is to return Captivate and get my $900 back
    Cheers
    Sal

  • How do define the limit of the max heap size?

    Hi All,
    I would like to know what should be the limit of the JVM max heap size.
    What will happen if we will not define it?
    What is the purpose of defining it from the technical point of view?
    Thanks
    Edited by: Anna78 on Jul 31, 2008 12:36 PM

    Defining a max heap space too large can have the following effect:
    If you create new objects, the VM may decide it is not worth getting rid of garbage-collectable ones, as there
    is still plenty of space between the current heap size and the max allowed. The result will be that the
    application will run faster and will consume more memory than it really needs.
    If the heap size is too small, but still sufficient, the application will do a lot of garbage-collection and therefore
    run slower. On the other hand, it will stay inside the tight space it has been allowed to use.
    The speed difference may or may not be noticeable, while the difference between 256M and 512M may
    or may not matter on today's computers.

Maybe you are looking for

  • Number of document created in th backend in the classic scenario?

    Dear Experts, Could you please tell me the number of document and its details created in the ECC backend in the below cases? 1. Case 1: Define backend objects:Purchase requistion created if no stock available. Is it correct that the system will creat

  • Adding new RAM to Mac Pro

    I just got my new Mac Pro. It comes with (2) 1G sticks of DDR2 RAM. I bought (4) 2G sticks of DDR2 RAM from OWC to put in it. Can I just add my 2G sticks and keep the 1G sticks in, or do they all have to be the same size? I spoke with 2 different App

  • XML Validation : isschemavalid always returns 0

    Hello XML folks, I have registered the schema and tried to validate. It always returns 0 (i.e invalid). Could you please let me know what is wrong with XSD or XML Thanks, Parappa SQL> DECLARE 2 doc varchar2(3800) := 3 '<?xml version="1.0" encoding="U

  • Integration of client machines into Win2003 DC

    Hi everyone, we have a eDirectory and ZW 7 environment with server based profiles and Windows XP client machines. Everythings just fine at the moment but now we want to integrate the client machines with a newly set up Windows 2003 SP 2 Domain Contro

  • Is voice dictation availble for iphone 4

    Is voice dictation available for iphone 4 with the little microphone next to the spacebar on the keyboard?