How to deal with large amount of data

Hi folks,
I have a web page that needs 3 maps to serve. Each map will hold about 150,000 entries, and all of them will use about 100MB total. For some users with lots of data (e.g. 1,000,0000 entries), it may use up to 1GB of memory. If few of these high-volume users log in at the same time, it can bring the server down. The data is from the files, I cannot read it on demand because it will be too slow. Loading the data to maps offers me very good performance, but it does not scale. I am thinking to serialize the maps and deserialize them when I need. Is it my only option?
Thanks in advance!

JoachimSauer wrote:
I don't buy the "too slow" argument.
I'd put the data into a database. They are built to handle that amount of data.++

Similar Messages

  • Dealing with large amounts of data

    Hi
    I am new to using Flex and BlazeDS. I can see in the FAQ that binary data transfer from server to Flex app is more efficient. My question is: is there a way to build a Flex databound control (e.g. datagrid) which binds to a SQL query or web service or remoting on the server side and then display infinite amount of data as user pages or scrolls using scrollbar? or does the developer have to write code from scratch to deal with paging or deal with infinite scrollbar by asking server for chunks of data at a time?

    You have to write your own paginating grid. It's easy to do, just make a canvas, throw a grid and some buttons on it, than when user click to the next page, you make a request to the server and when you have a result, set it as a new data model for the grid.
    I would discourage you to return infinite amount of data to the user, provide a search functionality plus pagination.
    Hope that helps.

  • How do I pause an iCloud restore for app with large amounts of data?

    I am using an iPhone app which is holding 10 Gb of data (media files) .
    Unfortunately, although all data was backed up, my iPhone 4 was faulty and needed to be replaced with a new handset. On restore, the 10Gb of data takes a very long time to restore over wi-fi. If interrupted (I reached the halfway point during the night) to go to work or take the dog for a walk, I end up of course on 3G for a short period of time.
    Next time I am in a wi-fi zone the app is restoring again right from the beginning
    How does anyone restore an app with large amounts of data or pause a restore?

    You can use classifications but there is no auto feature to archive like that on web apps.
    In terms of the blog, Like I have said to everyone that has posted about blog preview images:
    http://www.prettypollution.com.au/business-catalyst-blog
    Just one example of an image at the start of the blog post rendering out, not hard at all.

  • Dealing with large volumes of data

    Background:
    I recently "inherited" support for our company's "data mining" group, which amounts to a number of semi-technical people who have received introductory level training in writing SQL queries and been turned loose with SQL Server Management
    Studio to develop and run queries to "mine" several databases that have been created for their use.  The database design (if you can call it that) is absolutely horrible.  All of the data, which we receive at defined intervals from our
    clients, is typically dumped into a single table consisting of 200+ varchar(x) fields.  There are no indexes or primary keys on the tables in these databases, and the tables in each database contain several hundred million rows (for example one table
    contains 650 million rows of data and takes up a little over 1 TB of disk space, and we receive weekly feeds from our client which adds another 300,000 rows of data).
    Needless to say, query performance is terrible, since every query ends up being a table scan of 650 million rows of data.  I have been asked to "fix" the problems.
    My experience is primarily in applications development.  I know enough about SQL Server to perform some basic performance tuning and write reasonably efficient queries; however, I'm not accustomed to having to completely overhaul such a poor design
    with such a large volume of data.  We have already tried to add an identity column and set it up as a primary key, but the server ran out of disk space while trying to implement the change.
    I'm looking for any recommendations on how best to implement changes to the table(s) housing such a large volume of data.  In the short term, I'm going to need to be able to perform a certain amount of data analysis so I can determine the proper data
    types for fields (and whether any existing data would cause a problem when trying to convert the data to the new data type), so I'll need to know what can be done to make it possible to perform such analysis without the process consuming entire days to analyze
    the data in one or two fields.
    I'm looking for reference materials / information on how to deal with the issues, particularly when a large volumn of data is involved.  I'm also looking for information on how to load large volumes of data to the database (current processing of a typical
    data file takes 10-12 hours to load 300,000 records).  Any guidance that can be provided is appreciated.  If more specific information is needed, I'll be happy to try to answer any questions you might have about my situation.

    I don't think you will find a single magic bullet to solve all the issues.  The main point is that there will be no shortcut for major schema and index changes.  You will need at least 120% free space to create a clustered index and facilitate
    major schema changes.
    I suggest an incremental approach to address you biggest pain points.  You mention it takes 10-12 hours to load 300,000 rows, which suggests there may be queries involved in the process which require full scans of the 650 million row table.  Perhaps
    some indexes targeted at improving that process is a good first step.
    What SQL Server version and edition are you using?  You'll have more options with Enterprise (partitioning, row/page compression). 
    Regarding the data types, I would take a best guess at the proper types and run a query with TRY_CONVERT (assuming SQL 2012) to determine counts of rows that conform or not for each column.  Then create a new table (using SELECT INTO) that has strongly
    typed columns for those columns that are not problematic, plus the others that cannot easily be converted, and then drop the old table and rename the new one.  You can follow up later to address columns data corrections and/or transformations. 
    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

  • How can I copy large amount of data between two HD ?

    Hello !
    Which command could I user to copy large amount of data between two hard disk drives ?
    How Lion identify the disk drives when you want to write some script, for example in Windows I sue
    Robocopy D:\folder source\Files E:\folder destination
    I just want to copy files and if the files/folders exist in destination the files should be overwritted.
    Help please, I bougth my first MAC 4 days ago.
    Thanks !

    Select the files/folders on one HD and drag & drop onto the other HD. The copied ones will overwrite anything with the same names.
    Since you're a newcomer to the Mac, see these:
    Switching from Windows to Mac OS X,
    Basic Tutorials on using a Mac,
    Mac 101: Mac Essentials,
    Mac OS X keyboard shortcuts,
    Anatomy of a Mac,
    MacTips,
    Switching to Mac Superguide, and
    Switching to the Mac: The Missing Manual,
    Snow Leopard Edition.&
    Additionally, *Texas Mac Man* recommends:
    Quick Assist,
    Welcome to the Switch To A Mac Guides,
    Take Control E-books, and
    A guide for switching to a Mac.

  • Error in Generating reports with large amount of data using OBIR

    Hi all,
    we hve integrated OBIR (Oracle BI Reporting) with OIM (Oracle Identity management) to generate the custom reports. Some of the custom reports contain a large amount of data (approx 80-90K rows with 7-8 columns) and the query of these reports basically use the audit tables and resource form tables primarily. Now when we try to generate the report, it is working fine with HTML where report directly generate on console but the same report when we tried to generate and save in pdf or Excel it gave up with the following error.
    [120509_133712190][][STATEMENT] Generating page [1314]
    [120509_133712193][][STATEMENT] Phase2 time used: 3ms
    [120509_133712193][][STATEMENT] Total time used: 41269ms for processing XSL-FO
    [120509_133712846][oracle.apps.xdo.common.font.FontFactory][STATEMENT] type1.Helvetica closed.
    [120509_133712846][oracle.apps.xdo.common.font.FontFactory][STATEMENT] type1.Times-Roman closed.
    [120509_133712848][][PROCEDURE] FO+Gen time used: 41924 msecs
    [120509_133712848][oracle.apps.xdo.template.FOProcessor][STATEMENT] clearInputs(Object) is called.
    [120509_133712850][oracle.apps.xdo.template.FOProcessor][STATEMENT] clearInputs(Object) done. All inputs are cleared.
    [120509_133712850][oracle.apps.xdo.template.FOProcessor][STATEMENT] End Memory: max=496MB, total=496MB, free=121MB
    [120509_133818606][][EXCEPTION] java.net.SocketException: Socket closed
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
    at weblogic.servlet.internal.ChunkOutput.writeChunkTransfer(ChunkOutput.java:525)
    at weblogic.servlet.internal.ChunkOutput.writeChunks(ChunkOutput.java:504)
    at weblogic.servlet.internal.ChunkOutput.flush(ChunkOutput.java:382)
    at weblogic.servlet.internal.ChunkOutput.checkForFlush(ChunkOutput.java:469)
    at weblogic.servlet.internal.ChunkOutput.write(ChunkOutput.java:304)
    at weblogic.servlet.internal.ChunkOutputWrapper.write(ChunkOutputWrapper.java:139)
    at weblogic.servlet.internal.ServletOutputStreamImpl.write(ServletOutputStreamImpl.java:169)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
    at oracle.apps.xdo.servlet.util.IOUtil.readWrite(IOUtil.java:47)
    at oracle.apps.xdo.servlet.CoreProcessor.process(CoreProcessor.java:280)
    at oracle.apps.xdo.servlet.CoreProcessor.generateDocument(CoreProcessor.java:82)
    at oracle.apps.xdo.servlet.ReportImpl.renderBodyHTTP(ReportImpl.java:562)
    at oracle.apps.xdo.servlet.ReportImpl.renderReportBodyHTTP(ReportImpl.java:265)
    at oracle.apps.xdo.servlet.XDOServlet.writeReport(XDOServlet.java:270)
    at oracle.apps.xdo.servlet.XDOServlet.writeReport(XDOServlet.java:250)
    at oracle.apps.xdo.servlet.XDOServlet.doGet(XDOServlet.java:178)
    at oracle.apps.xdo.servlet.XDOServlet.doPost(XDOServlet.java:201)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
    at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
    at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292)
    at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:26)
    at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:42)
    at oracle.apps.xdo.servlet.security.SecurityFilter.doFilter(SecurityFilter.java:97)
    at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:42)
    at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496)
    at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
    at weblogic.security.service.SecurityManager.runAs(Unknown Source)
    at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
    at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
    at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
    It seems where the querry processing is taking some time we are facing this issue.Do i need to perform any additional configuration to generate such reports?

    java.net.SocketException: Socket closed
         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99)
         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
         at weblogic.servlet.internal.ChunkOutput.writeChunkTransfer(ChunkOutput.java:525)
         at weblogic.servlet.internal.ChunkOutput.writeChunks(ChunkOutput.java:504)
         at weblogic.servlet.internal.ChunkOutput.flush(ChunkOutput.java:382)
         at weblogic.servlet.internal.CharsetChunkOutput.flush(CharsetChunkOutput.java:249)
         at weblogic.servlet.internal.ChunkOutput.checkForFlush(ChunkOutput.java:469)
         at weblogic.servlet.internal.CharsetChunkOutput.implWrite(CharsetChunkOutput.java:396)
         at weblogic.servlet.internal.CharsetChunkOutput.write(CharsetChunkOutput.java:198)
         at weblogic.servlet.internal.ChunkOutputWrapper.write(ChunkOutputWrapper.java:139)
         at weblogic.servlet.internal.ServletOutputStreamImpl.write(ServletOutputStreamImpl.java:169)
         at com.tej.systemi.util.AroundData.copyStream(AroundData.java:311)
         at com.tej.systemi.client.servlet.servant.Newdownloadsingle.producePageData(Newdownloadsingle.java:108)
         at com.tej.systemi.client.servlet.servant.BaseViewController.serve(BaseViewController.java:542)
         at com.tej.systemi.client.servlet.FrontController.doRequest(FrontController.java:226)
         at com.tej.systemi.client.servlet.FrontController.doPost(FrontController.java:128)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
         at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
         at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
         at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292)
         at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:175)
         at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3498)
         at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
         at weblogic.security.service.SecurityManager.runAs(Unknown Source)
         at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
         at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
         at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
         at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
         at weblogic.work.ExecuteThread.run(ExecuteThread.java:17
    (Please help finding a solution in this issue its in production and we need to ASAP)
    Thanks in Advance
    Edited by: 909601 on Jan 23, 2012 2:05 AM

  • How can I edit large amount of data using Acrobat X Pro

    Hello all,
    I need to edit a catalog that contains large amount of data - mainly the product price. Currently I can only export the document into excel file and then paste the new price onto the catalog using Acrobat X Pro one by one, which is extremely time-consuming. I am sure there's a better way to make this faster while keeping the accuracy of the data. Thanks a lot in advance if any one's able to help! 

    Hi Chauhan,
    Yes I am able to edit text/image via tool box, but the thing is the catalog contains more than 20,000 price data and all I can do is deleteing the orginal price info from catalog and replace it with the revised data from excel. Repeating this process over 20,000 times would be a waste of time and manpower... Not sure if I make my situation clear enough? Pls just ask away, I really hope to sort it out, Thanks! 

  • Working with large amount of data

    Hi! I am writing an peer-to-peer video player and I need to operate with huge amounts of data. The downloaded data should be stored in memory for sharing with other peers. Some video files can be up to 2 GB and more. So keep all this data in RAM - not the best solution i think =)
    Since the flash player does not have access to the file system I can not save this data in temporary files.
    Is there a solution to this problem?

    No ideas? very sad ((

  • Ni reports crashes with large amounts of data

    I'm using labview 6.02 and am using some of the Ni reports tools to generate a report. I create, what can be, a large data array which is then sent to the "append table to report" vi and printed. The program works fine, although it is slow, as long as the data that I send to the "append table to report" vi is less than about 30kb. If the amount of data is too large Labview terminates execution with no error code or message displayed. Has anyone else had a similar problem? Does anyone know what is going on or better yet how to fix it?

    Hello,
    I was able to print a 100x100 element array of 5-character strings (~50 kB of data) without receiving a crash or error message. However, it did take a LONG time...about 15 minutes for the VI to run, and another 10 minutes for the printer to start printing. This makes sense, because 100x100 elements is a gigantic amount of data to send into the NI-Reports ActiveX object that is used for printing. You may want to consider breaking up your data into smaller arrays and printing those individually, instead of trying to print the giant array at once.
    I hope these suggestions help you out. Good luck with your application, and have a pleasant day.
    Sincerely,
    Darren N.
    NI Applications Engineer
    Darren Nattinger, CLA
    LabVIEW Artisan and Nugget Penman

  • Putting different tables with large amounts of data together

    Hi,
    I need to put different kinds of tables together:
    Example 'table1' with columns: DATE, IP, TYPEN, X1, X2, X3
    To 'table0' with columns DATE, IP, TYPENUMBER.
    TYPEN in table1 needs to be inserted into TYPENUMBER in table0, but through a function which transforms it to some other value.
    There are several other tables like 'table1', but with slighty different columns, that needs to be inserted into the same table ('table0').
    The amount of data in each table is quite huge, so the procedure should be done in small pieces and efficiently.
    Should/Could I use data pump for this?
    Thank you!

    user13036557 wrote:
    How should I continue with this then?
    Should I delete the columns I don't need and transform the data in the table first and then use data pump,
    or should I simply make a procedure going through every row (in smaller pieces) of 'table1' and inserting it to 'table0'?You have both the options .. Please test both of them , calculate time to complete and implement the best .
    Regards
    Rajesh

  • XY graphing with large amount of data

    I was looking into graphing a fairly substantial amount of data.  It is bursted across serial.
    What I have is 30 values corresponding to remote data sensors.  The data for each comes across together, so I have no problem having the data grouped.  It is, effectively, in an array of size 30.  I've been wanting to place this data in a graph.
    The period varies between 1/5 sec and 2 minutes between receptions (its wireless and mobile, so signal strength varies).  This eliminates waveform graph as the time isn't constant and there's alot of data(So no random NaN insertion).  This leaves me with an XY graph.
    Primary interest is with the last 10 minutes or so, with a desire to see up to an hour or two back.
    The data is farily similar, and I'm tring to possibly split it into groups of 4 or 5 sets of ordered pairs per graph.
    Problems:
    1.  If data comes in slow enough, everything is ok, but the time needed to synchrounously update the graph(s) often can exceed the time it would take to fully recieve the chunk of data which contains these data points.  Thinking asynchrounously is useless, as the graphs need to be reasonably in tune with the most recent data recieved.  I can't have the an exponential growth in the delta of time represented on the graph and the time the last bit of data was recieved.
    2.  I could use some advice on making older data points more sparse to allow for older data to be viewed, but with a sort of 'decay' of old data I don't value that 1/5 second resolution at all.
    I'm most concerned with solving problem 1, but random suggestions on 2 are most welcome.

    I didn't quite get the first question. Could you try to find out where
    exactly the time is consumed in your program and then refine your
    question.
    To the second question (which may also solve the first question). You
    can store all the data to a file as it arrives. Keep the most recent
    data in a let's say shift register of the update loop. Add a data point
    corresponding the start of the mesurement to the data and wire it to a
    XY graph. Make X Scrollbar visible. Handle the event for XY graph:X
    scale change. When the scale is changed i.e. the user scrolls the
    scroll bar, load data from the file and display it on the XY graph.
    This was a little simplified, it's a little more complicated.
    In my project, I am writing an XY graph X Control to handle a bit
    similar issue. I have huge data sets i.e. multichannel recordings 
    with millisecond resolution over many hours. One data set may be
    several gigabytes, so the data won't fit on the main memory of the
    computer at once. I wrote an X control which contains a XY graph. Each
    time the time scale is changed i.e. the scroll bar is scrolled, the X
    control generates a user event if it doesn't have enough data to
    display the time slice requested. The user event is handled outside the
    X Control by loading the appropriate set of data from the disk and
    displaying it on the X Control. The user event can be generated already
    a while before the X Control is out of data. This way the data is
    loaded a bit in advance, which allows seamles scrolling on the XY
    graph. One must notice that the front panel updates must be turned off
    in the X Control when the data is updated and back on after the update
    has finnished. Otherwise the XY graph will flicker annoingly.
    Tomi Maila

  • Out.println() problems with large amount of data in jsp page

    I have this kind of code in my jsp page:
    out.clearBuffer();
    out.println(myText); // size of myText is about 300 kbThe problem is that I manage to print the whole text only sometimes. Very often happens such that the receiving page gets only the first 40 kb and then the printing stops.
    I have made such tests that I split the myText to smaller parts and out.print() them one by one:
    Vector texts = splitTextToSmallerParts(myText);
    for(int i = 0; i < texts.size(); i++) {
      out.print(text.get(i));
      out.flush();
    }This produces the same kind of result. Sometimes all parts are printed but mostly only the first parts.
    I have tried to increase the buffer size but neither that makes the printing reliable. Also I have tried with autoFlush="false" so that I flush before the buffer size gets overflowed; again same result, sometimes works sometimes don't.
    Originally I use such a system where Visual Basic in Excel calls a jsp page. However, I don't think that this matters since the same problems occur if I use a browser.
    If anyone knows something about problems with large jsp pages, I would appreciate that.

    Well, there are many ways you could do this, but it depends on what you are looking for.
    For instance, generating an Excel Spreadsheet could be quite easy:
    import javax.servlet.*;
    import javax.servlet.http.*;
    import java.io.*;
    public class TableTest extends HttpServlet{
         public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
              response.setContentType("application/xls");
              PrintWriter out = new PrintWriter(response.getOutputStream());
                    out.println("Col1\tCol2\tCol3\tCol4");
                    out.println("1\t2\t3\t4");
                    out.println("3\t1\t5\t7");
                    out.println("2\t9\t3\t3");
              out.flush();
              out.close();
    }Just try this simple code, it works just fine... I used the same approach to generate a report of 30000 rows and 40 cols (more or less 5MB), so it should do the job for you.
    Regards

  • Converting tdm to lvm/ working with large amount of data

    I use a PCI 6251 card for data aquisition, and labview version 8; to log 5 channels at 100 Khz for approximately 4-5 million samples on each channel (the more the better). I use the express VI for reading and writing data which is strored in .tdm format (file size of the .tdx file is around 150 MB). I did not store it in lvm format to reduce the time taken to aquire data.
    1. how do i convert this binary file to a .mat file ?
    2. In another approach,  I converted the tdm file into lvm format, this works as long as the file size is small (say 50 MB) bigger than that labview memory gets full and will not save the new file. what is an efficient method to write data (say into lvm format) for big size files without causing labview memory to overflow? I tried saving to multiple files, saving one channel at a time, increased the computer's virtual memory (upto 4880 MB) but i still have problems with 'labview memory full' error.
    3.  Another problem i noticed with labview is that once it is used to aquire data, it occupies a lot of the computer's memory, even after the VI stops running, is ther a way to refresh the memory and is this mainly due to  bad programming?
    any suggestions?

    I assume from your first question that you are attempting to get your data into Matlab.  If that is the case, you have three options:
    You can treat the tdx file as a binary file and read directly from Matlab.  Each channel is a contiguous block of the data type you stored it in (DBL, I32, etc.), with the channels in the order you stored them.  You probably know how many points are in each channel.  If not, you can get this information from the XML in the tdm file.  This is probably your best option, since you won't have to convert anything.
    Early versions of TDM storage (those shipping with LV7.1 or earlier) automatically read the entire file into memory when you load it.  If you have LV7.1, you can upgrade to a version which allows you to read portions of the file by downloading and installing the demo version of LV8.  This will upgrade the shared USI component.  You can then read a portion of your large data set into memory and stream it back out to LVM.
    Do option 2, but use NI-HWS (available on your driver CD under the computer based instruments tab) instead of LVM.  HWS is a hierarchical binary format based on HDF5, so Matlab can read the files directly through its HDF5 interface.  You just need to know the file structure.  You can figure that out using HDFView.  If you take this route and have questions, reply to this post and I will try to answer them.  Note that you may wish to use HWS for your future storage, since its performance is much better than TDM and you can read it from Matlab.  HWS/HDF5 also supports compression, and at your data rates, you can probably pull this off while streaming to disk, if you have a reasonably fast computer.
    Handling large data sets in LabVIEW is an art, like most programming languages.  Check out the tutorial Managing Large Data Sets in LabVIEW for some helpful pointers and code.
    LabVIEW does not release memory until a VI exits memory, even if the VI is not running.  This is an optimization to prevent a repeatedly called VI from requesting the same memory every time it is called.  You can reduce this problem considerably by writing empty arrays to all your front panel objects before you exit your top level VI.  Graphs are a particulary common problem.
    This account is no longer active. Contact ShadesOfGray for current posts and information.

  • URLConnection with large amount of data causes OutOfMemoryError: Java heap

    I am setting up a system where my customers can sent me their data files to my server. If I use a simple socket with a server permanently running on a chosen port, my customers are able to transfer files of any size without problem. However, If I adopt a different architecture, using a web server and a CGI program to receive their submissions, the client program that they run will crash if the file they are sending is any larger than about 30 Mbytes, with the exception OutOfMemoryError: Java heap.
    The code in the two architectures is almost identical:
    Socket con = new Socket( host, portno);
    //URL url = new URL("http://"+host+":"+portno+path);
    //URLConnection con = url.openConnection();
    //con.setDoOutput(true);
    File source_file = new File(file_name);
    FileInputStream source = new FileInputStream(source_file);
    out = new BufferedOutputStream(con.getOutputStream());
    // First, Send a submission header.
    data_out = submit_info.getBytes();
    len = data_out.length + 1;
    data_len[0] = (byte)len;
    out.write(data_len, 0, 1);
    out.write(data_out, 0, data_out.length);
    // Then send the file content.
    buff = new byte[(int)buffSize];
    content_length = source_file.length();
    long tot = 0;
    while ( tot < content_length )
    // Read data from the file.
    readSize = (int) Math.min( content_length-tot, buffSize );
    nRead = source.read(buff, 0, readSize);
    tot += nRead;
    if ( nRead == 0 ) break;
    // Send data.
    out.write(buff, 0, nRead);
    "buffSize" is 4096. This code works fine, but if the first line is commented out and the next three are uncommented, the OutOfMemory exception is thrown within the loop when "tot" is around 30 million.
    I have tried calling the garbage collector within the loop but it makes no difference. I am unable to anticipate the size of files that my customers will submit, so I cannot set the heap size in advance to cope with what they will sent. Fortunately, using a simple Socket avoids the problem, so there seems to be something wrong with how URLConnection works.

    Set the URLConnection to use chunked mode. This saves it from having to buffer the entire contents before sending to ascertain its length.

  • Loosing indexes working with large amounts of data

    SCENARIO
    We are working on an Interface project with ORACLE 10g that works basically like this:
    We have some PARAMETER TABLES in which the key users can insert, update or delete parameters via a web UI.
    There is a download process that brings around 20 million records from our ERP system into what we call RFC TABLES. There are around 14 RFC TABLES.
    We developed several procedures that process all this data against the PARAMETER tables according to some business rules and we end up with what we call XML TABLES because they are sent to another software, completing the interface cycle. We also have INTERMIDIATE TABLES that are loaded in the middle of the process.
    The whole process takes around 2 hours to run.
    We had to create several indexes to get to this time. Without the indexes the process will take forever.
    Every night the RFC, INTERMIDIATE and XML tables need to be truncated and then loaded again.
    I know It might seem strange why we delete millions of records and then load them again. The reason is that the data the users insert in the PARAMETER TABLES need to be processed against ALL data that comes from the ERP and goes to the other software.
    PROBLEMS
    As I said we created several indexes in order to make the process run in less than 2 hours.
    We were able to run the whole process in that time for a few times but, suddenly, the process started to HANG forever and we realized some indexes were just not working anymore.
    When running EXPLAIN we figured the indexes were making no effect and we had some ACCESS FULLS. Curiously when taking the HINTS off and putting them back in the indexes started working again.
    SOLUTION
    We tried things like
    DBMS_STATS.GATHER_SCHEMA_STATS(ownname => SYS_CONTEXT('USERENV', 'CURRENT_SCHEMA'), cascade=>TRUE);
    dbms_utility.analyze_schema
    Dropping all tables and recreating the every time before the process starts
    Nothing solved our problem so far.
    We need advice from someone that worked in a process like this. Where millions of records are deleted and inserted and where a lot of indexes are needed.
    THANKS!
    Jose

    skynyrd wrote:
    I don't know anything about
    BIND variables
    Stored Outlines
    bind peeking issue
    or plan stability in the docs
    but I will research about all of them
    we are currently running the process with a new change:
    We put this line:
    DBMS_STATS.GATHER_SCHEMA_STATS(ownname => SYS_CONTEXT('USERENV', 'CURRENT_SCHEMA'), cascade=>TRUE);
    after every big INSERT or UPDATE (more than 1 million records)
    It is running well so far (it's almost in the end of the process). But I don't know if this will be a definitive solution. I hope so.
    I will post here after I have an answer if it solved the problem or not.
    Thanks a lot for your help so farWell, you best get someone in there that knows what those things are, basic development, basic performance tuning and basic administration all are predisposed on understanding these basic concepts, and patching is necessary (unless you are on XE). I would recommend getting books by Tom Kyte, he clearly explains the concepts you need to know to make things work well. You ought to find some good explanations of bind peeking online if you google that term with +Kyte.  
    You will be subject to this error at random times if you don't find the root cause and fix it.
    Here is some food for your thoughts:
    http://structureddata.org/2008/03/26/choosing-an-optimal-stats-gathering-strategy/ (one of those "what to expect from the 10g optimizer" links does work)
    http://kerryosborne.oracle-guy.com/2009/03/bind-variable-peeking-drives-me-nuts/
    http://pastebin.com/yTqnuRNN
    http://kerryosborne.oracle-guy.com/category/oracle/plan-stability/
    Getting stats on the entire schema as frequently as you do may be overkill and time wasting, or even counterproductive if you have an issue with skewed stats. Note that you can figure out what statistics you need and lock them, or if you have several scenarios, export them and import them as necessary. You need to know exactly what you are doing, and that is some amount of work. It's not magic, but it is math. Get Jonathan Lewis' optimizer book.

Maybe you are looking for