Performance bottleneck with subreports

I have an SSRS performance bottleneck on my production server that we have diagnosed as being related to the use of subreports.
Background facts:
* Our Production and Development servers are identically configured
* We've tried the basic restart/reboot activities, didn't change anything about the performance.
* The Development server was "cloned" from the Production server about a month ago, so all application settings (memory usage, logging, etc.) are identical between the two
* For the bottlenecked report the underlying stored procedure executes in 3 seconds, returning 901 rows, in both environments with the same parameters.  The execution plan is identical between the two servers, and the underlying tables and indexing
is identical.  Stats run regularly on both servers.
* In the development environment the report runs in 12 seconds. But on Production the report takes well over a minute to return, ranging from 1:10 up to 1:40.
* If I point the Development SSRS report to the PROD datasource I get a return time of 14 seconds (the additional two seconds due to the transfer of data over the network).
* If I point the Production SSRS report to the DEV datasource I get a return time of well over a minute.
* I have tried deleting the Production report definition and uploading it as new to see if there was a corruption issue, this didn't change the runtimes.
* Out of the hundreds of Production SSRS reports that we have, the only two that exhibit dramatically different performance between Dev and Prod are the ones that contain subreports.
* Queries against the ReportServerTempDB also confirm that these two reports are the major contributors to TempDB utilization.
* We have verified that the ReportServerTempDB is being backed up and shrunk on a regular basis.
These factors tell me that the issue is not with the database or the SQL.  The tests on the Development server also prove that the reports and subreports are not an issue in themselves - it is possible to get acceptable performance from them in the
Development environment, or when they are pointed from the Dev reportserver over to the Prod database.
Based on these details, what should we check on our Prod server to resolve the performance issue with subreports on this particular server?

Hi GottaLoveSQL,
According to your description, you want to improve the performance of report with subreports. Right?
In Reporting Services, the usage of subreport will impact the report performance, because the report server processes each instance of a subreport as a separate report. So the best way is avoid using subreport by using LookUp , MultiLookUp , LookUpSet, which
will bridge different data sources. In this scenario, we suggest you cache the report with subreport. We can create a cache refresh plan for the report in Report Manager. Please refer to the link below:
http://technet.microsoft.com/en-us/library/ms155927.aspx
Reference:
Report Performance Optimization Tips (Subreports, Drilldown)
Performance, Snapshots, Caching (Reporting Services)
Performance Issue in SSRS 2008
If you have any question, please feel free to ask.
Best Regards,
Simon Hou

Similar Messages

  • Performance bottleneck with 2.2.1 and 2008 R2 os VM's

    Hi,
    I have DL370 g6 with Oracle vm server 2.2.1 installed
    *72 GB of Memory and 2 dual + quad core processor*
    All VM's are installed on local disk ( 6 300 GB in Raid 5 )*
    I have 2 nic connected to siwtch for lan traffic
    We have 10 VM's with 2008 R2 OS on it.
    The overall performance of these VM's is really horrible.
    They are very very very slow
    To install Databse on it takes 4 hours even though RAM is 6 GB  for each VM
    To restar the system it takes around 20 minutes .
    Has anybody tried thiese many VM's on 1 server.
    Is there any tool or any way i can see what is the issue or is there any bottlneck on the server .
    2008 R2 is generally a resource hungry OS but still the overall performance is really horrible

    hi,
    hdparm -T /dev/cciss/c0d0 which is the drive gives
    /dev/cciss/c0d0
    Timing cached reads: 31088 MB in 1.99 seconds = 15595.54 MB/sec
    HDIO_Drive_CMD(null) (wait for flush complete ) failed : Inappropriate ioctl for device
    HDIO_Drive_CMD(null) (wait for flush complete ) failed : Inappropriate ioctl for device
    hdparm  -T /dev/cciss/c0d0p5 whichi is /OVS - it gives
    /dev/cciss/c0d0p5
    Timing cached reads: 30636 MB in 1.99 seconds = 15364.10 MB/sec
    HDIO_Drive_CMD(null) (wait for flush complete ) failed : Inappropriate ioctl for device
    HDIO_Drive_CMD(null) (wait for flush complete ) failed : Inappropriate ioctl for device
    My all VM guests are windows 2008 R2 64 bit OS which is very new OS of windows.Oracle came up with New PV drivers for it and I suspect can that be a reason for all this resource bottleneck ???????????
    For I/O test use :On the VM server
    1) iostat
    avg-cpu: %user %nice %system %iowait %steal %idle
    0.11 0.00 0.03 0.24 0.00 99.61
    Device tps BLK_read/s Blk_wrtn/s
    ciss/c0d0 - disk 41.90 186.32 26.05
    cciss/c0d0p1-/boot 0.00 0.00 0.00
    cciss/c0d0p2 - 0.00 0.00 0.00
    cciss/c0d0p3- / 1.90 1.83 43.06
    cciss/c0d0p4 0.00 0.00 0.00
    cciss/c0d0p5 - /OVS 40.00 184.49 295.82
    2) vmstat
    swpd free buff cache
    92 166632 94940 53444
    3) sar
    To display block I/O activity:
    sar -b 3 100
    Average : tps - 41.18 rtps - 7.17 wtps - 34.01 bread/s 188.99 bwrtn/s - 588.64
    To display block I/O activity for each block device:
    sar -d 3 100
    Does this look ok .Is there any way to improve overall performance.Like enabling or disabling something .We are facing a very bad problem here with all the VM with 2008 R2 OS on it

  • Performance problem with subreports

    I have created a report with 5 subreports. It is a student transcript and has one subreport for each year of a studentu2019s time at the school, as well as a fifth one that calculates GPA. Everything is working except that it runs extremely slow. If I run each of the subreports in a normal query specifying the student_id field then it takes less than 2 seconds per subreport. I have each of the subreports linking via the student_id. All I can guess is that for each subreport itu2019s running on all data and not using the linked field within the subreport query and this is causing it to take ages. It takes over 15 minutes to run the report just for a dozen students.
    Is there anything I can do to improve performance significantly?
    Thanks!
    Calvin

    Candie,
    It depends. Look at it this way... If you were to take each of those sub reports as a stand alone report, what parameters would it need?
    You may have 2 sub reports that need all 5 while another only needs 1 and others that need 3 or 4.
    The key is that you want to get all of the necessary parameters for a given sub report into that sub report. This will allow each of the sub reports to work more efficiently by preventing them from pulling more data than they need.
    Simply relying on links to common fields alone to filter the sub report data hurts performance due to the fact that each of the sub-reports will be querying more data than it will be using and then filtering out on the client side.
    Linking the parameters simply eliminates the "double prompting" that occurs when you have multiple sub reports asking for the same information.
    Basically, add all of the  parameters that would be necessary if you were using the sub report as a stand alone report but don't try to force in unnecessary parameters simply because they exist in the main report.
    FYI: You'll get faster responses, from more people, if you open your own new thread. Adding your questions to closed threads is a good way to get ignored.
    HTH,
    Jason

  • Setting Multi Variables with Multi SQL Queries: Performance Bottleneck?

    Hello ODI Experts,
    I have created a Logical & Physical Schema and Source Data Store to pick data from DB Table. Now I am setting variables with simple SELECT Query for setting each variable (in its Refreshing tab>Select Query field).
    It gives me a less optimized approach picking one column per query (per variable). Lets say, I have to pick 35 columns from a table and put those in 35 variables...It would mean running 35 queries for fetching one record from the database table.
    Doesn't it seem less performance effective (less optimized)..a little scary..or ODI does it with some internal optimized technique?
    Any thing better or a different approach that I can do to make variable setting more optimized?
    Please guide.
    Thanks & Regards,
    Ahsan Asghar

    Hi GottaLoveSQL,
    According to your description, you want to improve the performance of report with subreports. Right?
    In Reporting Services, the usage of subreport will impact the report performance, because the report server processes each instance of a subreport as a separate report. So the best way is avoid using subreport by using LookUp , MultiLookUp , LookUpSet, which
    will bridge different data sources. In this scenario, we suggest you cache the report with subreport. We can create a cache refresh plan for the report in Report Manager. Please refer to the link below:
    http://technet.microsoft.com/en-us/library/ms155927.aspx
    Reference:
    Report Performance Optimization Tips (Subreports, Drilldown)
    Performance, Snapshots, Caching (Reporting Services)
    Performance Issue in SSRS 2008
    If you have any question, please feel free to ask.
    Best Regards,
    Simon Hou

  • FDM Performance Tunning with Oracle

    Hi,
    We are using Oracle 10g with FDM.We have scheduled batch jobs but it takes a long time while importing the data.
    Are there any performance tunnings we can do at Oracle end?
    Thanks.

    First you need to isolate where the performance bottleneck is occurring. If batch loads are significantly slower than interactive loads, then I'd look in the batch processes, as you've suggested yourself.
    If load performance is a problem in both batch and interactive modes, then something else you might consider is the number of records in a typical load. Have you configured the location with the optimal load type (SQL Insert or Bulk Insert) for your situation?

  • Changing database server on a report with subreports = formula error

    Good morning,
    I currently have several reports that print out, and were developed attached to our development database. However, I need to be able to dynamically change the server that the report uses according to the server configured in our application. Each of these reports contains one or more subreports, which point to the same server and database as the main report. All reports, both the main and subreports, are based on manual SQL commands.
    I'm running into some significant issues. So significant, in fact, that we were forced to deploy our application with reports that had been switched to our production environment in the designer in order to get them functional. This is, obviously, not an acceptable or long-term solution.
    I've gone round and round a couple of times I get different results with different methods of changing this information. I'll outline them below. First, my current code:
    ConnectionInfo connectionInfo = new ConnectionInfo();
                    TableLogOnInfo logOnInfo = new TableLogOnInfo();
                    Console.WriteLine("Report \"{0}\"", report.Name);
                    foreach (Table table in report.Database.Tables)
                        logOnInfo = table.LogOnInfo;
                        connectionInfo = new ConnectionInfo(logOnInfo.ConnectionInfo);
                        connectionInfo.ServerName = "panthers-dev";
                        connectionInfo.DatabaseName = "Prosys";
                        logOnInfo.ConnectionInfo = connectionInfo;
                        //table.Location = "Prosys.dbo." + table.Location.Substring(table.Location.LastIndexOf(".") + 1);
                        table.ApplyLogOnInfo(logOnInfo);
                        table.LogOnInfo.ConnectionInfo = connectionInfo;
                        Console.WriteLine("\t\"{0}\": \"{1}\", \"{2}\", \"{3}\", {4}", table.Name, table.LogOnInfo.ConnectionInfo.ServerName, table.LogOnInfo.ConnectionInfo.DatabaseName, table.Location, table.TestConnectivity());
                    foreach (Section section in report.ReportDefinition.Sections)
                        foreach (ReportObject ro in section.ReportObjects)
                            if (ro.Kind == ReportObjectKind.SubreportObject)
                                SubreportObject sro = (SubreportObject)ro;
                                ReportDocument subreport = report.OpenSubreport(sro.SubreportName);
                                Console.WriteLine("\tSubreport \"{0}\"", subreport.Name);
                                foreach (Table table in subreport.Database.Tables)
                                    logOnInfo = table.LogOnInfo;
                                    connectionInfo = new ConnectionInfo(logOnInfo.ConnectionInfo);
                                    connectionInfo.ServerName = "panthers-dev";
                                    connectionInfo.DatabaseName = "Prosys";
                                    logOnInfo.ConnectionInfo = connectionInfo;
                                    //table.Location = "Prosys.dbo." + table.Location.Substring(table.Location.LastIndexOf(".") + 1);
                                    table.ApplyLogOnInfo(logOnInfo);
                                    table.LogOnInfo.ConnectionInfo = connectionInfo;
                                    Console.WriteLine("\t\t\"{0}\": \"{1}\", \"{2}\", \"{3}\", {4}", table.Name, table.LogOnInfo.ConnectionInfo.ServerName, table.LogOnInfo.ConnectionInfo.DatabaseName, table.Location, table.TestConnectivity());
    Using this approach, my console output prints what I expect and want to see: the correct server and database information, and True for TestConnectivity for all reports and subreports. The two reports I have that have no subreports print out correctly, with data from the proper server. However, all of the reports with subreports fail with formula errors. If this procedure is not run, they work just fine on either server.
    I had to place the assignment of table.LogOnInfo.ConnectionInfo = connectionInfo after the call to ApplyLogOnInfo, as that function did not behave as expected. If I perform the assignment first (or not at all), then calling ApplyLogOnInfo on the outer report's table did NOT affect the values of its ConnectionInfo object, but it DID affect the values of the ConnectionInfo object's of its subreports!
    In any event, if anyone could post a code sample of changing database connection information on a report containing subreports, I would appreciate it.
    Any help is greatly appreciated and anxiously awaited!

    Hi Adam,
    Code for changing database connection information on a report containing subreports :
    private ReportDocument northwindCustomersReport;
        private void ConfigureCrystalReports()
            northwindCustomersReport = new ReportDocument();
            string reportPath = Server.MapPath("NorthwindCustomers.rpt");
            northwindCustomersReport.Load(reportPath);
            ConnectionInfo connectionInfo = new ConnectionInfo();
            connectionInfo.ServerName = "localhost";
            connectionInfo.DatabaseName = "Northwind";
            connectionInfo.IntegratedSecurity = false;
            SetDBLogonForReport(connectionInfo, northwindCustomersReport);
            SetDBLogonForSubreports(connectionInfo, northwindCustomersReport);
            crystalReportViewer.ReportSource = northwindCustomersReport;
        private void Page_Init(object sender, EventArgs e)
            ConfigureCrystalReports();
        private void SetDBLogonForReport(ConnectionInfo connectionInfo, ReportDocument reportDocument)
            Tables tables = reportDocument.Database.Tables;
            foreach (CrystalDecisions.CrystalReports.Engine.Table table in tables)
                TableLogOnInfo tableLogonInfo = table.LogOnInfo;
                tableLogonInfo.ConnectionInfo = connectionInfo;
                table.ApplyLogOnInfo(tableLogonInfo);
        private void SetDBLogonForSubreports(ConnectionInfo connectionInfo, ReportDocument reportDocument)
            Sections sections = reportDocument.ReportDefinition.Sections;
            foreach (Section section in sections)
                ReportObjects reportObjects = section.ReportObjects;
                foreach (ReportObject reportObject in reportObjects)
                    if (reportObject.Kind == ReportObjectKind.SubreportObject)
                        SubreportObject subreportObject = (SubreportObject)reportObject;
                        ReportDocument subReportDocument = subreportObject.OpenSubreport(subreportObject.SubreportName);
                        SetDBLogonForReport(connectionInfo, subReportDocument);
    Hope this helps!!
    Regards,
    Shweta

  • OWB Performance Bottleneck

    Is there any session log that is produced by the OWB mapping execution other than seeing the results in OWB Runtime Audit Browser.
    Suppose that the mapping is doing some hash join which is consuming too much amount of time and I would like to see which are the tables that are being joined at that instant. This would help me in identifying the exact area of the problem in a mapping. Does OWB provide a session log which can help me get that information, or from any other place where I can get some information regarding the operation which is causing a performance bottleneck
    regards
    -AP

    Thanks for all your suggestions. The mapping was using a join between some 4 - 5 tables and I think this was the place the mapping was getting stuck during execution in Set Based Mode. Moreover the mapping loads some 70 million records into the target table. Perhaps, loading such huge volume of data and that too in a set based mode and also with a massive join in the beginning, mapping should have got stuck somwhere.
    The solution that came up was to create a table with the join condition and use the table as input to the mapping. This helps us to get rid of the joiner in the very beginning and also the mapping be run in Row Based Target Only mode. The data (70 million) got loaded in some 4 hours.
    regards
    -AP

  • Does BPM - for a synchronous interface have a performance bottleneck

    Hi All,
    Just have a small query.
    We have a scenario in which we need to receive PO details from a legacy system, create a sales order in ecc and send back a response table to the legacy system.
    Our understanding is that this can be acheived using synchronous ABAP Proxies and also involves BPM and Abstract mappings.
    I beleive that this should not pose any problems. My concern here is that we are confused as to whether BPM would have bottlenecks with performance. Do we have any SAP document or article which mentions that for synchronous interfaces BPM is the only way to go and this would not have a significant impact on the performance.
    Another approach to the problem would be to create an asynchronous inbound proxy, write ABAP code within it and call a seperate outbound asynchronous proxy within the inbound proxy method. This approach looks and sounds very clumsy.
    Kindly let me know your thoughts or any links which would be useful.
    Thanks & Regards,
    Mz

    Hi Aashish,
    Thanks for your quick reply. it was helpful, but i am not using RFC's. Correct me if i am wrong, but i have explained the scenarios in detail below.
    Scenario 1. Synchronous
    1) PI Picks file from a common folder.
    2) PI does a data mapping and sends the data to ECC.
    3) ECC contains an inbound interface which receives the data and in which abap proxy code is written.
    4) The abap proxy code executes a function module and sends the response as an internal table back to PI.
    5) PI receives the response and places it in a text/csv file and places it back to another folder.
    I assume that the above would be possible only using BPM. What i understand is that in order for an interface to receive and send data, abstract mappings are to be used, and for this BPM is required. We do not have any conversions etc. its just a simple matter of receiving an internal table from ECC and creating a file to place in the folder.
    I also understand that BPM could have bottlenecks due to queue and cache issues, messages might be pending, or lost etc.
    Scenario 2. Asynchronous
    1) PI Picks file from a common folder.
    2) PI does a data mapping and sends the data to ECC.
    3) ECC contains an inbound interface which receives the data and in which abap proxy code is written.
    4) ABAP Proxy code executes the same function module and calls a seperate outbound interface and passes the values to it. This would be used in sending the response back.
    5)  PI receives the response from the second interface and places it in a text/csv file and places it back to another folder.
    I would like to know which would be the better approach. Documentation/references to support your claims would be much appreciated.
    Cheers,
    Mz

  • Will RAC's performance bottleneck be the shared disk storage ?

    Hi All
    I'm studying RAC and I'm concerned about RAC's I/O performance bottleneck.
    If I have 10 nodes and they use the same storage disk to hold database, then
    they will do I/Os to the disk simultaneously.
    Maybe we got more latency ...
    Will that be a performance problem?
    How does RAC solve this kind of problem?
    Thanks.

    J.Laurence wrote:
    I see FC can solve the problem with bandwidth(throughput),There are a couple of layers in the I/O subsystem for RAC.
    There is CacheFusion as already mentioned. Why read a data block from disk when another node has it in is buffer cache and can provide that instead (over the Interconnect communication layer).
    Then there is the actual pipes between the server nodes and the storage system. Fibre is slow and not what the latest RAC architecture (such as Exadata) uses.
    Traditionally, you pop a HBA card into the server that provides you with 2 fibre channel pipes to the storage switch. These usually run at 2Gb/s and the I/O driver can load balance and fail over. So it in theory can scale to 4Gb/s and provide redundancy should one one fail.
    Exadata and more "+modern+" RAC systems use HCA cards running Infiniband (IB). This provides scalability of up to 40Gb/s. Also dual port, which means that you have 2 cables running into the storage switch.
    IB supports a protocol called RDMA (Remote Direct Memory Access). This essentially allow memory to be "+shared+" across the IB fabric layer - and is used to read data blocks from the storage array's buffer cache into the local Oracle RAC instance's buffer cache.
    Port to port latency for a properly configured IB layer running QDR (4 speed) can be lower than 70ns.
    And this does not stop there. You can of course add a huge memory cache in the storage array (which is essentially a server with a bunch of disks). Current x86-64 motherboard technology supports up to 512GB RAM.
    Exadata takes it even further as special ASM software on the storage node reconstructs data blocks on the fly to supply the RAC instance with only relevant data. This reduces the data volume to push from the storage node to the database node.
    So fibre channels in this sense is a bit dated. As is GigE.
    But what about the hard drive's reading & writing I/O? Not a problem as the storage array deals with that. A RAC instance that writes a data block, writes it into storage buffer cache.. where the storage array s/w manages that cache and will do the physical write to disk.
    Of course, it will stripe heavily and will have 24+ disk controllers available to write that data block.. so do not think of I/O latency ito of the actual speed of a single disk.

  • Performance problem with synchronized singleton

    I'm using the singleton pattern to cache incoming JMS Message data from a 3rd party. I'm seeing terrible performance though, and I think it's because I've misunderstood something.
    My singleton class stores incoming JMS messages in a HashMap, so that successive messages can be checked to see if they are a new piece of data, or an update to an earlier one.
    I followed the traditional examples of a private constructor and a public getInstance method, and applied the double-checked locking to the latter. However, a colleague then suggested that all my other methods in the same class should also be synchronized - is this the case or am I creating an unnecessary performance bottleneck? Or have I unwittingly created that bottleneck elsewhere?
    package com.mycode;
    import java.util.HashMap;
    import java.util.Iterator;
    public class DataCache {
        private volatile static DataCache uniqueInstance;
        private HashMap<String, DataCacheElement> dataCache;
        private DataCache() {
            if (dataCache == null) {
                dataCache = new HashMap<String, DataCacheElement>();
        public static DataCache getInstance() {
             if (uniqueInstance == null) {
                synchronized  (DataCache.class) {
                    if (uniqueInstance == null) {
                        uniqueInstance = new DataCache();
            return uniqueInstance;
        public synchronized void put(String uniqueID, DataCacheElement dataCacheElement) {
            dataCache.put(uniqueID, dataCacheElement);
        public synchronized DataCacheElement get(String uniqueID) {
            DataCacheElement dataCacheElement = (DataCacheElement) dataCache.get(uniqueID);
            return dataCacheElement;
        public synchronized void remove(String uniqueID) {
            dataCache.remove(uniqueID);
        public synchronized int getCacheSize() {
         return dataCache.keySet().size();
         * Flushes all objects from the cache that are older than the
         * expiry time.
         * @param expiryTime (long milliseconds)
        public synchronized void flush(long expiryTime) {
            String uniqueID;
            long currentDate = System.currentTimeMillis();
            long compareDate = currentDate - (expiryTime);
            Iterator<String> iterator = dataCache.keySet().iterator();
            while( iterator.hasNext() ){
                // Get element by unique key
                uniqueID = (String) iterator.next();
                DataCacheElement dataCacheElement = (DataCacheElement) get(uniqueID);
                // get time from element
                long lastUpdatedDate = dataCacheElement.getUpdatedDate();
                // if time is greater than 1 day, remove element from cache
                if (lastUpdatedDate <  compareDate) {
                    remove(uniqueID);
        public synchronized void empty() {
            dataCache.clear();
    }

    m0thr4 wrote:
    SunFred wrote:
    m0thr4 wrote:
    I [...] applied the double-checked locking
    Which is broken. http://www.ibm.com/developerworks/java/library/j-dcl.html
    from the link:
    The theory behind double-checked locking is perfect. Unfortunately, reality is entirely different. The problem with double-checked locking is that there is no guarantee it will work on single or multi-processor machines.
    The issue of the failure of double-checked locking is not due to implementation bugs in JVMs but to the current Java platform memory model. The memory model allows what is known as "out-of-order writes" and is a prime reason why this idiom fails[b].
    I had a read of that article and have a couple of questions about it:
    1. The article was written way back in May 2002 - is the issue they describe relevant to Java 6's memory model? DCL will work starting with 1.4 or 1.5, if you make the variable you're testing volatile. However, there's no reason to do it.
    Lazy instantiation is almost never appropriate, and for those rare times when it is, use a nested class to hold your instance reference. (There are examples if you search for them.) I'd be willing to be lazy instantiation is no appropriate in your case, so you don't need to muck with syncing or DCL or any of that nonsense.

  • Major performance bottleneck in JSF RI 1.0

    We've been doing some load testing this week, and have come up with what I believe is a major performance bottleneck in the reference implementation.
    Our test suite was conducted two different application servers (JBoss and Oracle) and we found that in both cases response time degraded dramatically when hitting about 25-30 concurrent users.
    On analyzing a thread dump when the application server was in this state we noticed that close to twenty threads were waiting on the same locked resource.
    The resource is the 'descriptors' static field in the javax.faces.component.UIComponentBase class. It is a WeakHashMap. The contention occurs in the getPropertyDescriptors method, which has a large synchronized block.

    Well not the answer I was hoping for. But at least that's clear.
    Jayashri, I'm using JSF RI for an application that will be delivered to testing in august. Can you give advice wether I can expect an update for this bottleneck problem within that timeframe?
    Sincerely,
    Joost de Vries
    ps hi netbug. Saw you at theserverside! :-)

  • J2EE application performance bottlenecks

    For anyone interested in learning how to get resolve J2EE application performance bottlenecks, found a great resource:
    http://www.cyanea.com/email/throttle_form2.html
    registering with them can have you win 1 of 3 iPod mini's

    I agree with yawmark's response #1 in one of your evil spams http://forum.java.sun.com/thread.jsp?thread=514026&forum=54&message=2446641

  • Array as Shared Memory - performance bottleneck

    Array as shared memory - performance bottleneck
    Hello,
    currently I work on a multi-threaded application, where many threads work on shared memory.
    I wondering why the application doesn't become faster by using many threads (I have i7 machine).
    Here is an example for initialization in single thread:
              final int arrayLength = (int)1e7;
              final int threadNumber = Runtime.getRuntime().availableProcessors();
              long startTime;
               * init array in single thread
              Integer[] a1 = new Integer[arrayLength];
              startTime = System.currentTimeMillis();
              for(int i=0; i<arrayLength; i++){
                   a1[i] = i;
              System.out.println("single thread=" + (System.currentTimeMillis()-startTime));and here initialization with many threads:
               * init array in many threads
              final Integer[] a3 = new Integer[arrayLength];
              List<Thread> threadList = new ArrayList<Thread>();
              for(int i=0; i<threadNumber; i++){
                   final int iF = i;
                   Thread t = new Thread(new Runnable(){
                        @Override
                        public void run() {
                             int end = (iF+1)*offset;
                             if(iF==(threadNumber-1))
                                  end = a3.length;
                             for(int i=iF*offset; i<end; i++){
                                  a3[i] = i;
                   threadList.add(t);
              startTime = System.currentTimeMillis();
              for(Thread t:threadList)
                   t.start();
              for(Thread t:threadList)
                   t.join();After execution it looks like this:
    single thread=2372
    many threads List=3760I have i7 4GB RAM
    System + Parameters:
    JVM-64bit JDK1.6.0_14
    -Xmx3g
    Why the executing of one thread is faster as executing of many threads?
    As you can see I didn't use any synchronization.
    Maybe I have to configure JVM in some way to gain wished performance (I expected the performance gain on i7 x8 times) ?

    Hello,
    I'm from [happy-guys|http://www.happy-guys.com] , and we developed new sorting-algorithm to sort an array on the multi-core machine.
    But after the algorithm was implemented it was a little-bit slower as standard sorting-algorithm from JDK (Array.sort(...)). After searching for the reason, I created performance tests which shows that the arrays in Java don't allow to access them by many threads at the same time.
    The bad news is: different threads slowdown each-other even if they use different array-objects.
    I believe all array-objects are natively managed by a global manager in JVM, thus this manager builds a global-lock for all threads.
    Only one thread can access any array at the same time!
    I used:
    Software:
    1)Windows Vista 64bit,
    2) java version "1.6.0_14"
    Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
    Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
    Hardware:
    Intel(R) Core(TM) i7 CPU 920 @ 2,67GHz 2,79 GHz, 6G RAM
    Test1: initialization of array in a single thread
    Test2: the array initialization in many threads on the single array
    Test3: array initialization in many threads on many arrays
    Results in ms:
    Test1 = 5588
    Test2 = 4976
    Test3 = 5429
    Test1:
    package org.happy.concurrent.sort.forum;
    * simulates the initialization of array in a single thread
    * @author Andreas Hollmann
    public class ArraySingleThread {
         public static void main(String[] args) throws InterruptedException {
              final int arrayLength = (int)2e7;
              long startTime;
                    * init array in single thread
                   Integer[] a1 = new Integer[arrayLength];
                   startTime = System.currentTimeMillis();
                   for(int i=0; i<arrayLength; i++){
                        a1[i] = i;
                   System.out.println("single thread=" + (System.currentTimeMillis()-startTime));
    }Test2:
    package org.happy.concurrent.sort.forum;
    import java.util.ArrayList;
    import java.util.List;
    * simulates the array initialization in many threads on the single array
    * @author Andreas Hollmann
    public class ArrayManyThreads {
         public static void main(String[] args) throws InterruptedException {
              final int arrayLength = (int)2e7;
              final int threadNumber = Runtime.getRuntime().availableProcessors();
              long startTime;
              final int offset = arrayLength/threadNumber;
                    * init array in many threads
                   final Integer[] a = new Integer[arrayLength];
                   List<Thread> threadList = new ArrayList<Thread>();
                   for(int i=0; i<threadNumber; i++){
                        final int iF = i;
                        Thread t = new Thread(new Runnable(){
                             @Override
                             public void run() {
                                  int end = (iF+1)*offset;
                                  if(iF==(threadNumber-1))
                                       end = a.length;
                                  for(int i=iF*offset; i<end; i++){
                                       a[i] = i;
                        threadList.add(t);
                   startTime = System.currentTimeMillis();
                   for(Thread t:threadList)
                        t.start();
                   for(Thread t:threadList)
                        t.join();
                   System.out.println("many threads List=" + (System.currentTimeMillis()-startTime));
    }Test3:
    package org.happy.concurrent.sort.forum;
    import java.util.ArrayList;
    import java.util.List;
    * simulates the array initialization in many threads on many arrays
    * @author Andreas Hollmann
    public class ArrayManyThreadsManyArrays {
         public static void main(String[] args) throws InterruptedException {
              final int arrayLength = (int)2e7;
              final int threadNumber = Runtime.getRuntime().availableProcessors();
              long startTime;
              final int offset = arrayLength/threadNumber;
                    * init many arrays in many threads
                   final ArrayList<Integer[]> list = new ArrayList<Integer[]>();
                   for(int i=0; i<threadNumber; i++){
                        int size = offset;
                        if(i<(threadNumber-1))
                             size = offset + arrayLength%threadNumber;
                        list.add(new Integer[size]);
                   List<Thread> threadList = new ArrayList<Thread>();
                   for(int i=0; i<threadNumber; i++){
                        final int index = i;
                        Thread t = new Thread(new Runnable(){
                             @Override
                             public void run() {
                                  Integer[] a = list.get(index);
                                  int value = index*offset;
                                  for(int i=0; i<a.length; i++){
                                       value++;
                                       a[i] = value;
                        threadList.add(t);
                   startTime = System.currentTimeMillis();
                   for(Thread t:threadList)
                        t.start();
                   for(Thread t:threadList)
                        t.join();
                   System.out.println("many threads - many List=" + (System.currentTimeMillis()-startTime));
    }

  • Identifying CPU Bottlenecks with vmstat kthr / r

    Hi,
    Can someone help me with the interpretation of vmstat with CPU and multiples cores, threads ?
    In short, the server is experiencing a CPU bottleneck when “r” is greater than the number of CPU’s on the server.
    But what should I use in these cases for example :
    2 CPU's dual cores should mean 4 CPU's ?
    2 CPU's dual cores dual threads should mean 4 CPU's or 8 CPU's ?
    # vmstat 5
         kthr   memory          page             disk      faults        cpu
         r b w swap  free re mf pi p fr de sr s0 s1 s2 s3  in  sy  cs us sy id
         12 0 0 11456 4120 1  41 19 1  3  0  2  0  4  0  0  48 112 130  4 14 82
         14 0 1 10132 4280 0   4 44 0  0  0  0  0 23  0  0 211 230 144  3 35 62
         15 0 1 10132 4616 0   0 20 0  0  0  0  0 19  0  0 150 172 146  3 33 64
         17 0 1 10132 5292 0   0  9 0  0  0  0  0 21  0  0 165 105 130  1 21 78Thanks !

    Look at LAT from prstat (generally I use prstat -amL). This will give you an idea of the delay due to wait for CPU.
    Looking at your vmstat:
    # vmstat 5
         kthr   memory          page             disk      faults        cpu
         r b w swap  free re mf pi p fr de sr s0 s1 s2 s3  in  sy  cs us sy id
         12 0 0 11456 4120 1  41 19 1  3  0  2  0  4  0  0  48 112 130  4 14 82
         14 0 1 10132 4280 0   4 44 0  0  0  0  0 23  0  0 211 230 144  3 35 62
         15 0 1 10132 4616 0   0 20 0  0  0  0  0 19  0  0 150 172 146  3 33 64
         17 0 1 10132 5292 0   0  9 0  0  0  0  0 21  0  0 165 105 130  1 21 78Note that there is a run queue but also much cpu idle - 62% to 82%
    Use mpstat to see the load on each cpu.
    If you feel that there is a performance problem with the application(s) running on this server I don't think this vmstat supports the need for more CPUs.
    What is the hardware and Solaris level involved.
    have a good day,
    Glen
    (PS I've been wrong before so don't rule out a need for more CPUs - but collect more data before reaching a conclusion.)

  • Is there a recommended limit on the number of custom sections and the cells per table so that there are no performance issues with the UI?

    Is there a recommended limit on the number of custom sections and the cells per table so that there are no performance issues with the UI?

    Thanks Kelly,
    The answers would be the following:
    1200 cells per custom section (NEW COUNT), and up to 30 custom sections per spec.
    Assuming all will be populated, and this would apply to all final material specs in the system which could be ~25% of all material specs.
    The cells will be numeric, free text, drop downs, and some calculated numeric.
    Are we reaching the limits for UI performance?
    Thanks

Maybe you are looking for

  • Live Migration with Different CPU versions on the hosts, win 2012R2 Datacenter

    Hello This question have been asked in different forums but when I read the the thread's I feel that I get mixed answers. And most answers are dating from 2012 (Win 2008R2), I don't know if they are still correct in win 2012R2. So now I ask the quest

  • Help With iTunes Recovery From Old Hard Drive

    Hi, recently I had A PC issue where I needed to replace my hard drive. I was originally running XP on a 500 GB drive, I didn't want to loose all my flies on that drive so I reinstalled XP on A 80 GB drive. I now running XP off the 80 GB drive (C:) ha

  • How to hide image displays for lightbox

    Using DW, I have a lightbox display.  Here is the page: Example page As you'll see, if you click on the photo, it opens up into a lightbox.  My colleague wants to have 10 photos open up in the lightbox but they do not want all of those photos to have

  • J_security_check skipping?

    Hi all, I am using resin form based authentication (j_security_check). I want to lock the user account after N invalid login attempts. Now, before validating the user, i want to check no of attempts for a particular username, so i want to skip the su

  • Invalid number error when using external table

    Hello. I have a problem with creating an external table with number field. My txt file looks like: 11111|text text|03718 22222|text text text|04208 33333|text|04215 I try to create external table like: create table table_ex ( id varchar2(5), name var