OWB Performance Bottleneck

Is there any session log that is produced by the OWB mapping execution other than seeing the results in OWB Runtime Audit Browser.
Suppose that the mapping is doing some hash join which is consuming too much amount of time and I would like to see which are the tables that are being joined at that instant. This would help me in identifying the exact area of the problem in a mapping. Does OWB provide a session log which can help me get that information, or from any other place where I can get some information regarding the operation which is causing a performance bottleneck
regards
-AP

Thanks for all your suggestions. The mapping was using a join between some 4 - 5 tables and I think this was the place the mapping was getting stuck during execution in Set Based Mode. Moreover the mapping loads some 70 million records into the target table. Perhaps, loading such huge volume of data and that too in a set based mode and also with a massive join in the beginning, mapping should have got stuck somwhere.
The solution that came up was to create a table with the join condition and use the table as input to the mapping. This helps us to get rid of the joiner in the very beginning and also the mapping be run in Row Based Target Only mode. The data (70 million) got loaded in some 4 hours.
regards
-AP

Similar Messages

Some Thoughts On An OWB Performance/Testing Framework

Hi all,
I've been giving some thought recently to how we could build a performance tuning and testing framework around Oracle Warehouse Builder. Specifically, I'm looking at was in which we can use some of the performance tuning techniques described in Cary Millsap/Jeff Holt's book "Optimizing Oracle Performance" to profile and performance tune mappings and process flows, and to use some of the ideas put forward in Kent Graziano's Agile Methods in Data Warehousing paper http://www.rmoug.org/td2005pres/graziano.zip and Steven Feuernstein's utPLSQL project http://utplsql.sourceforge.net/ to provide an agile/test-driven way of developing mappings, process flows and modules. The aim of this is to ensure that the mappings we put together are as efficient as possible, work individually and together as expected, and are quick to develop and test.
At the moment, most people's experience of performance tuning OWB mappings is firstly to see if it runs set-based rather than row-based, then perhaps to extract the main SQL statement and run an explain plan on it, then check to make sure indexes etc are being used ok. This involves a lot of manual work, doesn't factor in the data available from the wait interface, doesn't store the execution plans anywhere, and doesn't really scale out to encompass entire batches of mapping (process flows).
For some background reading on Cary Millsap/Jeff Holt's approach to profiling and performance tuning, take a look at http://www.rittman.net/archives/000961.html and http://www.rittman.net/work_stuff/extended_sql_trace_and_tkprof.htm. Basically, this approach traces the SQL that is generated by a batch file (read: mapping) and generates a file that can be later used to replay the SQL commands used, the explain plans that relate to the SQL, details on what wait events occurred during execution, and provides at the end a profile listing that tells you where the majority of your time went during the batch. It's currently the "preferred" way of tuning applications as it focuses all the tuning effort on precisely the issues that are slowing your mappings down, rather than database-wide issues that might not be relevant to your mapping.
For some background information on agile methods, take a look at Kent Graziano's paper, this one on test-driven development http://c2.com/cgi/wiki?TestDrivenDevelopment , this one http://martinfowler.com/articles/evodb.html on agile database development, and the sourceforge project for utPLSQL http://utplsql.sourceforge.net/. What this is all about is having a development methodology that builds in quality but is flexible and responsive to changes in customer requirements. The benefit of using utPLSQL (or any unit testing framework) is that you can automatically check your altered mappings to see that they still return logically correct data, meaning that you can make changes to your data model and mappings whilst still being sure that it'll still compile and run.
Observations On The Current State of OWB Performance Tuning & Testing
At present, when you build OWB mappings, there is no way (within the OWB GUI) to determine how "efficient" the mapping is. Often, when building the mapping against development data, the mapping executes quickly and yet when run against the full dataset, problems then occur. The mapping is built "in isolation" from its effect on the database and there is no handy tool for determining how efficient the SQL is.
OWB doesn't come with any methodology or testing framework, and so apart from checking that the mapping has run, and that the number of rows inserted/updated/deleted looks correct, there is nothing really to tell you whether there are any "logical" errors. Also, there is no OWB methodology for integration testing, unit testing, or any other sort of testing, and we need to put one in place. Note - OWB does come with auditing, error reporting and so on, but there's no framework for guiding the user through a regime of unit testing, integration testing, system testing and so on, which I would imagine more complete developer GUIs come with. Certainly there's no built in ability to use testing frameworks such as utPLSQL, or a part of the application that let's you record whether a mapping has been tested, and changes the test status of mappings when you make changes to ones that they are dependent on.
OWB is effectively a code generator, and this code runs against the Oracle database just like any other SQL or PL/SQL code. There is a whole world of information and techniques out there for tuning SQL and PL/SQL, and one particular methodology that we quite like is the Cary Millsap/Jeff Holt "Extended SQL Trace" approach that uses Oracle diagnostic events to find out exactly what went on during the running of a batch of SQL commands. We've been pretty successful using this approach to tune customer applications and batch jobs, and we'd like to use this, together with the "Method R" performance profiling methodology detailed in the book "Optimising Oracle Performance", as a way of tuning our generated mapping code.
Whilst we want to build performance and quality into our code, we also don't want to overburden developers with an unwieldy development approach, because what we'll know will happen is that after a short amount of time, it won't get used. Given that we want this framework to be used for all mappings, it's got to be easy to use, cause minimal overhead, and have results that are easy to interpret. If at all possible, we'd like to use some of the ideas from agile methodologies such as eXtreme Programming, SCRUM and so on to build in quality but minimise paperwork.
We also recognise that there are quite a few settings that can be changed at a session and instance level, that can have an effect on the performance of a mapping. Some of these include initialisation parameters that can change the amount of memory assigned to the instance and the amount of memory subsequently assigned to caches, sort areas and the like, preferences that can be set so that indexes are preferred over table scans, and other such "tweaks" to the Oracle instance we're working with. For reference, the version of Oracle we're going to use to both run our code and store our data is Oracle 10g 10.1.0.3 Enterprise Edition, running on Sun Solaris 64-bit.
Some initial thoughts on how this could be accomplished
- Put in place some method for automatically / easily generating explain plans for OWB mappings (issue - this is only relevant for mappings that are set based, and what about pre- and post- mapping triggers)
- Put in place a method for starting and stopping an event 10046 extended SQL trace for a mapping
- Put in place a way of detecting whether the explain plan / cost / timing for a mapping changes significantly
- Put in place a way of tracing a collection of mappings, i.e. a process flow
- The way of enabling tracing should either be built in by default, or easily added by the OWB developer. Ideally it should be simple to switch it on or off (perhaps levels of event 10046 tracing?)
- Perhaps store trace results in a repository? reporting? exception reporting?
at an instance level, come up with some stock recommendations for instance settings
- identify the set of instance and session settings that are relevant for ETL jobs, and determine what effect changing them has on the ETL job
- put in place a regime that records key instance indicators (STATSPACK / ASH) and allows reports to be run / exceptions to be reported
- Incorporate any existing "performance best practices" for OWB development
- define a lightweight regime for unit testing (as per agile methodologies) and a way of automating it (utPLSQL?) and of recording the results so we can check the status of dependent mappings easily
other ideas around testing?
Suggested Approach
- For mapping tracing and generation of explain plans, a pre- and post-mapping trigger that turns extended SQL trace on and off, places the trace file in a predetermined spot, formats the trace file and dumps the output to repository tables.
- For process flows, something that does the same at the start and end of the process. Issue - how might this conflict with mapping level tracing controls?
- Within the mapping/process flow tracing repository, store the values of historic executions, have an exception report that tells you when a mapping execution time varies by a certain amount
- get the standard set of preferred initialisation parameters for a DW, use these as the start point for the stock recommendations. Identify which ones have an effect on an ETL job.
- identify the standard steps Oracle recommends for getting the best performance out of OWB (workstation RAM etc) - see OWB Performance Tips http://www.rittman.net/archives/001031.html and Optimizing Oracle Warehouse Builder Performance http://www.oracle.com/technology/products/warehouse/pdf/OWBPerformanceWP.pdf
- Investigate what additional tuning options and advisers are available with 10g
- Investigate the effect of system statistics & come up with recommendations.
Further reading / resources:
- Diagnosing Performance Problems Using Extended Trace" Cary Millsap
http://otn.oracle.com/oramag/oracle/04-jan/o14tech_perf.html
- "Performance Tuning With STATSPACK" Connie Dialeris and Graham Wood
http://www.oracle.com/oramag/oracle/00-sep/index.html?o50tun.html
- "Performance Tuning with Statspack, Part II" Connie Dialeris and Graham Wood
http://otn.oracle.com/deploy/performance/pdf/statspack_tuning_otn_new.pdf
- "Analyzing a Statspack Report: A Guide to the Detail Pages" Connie Dialeris and Graham Wood
http://www.oracle.com/oramag/oracle/00-nov/index.html?o60tun_ol.html
- "Why Isn't Oracle Using My Index?!" Jonathan Lewis
http://www.dbazine.com/jlewis12.shtml
- "Performance Tuning Enhancements in Oracle Database 10g" Oracle-Base.com
http://www.oracle-base.com/articles/10g/PerformanceTuningEnhancements10g.php
- Introduction to Method R and Hotsos Profiler (Cary Millsap, free reg. required)
http://www.hotsos.com/downloads/registered/00000029.pdf
- Exploring the Oracle Database 10g Wait Interface (Robin Schumacher)
http://otn.oracle.com/pub/articles/schumacher_10gwait.html
- Article referencing an OWB forum posting
http://www.rittman.net/archives/001031.html
- How do I inspect error logs in Warehouse Builder? - OWB Exchange tip
http://www.oracle.com/technology/products/warehouse/pdf/Cases/case10.pdf
- What is the fastest way to load data from files? - OWB exchange tip
http://www.oracle.com/technology/products/warehouse/pdf/Cases/case1.pdf
- Optimizing Oracle Warehouse Builder Performance - Oracle White Paper
http://www.oracle.com/technology/products/warehouse/pdf/OWBPerformanceWP.pdf
- OWB Advanced ETL topics - including sections on operating modes, partition exchange loading
http://www.oracle.com/technology/products/warehouse/selfserv_edu/advanced_ETL.html
- Niall Litchfield's Simple Profiler (a creative commons-licensed trace file profiler, based on Oracle Trace Analyzer, that displays the response time profile through HTMLDB. Perhaps could be used as the basis for the repository/reporting part of the project)
http://www.niall.litchfield.dial.pipex.com/SimpleProfiler/SimpleProfiler.html
- Welcome to the utPLSQL Project - a PL/SQL unit testing framework by Steven Feuernstein. Could be useful for automating the process of unit testing mappings.
http://utplsql.sourceforge.net/
Relevant postings from the OTN OWB Forum
- Bulk Insert - Configuration Settings in OWB
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=291269&tstart=30&trange=15
- Default Performance Parameters
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=213265&message=588419&q=706572666f726d616e6365#588419
- Performance Improvements
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=270350&message=820365&q=706572666f726d616e6365#820365
- Map Operator performance
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=238184&message=681817&q=706572666f726d616e6365#681817
- Performance of mapping with FILTER
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=273221&message=830732&q=706572666f726d616e6365#830732
- Poor mapping performance
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=275059&message=838812&q=706572666f726d616e6365#838812
- Optimizing Mapping Performance With OWB
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=269552&message=815295&q=706572666f726d616e6365#815295
- Performance of mapping with FILTER
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=273221&message=830732&q=706572666f726d616e6365#830732
- Performance of the OWB-Repository
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=66271&message=66271&q=706572666f726d616e6365#66271
- One large JOIN or many small ones?
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=202784&message=553503&q=706572666f726d616e6365#553503
- NATIVE PL SQL with OWB9i
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=270273&message=818390&q=706572666f726d616e6365#818390
Next Steps
Although this is something that I'll be progressing with anyway, I'd appreciate any comment from existing OWB users as to how they currently perform performance tuning and testing. Whilst these are perhaps two distinct subject areas, they can be thought of as the core of an "OWB Best Practices" framework and I'd be prepared to write the results up as a freely downloadable whitepaper. With this in mind, does anyone have an existing best practices for tuning or testing, have they tried using SQL trace and TKPROF to profile mappings and process flows, or have you used a unit testing framework such as utPLSQL to automatically test the set of mappings that make up your project?
Any feedback, add it to this forum posting or send directly through to me at [email protected]. I'll report back on a proposed approach in due course.

Hi Mark,
interesting post, but I think you may be focusing on the trees, and losing sight of the forest.
Coincidentally, I've been giving quite a lot of thought lately to some aspects of your post. They relate to some new stuff I'm doing. Maybe I'll be able to answer in more detail later, but I do have a few preliminary thoughts.
1. 'How efficient is the generated code' is a perennial topic. There are still some people who believe that a code generator like OWB cannot be in the same league as hand-crafted SQL. I answered that question quite definitely: "We carefully timed execution of full-size runs of both the original code and the OWB versions. Take it from me, the code that OWB generates is every bit as fast as the very best hand-crafted and fully tuned code that an expert programmer can produce."
The link is http://www.donnapkelly.pwp.blueyonder.co.uk/generated_code.htm
That said, it still behooves the developer to have a solid understanding of what the generated code will actually do, such as how it will take advantage of indexes, and so on. If not, the developer can create such monstrosities as lookups into an un-indexed field (I've seen that).
2. The real issue is not how fast any particular generated mapping runs, but whether or not the system as a whole is fit for purpose. Most often, that means: does it fit within its batch update window? My technique is to dump the process flow into Microsoft Project, and then to add the timings for each process. That creates a Critical Path, and then I can visually inspect it for any bottleneck processes. I usually find that there are not more than one or two dogs. I'll concentrate on those, fix them, and re-do the flow timings. I would add this: the dogs I have seen, I have invariably replaced. They were just garbage, They did not need tuning at all - just scrapping.
Gee, but this whole thing is minimum effort and real fast! I generally figure that it takes maybe a day or two (max) to soup up system performance to the point where it whizzes.
Fact is, I don't really care whether there are a lot of sub-optimal processes. All I really care about is performance of the system as a whole. This technique seems to work for me. 'Course, it depends on architecting the thing properly in the first place. Otherwise, no amount of tuning of going to help worth a darn.
Conversely (re. my note about replacing dogs) I do not think I have ever tuned a piece of OWB-generated code. Never found a need to. Not once. Not ever.
That's not to say I do not recognise the value of playing with deployment configuration parameters. Obviously, I set auditing=none, and operating mode=set based, and sometimes, I play with a couple of different target environments to fool around with partitioning, for example. Nonetheless, if it is not a switch or a knob inside OWB, I do not touch it. This is in line with my dictat that you shall use no other tool than OWB to develop data warehouses. (And that includes all documentation!). (OK, I'll accept MS Project)
Finally, you raise the concept of a 'testing framework'. This is a major part of what I am working on at the moment. This is a tough one. Clearly, the developer must unit test each mapping in a design-model-deploy-execute cycle, paying attention to both functionality and performance. When the developer is satisifed, that mapping will be marked as 'done' in the project workbook. Mappings will form part of a stream, executed as a process flow. Each process flow will usually terminate in a dimension, a fact, or an aggregate. Each process flow will be tested as an integrated whole. There will be test strategies devised, and test cases constructed. There will finally be system tests, to verify the validity of the system as a production-grade whole. (stuff like recovery/restart, late-arriving data, and so on)
For me, I use EDM (TM). That's the methodology I created (and trademarked) twenty years ago: Evolutionary Development Methodology (TM). This is a spiral methodology based around prototyping cycles within Stage cycles within Release cycles. For OWB, a Stage would consist (say) of a Dimensional update. What I am trying to now is to graft this within a traditional waterfall methodology, and I am having the same difficulties I had when I tried to do it then.
All suggestions on how to do that grafting gratefully received!
To sum up, I 'm kinda at a loss as to why you want to go deep into OWB-generated code performance stuff. Jeepers, architect the thing right, and the code runs fast enough for anyone. I've worked on ultra-large OWB systems, including validating the largest data warehouse in the UK. I've never found any value in 'tuning' the code. What I'd like you to comment on is this: what will it buy you?
Cheers,
Donna
http://www.donnapkelly.pwp.blueyonder.co.uk

OWB Performance Tuning

Hi Every body,
I searched for OWB performance tuning guidelines for OWB11gR2.
1) The posted link Please check: http://www.oracle.com/technology/products/warehouse/pdf/OWBPerformanceWP.pdf
is not pulling the desired white paper. It points to Oracle OWB resource page. I did not find any links related to performance tuning. Any idea?
2) I reviewed https://blogs.oracle.com/warehousebuilder/entry/performance_tuning_mappings
Performance tuning mappings By David Allan
The links in the blog (a) There are reports in the utility exchange (see here)
(b) There is a viewlet describing some of this here.
Not working. Could you post the working links?
Regards
Ram Iyer

Hi Ram
The blog links should be fixed now, let me know if not. The blog has been rehosted a zillion times and each time stuff is broken in the migration - sound familiar?
Cheers
David

Performance bottleneck with subreports

I have an SSRS performance bottleneck on my production server that we have diagnosed as being related to the use of subreports.
Background facts:
* Our Production and Development servers are identically configured
* We've tried the basic restart/reboot activities, didn't change anything about the performance.
* The Development server was "cloned" from the Production server about a month ago, so all application settings (memory usage, logging, etc.) are identical between the two
* For the bottlenecked report the underlying stored procedure executes in 3 seconds, returning 901 rows, in both environments with the same parameters. The execution plan is identical between the two servers, and the underlying tables and indexing
is identical. Stats run regularly on both servers.
* In the development environment the report runs in 12 seconds. But on Production the report takes well over a minute to return, ranging from 1:10 up to 1:40.
* If I point the Development SSRS report to the PROD datasource I get a return time of 14 seconds (the additional two seconds due to the transfer of data over the network).
* If I point the Production SSRS report to the DEV datasource I get a return time of well over a minute.
* I have tried deleting the Production report definition and uploading it as new to see if there was a corruption issue, this didn't change the runtimes.
* Out of the hundreds of Production SSRS reports that we have, the only two that exhibit dramatically different performance between Dev and Prod are the ones that contain subreports.
* Queries against the ReportServerTempDB also confirm that these two reports are the major contributors to TempDB utilization.
* We have verified that the ReportServerTempDB is being backed up and shrunk on a regular basis.
These factors tell me that the issue is not with the database or the SQL. The tests on the Development server also prove that the reports and subreports are not an issue in themselves - it is possible to get acceptable performance from them in the
Development environment, or when they are pointed from the Dev reportserver over to the Prod database.
Based on these details, what should we check on our Prod server to resolve the performance issue with subreports on this particular server?

Hi GottaLoveSQL,
According to your description, you want to improve the performance of report with subreports. Right?
In Reporting Services, the usage of subreport will impact the report performance, because the report server processes each instance of a subreport as a separate report. So the best way is avoid using subreport by using LookUp , MultiLookUp , LookUpSet, which
will bridge different data sources. In this scenario, we suggest you cache the report with subreport. We can create a cache refresh plan for the report in Report Manager. Please refer to the link below:
http://technet.microsoft.com/en-us/library/ms155927.aspx
Reference:
Report Performance Optimization Tips (Subreports, Drilldown)
Performance, Snapshots, Caching (Reporting Services)
Performance Issue in SSRS 2008
If you have any question, please feel free to ask.
Best Regards,
Simon Hou

Will RAC's performance bottleneck be the shared disk storage ?

Hi All
I'm studying RAC and I'm concerned about RAC's I/O performance bottleneck.
If I have 10 nodes and they use the same storage disk to hold database, then
they will do I/Os to the disk simultaneously.
Maybe we got more latency ...
Will that be a performance problem?
How does RAC solve this kind of problem?
Thanks.

J.Laurence wrote:
I see FC can solve the problem with bandwidth(throughput),There are a couple of layers in the I/O subsystem for RAC.
There is CacheFusion as already mentioned. Why read a data block from disk when another node has it in is buffer cache and can provide that instead (over the Interconnect communication layer).
Then there is the actual pipes between the server nodes and the storage system. Fibre is slow and not what the latest RAC architecture (such as Exadata) uses.
Traditionally, you pop a HBA card into the server that provides you with 2 fibre channel pipes to the storage switch. These usually run at 2Gb/s and the I/O driver can load balance and fail over. So it in theory can scale to 4Gb/s and provide redundancy should one one fail.
Exadata and more "+modern+" RAC systems use HCA cards running Infiniband (IB). This provides scalability of up to 40Gb/s. Also dual port, which means that you have 2 cables running into the storage switch.
IB supports a protocol called RDMA (Remote Direct Memory Access). This essentially allow memory to be "+shared+" across the IB fabric layer - and is used to read data blocks from the storage array's buffer cache into the local Oracle RAC instance's buffer cache.
Port to port latency for a properly configured IB layer running QDR (4 speed) can be lower than 70ns.
And this does not stop there. You can of course add a huge memory cache in the storage array (which is essentially a server with a bunch of disks). Current x86-64 motherboard technology supports up to 512GB RAM.
Exadata takes it even further as special ASM software on the storage node reconstructs data blocks on the fly to supply the RAC instance with only relevant data. This reduces the data volume to push from the storage node to the database node.
So fibre channels in this sense is a bit dated. As is GigE.
But what about the hard drive's reading & writing I/O? Not a problem as the storage array deals with that. A RAC instance that writes a data block, writes it into storage buffer cache.. where the storage array s/w manages that cache and will do the physical write to disk.
Of course, it will stripe heavily and will have 24+ disk controllers available to write that data block.. so do not think of I/O latency ito of the actual speed of a single disk.

Major performance bottleneck in JSF RI 1.0

We've been doing some load testing this week, and have come up with what I believe is a major performance bottleneck in the reference implementation.
Our test suite was conducted two different application servers (JBoss and Oracle) and we found that in both cases response time degraded dramatically when hitting about 25-30 concurrent users.
On analyzing a thread dump when the application server was in this state we noticed that close to twenty threads were waiting on the same locked resource.
The resource is the 'descriptors' static field in the javax.faces.component.UIComponentBase class. It is a WeakHashMap. The contention occurs in the getPropertyDescriptors method, which has a large synchronized block.

Well not the answer I was hoping for. But at least that's clear.
Jayashri, I'm using JSF RI for an application that will be delivered to testing in august. Can you give advice wether I can expect an update for this bottleneck problem within that timeframe?
Sincerely,
Joost de Vries
ps hi netbug. Saw you at theserverside! :-)

J2EE application performance bottlenecks

For anyone interested in learning how to get resolve J2EE application performance bottlenecks, found a great resource:
http://www.cyanea.com/email/throttle_form2.html
registering with them can have you win 1 of 3 iPod mini's

I agree with yawmark's response #1 in one of your evil spams http://forum.java.sun.com/thread.jsp?thread=514026&forum=54&message=2446641

Array as Shared Memory - performance bottleneck

Array as shared memory - performance bottleneck
Hello,
currently I work on a multi-threaded application, where many threads work on shared memory.
I wondering why the application doesn't become faster by using many threads (I have i7 machine).
Here is an example for initialization in single thread:
          final int arrayLength = (int)1e7;
          final int threadNumber = Runtime.getRuntime().availableProcessors();
          long startTime;
           * init array in single thread
          Integer[] a1 = new Integer[arrayLength];
          startTime = System.currentTimeMillis();
          for(int i=0; i<arrayLength; i++){
               a1[i] = i;
          System.out.println("single thread=" + (System.currentTimeMillis()-startTime));and here initialization with many threads:
           * init array in many threads
          final Integer[] a3 = new Integer[arrayLength];
          List<Thread> threadList = new ArrayList<Thread>();
          for(int i=0; i<threadNumber; i++){
               final int iF = i;
               Thread t = new Thread(new Runnable(){
                    @Override
                    public void run() {
                         int end = (iF+1)*offset;
                         if(iF==(threadNumber-1))
                              end = a3.length;
                         for(int i=iF*offset; i<end; i++){
                              a3[i] = i;
               threadList.add(t);
          startTime = System.currentTimeMillis();
          for(Thread t:threadList)
               t.start();
          for(Thread t:threadList)
               t.join();After execution it looks like this:
single thread=2372
many threads List=3760I have i7 4GB RAM
System + Parameters:
JVM-64bit JDK1.6.0_14
-Xmx3g
Why the executing of one thread is faster as executing of many threads?
As you can see I didn't use any synchronization.
Maybe I have to configure JVM in some way to gain wished performance (I expected the performance gain on i7 x8 times) ?

Hello,
I'm from [happy-guys|http://www.happy-guys.com] , and we developed new sorting-algorithm to sort an array on the multi-core machine.
But after the algorithm was implemented it was a little-bit slower as standard sorting-algorithm from JDK (Array.sort(...)). After searching for the reason, I created performance tests which shows that the arrays in Java don't allow to access them by many threads at the same time.
The bad news is: different threads slowdown each-other even if they use different array-objects.
I believe all array-objects are natively managed by a global manager in JVM, thus this manager builds a global-lock for all threads.
Only one thread can access any array at the same time!
I used:
Software:
1)Windows Vista 64bit,
2) java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
Hardware:
Intel(R) Core(TM) i7 CPU 920 @ 2,67GHz 2,79 GHz, 6G RAM
Test1: initialization of array in a single thread
Test2: the array initialization in many threads on the single array
Test3: array initialization in many threads on many arrays
Results in ms:
Test1 = 5588
Test2 = 4976
Test3 = 5429
Test1:
package org.happy.concurrent.sort.forum;
* simulates the initialization of array in a single thread
* @author Andreas Hollmann
public class ArraySingleThread {
     public static void main(String[] args) throws InterruptedException {
          final int arrayLength = (int)2e7;
          long startTime;
                * init array in single thread
               Integer[] a1 = new Integer[arrayLength];
               startTime = System.currentTimeMillis();
               for(int i=0; i<arrayLength; i++){
                    a1[i] = i;
               System.out.println("single thread=" + (System.currentTimeMillis()-startTime));
}Test2:
package org.happy.concurrent.sort.forum;
import java.util.ArrayList;
import java.util.List;
* simulates the array initialization in many threads on the single array
* @author Andreas Hollmann
public class ArrayManyThreads {
     public static void main(String[] args) throws InterruptedException {
          final int arrayLength = (int)2e7;
          final int threadNumber = Runtime.getRuntime().availableProcessors();
          long startTime;
          final int offset = arrayLength/threadNumber;
                * init array in many threads
               final Integer[] a = new Integer[arrayLength];
               List<Thread> threadList = new ArrayList<Thread>();
               for(int i=0; i<threadNumber; i++){
                    final int iF = i;
                    Thread t = new Thread(new Runnable(){
                         @Override
                         public void run() {
                              int end = (iF+1)*offset;
                              if(iF==(threadNumber-1))
                                   end = a.length;
                              for(int i=iF*offset; i<end; i++){
                                   a[i] = i;
                    threadList.add(t);
               startTime = System.currentTimeMillis();
               for(Thread t:threadList)
                    t.start();
               for(Thread t:threadList)
                    t.join();
               System.out.println("many threads List=" + (System.currentTimeMillis()-startTime));
}Test3:
package org.happy.concurrent.sort.forum;
import java.util.ArrayList;
import java.util.List;
* simulates the array initialization in many threads on many arrays
* @author Andreas Hollmann
public class ArrayManyThreadsManyArrays {
     public static void main(String[] args) throws InterruptedException {
          final int arrayLength = (int)2e7;
          final int threadNumber = Runtime.getRuntime().availableProcessors();
          long startTime;
          final int offset = arrayLength/threadNumber;
                * init many arrays in many threads
               final ArrayList<Integer[]> list = new ArrayList<Integer[]>();
               for(int i=0; i<threadNumber; i++){
                    int size = offset;
                    if(i<(threadNumber-1))
                         size = offset + arrayLength%threadNumber;
                    list.add(new Integer[size]);
               List<Thread> threadList = new ArrayList<Thread>();
               for(int i=0; i<threadNumber; i++){
                    final int index = i;
                    Thread t = new Thread(new Runnable(){
                         @Override
                         public void run() {
                              Integer[] a = list.get(index);
                              int value = index*offset;
                              for(int i=0; i<a.length; i++){
                                   value++;
                                   a[i] = value;
                    threadList.add(t);
               startTime = System.currentTimeMillis();
               for(Thread t:threadList)
                    t.start();
               for(Thread t:threadList)
                    t.join();
               System.out.println("many threads - many List=" + (System.currentTimeMillis()-startTime));
}

Performance bottleneck of hard drive: assets vs. cache vs. render-to drive?

so i'm beefing up my old mac pro tower (5,1) and was wondering which combination of use of hard drives is fastest, if anyone has any firsthand or theoretical suggestions...
if someone has all three of these hard drives:
A) PCIe SSD (OWC Mercury Accelsior_E2 PCI Express SSD)
B) internal drive bay SSD
C) external SSD connected via 600MB/s eSATA port of the above linked card
… which is best to use in combination for the following in After Effects CC/CC2014?
1) storage of assets files used in the AE project (ie. 1080/4k/RAW/etc footage, PSD files)
2) AE disk cache
3) the drive that AE is rendering to
… for example is 1A + 2C + 3B the fastest use for rendering? and is 1AC + 2B the fastest for while working in AE?
between assets, disk cache, and render location, which are more of a performance bottleneck?
and does the optimal combination vary if someone had 16 GB vs 64GB vs 128GB of RAM?
thanks in advance for any insight!

Well, the long and short answer is: It won't matter. All your system buses only have so much overall transfer bandwith and ultimately they all end up being in some way piped through your PCI bus, which in addition is shared by your graphics card, audio devices and what have you as well. There are going to be wait states and data collisions and whether or not you can make your machine fly to Mars is ultimately not relevant. There may be perhaps some tiny advantage in using a native PCI card SSD for Cache, but otherwise the overall combined data transfer rates will be way above and beyond what your system can handle, so it will put in the throttle one way or the other.
Mylenium

OWB Performance Whitepaper on OTN

Some people were asking for OWB performance tips. Please check: http://www.oracle.com/technology/products/warehouse/pdf/OWBPerformanceWP.pdf
Regards:
Igor

Hi there,
thanks for reporting this glitch; will take care of this shortly, Peter

Performance bottleneck with 2.2.1 and 2008 R2 os VM's

Hi,
I have DL370 g6 with Oracle vm server 2.2.1 installed
*72 GB of Memory and 2 dual + quad core processor*
All VM's are installed on local disk ( 6 300 GB in Raid 5 )*
I have 2 nic connected to siwtch for lan traffic
We have 10 VM's with 2008 R2 OS on it.
The overall performance of these VM's is really horrible.
They are very very very slow
To install Databse on it takes 4 hours even though RAM is 6 GB for each VM
To restar the system it takes around 20 minutes .
Has anybody tried thiese many VM's on 1 server.
Is there any tool or any way i can see what is the issue or is there any bottlneck on the server .
2008 R2 is generally a resource hungry OS but still the overall performance is really horrible

hi,
hdparm -T /dev/cciss/c0d0 which is the drive gives
/dev/cciss/c0d0
Timing cached reads: 31088 MB in 1.99 seconds = 15595.54 MB/sec
HDIO_Drive_CMD(null) (wait for flush complete ) failed : Inappropriate ioctl for device
HDIO_Drive_CMD(null) (wait for flush complete ) failed : Inappropriate ioctl for device
hdparm -T /dev/cciss/c0d0p5 whichi is /OVS - it gives
/dev/cciss/c0d0p5
Timing cached reads: 30636 MB in 1.99 seconds = 15364.10 MB/sec
HDIO_Drive_CMD(null) (wait for flush complete ) failed : Inappropriate ioctl for device
HDIO_Drive_CMD(null) (wait for flush complete ) failed : Inappropriate ioctl for device
My all VM guests are windows 2008 R2 64 bit OS which is very new OS of windows.Oracle came up with New PV drivers for it and I suspect can that be a reason for all this resource bottleneck ???????????
For I/O test use :On the VM server
1) iostat
avg-cpu: %user %nice %system %iowait %steal %idle
0.11 0.00 0.03 0.24 0.00 99.61
Device tps BLK_read/s Blk_wrtn/s
ciss/c0d0 - disk 41.90 186.32 26.05
cciss/c0d0p1-/boot 0.00 0.00 0.00
cciss/c0d0p2 - 0.00 0.00 0.00
cciss/c0d0p3- / 1.90 1.83 43.06
cciss/c0d0p4 0.00 0.00 0.00
cciss/c0d0p5 - /OVS 40.00 184.49 295.82
2) vmstat
swpd free buff cache
92 166632 94940 53444
3) sar
To display block I/O activity:
sar -b 3 100
Average : tps - 41.18 rtps - 7.17 wtps - 34.01 bread/s 188.99 bwrtn/s - 588.64
To display block I/O activity for each block device:
sar -d 3 100
Does this look ok .Is there any way to improve overall performance.Like enabling or disabling something .We are facing a very bad problem here with all the VM with 2008 R2 OS on it

Structured approach to debugging performance bottlenecks for 3rd Party apps

Hi All,
I am facing a situation which I believe most App Support personnel and DBAs in IT organizations do, but I havent found a structured approach to solve the problem. I am hoping this thread can help filter and pull together the varied chunks of information out there in one place.
Here is the situation. I am avoiding making it too specific, as the idea is to identify a good approach that is repeatable in other scenarios.
We are in the process of implementing a solution using a third party application (SAP's BPC), which is sitting on an Oracle database. The application implementation team has some control on to use the application to design the solution, but no direct access to the underlying queries that the app generates. We are starting to find that as the underlying database usage size is increasing (from a couple of million to tens of mllions of records), the performance of certain operations is becoming very unpredictable. Sometimes an operation would go through, relatively fast while at other times it would get stuck for over an hour and then time-out.
In such situations it is a classic battle between the Oracle DBAs and the App implementation team to try and push the ball in each other's court to try and identify and "fix" the problem.
What in your opinion would be a structured approach between the two teams to help solve the problem? For each step of the approach, please also try and see if you can point to links which further dive into specifics of executing that step.
For example, one approach might be to ...
1. DBA team to find a way to identify specific Querios/DBOperations that are taking too long. (add references here)
2. App team to collaborate with the App manufacturer's support organization to see what design changes or parameters could alter the nature of queries being generated or affect the size of the underlying tables. (too specific for each 3rd party app)
3. After exhausing (2), DBA team to analyze the remaining culprit queries and find ways to obtain better performance without changing the query or the size of the database tables via indexes/DB parameters/etc.. (add references here)
4. After exhausing (3), DBA/Unix admin team to identify which specific hardware bottlenecks are being faced (CPUs/storage/memory) to see if hardware changes can help obtain better performance.
Thoughts?

>
1. DBA team to find a way to identify specific Querios/DBOperations that are taking too long. (add references here)
2. App team to collaborate with the App manufacturer's support organization to see what design changes or parameters could alter the nature of queries being generated or affect the size of the underlying tables. (too specific for each 3rd party app)
3. After exhausing (2), DBA team to analyze the remaining culprit queries and find ways to obtain better performance without changing the query or the size of the database tables via indexes/DB parameters/etc.. (add references here)
4. After exhausing (3), DBA/Unix admin team to identify which specific hardware bottlenecks are being faced (CPUs/storage/memory) to see if hardware changes can help obtain better performance.
>
In general your approach is correct.
However I'd put priorities different way.
1. DBA team to find a way to identify specific Querios/DBOperations that are taking too long. (add references here)
2. DBA team to analyze the culprit queries and find ways to obtain better performance without changing the query or the size of the database tables via indexes/DB parameters/etc.. (add references here)
With collaboration with the App manufacturer's support if required.
Indexes are transparent to application logic. They do not affect results data. Only performance.
Note that indexes should be regular b-tree indexes, not unique or bitmap.
Edited by: user11181920 on Nov 7, 2012 3:20 PM
Changes of queries can be allowed here, with using Oracle query substitution techniques (Plan Stability, Plan Management...).
3. After exhausing (2), DBA/Unix admin team to identify which specific hardware bottlenecks are being faced (CPUs/storage/memory) to see if hardware changes can help obtain better performance.
Not only because today to beef up HW is less expensive way to improve performance comparing to SW optimization, especially redesign of App; but mainly, in case with SAP, the poor performance that can be improved by HW tells that the sizing of the system has been done incorrectly.
SAP has a methodology to size your HW depending on volume of data, number of users and quantity of transactions.
Sizing should be re-done if your data grown beyond the volume that had been used for initial SAP sizing.
4. After exhausting (3), App team to collaborate with the App manufacturer's support organization to see what design changes or parameters could alter the nature of queries being generated or affect the size of the underlying tables. (too specific for each 3rd party app)

OWB Performance

Can someone give me pointers on how can the mappings execution performance can be improved? Settings on O/S level, Database , mappings etc.
Thanks.

http://www.oracle.com/technology/products/warehouse/selfserv_edu/advanced_ETL.html has pointers to the specific places in OWB documentation explaining Mapping Operating Modes, Configuration Settings, Commit settings, Partition Exchange Loading. All these are directly related to the performance of your mappings execution.
Nikolai Rochnik

Does BPM - for a synchronous interface have a performance bottleneck

Hi All,
Just have a small query.
We have a scenario in which we need to receive PO details from a legacy system, create a sales order in ecc and send back a response table to the legacy system.
Our understanding is that this can be acheived using synchronous ABAP Proxies and also involves BPM and Abstract mappings.
I beleive that this should not pose any problems. My concern here is that we are confused as to whether BPM would have bottlenecks with performance. Do we have any SAP document or article which mentions that for synchronous interfaces BPM is the only way to go and this would not have a significant impact on the performance.
Another approach to the problem would be to create an asynchronous inbound proxy, write ABAP code within it and call a seperate outbound asynchronous proxy within the inbound proxy method. This approach looks and sounds very clumsy.
Kindly let me know your thoughts or any links which would be useful.
Thanks & Regards,
Mz

Hi Aashish,
Thanks for your quick reply. it was helpful, but i am not using RFC's. Correct me if i am wrong, but i have explained the scenarios in detail below.
Scenario 1. Synchronous
1) PI Picks file from a common folder.
2) PI does a data mapping and sends the data to ECC.
3) ECC contains an inbound interface which receives the data and in which abap proxy code is written.
4) The abap proxy code executes a function module and sends the response as an internal table back to PI.
5) PI receives the response and places it in a text/csv file and places it back to another folder.
I assume that the above would be possible only using BPM. What i understand is that in order for an interface to receive and send data, abstract mappings are to be used, and for this BPM is required. We do not have any conversions etc. its just a simple matter of receiving an internal table from ECC and creating a file to place in the folder.
I also understand that BPM could have bottlenecks due to queue and cache issues, messages might be pending, or lost etc.
Scenario 2. Asynchronous
1) PI Picks file from a common folder.
2) PI does a data mapping and sends the data to ECC.
3) ECC contains an inbound interface which receives the data and in which abap proxy code is written.
4) ABAP Proxy code executes the same function module and calls a seperate outbound interface and passes the values to it. This would be used in sending the response back.
5) PI receives the response from the second interface and places it in a text/csv file and places it back to another folder.
I would like to know which would be the better approach. Documentation/references to support your claims would be much appreciated.
Cheers,
Mz

OWB performance with repository browser

Hi,
I just want to know... is there some way in repository browser or OWB to detect how much time that had been spent to execute one record in a query?
For example, if i got 20 records executed in a single mapping... i want to know how much time the mapping need and the time performance of each records...
Anyone has suggestion?
Thanks in advance,
Davis

I also am not quite sure how you could do this in such a way to gaurantee accurate measurement.
One possible approach is to append a timestamp column to the source table, and populate the field with systimestamp (for the sub-second granularity) in the mapping. Then, after the load, you could sort all your records by this column, find a chunk together and compare load times.
But even this would be like swatting a mosquito with a bat, and may not even be fully accurate itself under certain loading scenarios (not the mention the fact that technically this would actually make the mapping run slower that actual since you've added a whole new column populated by a function call!)
-J

OWB Performance Bottleneck

Similar Messages

Maybe you are looking for