Oracle RAC Wait events
Sun OS 10
Oracle 10.2.0.5
We we are running 2 node RAC and we frequently seeing the following waits in the top 5 wait event
cr request retry
gcs log flush sync
Couldn't locate these events in the database reference
http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevents.htm
Thanks
Saravanan
gcs log flush sync is similar to log file sync in standalone:
from - http://orainternals.files.wordpress.com/2010/02/riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdf (you might have more luck opening this one)
Gcs log flush sync
- But, if the instances crash right after the block is transferred to
other node, how does RAC maintain consistency?
-Actually, before sending a current mode block LMS process will
request LGWR for a log flush.
- Until LGWR sends a signal back to LMS process, LMS process
will wait on ‘gcs log flush’ event.
- CR block transfer might need log flush if the block was
considered “busy”.
- One of the busy condition is that if the block was constructed by
applying undo records.
cr request retry in some cases means that the message was lost and re-requested... this is tied to interconnect - either udp issues (like truncated udp packets or packets sent out of order), the session was lost on the other node, or the node restarted quickly... could also mean your nic might be flaky or something happening on the switch. If this is a big concern then you'll need to have someone look at the flow on in the interconnects as this is specific to cache fusion.
Similar Messages
-
Hi ,
Could you please tell me the wait events in RAC
Please let me know the causes and the solution to fix this issue.
Thanks
RangarajbkPlease read documentation for the same,
http://docs.oracle.com/cd/E11882_01/rac.112/e16795/monitor.htm#CFAGAAGD
Aman.... -
can any one suggest me a best oracle book beyond documentation...
sure,
we can understand all the concepts through the documentation. have gone through clusterware administration and rac administration twice.
can any one tell me best book for troubleshooting basic oracle rac issues. and tuning the oracle rac. wait events etc.Julian Dyke RAC books are considered to be among the best RAC available books:
Pro Oracle Database 10g Rac on Linux: Installation, Administration And Performance by Julian Dyke, S Shaw
Pro Oracle Database 11g Rac on Linux by Steve Shaw,Julian Dyke,Martin Bach
However these books are not specialized in RAC troubleshooting. For RAC 10g a very good RAC wait event documentation can be found in Oracle wait interface: a practical guide to performance diagnostics & tuning by Richmond Shee,Kirtikumar Deshpande,K. Gopalakrishnan.
Edited by: P. Forstmann on 14 janv. 2011 08:57 -
Oracle RAC 9i LMD library cache lock top wait event
We are experiencing the library cache lock as our top wait event. Even thought the box is currently idle, The Global Enqueue Service Daemon (LMD) is taking up CPU cycles. The background process is also logging to trace "skgxpdocon: warning outstanding accept handle count has reached new high water mark 245000".
Any help would be appreciated.
ThanksThere is a new patch for this - check out p4673610 on metalink. We have also experience the problem in 9.2.0.8.
-
Hi: I'm analyzing this STATSPACK report: it is "volume test" on our UAT server, so most input is from 'bind variables'. Our shared pool is well utilized in oracle. Oracle redo logs is not appropriately configured on this server, as in 'Top 5 wait events' there are 2 for redos.
I need to know what else information can be dig-out from 'foreground wait events' & 'background wait events', and what can assist us to better understanding, in combination of 'Top 5 wait event's, that how the server/test went? it could be overwelming No. of wait events, so appreciate any helpful diagnostic or analysis. Database is oracle 11.2.0.4 upgraded from 11.2.0.3, on IBM AIX power system 64bit, level 6.x
STATSPACK report for
Database DB Id Instance Inst Num Startup Time Release RAC
~~~~~~~~ ----------- ------------ -------- --------------- ----------- ---
700000XXX XXX 1 22-Apr-15 12:12 11.2.0.4.0 NO
Host Name Platform CPUs Cores Sockets Memory (G)
~~~~ ---------------- ---------------------- ----- ----- ------- ------------
dXXXX_XXX AIX-Based Systems (64- 2 1 0 16.0
Snapshot Snap Id Snap Time Sessions Curs/Sess Comment
~~~~~~~~ ---------- ------------------ -------- --------- ------------------
Begin Snap: 5635 22-Apr-15 13:00:02 114 4.6
End Snap: 5636 22-Apr-15 14:00:01 128 8.8
Elapsed: 59.98 (mins) Av Act Sess: 0.6
DB time: 35.98 (mins) DB CPU: 19.43 (mins)
Cache Sizes Begin End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 2,064M Std Block Size: 8K
Shared Pool: 3,072M Log Buffer: 13,632K
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ------------------ ----------------- ----------- -----------
DB time(s): 0.6 0.0 0.00 0.00
DB CPU(s): 0.3 0.0 0.00 0.00
Redo size: 458,720.6 8,755.7
Logical reads: 12,874.2 245.7
Block changes: 1,356.4 25.9
Physical reads: 6.6 0.1
Physical writes: 61.8 1.2
User calls: 2,033.7 38.8
Parses: 286.5 5.5
Hard parses: 0.5 0.0
W/A MB processed: 1.7 0.0
Logons: 1.2 0.0
Executes: 801.1 15.3
Rollbacks: 6.1 0.1
Transactions: 52.4
Instance Efficiency Indicators
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 99.98 Optimal W/A Exec %: 100.00
Library Hit %: 99.77 Soft Parse %: 99.82
Execute to Parse %: 64.24 Latch Hit %: 99.98
Parse CPU to Parse Elapsd %: 53.15 % Non-Parse CPU: 98.03
Shared Pool Statistics Begin End
Memory Usage %: 10.50 12.79
% SQL with executions>1: 69.98 78.37
% Memory for SQL w/exec>1: 70.22 81.96
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
CPU time 847 50.2
enq: TX - row lock contention 4,480 434 97 25.8
log file sync 284,169 185 1 11.0
log file parallel write 299,537 164 1 9.7
log file sequential read 698 16 24 1.0
Host CPU (CPUs: 2 Cores: 1 Sockets: 0)
~~~~~~~~ Load Average
Begin End User System Idle WIO WCPU
1.16 1.84 19.28 14.51 66.21 1.20 82.01
Instance CPU
~~~~~~~~~~~~ % Time (seconds)
Host: Total time (s): 7,193.8
Host: Busy CPU time (s): 2,430.7
% of time Host is Busy: 33.8
Instance: Total CPU time (s): 1,203.1
% of Busy CPU used for Instance: 49.5
Instance: Total Database time (s): 2,426.4
%DB time waiting for CPU (Resource Mgr): 0.0
Memory Statistics Begin End
~~~~~~~~~~~~~~~~~ ------------ ------------
Host Mem (MB): 16,384.0 16,384.0
SGA use (MB): 7,136.0 7,136.0
PGA use (MB): 282.5 361.4
% Host Mem used for SGA+PGA: 45.3 45.8
Foreground Wait Events DB/Inst: XXXXXs Snaps: 5635-5636
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by Total Wait Time desc, Waits desc (idle events last)
Avg %Total
%Tim Total Wait wait Waits Call
Event Waits out Time (s) (ms) /txn Time
enq: TX - row lock contentio 4,480 0 434 97 0.0 25.8
log file sync 284,167 0 185 1 1.5 11.0
Disk file operations I/O 8,741 0 4 0 0.0 .2
direct path write 13,247 0 3 0 0.1 .2
db file sequential read 6,058 0 1 0 0.0 .1
buffer busy waits 1,800 0 1 1 0.0 .1
SQL*Net more data to client 29,161 0 1 0 0.2 .1
direct path read 7,696 0 1 0 0.0 .0
db file scattered read 316 0 1 2 0.0 .0
latch: shared pool 144 0 0 2 0.0 .0
CSS initialization 30 0 0 3 0.0 .0
cursor: pin S 10 0 0 9 0.0 .0
row cache lock 41 0 0 2 0.0 .0
latch: row cache objects 19 0 0 3 0.0 .0
log file switch (private str 8 0 0 7 0.0 .0
library cache: mutex X 28 0 0 2 0.0 .0
latch: cache buffers chains 54 0 0 1 0.0 .0
latch free 290 0 0 0 0.0 .0
control file sequential read 1,568 0 0 0 0.0 .0
log file switch (checkpoint 4 0 0 6 0.0 .0
direct path sync 8 0 0 3 0.0 .0
latch: redo allocation 60 0 0 0 0.0 .0
SQL*Net break/reset to clien 34 0 0 1 0.0 .0
latch: enqueue hash chains 45 0 0 0 0.0 .0
latch: cache buffers lru cha 7 0 0 2 0.0 .0
latch: session allocation 5 0 0 1 0.0 .0
latch: object queue header o 6 0 0 1 0.0 .0
ASM file metadata operation 30 0 0 0 0.0 .0
latch: In memory undo latch 15 0 0 0 0.0 .0
latch: undo global data 8 0 0 0 0.0 .0
SQL*Net message from client 6,362,536 0 278,225 44 33.7
jobq slave wait 7,270 100 3,635 500 0.0
SQL*Net more data from clien 7,976 0 15 2 0.0
SQL*Net message to client 6,362,544 0 8 0 33.7
Background Wait Events DB/Inst: XXXXXs Snaps: 5635-5636
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by Total Wait Time desc, Waits desc (idle events last)
Avg %Total
%Tim Total Wait wait Waits Call
Event Waits out Time (s) (ms) /txn Time
log file parallel write 299,537 0 164 1 1.6 9.7
log file sequential read 698 0 16 24 0.0 1.0
db file parallel write 9,556 0 13 1 0.1 .8
os thread startup 146 0 10 70 0.0 .6
control file parallel write 2,037 0 2 1 0.0 .1
Log archive I/O 35 0 1 30 0.0 .1
LGWR wait for redo copy 2,447 0 0 0 0.0 .0
db file async I/O submit 9,556 0 0 0 0.1 .0
db file sequential read 145 0 0 2 0.0 .0
Disk file operations I/O 349 0 0 0 0.0 .0
db file scattered read 30 0 0 4 0.0 .0
control file sequential read 5,837 0 0 0 0.0 .0
ADR block file read 19 0 0 4 0.0 .0
ADR block file write 5 0 0 15 0.0 .0
direct path write 14 0 0 2 0.0 .0
direct path read 3 0 0 7 0.0 .0
latch: shared pool 3 0 0 6 0.0 .0
log file single write 56 0 0 0 0.0 .0
latch: redo allocation 53 0 0 0 0.0 .0
latch: active service list 1 0 0 3 0.0 .0
latch free 11 0 0 0 0.0 .0
rdbms ipc message 314,523 5 57,189 182 1.7
Space Manager: slave idle wa 4,086 88 18,996 4649 0.0
DIAG idle wait 7,185 100 7,186 1000 0.0
Streams AQ: waiting for time 2 50 4,909 ###### 0.0
Streams AQ: qmn slave idle w 129 0 3,612 28002 0.0
Streams AQ: qmn coordinator 258 50 3,612 14001 0.0
smon timer 43 2 3,605 83839 0.0
pmon timer 1,199 99 3,596 2999 0.0
SQL*Net message from client 17,019 0 31 2 0.1
SQL*Net message to client 12,762 0 0 0 0.1
class slave wait 28 0 0 0 0.0
thank you very much!Hi: just know it now: it is a large amount of 'concurrent transaction' designed in this "Volume Test" - to simulate large incoming transaction volme, so I guess wait in eq:TX - row is expected.
The fact: (1) redo logs at uat server is known to not well-tune for configurations (2) volume test slow 5%, however data amount in its test is kept the same by each time import production data, by the team. So why it slowed 5% this year?
The wait histogram is pasted below, any one interest to take a look? any ideas?
Wait Event Histogram DB/Inst: XXXX/XXXX Snaps: 5635-5636
-> Total Waits - units: K is 1000, M is 1000000, G is 1000000000
-> % of Waits - column heading: <=1s is truly <1024ms, >1s is truly >=1024ms
-> % of Waits - value: .0 indicates value was <.05%, null is truly 0
-> Ordered by Event (idle events last)
Total ----------------- % of Waits ------------------
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
ADR block file read 19 26.3 5.3 10.5 57.9
ADR block file write 5 40.0 60.0
ADR file lock 6 100.0
ARCH wait for archivelog l 14 100.0
ASM file metadata operatio 30 100.0
CSS initialization 30 100.0
Disk file operations I/O 9090 97.2 1.4 .6 .4 .2 .1 .1
LGWR wait for redo copy 2447 98.5 .5 .4 .2 .2 .2 .1
Log archive I/O 35 40.0 8.6 25.7 2.9 22.9
SQL*Net break/reset to cli 34 85.3 8.8 5.9
SQL*Net more data to clien 29K 99.9 .0 .0 .0 .0 .0
buffer busy waits 1800 96.8 .7 .7 .6 .3 .4 .5
control file parallel writ 2037 90.7 5.0 2.1 .8 1.0 .3 .1
control file sequential re 7405 100.0 .0
cursor: pin S 10 10.0 90.0
db file async I/O submit 9556 99.9 .0 .0 .0
db file parallel read 1 100.0
db file parallel write 9556 62.0 32.4 1.7 .8 1.5 1.3 .1
db file scattered read 345 72.8 3.8 2.3 11.6 9.0 .6
db file sequential read 6199 97.2 .2 .3 1.6 .7 .0 .0
direct path read 7699 99.1 .4 .2 .1 .1 .0
direct path sync 8 25.0 37.5 12.5 25.0
direct path write 13K 97.8 .9 .5 .4 .3 .1 .0
enq: TX - row lock content 4480 .4 .7 1.3 3.0 6.8 12.3 75.4 .1
latch free 301 98.3 .3 .7 .7
latch: In memory undo latc 15 93.3 6.7
latch: active service list 1 100.0
latch: cache buffers chain 55 94.5 3.6 1.8
latch: cache buffers lru c 9 88.9 11.1
latch: call allocation 6 100.0
latch: checkpoint queue la 3 100.0
latch: enqueue hash chains 45 97.8 2.2
latch: messages 4 100.0
latch: object queue header 7 85.7 14.3
latch: redo allocation 113 97.3 1.8 .9
latch: row cache objects 19 89.5 5.3 5.3
latch: session allocation 5 80.0 20.0
latch: shared pool 147 90.5 1.4 2.7 1.4 .7 1.4 2.0
latch: undo global data 8 100.0
library cache: mutex X 28 89.3 3.6 3.6 3.6
log file parallel write 299K 95.6 2.6 1.0 .4 .3 .2 .0
log file sequential read 698 29.5 .1 4.6 46.8 18.9
log file single write 56 100.0
log file switch (checkpoin 4 25.0 50.0 25.0
log file switch (private s 8 12.5 37.5 50.0
log file sync 284K 93.3 3.7 1.4 .7 .5 .3 .1
os thread startup 146 100.0
row cache lock 41 85.4 9.8 2.4 2.4
DIAG idle wait 7184 100.0
SQL*Net message from clien 6379K 86.6 5.1 2.9 1.3 .7 .3 2.8 .3
SQL*Net message to client 6375K 100.0 .0 .0 .0 .0 .0 .0
Wait Event Histogram DB/Inst: XXXX/xxxx Snaps: 5635-5636
-> Total Waits - units: K is 1000, M is 1000000, G is 1000000000
-> % of Waits - column heading: <=1s is truly <1024ms, >1s is truly >=1024ms
-> % of Waits - value: .0 indicates value was <.05%, null is truly 0
-> Ordered by Event (idle events last)
Total ----------------- % of Waits ------------------
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
SQL*Net more data from cli 7976 99.7 .1 .1 .0 .1
Space Manager: slave idle 4086 .1 .2 .0 .0 .3 3.2 96.1
Streams AQ: qmn coordinato 258 49.2 .8 50.0
Streams AQ: qmn slave idle 129 100.0
Streams AQ: waiting for ti 2 50.0 50.0
class slave wait 28 92.9 3.6 3.6
jobq slave wait 7270 .0 100.0
pmon timer 1199 100.0
rdbms ipc message 314K 10.3 7.3 39.7 15.4 10.6 5.3 8.2 3.3
smon timer 43 100.0 -
RAC specific Wait events in AWR
DB version: 10.2.0.4
OS : Solaris x86
What are the most frequent RAC specific Wait events that appear in AWR reports ?Hi Pete,
This depends on your environment. You can identify them as follows:
Monitoring Oracle RAC Statistics and Wait Events
http://download.oracle.com/docs/cd/E11882_01/rac.112/e16795/monitor.htm#i1010220
ASH Report for Oracle RAC: Top Cluster Events
The ASH report Top Cluster Events section is part of the Top Events report that is specific to Oracle RAC. The Top Cluster Events report lists events that account for the highest percentage of session activity in the cluster wait class event along with the instance number of the affected instances. You can use this information to identify which events and instances caused a high percentage of cluster wait events.
http://download.oracle.com/docs/cd/E11882_01/rac.112/e16795/monitor.htm
Regards,
Levi Pereira
<font size="1" color="red">Please close your thread when you get the solution to your problem.</font><br>
<font size="1" color="red">Mark the replies answered "helpful" answer and/or "correct" answer that will help others with same problem.</font><br>
<font size="1" color="red">Thanks for doing your part to make this community as valuable as possible for everyone!</font><br> -
Hi,
Our application is running on Oracle RAC. During certain time of the day, the applcation responds very slowly. At these times, it is observed that the row cache waits are very high. We have even tried altering the sys.AUDSES$ sequence and changing it cache size to 10000 from default 20, but this did not help.
Can anyone suggest a solution for this problem? And why this problem occurs?Hi,
it looks like your problem is related to the fact that you do not cache sequences (this is a well know RAC tuning topic).
Oracle introduced sequences (wrong name, definitely) to generate unique numbers, not to actually support a time sequence of events, or to preserve an order or to have ascending sequences of numbers with no gaps.
Ordering a sequence of events is a serialization process that should not be implemented by a sequence.
Now, if you do not cash sequences, in RAC the lock (enqueue) on the sequence (that is required when you ask for the next set of values) is a global resource on which inter instance contention occurs.
Furthermore, in case the application has a high volume of inserts, having nocache sequences leads to inter instance index block contention.
Oracle says that the default cache value of 20 for sequences is inappropriate in most case of RAC implementations and it is frequent to have caches of 1000 values or more. You need to test what is your ideal value.
Now it is up to you to decide between:
- keep things as they are and have a non scalable RAC installation
- find a way to cache sequences without harming the application assumptions.
Hope it helps,
Regards,
Corrado -
rac waits after restarting database.
buffer busy global CR 6,083,927 1,386,593 48.20
global cache cr request 40,081,544 960,533 33.39
buffer busy waits 5,081,718 428,610 14.90
PX Deq: Signal ACK 17,248 42,099 1.46
db file sequential read 3,176,639 20,033 .70
please state where is the problemHi,
GPFS is not free, and needs to be purchased.
ONLY GPFS 2.3 is certified with RAC as for today.
Certification for 3.1 ion under way.
About 9iRAC and HACMP on AIX 5.3, ONLY HACMP 5.1,5.2 and 5.3 are supported, not yet HACMP 5.4. For info, 10gRAC is only supported with HACMP 5.1 and 5.2.
To find out how to setup HACMP for a 9iRAC installation, you can use the following document, available on http://www.oracleracsig.org :
CookBook V3.2 - Oracle 9iRAC on IBM@server pSeries running AIX5L
(Step-by-step installation guide for 9iRAC on AIX5L. Whith HACMP for database on concurrent raw devices And IBM GPFS for database on shared cluster filesystem. Including HACMP and GPFS setup.
The document covers HACMP 5.1 or 5.2, not 5.3 release, but it should be nearly the same with some screen changes.
Point which is different with the cookbook for now, it's the way to create the lv (logical volume), you should use the "-T O" options for it.
Hope it will help.
Regards.
Fred. -
Current wait events in oracle database
Hi guys need your help
I got a dataabse ruuning very slow and I need to find out the current wait events in the oracle database and can I find out what reason each session is waiting for.Use @wait.sql script to find out the wait events :
select sid, event, seconds_in_wait secs_wait, state,p1,p2,p3,wait_time,p1text,p2text,p3text
from v$session_wait
where sid in
+(select a.sid from v$session a, v$process b where a.paddr = b.addr+
and a.status = 'ACTIVE' and a.username is not null)
order by 1
+/+
Edited by: Girish on Jun 9, 2011 4:06 AM -
Streams AQ wait event on Oracle 10g
Hello,
I have ECC 6.0 on W2k3 with Oracle. I have some wait event about Streams AQ :
Streams AQ: waiting for messages in the queue
Streams AQ: qmn slave idle wait
Streams AQ: qmn coordinator waiting for slave to start
What does it mean ? What can I do to fix that?
From what I read, it's seems to have something to do with parameter : aq_tm_processes
What this parameter whould be set to? It seems to be set to O now.
Thank you for any help,
NicholasHi,
What is the Patch Level of Oracle 10g which is in use ?
Please refer Oracle Meta link 428441.1 to get more information. It will tell you the reason and the possible alternatives to deal with it. You can refer SAP Note 758563 to get Oracle Meta link access.
Unless you use Oracle Streams Advanced Queuing , there's no need to set this parameter.
If AQ_TM_PROCESSES is not specified or is set to 0, then the queue monitor is not created.
In 10gR2 parameter AQ_TM_PROCESSES shouldn't be set explicitly in pfile/spfile, because Oracle autotunes it.
Also refer the [this link|SRM Alert Management does not determine recipient runtime?; to get more info.
Regards,
Bhavik G. Shroff -
Hello colleagues,
Could you please let me know where can I find detailed information with regards to Oracle wait events?
I'm interested in methods that allow me to know how to identify performance issues on the system that are related to DB wait events.
Thanks and best regards
RodHi there,
in addition to the already mentioned SAP notes, why don't you just check the Oracle documentation?
Most of the wait events are fully explained in there, along with the approach how to use the wait event data.
If you want to grasp a better understanding of the technique of wait time analysis, I highly recommend that you check the website of Cary Millsap and Method R. Cary provides great introductions into the theory behind the wait events and does a perfect job with explaining it.
regards,
Lars -
Wait event PX Deq: reap credit in Oracle 9.2.0.8
Hi,
Can you please explain me what does mean by "PX Deq: reap credit" wait event. My session is waiting on this event. Can you please suggest how to reduce this wait.
ThanksHi
oratst@ebsdevdb on /ebdbh/11g/data/cfgtoollogs/dbua/ebstest/upgrade1 # more Upgrade_Directive.log
Connected.
ODMA_DIRECTIVE:VERSION:9.2.0.8
ODMA_DIRECTIVE:MIGRATE_SID:
ODMA_DIRECTIVE:ORA:IGNORE:06512:
ODMA_DIRECTIVE:ORA:FATAL:00600:
ODMA_DIRECTIVE:ORA:FATAL:01012:
ODMA_DIRECTIVE:ORA:FATAL:01031:
ODMA_DIRECTIVE:ORA:FATAL:01034:
ODMA_DIRECTIVE:ORA:FATAL:01078:
ODMA_DIRECTIVE:ORA:FATAL:01092:
ODMA_DIRECTIVE:ORA:FATAL:01109:
ODMA_DIRECTIVE:ORA:FATAL:01119:
ODMA_DIRECTIVE:ORA:FATAL:01507:
ODMA_DIRECTIVE:ORA:FATAL:01722:
ODMA_DIRECTIVE:ORA:FATAL:03113:
ODMA_DIRECTIVE:ORA:FATAL:03114:
ODMA_DIRECTIVE:ORA:FATAL:07445:
ODMA_DIRECTIVE:ORA:FATAL:12560:
ODMA_DIRECTIVE:ORA:RECOVER_TBS:01650:
ODMA_DIRECTIVE:ORA:RECOVER_TBS:01651:
ODMA_DIRECTIVE:ORA:RECOVER_TBS:01652:
ODMA_DIRECTIVE:ORA:RECOVER_TBS:01653:
ODMA_DIRECTIVE:ORA:RECOVER_TBS:01654:
ODMA_DIRECTIVE:ORA:RECOVER_TBS:01655:
ODMA_DIRECTIVE:ORA:RECOVER_ROLL:01562:
ODMA_DIRECTIVE:ORA:RECOVER_INIT:04031:
ODMA_DIRECTIVE:SCRIPT:UPGRADE:rdbms/admin/catupgrd.sql:
ODMA_DIRECTIVE:BOUNCE_DATABASE:UPGRADE:
ODMA_DIRECTIVE:SCRIPT:UPGRADE:rdbms/admin/catuppst.sql:
ODMA_DIRECTIVE:SCRIPT:UPGRADE:sqlplus/admin/help/hlpbld.sql helpus.sql:
Thanks
With Regards
A-Z -
Active session Spike on Oracle RAC 11G R2 on HP UX
Dear Experts,
We need urgent help please, as we are facing very low performance in production database.
We are having oracle 11G RAC on HP Unix environment. Following is the ADDM report. Kindly check and please help me to figure it out the issue and resolve it at earliest.
---------Instance 1---------------
ADDM Report for Task 'TASK_36650'
Analysis Period
AWR snapshot range from 11634 to 11636.
Time period starts at 21-JUL-13 07.00.03 PM
Time period ends at 21-JUL-13 09.00.49 PM
Analysis Target
Database 'MCMSDRAC' with DB ID 2894940361.
Database version 11.2.0.1.0.
ADDM performed an analysis of instance mcmsdrac1, numbered 1 and hosted at
mcmsdbl1.
Activity During the Analysis Period
Total database time was 38466 seconds.
The average number of active sessions was 5.31.
Summary of Findings
Description Active Sessions Recommendations
Percent of Activity
1 CPU Usage 1.44 | 27.08 1
2 Interconnect Latency .07 | 1.33 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Findings and Recommendations
Finding 1: CPU Usage
Impact is 1.44 active sessions, 27.08% of total activity.
Host CPU was a bottleneck and the instance was consuming 99% of the host CPU.
All wait times will be inflated by wait for CPU.
Host CPU consumption was 99%.
Recommendation 1: Host Configuration
Estimated benefit is 1.44 active sessions, 27.08% of total activity.
Action
Consider adding more CPUs to the host or adding instances serving the
database on other hosts.
Action
Session CPU consumption was throttled by the Oracle Resource Manager.
Consider revising the resource plan that was active during the analysis
period.
Finding 2: Interconnect Latency
Impact is .07 active sessions, 1.33% of total activity.
Higher than expected latency of the cluster interconnect was responsible for
significant database time on this instance.
The instance was consuming 110 kilo bits per second of interconnect bandwidth.
20% of this interconnect bandwidth was used for global cache messaging, 21%
for parallel query messaging and 7% for database lock management.
The average latency for 8K interconnect messages was 42153 microseconds.
The instance is using the private interconnect device "lan2" with IP address
172.16.200.71 and source "Oracle Cluster Repository".
The device "lan2" was used for 100% of interconnect traffic and experienced 0
send or receive errors during the analysis period.
Recommendation 1: Host Configuration
Estimated benefit is .07 active sessions, 1.33% of total activity.
Action
Investigate cause of high network interconnect latency between database
instances. Oracle's recommended solution is to use a high speed
dedicated network.
Action
Check the configuration of the cluster interconnect. Check OS setup like
adapter setting, firmware and driver release. Check that the OS's socket
receive buffers are large enough to store an entire multiblock read. The
value of parameter "db_file_multiblock_read_count" may be decreased as a
workaround.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Additional Information
Miscellaneous Information
Wait class "Application" was not consuming significant database time.
Wait class "Cluster" was not consuming significant database time.
Wait class "Commit" was not consuming significant database time.
Wait class "Concurrency" was not consuming significant database time.
Wait class "Configuration" was not consuming significant database time.
Wait class "Network" was not consuming significant database time.
Wait class "User I/O" was not consuming significant database time.
Session connect and disconnect calls were not consuming significant database
time.
Hard parsing of SQL statements was not consuming significant database time.
The database's maintenance windows were active during 100% of the analysis
period.
----------------Instance 2 --------------------
ADDM Report for Task 'TASK_36652'
Analysis Period
AWR snapshot range from 11634 to 11636.
Time period starts at 21-JUL-13 07.00.03 PM
Time period ends at 21-JUL-13 09.00.49 PM
Analysis Target
Database 'MCMSDRAC' with DB ID 2894940361.
Database version 11.2.0.1.0.
ADDM performed an analysis of instance mcmsdrac2, numbered 2 and hosted at
mcmsdbl2.
Activity During the Analysis Period
Total database time was 2898 seconds.
The average number of active sessions was .4.
Summary of Findings
Description Active Sessions Recommendations
Percent of Activity
1 Top SQL Statements .11 | 27.65 5
2 Interconnect Latency .1 | 24.15 1
3 Shared Pool Latches .09 | 22.42 1
4 PL/SQL Execution .06 | 14.39 2
5 Unusual "Other" Wait Event .03 | 8.73 4
6 Unusual "Other" Wait Event .03 | 6.42 3
7 Unusual "Other" Wait Event .03 | 6.29 6
8 Hard Parse .02 | 5.5 0
9 Soft Parse .02 | 3.86 2
10 Unusual "Other" Wait Event .01 | 3.75 4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Findings and Recommendations
Finding 1: Top SQL Statements
Impact is .11 active sessions, 27.65% of total activity.
SQL statements consuming significant database time were found. These
statements offer a good opportunity for performance improvement.
Recommendation 1: SQL Tuning
Estimated benefit is .05 active sessions, 12.88% of total activity.
Action
Investigate the PL/SQL statement with SQL_ID "d1s02myktu19h" for
possible performance improvements. You can supplement the information
given here with an ASH report for this SQL_ID.
Related Object
SQL statement with SQL_ID d1s02myktu19h.
begin dbms_utility.validate(:1,:2,:3,:4); end;
Rationale
The SQL Tuning Advisor cannot operate on PL/SQL statements.
Rationale
Database time for this SQL was divided as follows: 13% for SQL
execution, 2% for parsing, 85% for PL/SQL execution and 0% for Java
execution.
Rationale
SQL statement with SQL_ID "d1s02myktu19h" was executed 48 times and had
an average elapsed time of 7 seconds.
Rationale
Waiting for event "library cache pin" in wait class "Concurrency"
accounted for 70% of the database time spent in processing the SQL
statement with SQL_ID "d1s02myktu19h".
Rationale
Top level calls to execute the PL/SQL statement with SQL_ID
"63wt8yna5umd6" are responsible for 100% of the database time spent on
the PL/SQL statement with SQL_ID "d1s02myktu19h".
Related Object
SQL statement with SQL_ID 63wt8yna5umd6.
begin DBMS_UTILITY.COMPILE_SCHEMA( 'TPAUSER', FALSE ); end;
Recommendation 2: SQL Tuning
Estimated benefit is .02 active sessions, 4.55% of total activity.
Action
Run SQL Tuning Advisor on the SELECT statement with SQL_ID
"fk3bh3t41101x".
Related Object
SQL statement with SQL_ID fk3bh3t41101x.
SELECT MEM.MEMBER_CODE ,MEM.E_NAME,Pol.Policy_no
,pol.date_from,pol.date_to,POL.E_NAME,MEM.SEX,(SYSDATE-MEM.BIRTH_DATE
) AGE,POL.SCHEME_NO FROM TPAUSER.MEMBERS MEM,TPAUSER.POLICY POL WHERE
POL.QUOTATION_NO=MEM.QUOTATION_NO AND POL.BRANCH_CODE=MEM.BRANCH_CODE
and endt_no=(select max(endt_no) from tpauser.members mm where
mm.member_code=mem.member_code AND mm.QUOTATION_NO=MEM.QUOTATION_NO)
and member_code like '%' || nvl(:1,null) ||'%' ORDER BY MEMBER_CODE
Rationale
The SQL spent 92% of its database time on CPU, I/O and Cluster waits.
This part of database time may be improved by the SQL Tuning Advisor.
Rationale
Database time for this SQL was divided as follows: 100% for SQL
execution, 0% for parsing, 0% for PL/SQL execution and 0% for Java
execution.
Rationale
SQL statement with SQL_ID "fk3bh3t41101x" was executed 14 times and had
an average elapsed time of 4.9 seconds.
Rationale
At least one execution of the statement ran in parallel.
Recommendation 3: SQL Tuning
Estimated benefit is .02 active sessions, 3.79% of total activity.
Action
Run SQL Tuning Advisor on the SELECT statement with SQL_ID
"7mhjbjg9ntqf5".
Related Object
SQL statement with SQL_ID 7mhjbjg9ntqf5.
SELECT SUM(CNT) FROM (SELECT COUNT(PROC_CODE) CNT FROM
TPAUSER.TORBINY_PROCEDURE WHERE BRANCH_CODE = :B6 AND QUOTATION_NO =
:B5 AND CLASS_NO = :B4 AND OPTION_NO = :B3 AND PR_EFFECTIVE_DATE<=
:B2 AND PROC_CODE = :B1 UNION SELECT COUNT(MED_CODE) CNT FROM
TPAUSER.TORBINY_MEDICINE WHERE BRANCH_CODE = :B6 AND QUOTATION_NO =
:B5 AND CLASS_NO = :B4 AND OPTION_NO = :B3 AND M_EFFECTIVE_DATE<= :B2
AND MED_CODE = :B1 UNION SELECT COUNT(LAB_CODE) CNT FROM
TPAUSER.TORBINY_LAB WHERE BRANCH_CODE = :B6 AND QUOTATION_NO = :B5
AND CLASS_NO = :B4 AND OPTION_NO = :B3 AND L_EFFECTIVE_DATE<= :B2 AND
LAB_CODE = :B1 )
Rationale
The SQL spent 100% of its database time on CPU, I/O and Cluster waits.
This part of database time may be improved by the SQL Tuning Advisor.
Rationale
Database time for this SQL was divided as follows: 0% for SQL execution,
0% for parsing, 100% for PL/SQL execution and 0% for Java execution.
Rationale
SQL statement with SQL_ID "7mhjbjg9ntqf5" was executed 31 times and had
an average elapsed time of 3.4 seconds.
Rationale
Top level calls to execute the SELECT statement with SQL_ID
"a11nzdnd91gsg" are responsible for 100% of the database time spent on
the SELECT statement with SQL_ID "7mhjbjg9ntqf5".
Related Object
SQL statement with SQL_ID a11nzdnd91gsg.
SELECT POLICY_NO,SCHEME_NO FROM TPAUSER.POLICY WHERE QUOTATION_NO
=:B1
Recommendation 4: SQL Tuning
Estimated benefit is .01 active sessions, 3.03% of total activity.
Action
Investigate the SELECT statement with SQL_ID "4uqs4jt7aca5s" for
possible performance improvements. You can supplement the information
given here with an ASH report for this SQL_ID.
Related Object
SQL statement with SQL_ID 4uqs4jt7aca5s.
SELECT DISTINCT USER_ID FROM GV$SESSION, USERS WHERE UPPER (USERNAME)
= UPPER (USER_ID) AND USERS.APPROVAL_CLAIM='VC' AND USER_ID=:B1
Rationale
The SQL spent only 0% of its database time on CPU, I/O and Cluster
waits. Therefore, the SQL Tuning Advisor is not applicable in this case.
Look at performance data for the SQL to find potential improvements.
Rationale
Database time for this SQL was divided as follows: 100% for SQL
execution, 0% for parsing, 0% for PL/SQL execution and 0% for Java
execution.
Rationale
SQL statement with SQL_ID "4uqs4jt7aca5s" was executed 261 times and had
an average elapsed time of 0.35 seconds.
Rationale
At least one execution of the statement ran in parallel.
Rationale
Top level calls to execute the PL/SQL statement with SQL_ID
"91vt043t78460" are responsible for 100% of the database time spent on
the SELECT statement with SQL_ID "4uqs4jt7aca5s".
Related Object
SQL statement with SQL_ID 91vt043t78460.
begin TPAUSER.RECEIVE_NEW_FAX_APRROVAL(:V00001,:V00002,:V00003,:V0000
4); end;
Recommendation 5: SQL Tuning
Estimated benefit is .01 active sessions, 3.03% of total activity.
Action
Run SQL Tuning Advisor on the SELECT statement with SQL_ID
"7kt28fkc0yn5f".
Related Object
SQL statement with SQL_ID 7kt28fkc0yn5f.
SELECT COUNT(*) FROM TPAUSER.APPROVAL_MASTER WHERE APPROVAL_STATUS IS
NULL AND (UPPER(CODED) = UPPER(:B1 ) OR UPPER(PROCESSED_BY) =
UPPER(:B1 ))
Rationale
The SQL spent 100% of its database time on CPU, I/O and Cluster waits.
This part of database time may be improved by the SQL Tuning Advisor.
Rationale
Database time for this SQL was divided as follows: 100% for SQL
execution, 0% for parsing, 0% for PL/SQL execution and 0% for Java
execution.
Rationale
SQL statement with SQL_ID "7kt28fkc0yn5f" was executed 1034 times and
had an average elapsed time of 0.063 seconds.
Rationale
Top level calls to execute the PL/SQL statement with SQL_ID
"91vt043t78460" are responsible for 100% of the database time spent on
the SELECT statement with SQL_ID "7kt28fkc0yn5f".
Related Object
SQL statement with SQL_ID 91vt043t78460.
begin TPAUSER.RECEIVE_NEW_FAX_APRROVAL(:V00001,:V00002,:V00003,:V0000
4); end;
Finding 2: Interconnect Latency
Impact is .1 active sessions, 24.15% of total activity.
Higher than expected latency of the cluster interconnect was responsible for
significant database time on this instance.
The instance was consuming 128 kilo bits per second of interconnect bandwidth.
17% of this interconnect bandwidth was used for global cache messaging, 6% for
parallel query messaging and 8% for database lock management.
The average latency for 8K interconnect messages was 41863 microseconds.
The instance is using the private interconnect device "lan2" with IP address
172.16.200.72 and source "Oracle Cluster Repository".
The device "lan2" was used for 100% of interconnect traffic and experienced 0
send or receive errors during the analysis period.
Recommendation 1: Host Configuration
Estimated benefit is .1 active sessions, 24.15% of total activity.
Action
Investigate cause of high network interconnect latency between database
instances. Oracle's recommended solution is to use a high speed
dedicated network.
Action
Check the configuration of the cluster interconnect. Check OS setup like
adapter setting, firmware and driver release. Check that the OS's socket
receive buffers are large enough to store an entire multiblock read. The
value of parameter "db_file_multiblock_read_count" may be decreased as a
workaround.
Symptoms That Led to the Finding:
Inter-instance messaging was consuming significant database time on this
instance.
Impact is .06 active sessions, 14.23% of total activity.
Wait class "Cluster" was consuming significant database time.
Impact is .06 active sessions, 14.23% of total activity.
Finding 3: Shared Pool Latches
Impact is .09 active sessions, 22.42% of total activity.
Contention for latches related to the shared pool was consuming significant
database time.
Waits for "library cache lock" amounted to 5% of database time.
Waits for "library cache pin" amounted to 17% of database time.
Recommendation 1: Application Analysis
Estimated benefit is .09 active sessions, 22.42% of total activity.
Action
Investigate the cause for latch contention using the given blocking
sessions or modules.
Rationale
The session with ID 17 and serial number 15595 in instance number 1 was
the blocking session responsible for 34% of this recommendation's
benefit.
Symptoms That Led to the Finding:
Wait class "Concurrency" was consuming significant database time.
Impact is .1 active sessions, 24.96% of total activity.
Finding 4: PL/SQL Execution
Impact is .06 active sessions, 14.39% of total activity.
PL/SQL execution consumed significant database time.
Recommendation 1: SQL Tuning
Estimated benefit is .05 active sessions, 12.5% of total activity.
Action
Tune the entry point PL/SQL "SYS.DBMS_UTILITY.COMPILE_SCHEMA" of type
"PACKAGE" and ID 6019. Refer to the PL/SQL documentation for addition
information.
Rationale
318 seconds spent in executing PL/SQL "SYS.DBMS_UTILITY.VALIDATE#2" of
type "PACKAGE" and ID 6019.
Recommendation 2: SQL Tuning
Estimated benefit is .01 active sessions, 1.89% of total activity.
Action
Tune the entry point PL/SQL
"SYSMAN.EMD_MAINTENANCE.EXECUTE_EM_DBMS_JOB_PROCS" of type "PACKAGE" and
ID 68654. Refer to the PL/SQL documentation for addition information.
Finding 5: Unusual "Other" Wait Event
Impact is .03 active sessions, 8.73% of total activity.
Wait event "DFS lock handle" in wait class "Other" was consuming significant
database time.
Recommendation 1: Application Analysis
Estimated benefit is .03 active sessions, 8.73% of total activity.
Action
Investigate the cause for high "DFS lock handle" waits. Refer to
Oracle's "Database Reference" for the description of this wait event.
Recommendation 2: Application Analysis
Estimated benefit is .03 active sessions, 8.27% of total activity.
Action
Investigate the cause for high "DFS lock handle" waits in Service
"mcmsdrac".
Recommendation 3: Application Analysis
Estimated benefit is .02 active sessions, 5.05% of total activity.
Action
Investigate the cause for high "DFS lock handle" waits in Module "TOAD
9.7.2.5".
Recommendation 4: Application Analysis
Estimated benefit is .01 active sessions, 3.21% of total activity.
Action
Investigate the cause for high "DFS lock handle" waits in Module
"toad.exe".
Symptoms That Led to the Finding:
Wait class "Other" was consuming significant database time.
Impact is .15 active sessions, 38.29% of total activity.
Finding 6: Unusual "Other" Wait Event
Impact is .03 active sessions, 6.42% of total activity.
Wait event "reliable message" in wait class "Other" was consuming significant
database time.
Recommendation 1: Application Analysis
Estimated benefit is .03 active sessions, 6.42% of total activity.
Action
Investigate the cause for high "reliable message" waits. Refer to
Oracle's "Database Reference" for the description of this wait event.
Recommendation 2: Application Analysis
Estimated benefit is .03 active sessions, 6.42% of total activity.
Action
Investigate the cause for high "reliable message" waits in Service
"mcmsdrac".
Recommendation 3: Application Analysis
Estimated benefit is .02 active sessions, 4.13% of total activity.
Action
Investigate the cause for high "reliable message" waits in Module "TOAD
9.7.2.5".
Symptoms That Led to the Finding:
Wait class "Other" was consuming significant database time.
Impact is .15 active sessions, 38.29% of total activity.
Finding 7: Unusual "Other" Wait Event
Impact is .03 active sessions, 6.29% of total activity.
Wait event "enq: PS - contention" in wait class "Other" was consuming
significant database time.
Recommendation 1: Application Analysis
Estimated benefit is .03 active sessions, 6.29% of total activity.
Action
Investigate the cause for high "enq: PS - contention" waits. Refer to
Oracle's "Database Reference" for the description of this wait event.
Recommendation 2: Application Analysis
Estimated benefit is .02 active sessions, 6.02% of total activity.
Action
Investigate the cause for high "enq: PS - contention" waits in Service
"mcmsdrac".
Recommendation 3: Application Analysis
Estimated benefit is .02 active sessions, 4.93% of total activity.
Action
Investigate the cause for high "enq: PS - contention" waits with
P1,P2,P3 ("name|mode, instance, slave ID") values "1347616774", "1" and
"3599" respectively.
Recommendation 4: Application Analysis
Estimated benefit is .01 active sessions, 2.74% of total activity.
Action
Investigate the cause for high "enq: PS - contention" waits in Module
"Inbox Reader_92.exe".
Recommendation 5: Application Analysis
Estimated benefit is .01 active sessions, 2.74% of total activity.
Action
Investigate the cause for high "enq: PS - contention" waits in Module
"TOAD 9.7.2.5".
Recommendation 6: Application Analysis
Estimated benefit is .01 active sessions, 1.37% of total activity.
Action
Investigate the cause for high "enq: PS - contention" waits with
P1,P2,P3 ("name|mode, instance, slave ID") values "1347616774", "1" and
"3598" respectively.
Symptoms That Led to the Finding:
Wait class "Other" was consuming significant database time.
Impact is .15 active sessions, 38.29% of total activity.
Finding 8: Hard Parse
Impact is .02 active sessions, 5.5% of total activity.
Hard parsing of SQL statements was consuming significant database time.
Hard parses due to cursor environment mismatch were not consuming significant
database time.
Hard parsing SQL statements that encountered parse errors was not consuming
significant database time.
Hard parses due to literal usage and cursor invalidation were not consuming
significant database time.
The Oracle instance memory (SGA and PGA) was adequately sized.
No recommendations are available.
Symptoms That Led to the Finding:
Contention for latches related to the shared pool was consuming
significant database time.
Impact is .09 active sessions, 22.42% of total activity.
Wait class "Concurrency" was consuming significant database time.
Impact is .1 active sessions, 24.96% of total activity.
Finding 9: Soft Parse
Impact is .02 active sessions, 3.86% of total activity.
Soft parsing of SQL statements was consuming significant database time.
Recommendation 1: Application Analysis
Estimated benefit is .02 active sessions, 3.86% of total activity.
Action
Investigate application logic to keep open the frequently used cursors.
Note that cursors are closed by both cursor close calls and session
disconnects.
Recommendation 2: Database Configuration
Estimated benefit is .02 active sessions, 3.86% of total activity.
Action
Consider increasing the session cursor cache size by increasing the
value of parameter "session_cached_cursors".
Rationale
The value of parameter "session_cached_cursors" was "100" during the
analysis period.
Symptoms That Led to the Finding:
Contention for latches related to the shared pool was consuming
significant database time.
Impact is .09 active sessions, 22.42% of total activity.
Wait class "Concurrency" was consuming significant database time.
Impact is .1 active sessions, 24.96% of total activity.
Finding 10: Unusual "Other" Wait Event
Impact is .01 active sessions, 3.75% of total activity.
Wait event "IPC send completion sync" in wait class "Other" was consuming
significant database time.
Recommendation 1: Application Analysis
Estimated benefit is .01 active sessions, 3.75% of total activity.
Action
Investigate the cause for high "IPC send completion sync" waits. Refer
to Oracle's "Database Reference" for the description of this wait event.
Recommendation 2: Application Analysis
Estimated benefit is .01 active sessions, 3.75% of total activity.
Action
Investigate the cause for high "IPC send completion sync" waits with P1
("send count") value "1".
Recommendation 3: Application Analysis
Estimated benefit is .01 active sessions, 2.59% of total activity.
Action
Investigate the cause for high "IPC send completion sync" waits in
Service "mcmsdrac".
Recommendation 4: Application Analysis
Estimated benefit is .01 active sessions, 1.73% of total activity.
Action
Investigate the cause for high "IPC send completion sync" waits in
Module "TOAD 9.7.2.5".
Symptoms That Led to the Finding:
Wait class "Other" was consuming significant database time.
Impact is .15 active sessions, 38.29% of total activity.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Additional Information
Miscellaneous Information
Wait class "Application" was not consuming significant database time.
Wait class "Commit" was not consuming significant database time.
Wait class "Configuration" was not consuming significant database time.
CPU was not a bottleneck for the instance.
Wait class "Network" was not consuming significant database time.
Wait class "User I/O" was not consuming significant database time.
Session connect and disconnect calls were not consuming significant database
time.
The database's maintenance windows were active during 100% of the analysis
period.
Please help.Hello experts...
Please do the needful... It's really very urgent.
Thanks,
Syed -
RCA for Oracle RAC Performance Issue
Hi DBAs,
I have setup a 2 node Oracle RAC 10.2.0.3 on Linux 4.5 (64 bit) with 16 GB memory and 4 dual core CPUs each. The database is serving a web application but unfortunately the system is at its knees. The performance is terrible. The storage is a EMC SAN but ASM is not implemented with a fear to further degrade the performance or not to complicate the system further.
I am seeking the expert advises from some GURUs from this forums to formulate the action plan to do the root cause analysis to the system and database. Please advise me what tools I can use to gather the information about the Root Cause. AWR Report is not very helpful. The system stats with top, vmstat, iostat only show the high resource usage but difficult to find the reason. OEM has configured and very frequently report all kind of high wait events.
How I can use effectively find Network bottle necks (netstat command which need to be really helpful to understand).
How I can see the system I/O (iostats) which can provide me some useful information. I don't understand what sould be the baseline or optimal values to compare the I/O activities.
I am seeking help and advised to diagnose the issue. I also want to represent this issue as a case study.
Thanks
-Samar-First of all, RAC is mainly suited for OLTP applications.
Secondly, if your application is unscalable (it doesn't use bind variables and no SQL statements have been tuned and/or it has been ported from Sukkelserver 200<whatever>) running it against RAC will make things worse.
Thirdly: RAC uses a chatty Interconnect. If you didn't configure the Interconnect properly,and/or are using slow Network cards (1 Gb is mandatory), and/or you are not using a 9k MTU on your 1 Gb NIC, this again will make things worse.
You can't install RAC 'out of the box'. It won't perform! PERIOD.
Fourthly: you might suffer from your 'application' connecting and disconnecting for every individual SQL statement and/or commit every individual INSERT or UPDATE.
You need to address this.
Using ADDM and/or AWR is compulsory for analysing the problem, and/or having read Cary Millsaps book on Optimizing Oracle performance is compulsory.
You won't come anywhere without AWR and OS statistics will not provide any clue.
Because, paraphrasing William Jefferson Clinton, former president of the US of A:
It's the application, stupid.
99 out of 100 cases. Trust me. All developers I know currently are 100 percent clueless.
That said, if you can't be bothered to post the top 5 AWR events, and you aren't up to using AWR reports, maybe you should hire a consultant who can.
Regards,
Sybrand Bakker
Senior Oracle DBA -
What is "KJC: Wait for msg .." wait event in 10g??
Hi, all.
The database is 2 node RAC database (10.2.0.2.0)
on 32-bit windows 2003 EE SP1.
I found "KJC: Wait for msg sends to complete" wait event in
"Top 5 Timed Event" Section from AWR report.
What is "KJC: Wait for msg sends to complete" wait event??
The following is from UDUMP.
Dump file d:\oracle\product\10.2.0\admin\rac\udump\rac2_ora_5656.trc
Mon Sep 24 00:04:40 2007
ORACLE V10.2.0.2.0 - Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Real Application Clusters, OLAP and Data Mining options
Windows Server 2003 Version V5.2 Service Pack 1
CPU : 4 - type 586, 2 Physical Cores
Process Affinity : 0x00000000
Memory (Avail/Total): Ph:5278M/8190M, Ph+PgF:6596M/10041M, VA:316M/2047M
Instance name: rac2
Redo thread mounted by this instance: 2
Oracle process number: 64
Windows thread id: 5656, image: ORACLE.EXE (SHAD)
*** 2007-09-24 00:04:40.156
*** ACTION NAME:() 2007-09-24 00:04:40.156
*** MODULE NAME:(OEM.SystemPool) 2007-09-24 00:04:40.156
*** SERVICE NAME:(RAC.world) 2007-09-24 00:04:40.156
*** CLIENT ID:() 2007-09-24 00:04:40.156
*** SESSION ID:(486.53) 2007-09-24 00:04:40.156
IPCSendMsg: could not initiate send on conn 0x5b0d3e98 to node [rac1 : 696 : 3996 : 359937], err 10054
IPCGetRequestInfo: failed a request rqh(0x5b060db8), type(6), status(2), bytes(0)
Thanks and Regards.
Message was edited by:
user507290
Message was edited by:
user507290This might have something to do with bug 5075434 - Small performance overhead in RAC (waits for "KJC: Wait for msg sends to complete").
Check metalink for further details.
Maybe you are looking for
-
How to block the Creation of Mulitple Excise Invoice in J1IS
Hi Sap Gurs, Can any tell me how to block the System allowing to Create One more Excise Invoice in J1IS against Same GI Material Document no (Ref Trans Type:MATD) for Outgoing Materials ie:Stock Transfer from One Plant to onother Plant by Mvt Type 3
-
Distribute to layers crashing flash 50% of the time.
using the distribute to layers option constantly crashes me! I can use it ONCE after a restart of flash and after that it's a gamble wether it will crash or not...though more often than not it does. Anyone else having this issue? Any solutions or way
-
Convert Photo(s) To DNG not permitting multiple conversions
I'm having a problem where, for extended periods, I can only individually convert photos to DNG. Even when I have all images in a folder selected, the menu dialog reads "Convert photo to DNG...". Currently, I'm experiencing this with Lightroom 2.3, b
-
My hotmail gets hacked is it a virus?
My hotmail account has been hacked 3 times now, I keep changing the password, so is it malware or a virus? How do I find out?
-
JDev/ADF Faces 11gR2.3: UI Shell, how to disable scrollbars from logo
I have set a header logo by using the UI Shell template property "logoImagePath" attribute. Upon running the page, I notice that if I resize the browser window width to a smaller value then the image width, horizontal and vertical scroll bars appear.