Deadlock on RAC
Hello All,
I am using Oracle 10g RAC.
I facing some dead locks in the db,
I checked the alert log for both instances and i did not find any ORA-00060 error.
Is there other ora error to search it inside the alert log to check dead locks for RAC environments, is there any specific ORA for deadlocks related to RAC?
Regards,
NB wrote:
Hello All,
I am using Oracle 10g RAC.
I facing some dead locks in the db,
I checked the alert log for both instances and i did not find any ORA-00060 error.
Is there other ora error to search it inside the alert log to check dead locks for RAC environments, is there any specific ORA for deadlocks related to RAC?
Looks like you are having sessions waiting for locks but not a deadlock situation..
Please check AWR report for that duration or use OEM console to check for wait events.
In AWR report , find out top wait events and diagnose the session causing them..
Regards
Rajesh
Similar Messages
-
@oracle team; forms server help desparately needed
hi
my company is trying to migrate client machines to linux. our setup should be the following:
database <-> forms server <-> linux client with applet viewer
v9.2.....................v9i.................debian 3.0, j2sdk1.3
i am working on this setup for some months(!) now but i cannot get the clients to communicate stable with the rest of the setup.
problem: quite frequently the appletviewer will just hang and cause the forms server to create a
f90webm_dump_`$PID` file
in its working directory containing the following information:
[Mon Jan 20 21:58:29 2003 CET]::Client Status [ConnId=0, PID=10033]
>> Network connection to client failed: timeout on connection
the people working on the client machines are quite quickly inputting data into our databases. the appletviewer will hang more often on machines < 500 MHz but it also hangs on machines with more than 1.5 GHz and 512 MB DDR
i have literally tried everything:
i tried these java versions:
1.1, 1.3_x, 1.4.1 from sun, blackdown, ibm and even the one that came with the forms server on the clients
i tried to tune every setting on the forms server and in the jvm.cfg. i even tried to install the forms server on the clients so that i would have one tcp connection less to definitely exclude network failure as a possible reason.
i tried to use the PREEMPT_CLOSE setting, i even patched the kernels (low latency patch), analysed megabytes of debug logging, tried to strace the appletviewer to find WHY it keeps hanging!!!!
i tried to use different linux versions for best compatibility on the clients as well as on the forms server:
debian potato, woody, sarge, sid
suse 7.1, 7.2, 7.3
redhat 7.2, 7.3
i have read almost ALL documentation you could possibly read to find the reason of this problem.
dear oracle team
PLEASE just tell me that you have even tested this product and that you know how to make it work in a stable way. we can even pay you money to give us the solution. we have been using oracle forms for 7 years now and we want to stick with it. but our clients HAVE to be able to run the linux OS.
i have posted this problem 2 or 3 times already but somehow either noone seems to read this or noone has ever tried to use a three tier setup with linux on the middle and the client tier.
as i said i have wasted MONTHS already trying to get this to work and i am out of ideas what could cause this.
thanks for ANY reaction
armin wallandsorry for not responding a few days, i didnt have the possibility to do so.
if you access and run the application from a Windows
client using JInitiator, what happens.i have tried that of course and it seemed to be stable. (no hang in about 2 and a half days). i dont think it is a problem on the forms server itself...maybe just some setting that doesnt cooperate well with the appletviewer.
Since we don't
certify Forms9i with appletviewer, we didn't test it
there.thats what i thought to be honest....since you say "we"; do you work for oracle?
So first thing to do is to check if the
problem is a client issue or a server side problem.seems more like a client proble to me though....somehow it looks like some deadlock or race condition, but im not a programmer so this could be a completely wrong impression.
If youi can, can you try a test on a Windows client?as i said b4, we tried that and it seemed to work finely but it was to test the server and ensure it isnt a complete misconfiguration issue :)
rgds, armin -
Exception (or crash) when executing an XmlModify with multiple remove exprs
Hi all,
The following code snippet resembles my execution path for updating some nodes in a document.
XmlManager manager = DbController::getInstance()->getManager();
XmlContainer container = DbController::getInstance()->getContainer();
XmlTransaction mainTransaction = manager.createTransaction(DB_TXN_NOWAIT);
XmlQueryContext queryContext = manager.createQueryContext();
XmlTransaction childTransaction = mainTrContext->getXmlTransaction().createChild();
XmlDocument document = container.getDocument(childTransaction, "mydocdir/docname", 0);
childTransaction.commit();
XmlValue nodeXmlValue;
XmlQueryExpression docNodeExpression = manager.prepare(mainTransaction, "/acquisitionProtocol", queryContext
XmlResults resultList = docNodeExpression.execute(mainTransaction, mainTrContext->getDocument()->getXmlValue(), queryContext
if (resultList.size() > 0)
resultList.next(nodeXmlValue);
// XmlValue nodeXmlValue(mainTrContext->getDocument()->getXmlDocument());
XmlUpdateContext updateContext = manager.createUpdateContext();
XmlModify modifier = manager.createModify();
XmlQueryExpression nodeQueryExpression = manager.prepare(mainTransaction, "./location", queryContext);
modifier.addRemoveStep(nodeQueryExpression);
XmlQueryExpression nodeQueryExpression = manager.prepare(mainTransaction, "./id", queryContext);
modifier.addRemoveStep(nodeQueryExpression);
nodeQueryExpression = manager.prepare(mainTransaction, ".", queryContext);
modifier.addAppendStep(nodeQueryExpression, XmlModify::Element, "location", "yves/test", 0);
modifier.execute(mainTransaction, nodeXmlValue, queryContext, updateContext);
mainTransaction.commit();When I execute the modifier (2de last line) it throws the following exception or crashes:
DBcursor->get: DB_READ_COMMITTED, DB_READ_UNCOMMITTED and DB_RMW require locking
Exception code: 5
Error text:Error: Invalid argument File: NsEventReader.cpp Line: 828
DbErrno: 22The problem occurs when adding more than one remove expression to the modify object. If I use only one remove expression, the code works fine.
However, I can add multiple append expression without problems.
What am I doing wrong here???
Thanks in advance
Yves
Edited by: ywillems on Dec 10, 2008 7:55 AM
Edited by: ywillems on Dec 11, 2008 5:37 AM
Edited by: ywillems on Dec 11, 2008 7:05 AMYves,
Oddly enough I believe this is an optimizer bug that we've found and patched (but not yet officially released the patch). Try applying this patch to 2.4.16:
diff -ru dbxml-2.4.16-original/dbxml/src/dbxml/query/DecisionPointQP.cpp dbxml-2.4.16/dbxml/src/dbxml/query/DecisionPointQP.cpp
--- dbxml-2.4.16-original/dbxml/src/dbxml/query/DecisionPointQP.cpp
+++ dbxml-2.4.16/dbxml/src/dbxml/query/DecisionPointQP.cpp
@@ -1,14 +1,13 @@
// See the file LICENSE for redistribution information.
// Copyright (c) 2002,2008 Oracle. All rights reserved.
-// $Id$
#include "../DbXmlInternal.hpp"
#include "DecisionPointQP.hpp"
#include "QueryPlanHolder.hpp"
#include "../QueryContext.hpp"
#include "../Manager.hpp"
#include "../Container.hpp"
@@ -269,17 +268,17 @@ protected:
XPath2MemoryManager *mm_;
DecisionPointQP::ListItem *DecisionPointQP::justInTimeOptimize(int contID, DynamicContext *context)
// **** IMPORTANT - This algorithm is very carefully arranged to avoid
// **** deadlocks and race-conditions. Don't rearrange things unless you
// **** know what you are doing!
+
// Get the runtime configuration
DbXmlConfiguration *conf = GET_CONFIGURATION(context);
// Lookup the container
ScopedContainer scont((Manager&)conf->getManager(), contID, /*mustExist*/true);
// Just-in-time optimise the QueryPlan, using a temporary memory manager for thread safety
XPath2MemoryManagerImpl tmpMM;
@@ -349,17 +348,18 @@ void DecisionPointQP::justInTimeOptimize
qp->staticTypingLite(context);
OptimizationContext opt(OptimizationContext::REARRANGE, context, 0, container);
qp = qp->optimize(opt);
qp->logQP(opt.getLog(), "OQP", qp, opt.getPhase());
OptimizationContext opt(OptimizationContext::ALTERNATIVES, context, 0, container);
- qp = qp->chooseAlternative(opt, "decision point", container->getContainerID() == 0);
+ opt.setCheckForSS(container->getContainerID() == 0);
+ qp = qp->chooseAlternative(opt, "decision point");
qp->logQP(opt.getLog(), "OQP", qp, opt.getPhase());
OptimizationContext opt(OptimizationContext::ADD_STEPS, context, 0, container);
qp = qp->optimize(opt);
qp->logQP(opt.getLog(), "OQP", qp, opt.getPhase());
@@ -390,26 +390,37 @@ DecisionPointQP::DecisionPointQP(const D
removed_(false),
qpList_(0),
qpListDone_(o->qpListDone_),
compileTimeMinder_(o->compileTimeMinder_),
compileTimeContext_(o->compileTimeContext_)
if(arg_ != 0)
_src.add(arg_->getStaticAnalysis());
+
+ bool checkForSS = opt.checkForSS();
+
+ try {
+ ListItem **li = &qpList_;
+ for(ListItem *oli = o->qpList_; oli != 0; oli = oli->next) {
+ opt.setCheckForSS(oli->container->getContainerID() == 0);
+
+ *li = new (mm) ListItem(oli->container, 0);
+ (*li)->qp = oli->qp->chooseAlternative(opt, "decision point");
- ListItem **li = &qpList_;
- for(ListItem *oli = o->qpList_; oli != 0; oli = oli->next) {
- *li = new (mm) ListItem(oli->container, 0);
- (*li)->qp = oli->qp->chooseAlternative(opt, "decision point", oli->container->getContainerID() == 0);
+ _src.add((*li)->qp->getStaticAnalysis());
- _src.add((*li)->qp->getStaticAnalysis());
- li = &(*li)->next;
+ li = &(*li)->next;
+ }
+ catch(...) {
+ opt.setCheckForSS(checkForSS);
+ throw;
+ }
+ opt.setCheckForSS(checkForSS);
DecisionPointQP::DecisionPointQP(const DecisionPointQP *o, XPath2MemoryManager *mm)
: QueryPlan(DECISION_POINT, o->getFlags(), mm),
dps_(o->dps_ ? o->dps_->copy(mm) : 0),
arg_(o->arg_ ? o->arg_->copy(mm) : 0),
removed_(false),
qpList_(0),
diff -ru dbxml-2.4.16-original/dbxml/src/dbxml/query/QueryPlan.cpp dbxml-2.4.16/dbxml/src/dbxml/query/QueryPlan.cpp
--- dbxml-2.4.16-original/dbxml/src/dbxml/query/QueryPlan.cpp
+++ dbxml-2.4.16/dbxml/src/dbxml/query/QueryPlan.cpp
@@ -1,14 +1,13 @@
// See the file LICENSE for redistribution information.
// Copyright (c) 2002,2008 Oracle. All rights reserved.
-// $Id$
#include "../DbXmlInternal.hpp"
#include <assert.h>
#include <string.h>
#include <sstream>
#include <set>
#include <algorithm>
@@ -128,23 +127,54 @@ void QueryPlan::createAlternatives(unsig
createCombinations(maxAlternatives, opt, combinations);
// Generate the alternatives by applying conversion rules to the combinations
for(QueryPlans::iterator it = combinations.begin(); it != combinations.end(); ++it) {
(*it)->applyConversionRules(maxAlternatives, opt, alternatives);
-CostSortItem::CostSortItem(QueryPlan *qp, OperationContext &oc, QueryExecutionContext &qec)
- : qp_(qp), cost_(qp->cost(oc, qec))
+class ContainsSequentialScan : public NodeVisitingOptimizer
+public:
+ bool run(QueryPlan *qp)
+ {
+ found = false;
+ optimizeQP(qp);
+ return found;
+ }
+
+private:
+ virtual void resetInternal() {}
+
+ virtual ASTNode *optimize(ASTNode *item)
+ {
+ // Don't look inside ASTNode objects
+ return item;
+ }
+ virtual QueryPlan *optimizeSequentialScan(SequentialScanQP *item)
+ {
+ found = true;
+ return item;
+ }
+
+ bool found;
+};
+
+CostSortItem::CostSortItem(QueryPlan *qp, OperationContext &oc, QueryExecutionContext &qec, bool checkForSS)
+ : qp_(qp), cost_(qp->cost(oc, qec)),
+ hasSS_(false)
+{
+ if(checkForSS) hasSS_ = ContainsSequentialScan().run(qp);
bool CostSortItem::operator<(const CostSortItem &o) const
+ if(hasSS_ != o.hasSS_) return !hasSS_;
+
if(cost_.totalPages() < o.cost_.totalPages()) return true;
if(cost_.totalPages() > o.cost_.totalPages()) return false;
if(cost_.pagesOverhead < o.cost_.pagesOverhead) return true;
if(cost_.pagesOverhead > o.cost_.pagesOverhead) return false;
return qp_ < o.qp_;
@@ -189,22 +219,22 @@ void QueryPlan::createReducedAlternative
if(i != costSortSet.end()) {
(*it)->release();
continue;
++alternativesCount;
- costSortSet.insert(CostSortItem(*it, oc, qec));
+ costSortSet.insert(CostSortItem(*it, oc, qec, opt.checkForSS()));
if(costSortSet.size() > ALTERNATIVES_THRESHOLD) {
// Trim all QueryPlans outside of a factor of the cost of the lowest cost QueryPlan
// TBD Make the specific factor configurable - jpcs
- set<CostSortItem>::iterator cutPoint = costSortSet.lower_bound(costSortSet.begin()->cost_.totalPages() * cutOffFactor);
+ set<CostSortItem>::iterator cutPoint = costSortSet.lower_bound(CostSortItem(costSortSet.begin()->cost_.totalPages() * cutOffFactor, false));
if(cutPoint != costSortSet.begin() && cutPoint != costSortSet.end()) {
for(i = cutPoint; i != costSortSet.end(); ++i) {
if(Log::isLogEnabled(Log::C_OPTIMIZER, Log::L_DEBUG)) {
ostringstream oss;
oss << "Rejected Alternative (outside cut off factor: ";
oss << (costSortSet.begin()->cost_.totalPages() * cutOffFactor);
oss << ")";
log(qec, oss.str());
@@ -247,54 +277,27 @@ void QueryPlan::createReducedAlternative
} else {
for(set<CostSortItem>::iterator i = costSortSet.begin(); i != costSortSet.end(); ++i) {
alternatives.push_back(i->qp_);
-class ContainsSequentialScan : public NodeVisitingOptimizer
+static bool betterAlternativeCost(const Cost &costA, bool ssA, const Cost &costB, bool ssB, bool checkForSS)
-public:
- bool run(QueryPlan *qp)
- found = false;
- optimizeQP(qp);
- return found;
-private:
- virtual void resetInternal() {}
- virtual ASTNode *optimize(ASTNode *item)
- // Don't look inside ASTNode objects
- return item;
- virtual QueryPlan *optimizeSequentialScan(SequentialScanQP *item)
- found = true;
- return item;
- bool found;
-static bool betterAlternativeCost(const Cost &costA, bool ssA, const Cost &costB, bool ssB, bool noSequentialScan)
- if(ssA != ssB && noSequentialScan) return ssB;
+ if(ssA != ssB && checkForSS) return ssB;
if(costA.totalPages() < costB.totalPages()) return true;
if(costA.totalPages() > costB.totalPages()) return false;
return costA.pagesOverhead < costB.pagesOverhead;
-QueryPlan *QueryPlan::chooseAlternative(OptimizationContext &opt, const char *name, bool noSequentialScan) const
+QueryPlan *QueryPlan::chooseAlternative(OptimizationContext &opt, const char *name) const
QueryPlans combinations;
createCombinations(MAX_ALTERNATIVES, opt, combinations);
// TBD remove the need for QueryExecutionContext here - jpcs
QueryExecutionContext qec(GET_CONFIGURATION(opt.getContext())->getQueryContext(),
/*debugging*/false);
qec.setContainerBase(opt.getContainerBase());
@@ -313,17 +316,17 @@ QueryPlan *QueryPlan::chooseAlternative(
for(QueryPlans::iterator it = myAlts.begin(); it != myAlts.end(); ++it) {
++alternativesCount;
QueryPlan *qp = (*it);
Cost itCost = qp->cost(opt.getOperationContext(), qec);
bool itSS = ContainsSequentialScan().run(qp);
- if(bestQP == 0 || betterAlternativeCost(itCost, itSS, bestCost, bestSS, noSequentialScan)) {
+ if(bestQP == 0 || betterAlternativeCost(itCost, itSS, bestCost, bestSS, opt.checkForSS())) {
if(bestQP != 0) {
log(qec, "Rejected Alternative (not best)");
bestQP->logCost(qec, bestCost, 0);
bestQP->release();
bestQP = qp;
bestCost = itCost;
bestSS = itSS;
diff -ru dbxml-2.4.16-original/dbxml/src/dbxml/query/QueryPlan.hpp dbxml-2.4.16/dbxml/src/dbxml/query/QueryPlan.hpp
--- dbxml-2.4.16-original/dbxml/src/dbxml/query/QueryPlan.hpp
+++ dbxml-2.4.16/dbxml/src/dbxml/query/QueryPlan.hpp
@@ -1,14 +1,13 @@
// See the file LICENSE for redistribution information.
// Copyright (c) 2002,2008 Oracle. All rights reserved.
-// $Id$
#ifndef __QUERYPLAN_HPP
#define __QUERYPLAN_HPP
#include <vector>
#include <set>
#include <string>
@@ -60,40 +59,44 @@ public:
REARRANGE = 2,
ALTERNATIVES = 3,
ADD_STEPS = 4,
MAKE_PREDICATES = 5,
REMOVE_REDUNDENTS = 6
OptimizationContext(Phase ph, DynamicContext *cn, QueryPlanOptimizer *qpo, ContainerBase *c = 0)
- : phase_(ph), context_(cn), qpo_(qpo), container_(c), isFetched_(false) {}
+ : phase_(ph), context_(cn), qpo_(qpo), container_(c), isFetched_(false), checkForSS_(false) {}
Phase getPhase() const { return phase_; }
DynamicContext *getContext() const { return context_; }
XPath2MemoryManager *getMemoryManager() const;
QueryPlanOptimizer *getQueryPlanOptimizer() const { return qpo_; }
void setQueryPlanOptimizer(QueryPlanOptimizer *qpo) { qpo_ = qpo; }
ContainerBase *getContainerBase() const { return container_; }
Transaction *getTransaction() const;
OperationContext &getOperationContext() const;
const IndexSpecification &getIndexSpecification() const;
const Log &getLog() const;
+ bool checkForSS() const { return checkForSS_; }
+ void setCheckForSS(bool val) { checkForSS_ = val; }
+
private:
Phase phase_;
DynamicContext *context_;
QueryPlanOptimizer *qpo_;
ContainerBase *container_;
mutable IndexSpecification is_;
mutable bool isFetched_;
+ bool checkForSS_;
class QueryPlan : public LocationInfo
public:
virtual ~QueryPlan() {}
typedef enum {
// Index lookups
@@ -176,17 +179,17 @@ public:
virtual const StaticAnalysis &getStaticAnalysis() const { return _src; }
virtual QueryPlan *optimize(OptimizationContext &opt) = 0;
virtual void createCombinations(unsigned int maxAlternatives, OptimizationContext &opt, QueryPlans &combinations) const;
virtual void applyConversionRules(unsigned int maxAlternatives, OptimizationContext &opt, QueryPlans &alternatives);
void createAlternatives(unsigned int maxAlternatives, OptimizationContext &opt, QueryPlans &alternatives) const;
void createReducedAlternatives(double cutOffFactor, unsigned int maxAlternatives, OptimizationContext &opt, QueryPlans &alternatives) const;
- QueryPlan *chooseAlternative(OptimizationContext &opt, const char *name, bool noSequentialScan = false) const;
+ QueryPlan *chooseAlternative(OptimizationContext &opt, const char *name) const;
virtual NodeIterator *createNodeIterator(DynamicContext *context) const = 0;
virtual Cost cost(OperationContext &context, QueryExecutionContext &qec) const = 0;
/** Returns the QueryPlanRoot objects from the PathsQP in this QueryPlan */
virtual void findQueryPlanRoots(QPRSet &qprset) const = 0;
/// Returns true if it's sure. Returns false if it doesn't know
virtual bool isSubsetOf(const QueryPlan *o) const = 0;
@@ -536,20 +539,21 @@ protected:
ImpliedSchemaNode *isn2_;
QPValue value2_;
DbWrapper::Operation operation2_;
struct CostSortItem {
- CostSortItem(double cost) : qp_(0), cost_(0, cost) {}
- CostSortItem(QueryPlan *qp, OperationContext &oc, QueryExecutionContext &qec);
+ CostSortItem(double cost, bool hasSS) : qp_(0), cost_(0, cost), hasSS_(hasSS) {}
+ CostSortItem(QueryPlan *qp, OperationContext &oc, QueryExecutionContext &qec, bool checkForSS);
bool operator<(const CostSortItem &o) const;
QueryPlan *qp_;
Cost cost_;
+ bool hasSS_;
#endif
diff -ru dbxml-2.4.16-original/dbxml/src/dbxml/query/SequentialScanQP.cpp dbxml-2.4.16/dbxml/src/dbxml/query/SequentialScanQP.cpp
--- dbxml-2.4.16-original/dbxml/src/dbxml/query/SequentialScanQP.cpp
+++ dbxml-2.4.16/dbxml/src/dbxml/query/SequentialScanQP.cpp
@@ -1,14 +1,13 @@
// See the file LICENSE for redistribution information.
// Copyright (c) 2002,2008 Oracle. All rights reserved.
-// $Id$
#include "../DbXmlInternal.hpp"
#include "SequentialScanQP.hpp"
#include "StepQP.hpp"
#include "QueryExecutionContext.hpp"
#include "../ContainerBase.hpp"
#include "../Document.hpp"
@@ -134,16 +133,17 @@ QueryPlan *SequentialScanQP::optimize(Op
return this;
NodeIterator *SequentialScanQP::createNodeIterator(DynamicContext *context) const
+ DBXML_ASSERT(container_->getContainerID() != 0);
if(nodeType_ == ImpliedSchemaNode::METADATA) {
return container_->createDocumentIterator(context, this);
} else {
NamedNodeIterator *result;
if(nodeType_ == ImpliedSchemaNode::ATTRIBUTE) {
result = container_->createAttributeIterator(context,
this,
nsUriID_); -
Modifier and Accessor Synchornization
Hi,
This is not what you may think -- it is not about getter and setters, allow me to elaborate. I have a data structure that requires a few modifications to the underlying data and many accesses (M threads add/remove, N threads access).
To achieve concurrency, I am adding two minimalistic locks to synchronize between two thread types. One for when the data structure goes under modification phase (modifyingLock) and one for the count of threads currently accessing the data structure (threadCount).
The premise is allow modifying threads to make modification to data when there is no thread accessing (reading) it. Therefore, it must wait until all the accessing threads are done by verifying whether threadCount is zero or not. Ergo, Access threads can be in two states, either in process of accessing the data or just about to begin searching. In case of former, the accessing thread should be allowed to finish it search and at the end decrement threadCount. If the thread of type N haven't begun accessing the data and see modifyingLock set, they should wait until modifications are perform to data and get notified upon completion. The process should work as following:
Modifying threads (M),
- Before making any modification, M must check to see whether modifyingLock as been set which is an indication another M thread is (OR about to) making modification to the data structure and also check whether any thread of type N is accessing it.
- If either condition is not met, threads of type M must wait until either another thread of type M finishes its task of modification and/or accessing threads of type N are flushed and done with their search (threadCount should be zero).
Accessing threads (N),
- If modifyingLock is not set, they can began accessing the data structure safely knowing thread of type M would not begin its task of modification until all the accessing threads are done.
- And any further new access would be blocked and have to wait until thread M's are done with their task.
Sounds logical but when it comes to the actual implementation, I can clearly see that there are many possible thread locks which I cannot seem to figure how to remedy:
// Partial code of add() method
// modifyingLock is a byte[] of size one
// threadAccessCount is an int[] of size one
synchronized (modifyingLock) {
while (modifyingLock[0] != 0 &&
threadAccessCount[0] != 0) {
try {
modifyingLock.wait();
} catch (InterruptedException ie) {
} // end of while loop
modifyingLock[0] = (byte) 1;
// Actual modification to the data structure
modifyingLock[0] = (byte) 0;
modifyingLock.notifyAll();
} // end of synchronized block
// Partial implementation of accessing method
synchronized (modifyingLock) {
while (modifyingLock[0] != 0) {
try {
modifyingLock.wait();
} catch (InterruptedException ie) {
} // end of while loop
} // end of synchronized block
synchronized (threadAccessCount) {
threadAccessCount[0]++;
// Actual search or read task
synchronized (threadAccessCount) {
threadAccessCount[0]--;
if (threadAccessCount[0] == 0) {
synchronized (modifyingLock) {
modifyingLock.notifyAll();
}Now my question, can you come up with any scenario where a deadlock or race condition can occur?
Thank YouMaxim_Karvonen,
Thank you for the keen eyes; I really didn't except anyone to read through the whole thing. I can swear I had made changes on my last reply that should reflect many of the issues you pointed out. I don't know why my last post does not reflect these changes. In any case, let me recap what the requirement for my problem is as it'll help explain the situation become more pristine.
We have writer and reader threads which both work on the same set of data structure. Since there is a mutual underlying data involved, I was hoping to come up with a mechanism to basically flush out the "reader" threads as soon as a "writer" thread appears for a modification to such data structure. And by flush, I am not saying to terminate the search in the middle of work but rather allow the "reader" threads to finish their task but any "new" coming "reader" AND "writer" thread must block and wait for the first "writer" thread to perform its duty.
The reason reader and writer threads may go head to head is because the modification methods (i.e. add() and remove()) both take advantage of the accessing method (i.e. contain()) to make sure an entry/node is present in the data structure before proceeding to the actual job. So as you can see, both reader and writer threads call upon contain() method at some point. Therefore, in the design, I have to make sure that all of its content only be available to a single "writer" thread.
Since my last reply, I managed (hopefully) to accomplish this to some extend. But soon after, I realized that my design would continuously only allow the "writer" threads to obtain the locks IF there is a writer thread present in the queue at any given time.
For instance, if I have 50 reader threads in the middle of contain() and 5 writer threads, waiting to be processed, all 50 reader threads must finish their task and exit. Meanwhile any incoming reader (or/and writer thread) must be blocked. Once all 50 reader threads egressed, one of the 5 (or + any incoming writer thread) writer thread is selected and given a way to perform its task.
Now, we can have two scenarios, one being to continue blocking all the incoming reader threads (which can add up to dozens by the time a writer thread finishes) due to the fact that we still have other writer threads. Secondly, allow a contention among reader and writer threads and see who wins. BUT... Due to my design, if a reader wins, then all other readers must be processed and their counter decremented to zero so the writer threads can be given a green flag for operation. And since number of reader threads pretty much dominates throughout the life cycle of the application, it will be a long epoch until any of the writer threads come to life.
So to combat this, I introduced a reader/writer ratio. Every time a writer performs its task, right before it exits, it checks to see whether a threshold has been reached. If not, it gives priority to another writer thread to continue executing until number of reader to writer threads passes such ratio. In that case, the ball would be in reader threads' court. So this is what I came up with so far:
private byte[] modifyingLock; // An indication that a writer thread "wanting" to do some work
private int[] accessThreadCount; // Number of "reader" threads
private short modificationThreadCount; // Number of "writer" threads
private double accessToModifyingThreadRatio; // The ratio of reader to writer (usually 10 to 1)
// add() method
synchorinzed (modifyingLock) {
modificationThreadCount++;
// An indication to all "reader" threads that a "writer" or "writers"
// thread(S) has joined a party and everyone needs to clear out a way
modifyingLock[0] = 1; /* (1.1) */
while (accessThreadCount[0] != 0) {
try {
modifyingLock.wait();
} catch (InterruptedException ie) {
} // end of while loop
// We have to reset the lock's value because add() needs to call on
// contain() and if we leave the lock's value on line "1.1", then
// a writer thread that calls contain() would also go on a block
// and create a deadlock
modifyingLock[0] = 0; /* (1.2) */
// If the item is already in the data structure, just return. But before
// doing so, a writer thread needs to do some cleaning and synchronization
if (/* contain() */) {
// As we discussed, if the "current" writing thread leave this lock's value
// unset, this would give the "reader" threads a way to execute without
// giving a chance to "writer" threads.
// This is due to the fact that the only way reader threads "know" whether
// a writer thread is about to do some work is to have this lock's value
// set which lets the incoming reader threads know that they should block
// ... for now.
// If ratio of reader threads to writer thread exceeds a certain number,
// then the lock's value is left unset so now, "reader" threads can execute
// their task. Otherwise, we let the next possible "writer" thread to take on.
// This is more like a writer thread vouching for another writer thread
// comrade.
if (/* threshold is NOT reached */) {
modifyingLock[0] = 1;
modificationThreadCount--;
modifyingLock.notifyAll();
return;
/****** Do the actual insertion *******/
// This is the end of the line for the writer thread so it has to perform
// the same set of procedures as I just described above. I could always
// modularize it to avoid code duplication but we leave it like that for now.
if (/* threshold is NOT reached */) {
modifyingLock[0] = 1;
modificationThreadCount--;
modifyingLock.notifyAll();
// contain() method
synchronized (accessThreadCount) {
accessThreadCount[0]++;
// If the modifyingLock is set, this is an indication that a writer
// thread is about to make a modification and all the incoming reader
// threads (those which have not begun accessing the underlying data
// structure) would have to block
// A writer thread that gets to this point has already acquired a lock
// so it follows through and also has the modifyingLock reset at line
// 1.2.
synchronized (modifyingLock) { /* (2.1) */
while (modifyingLock[0] == 1) {
try {
modifyingLock.wait(); /* (2.x) */
} catch (InterruptedException ie) {
} // end of while loop
} end of synchronized block
/******* Do the actual search *******/
// If this is a reader thread, try to acquire a lock on the object,
// decrement the "reader" thread counter. If the counter has reached
// zero, automatically call upon any waiting thread on the lock.
// The notification is only good under two conditions:
// - Only "reader" threads been performing and a writer(s) is on
// a waiting list.
// - Other reader threads that have been blocked on line 2.1
synchronized (modifyingLock) {
synchronized (accessThreadCount) {
accessThreadCount[0]--;
if (accessThreadCount[0] == 0) { /* (2.4) */
modifyingLock.notifyAll();
} // end of synchronized (accessThreadCount)
} // end of synchronized (modifyingLock)
return /* base on the outcome of search method above (2.2) */
///////////////////////////////////////////////////////////////////////////Ok, so this might not be the most crisp design for such requirements but I think it can hold its own. One concern that I have is that during the work flow of a "writer" thread, when it is about to exit contain(), it "might" call on notifyAll() once here, and another time at the bottom of add(). Would that be a problem? Note that the lock on the modifyingLock is still held by the writer thread up to the end of add().
Another tiny problem that I can think of is when contain() method returns at line (2.2). This is a clear case of unsafe "compound" actions which can result in some reader getting all the way down to line (2.2) and a context switch occurs. As a result, a writer thread begins working (and ultimately exits). In that case, a return value of true may well be "false!" due to alteration done to the data structure. This could leave a client of reader thread very unhappy. Although, this scenario can be very unlikely, it could happen and I do not know how to resolve it.
I still have to think about a few pointers. Maxim_Karvonen, I would like to extend my gratitude for your participation in this discussion. If it wasn't for your reply, I wouldn't have cared to post this latest response. I know this might not be the end so I keep digging.
Thank You -
Performance Issue: Deadlocks RAC
Hello,
Our team are recently doing performance tests in our RAC 11gR2 3 nodes Lab under Redhat Linux 5.
During the tests, there are deadlocks detected in the alert log file;
Tue Nov 15 11:22:54 2011
Global Enqueue Services Deadlock detected. More info in file
/opt/app/oracle/diag/rdbms/db/db1/trace/db1_lmd0_12563.trc.
Global Enqueue Services Deadlock detected. More info in file
/opt/app/oracle/diag/rdbms/db/db1/trace/db1_lmd0_12563.trc.
Tue Nov 15 11:23:10 2011
Global Enqueue Services Deadlock detected. More info in file
/opt/app/oracle/diag/rdbms/db/db1/trace/db1_lmd0_12563.trc.
Tue Nov 15 11:23:10 2011
Dumping diagnostic data in directory=[cdmp_20111115113104], requested by (instance=3, osid=13645), summary=[abnormal process termination].
Tue Nov 15 11:23:37 2011
Global Enqueue Services Deadlock detected. More info in file
/opt/app/oracle/diag/rdbms/db/db1/trace/db1_lmd0_12563.trc.
....About 50% of the queries are failing with this error; Looking into the trace file I have ;
/opt/app/oracle/diag/rdbms/db/db1/trace/db1_lmd0_12563.trc
user session for deadlock lock 0x3d2438198
sid: 200 ser: 9277 audsid: 1824543 user: 68/SNEADMIN
flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
pid: 65 O/S info: user: grid, term: UNKNOWN, ospid: 5258
image: oracle@db01
client details:
O/S info: user: cceadmin, term: unknown, ospid: 1234
machine: ca.gency.com program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
current SQL:
update CCE_ORDER_ITEM_DETAILS set ORDER_ID=:1 , CLUSTER_ID=:2 where ORDER_LINE_ITEM_ID=:3 and CLUSTER_ID=:4
DUMP LOCAL BLOCKER: initiate state dump for DEADLOCK
possible owner[65.5258] on resource TM-0001DF87-00000000
*** 2011-11-15 05:34:48.194
Submitting asynchronized dump request [28]. summary=[ges process stack dump (kjdglblkrdm1)].
Global blockers dump end:-----------------------------------
Global Wait-For-Graph(WFG) at ddTS[0.5] :
BLOCKED 0x3d2438198 4 wq 2 cvtops x1 TM 0x1df87.0x0(ext 0x0,0x0)[41000-0001-000004A5] inst 1
BLOCKER 0x3d7846108 4 wq 2 cvtops x1 TM 0x1df87.0x0(ext 0x0,0x0)[38000-0002-00002C68] inst 2
BLOCKED 0x3d7846108 4 wq 2 cvtops x1 TM 0x1df87.0x0(ext 0x0,0x0)[38000-0002-00002C68] inst 2
BLOCKER 0x3d75a4fd8 4 wq 2 cvtops x1 TM 0x1df87.0x0(ext 0x0,0x0)[43000-0003-000009C1] inst 3
BLOCKED 0x3d75a4fd8 4 wq 2 cvtops x1 TM 0x1df87.0x0(ext 0x0,0x0)[43000-0003-000009C1] inst 3
BLOCKER 0x3d2438198 4 wq 2 cvtops x1 TM 0x1df87.0x0(ext 0x0,0x0)[41000-0001-000004A5] inst 1
.....Any idea please , I can provide more information if needed
ThanksThe deadlocks posted seem to be concerned with mode 4 TM locksFrom the following links:
http://jonathanlewis.wordpress.com/2010/02/15/lock-horror/
http://www.rachelp.nl/index_kb.php?menu=articles&actie=show&id=15
and this information related to the third column in the global wait-for-graph:
#define KJUSERNL 0 /* no permissions */ (Null)
#define KJUSERCR 1 /* concurrent read */ (Row-S (SS))
#define KJUSERCW 2 /* concurrent write */ (Row-X (SX))
#define KJUSERPR 3 /* protected read */ (Share)
#define KJUSERPW 4 /* protected write */ (S/Row-X (SSX))
#define KJUSEREX 5 /* exclusive access */ (Exclusive)It seems I may have misinterpreted the "4" as meaning mode 4 when it should be mode "5"
If it is mode 5 that then might support the unindexed self-referential foreign key as I hope the demo below will illustrate.
Let me create a table will the relevant fk ( I used a 2 column fk just because it was similar to your situation but it's not relevant that it's 2 columns).
SQL> create table t1
2 (col1 number not null
3 ,col2 number not null
4 ,col3 varchar2(10) not null
5 ,related_col1 number
6 ,related_col2 number);
Table created.
SQL>
SQL> alter table t1 add constraint pk_t1 primary key (col1,col2);
Table altered.
SQL>
SQL> alter table t1 add constraint fk_t1 foreign key
2 (related_col1,related_col2) references t1 (col1,col2);
Table altered.
SQL>
SQL> insert into t1 values (1,1,'X',null,null);
1 row created.
SQL> insert into t1 values (2,2,'X',null,null);
1 row created.
SQL> commit;Now, show that if you don't update one of the columns involved in the pk, there's no issue with blocking locks:
Session 1:
Session1>update t1
2 set col3 = 'Y'
3 where col1 = 1
4 and col2 = 1;
1 row updated.
Session1>Session 2:
Session2>update t1
2 set col3 = 'Y'
3 where col1 = 2
4 and col2 = 2;
1 row updated.
Session2>But if the update statement updates one of the columns in the primary key which is referenced in the foreign key on the same table:
Session 1:
Session1>update t1
2 set col2 = 1
3 , col3 = 'Y'
4 where col1 = 1
5 and col2 = 1;
1 row updated.
Session1>Session 2 hangs:
Session2>update t1
2 set col1 = 2
3 , col3 = 'Y'
4 where col1 = 2
5 and col2 = 2;
hangsAnd you can see here that session 2 is waiting on a mode 5 TM lock:
Session1>l
1 select process,
2 l.sid,
3 type,
4 lmode,
5 request,
6 do.object_name
7 from v$locked_object lo,
8 dba_objects do,
9 v$lock l
10 where lo.object_id = do.object_id
11 AND l.sid = lo.session_id
12* AND do.object_name = 'T1'
Session1>/
PROCESS SID TY LMODE REQUEST OBJECT_NAM
4724:6028 1514 TM 0 5 T1
4724:6028 1514 AE 4 0 T1
4408:2384 380 TX 6 0 T1
4408:2384 380 TM 3 0 T1
4408:2384 380 AE 4 0 T1
Session1>However, if I create an index on that fk, the update no longer has the blocking issue:
Session1>create index i_t1 on t1 (related_col1, related_col2);
Index created.
Session1>update t1
2 set col2 = 1
3 , col3 = 'Y'
4 where col1 = 1
5 and col2 = 1;
1 row updated.
Session1>And session 2 no longer blocks:
Session2>update t1
2 set col1 = 2
3 , col3 = 'Y'
4 where col1 = 2
5 and col2 = 2;
1 row updated.
Session2>What of course I've missed is the added complexity which makes this a deadlock scenario involving these mode 5 locks.
We can do that artificially using the two sets of statements to make sure the two sessions are also competing for the same rows.
.... in session 1
Session1>drop index i_t1
2 /
Index dropped.
Session1>
Session1>update t1
2 set col3 = 'Y'
3 where col1 = 2
4 and col2 = 2;
1 row updated.
Session1>
.... over to session 2
Session2>update t1
2 set col3 = 'Y'
3 where col1 = 1
4 and col2 = 1;
1 row updated.
Session2>
.... back to session 1
Session1>update t1
2 set col2 = 1
3 , col3 = 'Y'
4 where col1 = 1
5 and col2 = 1;
.... back to Session 2
Session2>update t1
2 set col1 = 2
3 , col3 = 'Y'
4 where col1 = 2
5 and col2 = 2;
.... wait a few secs ....
update t1
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
Session2>rollback;And the locks involved in the deadlock look like this:
SQL> /
PROCESS SID TY LMODE REQUEST OBJECT_NAM
4724:6028 1514 TX 6 0 T1
4724:6028 1514 TM 3 5 T1
4724:6028 1514 AE 4 0 T1
4408:2384 380 TX 6 0 T1
4408:2384 380 TM 3 5 T1
4408:2384 380 AE 4 0 T1
6 rows selected.
SQL> So your situation might be slightly more complicated and you always have to bear in mind that the deadlock trace files do not tell you everything you need to know about all the statements involved in the deadlock situation. But it certainly points to this being an issue.
Edited by: Dom Brooks on Nov 16, 2011 2:39 PM
Edited by: Dom Brooks on Nov 16, 2011 2:46 PM
Added deadlock demo -
Hello All,
I am using Oracle RAC 11.2.0.3, I am facing deadlocks on my database.
I captured the lmd trace files, is there any way form the tarce files to know the row involved in the dead lock ?
I hvae info about the program and the query, but how can i know the row involved?
Regards,Hi,
I've faced this kind of problem and solved with the note below.
This note is to provide some common causes and solutions for message "Global Enqueue Services Deadlock detected" reported in alert log.
*Troubleshooting "Global Enqueue Services Deadlock detected" [ID 1443482.1]* -
Different Deadlock trace files
Hello,
In our application we use to have deadlock issues and i need to analyze that
trace file.Some time i use to have trace files which is having current session and
waiting session information and with modules and queries they are executing in top section
of trace file only , no need to read below data in trace file . But some times the
trace files are different..all update or select for update queries are spread
across the file and very difficult to understand which was locking what. Is that in rac or 11g environment
deadlock trace file is having different structure,?
One more question regarding deadlock ...many time we found that the current
query is updating on table A and waiting query updating on table B .. Is it possible
to have deadlock scenario when queries are working on different tables ? or
many be it is happening only if tables are in relation like parent and child ?hi,
Are you referring to .trm extention trace files which youare unable to read?
Here is good explanation of reading deadlock trace files
ORA-00060 Deadlock trace files.. how to read?
Thanks,
Ajay More
http://www.moreajays.com -
How unhealthy is this RAC?
Here's is the contents of v$system_event..
Is this
EVENT TOTAL_WAITS TIME_WAITED AVERAGE_WAIT
enq: TX - index contention 40564851 214701526 5.29
enq: TX - row lock contention 188846 12454614 65.95
enq: SQ - contention 141971 70568 0.5
cause for concern?
EVENT TOTAL_WAITS TIME_WAITED AVERAGE_WAIT
SQL*Net message to client 6015051449 607254 0
SQL*Net message from client 6015048542 178177969892 29.62
gcs remote message 2948555287 2633481757 0.89
CGS wait for IPC msg 1517805027 634397 0
db file sequential read 1500615188 816364485 0.54
ges remote message 1247679701 1407300224 1.13
gc cr multi block request 778432813 9913464 0.01
gc current block 2-way 747852637 38030616 0.05
db file scattered read 709428365 460939295 0.65
rdbms ipc message 708473316 37650068633 53.14
gc buffer busy acquire 671285134 1033621285 1.54
PX Deq: reap credit 667784615 484449 0
gcs log flush sync 592376026 171712257 0.29
gc cr block 2-way 530861847 19607062 0.04
library cache pin 437937120 15126237 0.03
log file sync 379523272 797193932 2.1
DIAG idle wait 359607166 2822108755 7.85
log file parallel write 351225436 259263769 0.74
LNS ASYNC end of log 350170653 1398410516 3.99
LNS wait on SENDREQ 321652621 3209301 0.01
PX qref latch 297396661 94308 0
read by other session 289140108 148440270 0.51
buffer deadlock 163505781 983055 0.01
gc current block busy 119223825 467716658 3.92
PX Deq: Table Q Normal 117332841 23574867 0.2
ksxr poll remote instances 110480324 90333 0
buffer busy waits 106938153 19933900 0.19
direct path read 93429599 108427028 1.16
SQL*Net more data from client 86471785 23026529 0.27
gc current grant busy 84978157 28215346 0.33
control file sequential read 82646297 23694583 0.29
PX Deq Credit: send blkd 78641669 9569299 0.12
latch: cache buffers chains 74218671 690277 0.01
gc current grant 2-way 72557796 1920419 0.03
library cache: mutex X 71106697 75993 0
DFS lock handle 70722498 2716407 0.04
gc cr grant 2-way 64558237 1633004 0.03
PX Deq: Execution Msg 61706261 314222076 5.09
gc cr block busy 61469863 119850802 1.95
library cache lock 52428649 3773354 0.07
PX Deq: Slave Session Stats 48040224 1886805 0.04
db file parallel read 46415188 118467902 2.55
IPC send completion sync 46250594 965101 0.02
enq: TX - index contention 40564851 214701526 5.29
PX Deq: Execute Reply 39689685 17243721 0.43
gc buffer busy release 36976909 242714774 6.56
SQL*Net more data to client 36627952 44167 0
PX Deq: Msg Fragment 30501244 343397 0.01
rdbms ipc reply 29725302 1352370 0.05
RMAN backup & recovery I/O 28824547 37722662 1.31
reliable message 27892263 3082134 0.11
PX Idle Wait 27356097 4651277341 170.03
ASM file metadata operation 25098749 8850323 0.35
gc object scan 22705857 7485 0
db file parallel write 19896252 52152606 2.62
latch: ges resource hash list 19336183 427451 0.02
enq: PS - contention 19143961 707455 0.04
PX Deq: Parse Reply 19093356 895799 0.05
gc cr disk read 17816846 448909 0.03
ASM background timer 16101806 1383957874 85.95
PX Deq: Slave Join Frag 16044789 233149 0.01
wait for unread message on broadcast channel 15056320 1413552546 93.88
cursor: mutex X 13435193 24140 0
KJC: Wait for msg sends to complete 13268497 11397 0
PX Deq: Signal ACK RSG 13214824 101941 0.01
KSV master wait 13206286 4235645 0.32
direct path read temp 12617694 5487608 0.43
PX Deq Credit: need buffer 11675868 879967 0.08
row cache lock 11536185 398216 0.03
PX Deq Credit: Session Stats 9480862 78910 0.01
SQL*Net message to dblink 9312894 1538 0
SQL*Net message from dblink 9312894 6279631 0.67
control file parallel write 7760982 11854435 1.53
pmon timer 7558889 1412576090 186.88
PX Deq: Join ACK 7548816 498931 0.07
gc current multi block request 6035173 155898 0.03
PING 5706961 1413230267 247.63
enq: XR - database force logging 4662671 198813 0.04
class slave wait 4561877 7097429006 1555.81
Streams AQ: waiting for messages in the queue 4495828 1543411682 343.3
SQL*Net more data from dblink 3696582 444575 0.12
LGWR wait for redo copy 3655353 17840 0
log file sequential read 3387305 6610414 1.95
Log archive I/O 2990486 276772 0.09
SQL*Net break/reset to client 2971976 2385935 0.8
direct path write temp 2839390 2522114 0.89
Space Manager: slave idle wait 2827526 1412987186 499.73
latch: shared pool 2808517 298150 0.11
latch: gc element 2421717 24688 0.01
SGA: MMAN sleep for component shrink 2336447 2458094 1.05
latch: enqueue hash chains 2279645 15435 0.01
latch free 2089418 78732 0.04
gc current split 2044784 1864009 0.91
PX Deq: Signal ACK EXT 1976164 19263 0.01
enq: FB - contention 1473469 61036 0.04
cursor: pin S wait on X 1313129 1464789 1.12
SQL*Net more data to dblink 1232891 986 0
Streams AQ: RAC qmn coordinator idle wait 1211300 788 0
enq: HW - contention 1175390 2077008 1.77
latch: session allocation 1167768 21883 0.02
Streams AQ: qmn coordinator idle wait 1144699 1412546634 1233.99
Streams AQ: qmn slave idle wait 1031585 2227183681 2158.99
lock deadlock retry 962937 4698 0
enq: CF - contention 956154 609647 0.64
latch: cache buffers lru chain 902764 37552 0.04
latch: object queue header operation 817911 27717 0.03
global enqueue expand wait 768633 654105 0.85
Data file init write 756191 329758 0.44
latch: gcs resource hash 647021 4147 0.01
local write wait 603007 286191 0.47
latch: row cache objects 599358 6453 0.01
ges lmd/lmses to freeze in rcfg - mrcvr 481759 156345 0.32
shared server idle wait 471190 1413238589 2999.3
enq: RF - DG Broker Current File ID 469833 23209 0.05
smon timer 432383 1411851085 3265.28
SGA: allocation forcing component growth 363333 379008 1.04
gc current retry 341104 1121252 3.29
enq: RF - synch: DG Broker metadata 319143 588290 1.84
enq: PG - contention 313659 14830 0.05
enq: TT - contention 260134 11207172 43.08
enq: KO - fast object checkpoint 236745 820808 3.47
dispatcher timer 236637 1413242481 5972.2
direct path write 231382 191008 0.83
cursor: pin S 229011 394 0
Streams AQ: waiting for time management or cleanup tasks 199981 1413148548 7066.41
enq: TX - row lock contention 188846 12454614 65.95
enq: TX - allocate ITL entry 153703 54252 0.35
enq: SQ - contention 141971 70568 0.5
ksdxexeother 141885 56 0
latch: redo allocation 138912 1858 0.01
recovery area: computing applied logs 126415 45925 0.36
gc current block congested 126318 21768 0.17
resmgr:cpu quantum 123074 151384 1.23
jobq slave wait 120678 35574221 294.79
Datapump dump file I/O 90431 9127 0.1
ges inquiry response 89402 4041 0.05
os thread startup 83809 222586 2.66
cr request retry 80062 71896 0.9
PX Deq: Table Q Sample 79665 133402 1.67
gc cr block congested 79026 14792 0.19
gc cr failure 77521 25019 0.32
enq: WF - contention 73983 825388 11.16
enq: TQ - TM contention 72871 3319 0.05
lock escalate retry 65714 1574 0.02
buffer exterminate 59775 64919 1.09
fbar timer 47136 1413183353 29980.98
log file switch completion 46911 452097 9.64
recovery area: computing obsolete files 45699 8547 0.19
enq: US - contention 40401 8805 0.22
enq: TM - contention 39149 5435032 138.83
library cache load lock 36311 382575 10.54
kjbdrmcvtq lmon drm quiesce: ping completion 31668 47443 1.5
enq: TD - KTF dump entries 31468 1424 0.05
enq: RO - fast object reuse 28422 31772 1.12
parallel recovery slave wait for change 27558 3163 0.11
name-service call wait 23694 181533 7.66
control file single write 22375 1624 0.07
kksfbc child completion 21239 106926 5.03
PX Deq: Table Q qref 19325 245 0.01
enq: TX - contention 18805 113253 6.02
latch: messages 17203 181 0.01
enq: RS - prevent file delete 16913 1013 0.06
enq: RS - prevent aging list update 15682 642 0.04
PX Deq: Table Q Get Keys 14322 42935 3
gc current grant congested 14292 2192 0.15
cursor: mutex S 13285 8 0
log file single write 13164 5371 0.41
latch: undo global data 12649 178 0.01
kksfbc research 11894 12680 1.07
parallel recovery slave idle wait 11193 5872 0.52
wait list latch free 11026 11794 1.07
enq: CT - state 11001 417 0.04
latch: checkpoint queue latch 10526 132 0.01
enq: PE - contention 10506 1139 0.11
ARCH wait on SENDREQ 9957 216480 21.74
gc cr grant congested 9465 1584 0.17
wait for scn ack 9377 3155 0.34
enq: TA - contention 8856 324 0.04
log buffer space 8777 89323 10.18
enq: TK - Auto Task Serialization 8542 343 0.04
enq: DR - contention 7842 323 0.04
process diagnostic dump 7707 2072 0.27
JOX Jit Process Sleep 7612 11286431 1482.72
enq: TC - contention 7357 340817 46.33
ges global resource directory to be frozen 7140 12299 1.72
enq: CO - master slave det 6850 312 0.05
enq: JS - job run lock - synchronize 6704 397 0.06
gcs drm freeze in enter server mode 6542 40742 6.23
enq: TS - contention 5959 89332 14.99
ARCH wait for archivelog lock 5600 36 0.01
PX Nsq: PQ load info query 5377 104798 19.49
db file single write 5373 3452 0.64
gc remaster 5315 50625 9.52
latch: parallel query alloc buffer 4939 1906 0.39
enq: TO - contention 4799 143 0.03
enq: AF - task serialization 4395 161 0.04
enq: PI - contention 4251 163 0.04
ges2 LMON to wake up LMD - mrcvr 4210 28 0.01
enq: DL - contention 3889 239 0.06
kjctssqmg: quick message send wait 3408 22 0.01
LNS wait on DETACH 3275 741 0.23
ksfd: async disk IO 3274 1 0
LNS wait on ATTACH 3273 51940 15.87
ARCH wait on DETACH 3231 714 0.22
ARCH wait on ATTACH 3226 43238 13.4
enq: BR - file shrink 2787 116 0.04
write complete waits 2631 1070 0.41
enq: MD - contention 2596 67 0.03
enq: WL - contention 2198 266518 121.25
single-task message 2098 25896 12.34
enq: OD - Serializing DDLs 2054 66 0.03
resmgr:internal state change 2001 14735 7.36
ARCH wait on c/f tx acquire 2 1751 175230 100.07
enq: WR - contention 1636 69 0.04
latch: cache buffer handles 1610 29 0.02
statement suspended, wait error to be cleared 1497 22626 15.11
Streams AQ: qmn coordinator waiting for slave to start 1214 678966 559.28
enq: PD - contention 1182 33 0.03
JS kgl get object wait 1096 4922 4.49
undo segment extension 1070 10065 9.41
PL/SQL lock timer 949 8739819 9209.5
enq: AE - lock 937 28 0.03
LGWR-LNS wait on channel 832 913 1.1
ges DFS hang analysis phase 2 acks 816 495 0.61
latch: redo writing 729 9 0.01
gc quiesce 665 564 0.85
enq: JS - queue lock 482 2111 4.38
PX Deq: Test for credit 442 13 0.03
enq: SS - contention 386 274 0.71
recovery area: computing dropped files 328 1400 4.27
recovery area: computing backed up files 328 496 1.51
ksdxexeotherwait 279 10592 37.97
log switch/archive 250 137570 550.28
gc domain validation 223 39964 179.21
auto-sqltune: wait graph update 195 96514 494.95
wait for a undo record 170 1214 7.14
parallel recovery coord send blocked 168 4 0.02
enq: JS - wdw op 168 3741 22.27
enq: KT - contention 165 5 0.03
switch logfile command 156 6290 40.32
gcs resource directory to be unfrozen 149 12839 86.17
Data Guard Broker Wait 139 10906 78.46
enq: SK - contention 129 4 0.03
enq: JS - job recov lock 128 4 0.03
gc cr block lost 125 6772 54.17
virtual circuit wait 122 3 0.03
ges LMON to get to FTDONE 100 187 1.87
enq: CU - contention 80 242 3.02
enq: JQ - contention 78 7 0.09
cursor: pin X 73 83 1.14
parallel recovery coord wait for reply 70 510 7.29
PX Deq: Txn Recovery Start 67 3436 51.29
SQL*Net break/reset to dblink 60 0 0
gc current block lost 57 2869 50.33
ges LMD suspend for testing event 51 709 13.89
inactive session 46 4550 98.91
recovery read 45 5 0.11
JS kill job wait 41 3548 86.53
enq: AS - service activation 40 1 0.03
enq: TL - contention 35 2 0.05
enq: UL - contention 34 524 15.42
gcs enter server mode 33 1559 47.24
wait for stopper event to be increased 30 218 7.27
enq: TQ - DDL contention 24 300 12.52
enq: MR - contention 21 1 0.03
ges reconfiguration to start 20 54 2.72
ges enter server mode 20 502 25.08
buffer latch 18 1337 74.26
enq: SR - contention 18 1 0.05
Streams: RAC waiting for inter instance ack 18 3748 208.21
enq: PR - contention 17 46 2.72
kupp process wait 16 166 10.39
checkpoint completed 15 3678 245.19
PX Deque wait 14 68 4.87
enq: BF - allocation contention 14 1 0.08
enq: XL - fault extent map 14 51 3.66
enq: FU - contention 14 17 1.18
enq: TH - metric threshold evaluation 13 114 8.78
enq: MW - contention 12 0 0.04
enq: DD - contention 10 0 0.04
process terminate 8 41 5.08
ges cgs registration 8 151 18.9
buffer resize 7 0 0
ktm: instance recovery 7 698 99.66
LNS wait on LGWR 6 0 0
ASM background starting 6 381 63.43
gc cr block 3-way 5 0 0.08
enq: PV - syncstart 5 9 1.74
Global transaction acquire instance locks 4 4 1.09
enq: RS - read alert level 4 0 0.04
LGWR wait on LNS 3 0 0
gc recovery 3 540 179.85
Streams AQ: enqueue blocked on low memory 3 544 181.2
DBWR range invalidation sync 3 17 5.83
enq: DM - contention 3 0 0.03
enq: RF - FSFO Observer Heartbeat 3 0 0.03
enq: JS - q mem clnup lck 3 0 0
DG Broker configuration file I/O 2 0 0
enq: RC - Result Cache: Contention 2 493 246.6
enq: KM - contention 2 0 0.03
enq: RT - contention 2 0 0.04
instance state change 2 0 0.12
kkdlgon 2 10 5.11
enq: TQ - INI contention 2 292 146.07
enq: JS - contention 2 0 0
ARCH wait for netserver start 1 400 400.02
log file switch (checkpoint incomplete) 1 3 3.44
JS coord start wait 1 50 50.09
ges lmd and pmon to attach 1 1 1.26
wait for tmc2 to complete 1 3 3.03
control file heartbeat 1 400 400.02
enq: SW - contention 1 0 0.04
enq: PW - perwarm status in dbw0 1 0 0.09
enq: FS - contention 1 0 0.04
enq: XR - quiesce database 1 0 0.04
enq: RS - write alert level 1 0 0.02
enq: CN - race with init 1 0 0.03
enq: FE - contention 1 4 3.77
Wait for shrink lock2 1 10 10.03
enq: IA - contention 1 0 0.02
enq: RF - atomicity 1 0 0.05
enq: RF - synchronization: aifo master 1 0 0.02
enq: RF - RF - Database Automatic Disable 1 0 0.06
enq: WP - contention 1 0 0.02
enq: TB - SQL Tuning Base Cache Load 1 0 0.05
enq: JS - evt notify 1 0 0.02Edited by: steffi on Mar 20, 2011 12:21 AM
Edited by: steffi on Mar 20, 2011 8:18 AM
Edited by: steffi on Mar 20, 2011 8:19 AMText can be formatted by tagging the beginning and end of the block of text with the code ta
\Formatted text goes here.
\Example:
This is formatted.When cutting and pasting text such as execution plans, excerpts from AWR reports, etc, it will maintain spacing and formatting, and make for much easier reading.
As to the content, well, dumping the contents of v$system_event is pretty close to useless.
As to the first three events you listed, 'enq: TX - index contention', 'enq: TX - row lock contention', 'enq: SQ - contention', well, all of those are easily tunable.
First, for 'enq: SQ - contention', check your sequences. Do you have any NOCACHE sequences? Or sequences with small caches?
As for 'enq: TX - row lock contention', well that's fairly self-explanatory. You have multiple sessions trying to lock the same row in the same table at the same time.
Last, 'enq: TX - index contention', that's non-row level contention on an index. For example, if you have a unique index, insert a row w/ column value 1, but don't commit, then try to insert that same value from another session.
But, before you do any of that, I think you need to clearly understand where the bottlenecks are. Try taking some AWR snapshots, about 5 minutes apart, when you're having performance problems. Look at the AWR report for that 5 minute period. In particular, look at your Top 5 timed events.
Hope that helps,
-Mark -
Deadlock detected but lmd log not found
We've an Oracle 10g RAC database. The version number is: 10.2.0.3.0. In one application sometimes we get deadlock error. When we check the alert log we see:
Global Enqueue Services Deadlock detected. More info in file
/oracle/product/admin/WEBCREAL/bdump/webcreal1_lmd0_27191.trc.
But when we checkthe folder /oracle/product/admin/WEBCREAL/bdump/ we do not see any log that contains the string "lmd0". We've also checked adump, udump folders but there is no file. I've even applied below command to find all files that contains "lmd0" but I've found only some old log files:
find / -name "*lmd*"I think there is a problem with lmd0 trace file generation but I can not find the root cause.
Any suggestion about this problem?Your find statement is anyway wrong, because you should use wildcards like here:
wcsprd@vsv1h181ps:/opt/wcsprd/ora/diag/rdbms/wcsprd/WCSPRD2 $ find . -name "lmd"-- Nothing found here
wcsprd@vsv1h181ps:/opt/wcsprd/ora/diag/rdbms/wcsprd/WCSPRD2 $ find . -name "*lmd*"
./trace/WCSPRD2_lmd0_1527948.trc
./trace/WCSPRD2_lmd0_1634460.trm
./trace/WCSPRD2_lmd0_1527948.trm
./trace/WCSPRD2_lmd0_467134.trc
./trace/WCSPRD2_lmd0_467134.trm
./trace/WCSPRD2_lmd0_462908.trc
./trace/WCSPRD2_lmd0_462908.trm
./trace/WCSPRD2_lmd0_1478740.trc
./trace/WCSPRD2_lmd0_1478740.trm
./trace/WCSPRD2_lmd0_1634460.trc
./trace/WCSPRD2_lmd0_626694.trc
./trace/WCSPRD2_lmd0_626694.trm
./trace/WCSPRD2_lmd0_860250.trc
./trace/WCSPRD2_lmd0_860250.trmCan it be that there is a cleanup process running on your system to clean out the log/trace directories in order to save disk space??
Otherwise the file must be there -
Database adapter causes deadlock
We use database adapter on 3 standalone tables with no FK. At the beginning of the process, we do a read
<jca:operation
ActivationSpec="oracle.tip.adapter.db.DBActivationSpec" ...>
At the end we do an update with
<jca:binding />
<operation name="merge">
<jca:operation
InteractionSpec="oracle.tip.adapter.db.DBWriteInteractionSpec"...>
That is, two db adapters at two ends.
We have got deadlocks with <lock-mode>1</lock-mode> or <lock-mode>0</lock-mode>.
We ahve two bpel nodes (10.1.2) in a cluster running on 10g RAC.
Any suggestion of avoiding deadlocks?If you are using SOA Suite 11g you can use query by example option to achieve part of that.
It support only equality check but on dynamic number of fields
Adam -
Hi, I have 11.1.0.7. RAC Linux database. I can see 'Global Enqueue Services Deadlock detected. More info in file [file_name]' in alert file.
Is there any tool/steps to interpret content of this trace file? This trace file have different format comparing to 10g and it is difficult to understand what caused deadlock.
Any suggestions greatly appreciated.
M.Hi,
SEssion 33 is blocking Session 9.
Object id = 1132910 - Object which is having blocking issue
select object_name from all_objects where object_id =1132910
TX is structured <rbs><slot><wrap>
TX-00060007-00049749 25 *33* X 23 9 X - blocking session
TX-0007000d-00037829 23 *9* X 25 33 X - blocked session
*25* - process *33* - blocker X - holds wait
*23* - process *9* - waiter X- same as above
as it does not have any row - might havig with some constraints issue on column level like unqiue constraints
Check that
HTH
- Pavan Kumar N -
read in the manual that deadlock (ora-00060) is treated separately as are also the
block corrupted and archive hung alerts.
But can not find anything to manage the deadlock message. These messages appear in the alert log file but EM 10g does not capture it.
ThanksLooking in alertlog.pl I see it excludes
257, 16038 for archive hung
1157, 1578, 27048 for data block corruption
603 for session terminated
No mention of deadlock being excluded.
Perhaps its just done via the generic alert monitoring config?
Go to the main database page, click on the link beside "Alert Log"
Then at the bottom click "Generic Alert Log Error Monitoring Configuration"
I use the following (esp for 10g SE &/or RAC)
Filter: .*ORA-0*(2097|0439|1555|0230|1587|27091|27072)\D.*
Critical: Undefined
Warning: ORA-.* -
Hi,
Recently we experienced a Deadlock which seemed to be related to JMS processing. I did a Thread Dump whilst the server was in this state which showed a number of "Blocked Lock Chains", which boiled down to 2 "Open Lock Chains".
The first was weblogic.socket.Muxer thread that seemed to be waiting in an "epollWait" loop, and held a fat lock on a java/lang/String object (that a number of other threads were waiting on).
The second was an ExecuteThread that had been initiated as a part of Tx completion processing, for which we have a Tx afterCompletion() synchronisation listener registered. The Tx Synchronisation listener sends one or more messages to JMS Topics - or at least in this case it attempts to. It gets stuck with the following message:
-- Waiting for notification on: weblogic/jms/backend/BEProducerSendRequest@0x2ac20bb500[fat lock]
and inlcudes a number of the following messages in the stack trace:
-- Lock released while waiting: weblogic/jms/backend/BEProducerSendRequest@0x2ac20bb500[fat lock]
These problems would seemed to be a consequence of a number of the following errors that had occurred in the hours prior to the deadlock:
<BEA-000503> <Incoming message header or abbreviation processing failed - Bad abbreviation value: '244' <- a number of these occurred, some with different values
<BEA-000503> <Incoming message header or abbreviation processing failed - ClassCastException: weblogic.rjvm.ClassTableEntry
Could the deadlock be the result of erroneous failure recovery from the above message processing failures?
Or could it be related to my other suspicion of some kind of race condition related to fact that we are trying to send a JMS Topic message as a part of the Tx afterCompletion processing potentially from the receipt of a JMS message itself, whilst a number of other JMS messages are also being processed?
We are using Weblogic 9.2 MP2 under JRockit 3.0.3 64 bit on Redhat Linux AS4 Update 4.
We use a number of JMS Queues, some of which are controlled via the UnitOfOrder feature to ensure in order message processing. The JMS Topics mentioned above will have a number of Durable Subscribers attached.
Any thoughts/suggestions would be much appreciated.Greg, did you get to a solution on this?
I am seeing the same type of messages:
Open lock chains
================
Chain 1:
"ExecuteThread: '33' for queue: 'weblogic.kernel.Default'" id=62 idx=0x80 tid=19223 waiting for java/lang/String@0xbd00b10 held by:
"ExecuteThread: '3' for queue: 'weblogic.kernel.Default'" id=32 idx=0x44 tid=19192 (waiting on notification)
Chain 2:
"ExecuteThread: '45' for queue: 'weblogic.kernel.Default'" id=74 idx=0x98 tid=19235 waiting for java/lang/String@0x1d1083a8 held by:
"ExecuteThread: '11' for queue: 'weblogic.kernel.Default'" id=40 idx=0x54 tid=19200 (waiting on notification)
Chain 3:
"ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'" id=100 idx=0xc8 tid=19297 waiting for java/lang/String@0x9c1fb10 held by:
"ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" id=102 idx=0xcc tid=19299 (active)
Can you let me know if you have had any success with this issue?
Thanks for all your help.
Regards
RV -
Hi,
We are facing RAC Interconnect performance problems.
Oracle Version: Oracle 9i RAC (9.2.0.7)
Operating system: SunOS 5.8
SQL> SELECT b1.inst_id, b2.value "RECEIVED",
b1.value "RECEIVE TIME",
((b1.value / b2.value) * 10) "AVG RECEIVE TIME (ms)"
FROM gv$sysstat b1, gv$sysstat b2
WHERE b1.name = 'global cache cr block receive time'
AND b2.name = 'global cache cr blocks received'
AND b1.inst_id = b2.inst_id;
INST_ID RECEIVED RECEIVE TIME AVG RECEIVE TIME (ms)
1 323849 172359 5.32220263
2 675806 94537 1.39887778
After database restart average time increases for Instance 1 and instance 2 remains similar.
Application performance degrades, restart database solves the issue. This is critical application and can not have frequent downtimes for restart.
What specific points should I check to find out to improve interconnect performance?
Thanks
Dilip Patel.Hi,
Configurations:
Node: 1
Hardware Model: Sun-Fire-V890
OS: SunOS 5.8
Release: Generic_117350-53
CPU: 16 sparcv9 cpu(s) running at 1200 MHz
Memory: 40.0GB
Node: 2
Hardware Model: Sun-Fire-V890
OS: SunOS 5.8
Release: Generic_117350-53
CPU: 16 sparcv9 cpu(s) running at 1200 MHz
Memory: 40.0GB
CPU Utilization on Node 1 is never exceeded 40%.
CPU Utilization on Node 2 is between 20% to 30%.
Application load is more Node 1 compared to Node 2.
I can observer wait event "global cache cr request" in top 5 wait events on most of the statspack report. Application faces degrade performacne after few days of restart database. No major changes done on application recently.
Statapack report for Node 1:
DB Name DB Id Instance Inst Num Release Cluster Host
XXXX 2753907139 xxxx1 1 9.2.0.7.0 YES xxxxx
Snap Id Snap Time Sessions Curs/Sess Comment
Begin Snap: 61688 17-Feb-09 09:10:06 253 299.4
End Snap: 61698 17-Feb-09 10:10:06 285 271.6
Elapsed: 60.00 (mins)
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 2,048M Std Block Size: 8K
Shared Pool Size: 384M Log Buffer: 2,048K
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
Redo size: 102,034.92 4,824.60
Logical reads: 60,920.35 2,880.55
Block changes: 986.07 46.63
Physical reads: 1,981.12 93.67
Physical writes: 28.30 1.34
User calls: 2,651.63 125.38
Parses: 500.89 23.68
Hard parses: 21.44 1.01
Sorts: 66.91 3.16
Logons: 3.69 0.17
Executes: 553.34 26.16
Transactions: 21.15
% Blocks changed per Read: 1.62 Recursive Call %: 22.21
Rollback per transaction %: 2.90 Rows per Sort: 7.44
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.99 Redo NoWait %: 100.00
Buffer Hit %: 96.75 In-memory Sort %: 100.00
Library Hit %: 98.30 Soft Parse %: 95.72
Execute to Parse %: 9.48 Latch Hit %: 99.37
Parse CPU to Parse Elapsd %: 90.03 % Non-Parse CPU: 92.97
Shared Pool Statistics Begin End
Memory Usage %: 94.23 94.93
% SQL with executions>1: 74.96 74.66
% Memory for SQL w/exec>1: 82.93 72.26
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
db file sequential read 1,080,532 13,191 40.94
CPU time 10,183 31.60
db file scattered read 456,075 3,977 12.34
wait for unread message on broadcast channel 4,195 2,770 8.60
global cache cr request 1,633,056 873 2.71
Cluster Statistics for DB: EPIP Instance: epip1 Snaps: 61688 -61698
Global Cache Service - Workload Characteristics
Ave global cache get time (ms): 0.8
Ave global cache convert time (ms): 1.1
Ave build time for CR block (ms): 0.1
Ave flush time for CR block (ms): 0.2
Ave send time for CR block (ms): 0.3
Ave time to process CR block request (ms): 0.6
Ave receive time for CR block (ms): 4.4
Ave pin time for current block (ms): 0.2
Ave flush time for current block (ms): 0.0
Ave send time for current block (ms): 0.3
Ave time to process current block request (ms): 0.5
Ave receive time for current block (ms): 2.6
Global cache hit ratio: 3.9
Ratio of current block defers: 0.0
% of messages sent for buffer gets: 3.7
% of remote buffer gets: 0.3
Ratio of I/O for coherence: 1.1
Ratio of local vs remote work: 10.9
Ratio of fusion vs physical writes: 0.0
Global Enqueue Service Statistics
Ave global lock get time (ms): 0.1
Ave global lock convert time (ms): 0.0
Ratio of global lock gets vs global lock releases: 1.0
GCS and GES Messaging statistics
Ave message sent queue time (ms): 0.4
Ave message sent queue time on ksxp (ms): 1.8
Ave message received queue time (ms): 0.2
Ave GCS message process time (ms): 0.1
Ave GES message process time (ms): 0.0
% of direct sent messages: 8.0
% of indirect sent messages: 49.4
% of flow controlled messages: 42.6
GES Statistics for DB: EPIP Instance: epip1 Snaps: 61688 -61698
Statistic Total per Second per Trans
dynamically allocated gcs resourc 0 0.0 0.0
dynamically allocated gcs shadows 0 0.0 0.0
flow control messages received 0 0.0 0.0
flow control messages sent 0 0.0 0.0
gcs ast xid 0 0.0 0.0
gcs blocked converts 2,830 0.8 0.0
gcs blocked cr converts 7,677 2.1 0.1
gcs compatible basts 5 0.0 0.0
gcs compatible cr basts (global) 142 0.0 0.0
gcs compatible cr basts (local) 142,678 39.6 1.9
gcs cr basts to PIs 0 0.0 0.0
gcs cr serve without current lock 0 0.0 0.0
gcs error msgs 0 0.0 0.0
gcs flush pi msgs 798 0.2 0.0
gcs forward cr to pinged instance 0 0.0 0.0
gcs immediate (compatible) conver 9,296 2.6 0.1
gcs immediate (null) converts 52,460 14.6 0.7
gcs immediate cr (compatible) con 752,507 209.0 9.9
gcs immediate cr (null) converts 4,047,959 1,124.4 53.2
gcs msgs process time(ms) 153,618 42.7 2.0
gcs msgs received 2,287,640 635.5 30.0
gcs out-of-order msgs 0 0.0 0.0
gcs pings refused 70,099 19.5 0.9
gcs queued converts 0 0.0 0.0
gcs recovery claim msgs 0 0.0 0.0
gcs refuse xid 1 0.0 0.0
gcs retry convert request 0 0.0 0.0
gcs side channel msgs actual 40,400 11.2 0.5
gcs side channel msgs logical 4,039,700 1,122.1 53.1
gcs write notification msgs 46 0.0 0.0
gcs write request msgs 972 0.3 0.0
gcs writes refused 4 0.0 0.0
ges msgs process time(ms) 2,713 0.8 0.0
ges msgs received 73,687 20.5 1.0
global posts dropped 0 0.0 0.0
global posts queue time 0 0.0 0.0
global posts queued 0 0.0 0.0
global posts requested 0 0.0 0.0
global posts sent 0 0.0 0.0
implicit batch messages received 288,801 80.2 3.8
implicit batch messages sent 622,610 172.9 8.2
lmd msg send time(ms) 2,148 0.6 0.0
lms(s) msg send time(ms) 1 0.0 0.0
messages flow controlled 3,473,393 964.8 45.6
messages received actual 765,292 212.6 10.1
messages received logical 2,360,972 655.8 31.0
messages sent directly 654,760 181.9 8.6
messages sent indirectly 4,027,924 1,118.9 52.9
msgs causing lmd to send msgs 33,481 9.3 0.4
msgs causing lms(s) to send msgs 13,220 3.7 0.2
msgs received queue time (ms) 379,304 105.4 5.0
msgs received queued 2,359,723 655.5 31.0
msgs sent queue time (ms) 1,514,305 420.6 19.9
msgs sent queue time on ksxp (ms) 4,349,174 1,208.1 57.1
msgs sent queued 4,032,426 1,120.1 53.0
msgs sent queued on ksxp 2,415,381 670.9 31.7
GES Statistics for DB: EPIP Instance: epip1 Snaps: 61688 -61698
Statistic Total per Second per Trans
process batch messages received 278,174 77.3 3.7
process batch messages sent 913,611 253.8 12.0
Wait Events for DB: EPIP Instance: epip1 Snaps: 61688 -61698
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
db file sequential read 1,080,532 0 13,191 12 14.2
db file scattered read 456,075 0 3,977 9 6.0
wait for unread message on b 4,195 1,838 2,770 660 0.1
global cache cr request 1,633,056 8,417 873 1 21.4
db file parallel write 8,243 0 260 32 0.1
buffer busy waits 16,811 0 168 10 0.2
log file parallel write 187,783 0 158 1 2.5
log file sync 75,143 0 147 2 1.0
buffer busy global CR 9,713 0 102 10 0.1
global cache open x 31,157 1,230 50 2 0.4
enqueue 58,261 14 45 1 0.8
latch free 33,398 7,610 44 1 0.4
direct path read (lob) 9,925 0 36 4 0.1
library cache pin 8,777 1 34 4 0.1
SQL*Net break/reset to clien 82,982 0 32 0 1.1
log file sequential read 409 0 31 75 0.0
log switch/archive 3 3 29 9770 0.0
SQL*Net more data to client 201,538 0 16 0 2.6
global cache open s 8,585 342 14 2 0.1
global cache s to x 11,098 148 11 1 0.1
control file sequential read 6,845 0 8 1 0.1
db file parallel read 1,569 0 7 4 0.0
log file switch completion 35 0 7 194 0.0
row cache lock 15,780 0 6 0 0.2
process startup 69 0 6 82 0.0
global cache null to x 1,759 48 6 3 0.0
direct path write (lob) 685 0 5 7 0.0
DFS lock handle 8,713 0 3 0 0.1
control file parallel write 1,350 0 2 2 0.0
wait for master scn 1,194 0 1 1 0.0
CGS wait for IPC msg 30,830 30,715 1 0 0.4
global cache busy 14 1 1 75 0.0
ksxr poll remote instances 30,997 12,692 1 0 0.4
direct path read 752 0 0 1 0.0
switch logfile command 3 0 0 148 0.0
log file single write 24 0 0 13 0.0
library cache lock 668 0 0 0 0.0
KJC: Wait for msg sends to c 1,161 0 0 0 0.0
buffer busy global cache 26 0 0 6 0.0
IPC send completion sync 261 260 0 0 0.0
PX Deq: reap credit 3,477 3,440 0 0 0.0
LGWR wait for redo copy 1,751 0 0 0 0.0
async disk IO 1,059 0 0 0 0.0
direct path write 298 0 0 0 0.0
slave TJ process wait 1 1 0 18 0.0
PX Deq: Execute Reply 3 1 0 3 0.0
PX Deq: Join ACK 8 4 0 1 0.0
global cache null to s 8 0 0 1 0.0
ges inquiry response 16 0 0 0 0.0
Wait Events for DB: EPIP Instance: epip1 Snaps: 61688 -61698
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
PX Deq: Parse Reply 6 2 0 1 0.0
PX Deq Credit: send blkd 2 1 0 0 0.0
PX Deq: Signal ACK 3 1 0 0 0.0
library cache load lock 1 0 0 0 0.0
buffer deadlock 6 6 0 0 0.0
lock escalate retry 4 4 0 0 0.0
SQL*Net message from client 9,470,867 0 643,285 68 124.4
queue messages 42,829 41,144 42,888 1001 0.6
wakeup time manager 601 600 16,751 27872 0.0
gcs remote message 795,414 120,163 13,606 17 10.4
jobq slave wait 2,546 2,462 7,375 2897 0.0
PX Idle Wait 2,895 2,841 7,021 2425 0.0
virtual circuit status 120 120 3,513 29273 0.0
ges remote message 142,306 69,912 3,504 25 1.9
SQL*Net more data from clien 206,559 0 19 0 2.7
SQL*Net message to client 9,470,903 0 14 0 124.4
PX Deq: Execution Msg 313 103 2 7 0.0
Background Wait Events for DB: EPIP Instance: epip1 Snaps: 61688 -61698
-> ordered by wait time desc, waits desc (idle events last)
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
db file parallel write 8,243 0 260 32 0.1
log file parallel write 187,797 0 158 1 2.5
log file sequential read 316 0 22 70 0.0
enqueue 56,204 0 15 0 0.7
control file sequential read 5,694 0 6 1 0.1
DFS lock handle 8,682 0 3 0 0.1
db file sequential read 276 0 2 8 0.0
control file parallel write 1,334 0 2 2 0.0
wait for master scn 1,194 0 1 1 0.0
CGS wait for IPC msg 30,830 30,714 1 0 0.4
ksxr poll remote instances 30,972 12,681 1 0 0.4
latch free 356 54 1 2 0.0
direct path read 752 0 0 1 0.0
log file single write 24 0 0 13 0.0
LGWR wait for redo copy 1,751 0 0 0 0.0
async disk IO 812 0 0 0 0.0
global cache cr request 69 0 0 1 0.0
row cache lock 45 0 0 1 0.0
direct path write 298 0 0 0 0.0
library cache pin 29 0 0 1 0.0
rdbms ipc reply 29 0 0 0 0.0
buffer busy waits 10 0 0 0 0.0
library cache lock 2 0 0 0 0.0
global cache open x 2 0 0 0 0.0
rdbms ipc message 179,764 36,258 29,215 163 2.4
gcs remote message 795,409 120,169 13,605 17 10.4
pmon timer 1,388 1,388 3,508 2527 0.0
ges remote message 142,295 69,912 3,504 25 1.9
smon timer 414 0 3,463 8366 0.0
------------------------------------------------------------- -
Hi All,
We have a 2 node rac database. client used to run a load everyday on PROD database.
Everyday they are getting a deadlock issue because of which they have rerun a job.
The load takes more time as expected.
When i checked the alert log i found the below error.
Global Enqueue Services Deadlock detected. More info in file
/u02/admin/EDWXPRD/bdump/edwxprd1_lmd0_10662.trc.
Tue Nov 2 07:02:54 2010
Global Enqueue Services Deadlock detected. More info in file
/u02/admin/EDWXPRD/bdump/regprd1_lmd0_10662.trc.
Trace files logs
DRM(2870) ignoring dissolve of 558909
* kjdrchkdrm: found an RM request in the request queue
Dissolve pkey 558910
DRM(2870) ignoring dissolve of 558910
*** 2010-11-02 07:00:36.256
stale cvak fr 1:0xade80f58([0x286b8][0x0],[AF])[h=KJUSERNL,n=KJUSEREX,b=KJUSERNL,ls=KJUSERSTAT_NOVALUE]:0x2 < 0x0
stale cvak fr 1:0xade80f58([0x286b8][0x0],[AF])[h=KJUSERNL,n=KJUSEREX,b=KJUSEREX,ls=KJUSERSTAT_NOVALUE]:0x3 < 0x0
*** 2010-11-02 07:01:25.070
user session for deadlock lock 0xbd8a20e0
pid=52 serial=49701 audsid=726380511 user: 58/ICOM_VW
O/S info: user: svc-ch-bo-sso, term: SOBSREP00, ospid: 8012:1884, machine: CORP\SHBAREP00
program: wireportserver.exe
application name: wireportserver.exe, hash value=1663395875
Current SQL Statement:
SELECT
V_TARGET_LST_D.LST_NM,
V_RUST_TARGET_D.ACTL_RNK,
DECODE(V_RUST_TARGET_D.TARGET_IND,:"SYS_B_00",:"SYS_B_01",:"SYS_B_02",:"SYS_B_03"),
count(V_RUST_D.SHIRE_RUST_ID),
V_TARGET_SLSFRC_DH.SLS_ORG_NM,
V_TARGET_SLSFRC_DH.PAR_SLSFRC_NM,
V_TARGET_SLSFRC_DH.SLSFRC_NM,
V_RUST_D.SHIRE_RUST_ID,
V_RUST_D.RUST_NM,
application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
ALTER TABLE RUST_TARGET_D ENABLE CONSTRAINT R_303
user session for deadlock lock 0xbd7dd210
pid=50 serial=62712 audsid=726380510 user: 57/ICOM_DM
O/S info: user: svc-etl-icbi, term: , ospid: 19433, machine: shbaetl00.corp.shire.com
program: [email protected] (TNS V1-V3)
application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
ALTER TABLE RUST_TARGET_D ENABLE CONSTRAINT R_303
user session for deadlock lock 0xbd8a13a8
pid=52 serial=49701 audsid=726380511 user: 58/ICOM_VW
O/S info: user: svc-ch-bo-sso, term: SHBAREP00, ospid: 8012:1884, machine: CORP\SHBAREP00
program: wireportserver.exe
application name: wireportserver.exe, hash value=1663395875
Current SQL Statement:
SELECT
V_TARGET_LST_D.LST_NM,
V_RUST_TARGET_D.ACTL_RNK,
DECODE(V_RUST_TARGET_D.TARGET_IND,:"SYS_B_00",:"SYS_B_01",:"SYS_B_02",:"SYS_B_03"),
count(V_RUST_D.SHIRE_RUST_ID),
V_TARGET_SLSFRC_DH.SLS_ORG_NM,
V_TARGET_SLSFRC_DH.PAR_SLSFRC_NM,
V_TARGET_SLSFRC_DH.SLSFRC_NM,
V_RUST_D.SHIRE_RUST_ID,
V_RUST_D.RUST_NM,
V_RUST_CON_D.FST_NM,
V_RUST_CON_D.LAST_NM,
V_RUST_TARGET_D.EXT_RNK
FROM
V_RUST_D,
The error which they are getting from application side is
ORA-06502: PL/SQL: numeric or value error: character to number conversion error
ORA-06512: at line 30
ORA-04020: deadlock detected while trying to lock object BCOM_DM.DATE_D
They complained that they get the deallock issue when they try to enable constraint as you can see in trace log.Hi,
GLOBAL ENQUEUE SERVICES DEADLOCK DETECTED [ID 973178.1]
http://www.dba-oracle.com/t_ora_04020_deadlock_detected_while_trying_to_lock_object_string.htm
Thanks
Maybe you are looking for
-
HELLO WORLD! First time writing in over eight years. I've got this dual 867 MDD, replaced cpus et voilà, chime, startup, boot...aargh, booot! The disk this machine came with is well packed and working under my g4 AGP, both os 9.2 and 10.3.9. The prob
-
Error 1074397153 Cannot Detect Recognizable Video source
I'm using a single PCI1410 card to acquire images from two cameras and display those images in Labview using the Multiple Camera Acquisition.vi found on the NI website. I'm running Windows XP, and my two cameras are Sony XC-HR70's with a resolution
-
Suddenly I can't search for keywords anymore in library filter. It's gone. I can still find my keyword tags to the right under keywording and keyword list. Were has it gone?
-
Dear all, I'm very hungry with my iMac and my Time Capsule : I try for two weeks to restore my iMac without any success... I wait 80 hours to finally have the message again and again : "to restart your computer : turn off computer with power button u
-
Just installed Photoshop CC - Crashing on Startup
Faulting application name: Photoshop.exe, version: 15.2.2.310, time stamp: 0x5480338c Faulting module name: msvcrt.dll, version: 7.0.7601.17744, time stamp: 0x4eeb033f Exception code: 0xc00000fd Fault offset: 0x00000000000015af Faulting process id: 0