Cache deadlock
We have had a few co-occurrence of a deadlock on one of our caches.
Thread seems stuck in com.tangosol.net.internal.StorageVersion.waitForPendingUpdates method. Any clues?
The stack trace for the worker threads on the offending node(two configured) is.
Thread[WorkflowEntityDistributedSchemeWorker:0,5,WorkflowEntityDistributedScheme]
INFO | 2012/01/24 08:40:32 | jvm 1 | java.lang.Object.wait(Native Method)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.util.SegmentedConcurrentMap$LockableEntry.waitForNotify(SegmentedConcurrentMap.java:939)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.util.SegmentedConcurrentMap.lock(SegmentedConcurrentMap.java:370)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCa
che$ResourceCoordinator.lock(PartitionedCache.CDB:4)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCa
che.lockKey(PartitionedCache.CDB:7)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCa
che$InvocationContext.lockEntry(PartitionedCache.CDB:19)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.createQueryResult(PartitionedCache.CDB:59)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.query(PartitionedCache.CDB:72)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onInvokeFilterRequest(PartitionedCache.CDB:55)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$InvokeFilterRequest.run(PartitionedCache.CDB:1)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:1)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:32)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.DaemonPool$Daemon.onNotify(DaemonPool.CDB:63)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
INFO | 2012/01/24 08:40:32 | jvm 1 | java.lang.Thread.run(Thread.java:662)
INFO | 2012/01/24 08:40:32 | jvm 1 |
INFO | 2012/01/24 08:40:32 | jvm 1 | Thread[WorkflowEntityDistributedSchemeWorker:1,5,WorkflowEntityDistributedScheme]
INFO | 2012/01/24 08:40:32 | jvm 1 | sun.misc.Unsafe.park(Native Method)
INFO | 2012/01/24 08:40:32 | jvm 1 | java.util.concurrent.locks.LockSupport.park(LockSupport.java:283)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.net.internal.StorageVersion.waitForPendingUpdates(StorageVersion.java:200)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.reevaluateQueryResults(PartitionedCache.CDB:39)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.checkIndexConsistency(PartitionedCache.CDB:52)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.createQueryResult(PartitionedCache.CDB:94)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.query(PartitionedCache.CDB:72)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onInvokeFilterRequest(PartitionedCache.CDB:55)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$InvokeFilterRequest.run(PartitionedCache.CDB:1)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:1)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:32)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.DaemonPool$Daemon.onNotify(DaemonPool.CDB:63)
INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
INFO | 2012/01/24 08:40:32 | jvm 1 | java.lang.Thread.run(Thread.java:662)
We have the service guardian disabled at the moment but we have successfully resolved the issue by killing the offending node.
I am facing a similar deadlock where all threads are at the following state:
"pool-3-thread-5" - Thread t@97
java.lang.Thread.State: TIMED_WAITING on com.tangosol.util.SegmentedConcurrentMap$LockableEntry@2b8f4fb4
at java.lang.Object.wait(Native Method)
at com.tangosol.util.SegmentedConcurrentMap$LockableEntry.waitForNotify(SegmentedConcurrentMap.java:939)
at com.tangosol.util.SegmentedConcurrentMap.lock(SegmentedConcurrentMap.java:370)
at com.tangosol.net.cache.CachingMap.get(CachingMap.java:462)
at com.nima.app.generic.CacheRepositoryImpl.getSingleByFilter(CacheRepositoryImpl.java:333)
at com.nima.app.bdm.configurationdata.businessdate.BusinessDateRepositoryImpl.getBusinessDateByCalendarDate(BusinessDateRepositoryImpl.java:108)
at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.getBusinessDate(DefaultProcessVariableEnricher.java:82)
at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.isMonthEnd(DefaultProcessVariableEnricher.java:73)
at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.enrichBusinessDate(DefaultProcessVariableEnricher.java:63)
at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.enrichProcessVariables(DefaultProcessVariableEnricher.java:53)
at com.nima.app.bpm.process.JBpmProcessExecutor.executeProcess(JBpmProcessExecutor.java:211)
at com.nima.app.bpm.message.XmlMessageProcessor.processMessage(XmlMessageProcessor.java:35)
at com.nima.app.bpm.message.JmsTextMessageListener.onMessage(JmsTextMessageListener.java:39)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:562)
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:500)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:468)
at org.springframework.jms.listener.AbstractMessageListenerContainer.executeListener(AbstractMessageListenerContainer.java:440)
at org.springframework.jms.listener.SimpleMessageListenerContainer.processMessage(SimpleMessageListenerContainer.java:340)
at org.springframework.jms.listener.SimpleMessageListenerContainer$1$1.run(SimpleMessageListenerContainer.java:307)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Locked ownable synchronizers:
- locked java.util.concurrent.locks.ReentrantLock$NonfairSync@550d9e8e
"pool-3-thread-4" - Thread t@96
java.lang.Thread.State: TIMED_WAITING on com.tangosol.util.SegmentedConcurrentMap$LockableEntry@2b8f4fb4
at java.lang.Object.wait(Native Method)
at com.tangosol.util.SegmentedConcurrentMap$LockableEntry.waitForNotify(SegmentedConcurrentMap.java:939)
at com.tangosol.util.SegmentedConcurrentMap.lock(SegmentedConcurrentMap.java:370)
at com.tangosol.net.cache.CachingMap.get(CachingMap.java:462)
at com.nima.app.generic.CacheRepositoryImpl.getSingleByFilter(CacheRepositoryImpl.java:333)
at com.nima.app.bdm.configurationdata.businessdate.BusinessDateRepositoryImpl.getBusinessDateByCalendarDate(BusinessDateRepositoryImpl.java:108)
at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.getBusinessDate(DefaultProcessVariableEnricher.java:82)
at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.isMonthEnd(DefaultProcessVariableEnricher.java:73)
at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.enrichBusinessDate(DefaultProcessVariableEnricher.java:63)
at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.enrichProcessVariables(DefaultProcessVariableEnricher.java:53)
at com.nima.app.bpm.process.JBpmProcessExecutor.executeProcess(JBpmProcessExecutor.java:211)
at com.nima.app.bpm.message.XmlMessageProcessor.processMessage(XmlMessageProcessor.java:35)
at com.nima.app.bpm.message.JmsTextMessageListener.onMessage(JmsTextMessageListener.java:39)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:562)
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:500)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:468)
at org.springframework.jms.listener.AbstractMessageListenerContainer.executeListener(AbstractMessageListenerContainer.java:440)
at org.springframework.jms.listener.SimpleMessageListenerContainer.processMessage(SimpleMessageListenerContainer.java:340)
at org.springframework.jms.listener.SimpleMessageListenerContainer$1$1.run(SimpleMessageListenerContainer.java:307)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Locked ownable synchronizers:
- locked java.util.concurrent.locks.ReentrantLock$NonfairSync@7124a841
any idea?
Similar Messages
-
Determine blocking sessions and blocked sessions in 9iR2
Hi,
Running 9.2.0.7 on Solaris 2.
We are trying to develop a query that can show us the blocked sessions and the session causing it. I have one working for 11 but for 9i, its a little more trickier. I am running these two so far:
select s1.username || '@' || s1.machine || ' ( SID=' || s1.sid ||
' ) is blocking ' || s2.username || '@' || s2.machine || ' ( SID=' ||
s2.sid || ' ) ' AS blocking_status
from gv$lock l1, gv$session s1, gv$lock l2, gv$session s2
where s1.sid = l1.sid
and s2.sid = l2.sid
and l1.BLOCK = 1
and l2.request > 0
and l1.id1 = l2.id1
and l2.id2 = l2.id2;
select do.object_name,
row_wait_obj#,
row_wait_file#,
row_wait_block#,
row_wait_row#,
dbms_rowid.rowid_create(1,
ROW_WAIT_OBJ#,
ROW_WAIT_FILE#,
ROW_WAIT_BLOCK#,
ROW_WAIT_ROW#)
from gv$session s, dba_objects do
where sid = 543
and s.ROW_WAIT_OBJ# = do.OBJECT_ID;Reason I need this is that lately we have been getting a lot of DEADLOCKS and we want to determine why this is happening a lot now and we want to start with who it is and what objects are causing it....any suggestions?mbobak wrote:
There are a few critical pieces to interpreting a deadlock trace file. First, to be clear, you're getting ORA-00060, not ORA-04020 (which is a library cache deadlock), correct?
If so, the tracefile will contain a deadlock graph. This will show the type of enqueue involved (TM or TX are the likely candidates), and the modes that locks and requests are being made.
Then, there's the SQL which encountered the deadlock, and finally, the other SQL involved in the deadlock.
All the above information is in the deadlock trace file.
Using it, you ought to be able to determine root cause of the deadlock.
If you need help understanding it, post here. If you post the deadlock graph, make sure you use code tags, or it will be unreadable.Yes we are getting the ORA-00060. This is what we get exactly from the AppTeam from the App:
Available exception message: iims.ge.common.exception.IIMSTechnicalException : ORA-00060: deadlock detected while waiting for resourceFrom our latest Deadlock occurence we got a LMD Trace file generated. We can see the DeadLock graph and its SQL. We the enqueue of TX and it's modes. Basically everything you asked for we see it in the trace file. What we want to see is what is causing it or who is so we can fix it. Maybe I am not reading the trace file correctly. I appreciate your assistance in helping me interpret the trace file. As requested, here is the trace file.
Dump file /var/local/oracle/logs/ora_prod_can1_lmd0_4432.trc
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.8.0 - Production
ORACLE_HOME = /opt/oracle/9.2.0
System name: SunOS
Node name: can-clust01
Release: 5.9
Version: Generic_118558-36
Machine: sun4u
Instance name: ORA_PROD_CAN1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 5
Unix process pid: 4432, image: oracle@can-clust01 (LMD0)
*** SESSION ID:(4.1) 2010-08-15 08:07:02.736
open lock on RM 0 0
*** 2010-08-15 08:07:31.353
open lock on RM 0 0
*** 2010-08-16 11:17:21.469
user session for deadlock lock 40972c9c0
pid=50 serial=6956 audsid=189500961 user: 61/IIMS_UWR
O/S info: user: weblogic, term: unknown, ospid: , machine: can-prod03
program: JDBC Thin Client
application name: JDBC Thin Client, hash value=0
Current SQL Statement:
UPDATE T_POLICY_PROPERTY POP SET POP.PRP_EFFECTIVE_END_DATE = :B3 , POP.PRP_LAST_UPDATED_DATE = SYSDATE WHERE POP.PRP_POL_POLICY_ID = :B2 AND POP.PRP_
PROPERTY_SEQ_NUM = 1 AND POP.PRP_EFFECTIVE_END_DATE = TO_DATE(:B1 , DATE_FORMAT)
Global Wait-For-Graph(WFG) at ddTS[0.1] :
BLOCKED 40972c570 5 [0x90014][0x19bb82],[TX] [131094,2] 1
BLOCKER 40972bb98 5 [0x90014][0x19bb82],[TX] [65586,6177] 0
BLOCKED 40972c9c0 5 [0x110014][0x12ec40],[TX] [65586,6177] 0
BLOCKER 40972ba18 5 [0x110014][0x12ec40],[TX] [131094,2] 1
user session for deadlock lock 40972c9c0
pid=50 serial=6956 audsid=189500961 user: 61/IIMS_UWR
O/S info: user: weblogic, term: unknown, ospid: , machine: can-prod03
program: JDBC Thin Client
application name: JDBC Thin Client, hash value=0
Current SQL Statement:
UPDATE T_POLICY_PROPERTY POP SET POP.PRP_EFFECTIVE_END_DATE = :B3 , POP.PRP_LAST_UPDATED_DATE = SYSDATE WHERE POP.PRP_POL_POLICY_ID = :B2 AND POP.PRP_
PROPERTY_SEQ_NUM = 1 AND POP.PRP_EFFECTIVE_END_DATE = TO_DATE(:B1 , DATE_FORMAT)
Global Wait-For-Graph(WFG) at ddTS[0.2] :
BLOCKED 40972c9c0 5 [0x110014][0x12ec40],[TX] [65586,6177] 0
BLOCKER 40972ba18 5 [0x110014][0x12ec40],[TX] [131094,2] 1
BLOCKED 40972c570 5 [0x90014][0x19bb82],[TX] [131094,2] 1
BLOCKER 40972bb98 5 [0x90014][0x19bb82],[TX] [65586,6177] 0
*** 2010-08-16 11:17:42.495
user session for deadlock lock 4098bcd08
pid=59 serial=981 audsid=189501588 user: 61/IIMS_UWR
O/S info: user: weblogic, term: unknown, ospid: , machine: can-prod03
program: JDBC Thin Client
application name: JDBC Thin Client, hash value=0
Current SQL Statement:
UPDATE T_POLICY_PROPERTY POP SET POP.PRP_EFFECTIVE_END_DATE = :B3 , POP.PRP_LAST_UPDATED_DATE = SYSDATE WHERE POP.PRP_POL_POLICY_ID = :B2 AND POP.PRP_
PROPERTY_SEQ_NUM = 1 AND POP.PRP_EFFECTIVE_END_DATE = TO_DATE(:B1 , DATE_FORMAT)
Global Wait-For-Graph(WFG) at ddTS[0.3] :
BLOCKED 41228b128 5 [0x70001][0x178a52],[TX] [131100,2] 1
BLOCKER 4098bade8 5 [0x70001][0x178a52],[TX] [65595,583] 0
BLOCKED 4098bcd08 5 [0x130025][0x1475c9],[TX] [65595,583] 0
BLOCKER 412275b78 5 [0x130025][0x1475c9],[TX] [131100,2] 1
user session for deadlock lock 4098bcd08
pid=59 serial=981 audsid=189501588 user: 61/IIMS_UWR
O/S info: user: weblogic, term: unknown, ospid: , machine: can-prod03
program: JDBC Thin Client
application name: JDBC Thin Client, hash value=0
Current SQL Statement:
UPDATE T_POLICY_PROPERTY POP SET POP.PRP_EFFECTIVE_END_DATE = :B3 , POP.PRP_LAST_UPDATED_DATE = SYSDATE WHERE POP.PRP_POL_POLICY_ID = :B2 AND POP.PRP_
PROPERTY_SEQ_NUM = 1 AND POP.PRP_EFFECTIVE_END_DATE = TO_DATE(:B1 , DATE_FORMAT)
Global Wait-For-Graph(WFG) at ddTS[0.4] :
BLOCKED 4098bcd08 5 [0x130025][0x1475c9],[TX] [65595,583] 0
BLOCKER 412275b78 5 [0x130025][0x1475c9],[TX] [131100,2] 1
BLOCKED 41228b128 5 [0x70001][0x178a52],[TX] [131100,2] 1
BLOCKER 4098bade8 5 [0x70001][0x178a52],[TX] [65595,583] 0Let's see what we can get out of this now :) -
HI all
db info :
9.2.0.8.0, System name: AIX
During night bath db generated ORA-04020 :
ORA-04020: deadlock detected while trying to lock object UCT.TMP_HGR_PU_O_201003_T17
object waiting waiting blocking blocking
handle session lock mode session lock mode
7000001221ecce8 70000011b28ba60 70000012edaed80 X 70000011c2f28c8 70000014a0f8560 S
7000001471248a0 70000011c2f28c8 700000121a36150 S 70000011b28ba60 700000121a35848 X
I would like to find session info (waiter/holder) ,.. etc: Can I translate 70000011b28ba60 ? is this hex number from wich info about session ?Hi Branislav,
As Jonathan mentioned, an ORA-4020 is a library cache deadlock. Before you have a library cache deadlock, your system will begin running into blocking library cache locks. If you can catch these in action, this script may help you track down what's going on and who's blocking who.
liblock.sql:
select decode(lob.kglobtyp, 0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE', 3, 'CLUSTER',
4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE',
7, 'PROCEDURE', 8, 'FUNCTION', 9, 'PACKAGE',
11, 'PACKAGE BODY', 12, 'TRIGGER',
13, 'TYPE', 14, 'TYPE BODY',
19, 'TABLE PARTITION', 20, 'INDEX PARTITION', 21, 'LOB',
22, 'LIBRARY', 23, 'DIRECTORY', 24, 'QUEUE',
28, 'JAVA SOURCE', 29, 'JAVA CLASS', 30, 'JAVA RESOURCE',
32, 'INDEXTYPE', 33, 'OPERATOR',
34, 'TABLE SUBPARTITION', 35, 'INDEX SUBPARTITION',
40, 'LOB PARTITION', 41, 'LOB SUBPARTITION',
42, 'MATERIALIZED VIEW',
43, 'DIMENSION',
44, 'CONTEXT', 46, 'RULE SET', 47, 'RESOURCE PLAN',
48, 'CONSUMER GROUP',
51, 'SUBSCRIPTION', 52, 'LOCATION',
55, 'XML SCHEMA', 56, 'JAVA DATA',
57, 'SECURITY PROFILE', 59, 'RULE',
62, 'EVALUATION CONTEXT',
'UNDEFINED') object_type,
lob.KGLNAOBJ object_name,
lk.KGLLKMOD lock_mode_held,
lk.KGLLKREQ lock_mode_requested,
ses.sid,
ses.serial#,
ses.username
FROM
x$kgllk lk,
v$session ses,
x$kglob lob,
v$session_wait vsw
WHERE
lk.KGLLKUSE = ses.saddr and
lk.KGLLKHDL = lob.KGLHDADR
and lob.kglhdadr = vsw.p1raw
and vsw.event = 'library cache lock'
order by lock_mode_held desc
/Hope that helps,
-Mark -
Coherence 3.3/387
.Net API
I have C++ GUI client with grid. It shows in the grid all updates from Coherence.
I have also the 'server' application, which inserts objects into Coherence (400-500 per second).
I use 'near-scheme' with front set to local-scheme and back set to remote-cache-scheme.
When I scroll the grid (while updating) i get 'busy' dialog and after get deadlock with this call stack:
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout, bool exitContext) + 0x14 bytes
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout) + 0x7 bytes
Coherence.dll!Tangosol.Util.ThreadGate.DoWait(long millis = -1) Line 634 + 0xb bytes C#
Coherence.dll!Tangosol.Util.ThreadGate.Close(long millis = -1) Line 282 + 0x11 bytes C#
Coherence.dll!Tangosol.Net.Cache.SynchronizedCache.Insert(object key = 100227898.0, object value = {Tangosol.Net.Cache.LocalCache.CacheLock}, long millis = 0) Line 177 + 0xc bytes C#
Coherence.dll!Tangosol.Net.Cache.SynchronizedCache.Insert(object key = 100227898.0, object value = {Tangosol.Net.Cache.LocalCache.CacheLock}) Line 131 + 0x11 bytes C#
Coherence.dll!Tangosol.Net.Cache.LocalCache.Lock(object key = 100227898.0, long waitTimeMillis = 0) Line 2194 + 0xe bytes C#
Coherence.dll!Tangosol.Net.Cache.CompositeCache.GetAll(System.Collections.ICollection keys = {Dimensions:[1]}) Line 896 + 0x16 bytes C#
> CoherenceWrap.dll!CoherenceWrap.CacheWrap.GetDealRecs(double[] adDatabaseIDs = {Dimensions:[1]}) Line 333 + 0xf bytes C#
CoherenceWrap.dll!CoherenceWrap.CacheWrap.GetDealRec(double adDatabaseID = 100227898.0) Line 319 + 0xb bytes C#
[Native to Managed Transition]
RMSSpotBlotter.exe!CoherenceWrap::ICacheRO::GetDealRec(double adDatabaseID=100227898.00000000) Line 980 + 0x1d bytes C++
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout, bool exitContext) + 0x14 bytes
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout) + 0x7 bytes
Coherence.dll!Tangosol.Util.ThreadGate.DoWait(long millis = -1) Line 634 + 0xb bytes C#
Coherence.dll!Tangosol.Util.ThreadGate.Enter(long millis = -1) Line 416 + 0x11 bytes C#
Coherence.dll!Tangosol.Net.Cache.SynchronizedCache.this[object].get(object key = 100227899.0) Line 483 + 0xc bytes C#
Coherence.dll!Tangosol.Net.Cache.LocalCache.Unlock(object key = 100227899.0) Line 2295 + 0xb bytes C#
Coherence.dll!Tangosol.Net.Cache.CompositeCache.GetAll(System.Collections.ICollection keys = {Dimensions:[1]}) Line 1000 + 0xf bytes C#
CoherenceWrap.dll!CoherenceWrap.CacheWrap.GetDealRecs(double[] adDatabaseIDs = {Dimensions:[1]}) Line 333 + 0xf bytes C#
CoherenceWrap.dll!CoherenceWrap.CacheWrap.GetDealRec(double adDatabaseID = 100227899.0) Line 319 + 0xb bytes C#
[Native to Managed Transition]
RMSSpotBlotter.exe!CoherenceWrap::ICacheRO::GetDealRec(double adDatabaseID=100227899.00000000) Line 980 + 0x1d bytes C++
If i use remote scheme, everything is fine. Does anybody know, why this happens?Hi Dmitry,
It appears that you may have hit a know bug in Coherence for .NET 3.3 that was fixed in 3.3.1:
COHNET-78: The ThreadGate.Close() implementation appears to miss notifications
Can you please try upgrading both your client and cluster to 3.3.1 and see if this resolves the problem?
Regards,
Jason -
Cache tag exceptions and deadlocks
Weblogic 6.1 sp2 on W2K and Solaris
If an un-caught Exception is produced from code with-in a cache tag, then the
cache for that page will become locked until server restart
This can be tested by throwing an Execption and displaying (using a seperate page)
all objects in the application, the cache with the Exception will be displayed
as xxx.lock
This can be avioded by catching all Exceptions
Gareth
"Christian Corcino" <[email protected]> wrote in message
news:3b5e3776$[email protected]..
> I do not quite understand the purpose of the keys in the cache tag,
> what are they used for.
> These are the different keys :
> parameter.key | page.key | request.key | application.key | session.key
>
> Can any one give me an example of when and how to use these keys,
These are logical sub-keys you can use to distinguish two tags that would
otherwise resolve to the same cached value. It helps prevent "overloading"
the cache name with disjoint information needed to distinguish cache
entries.
For instance, our application supports multiple languages and uses a page
variable called "language". Assume we want to cache the computed contents
of a select element according to the user's language:
<wl:cache name="selectValues_foobar" key="page.language" timeout="30m">
[code to compute select values omitted]
</wl:cache>
This lets us use one name to describe the particular dataset while still
allowing for independently cached values for differents values of
"language".
Felix Hack
[email protected]
-
Deadlock when calling cache.containsKey()
Hi,
Under heavy load, we sometimes experience a timeout when putting data in a distributed cache. In the example below we made a call to the containsKey() method of the "EVENTS" cache at 17:04:59 which never returned. When our execution timed out 1 minute later we received the following stack trace:
20 Feb 2009 17:05:57,667 [gridgain-#70%null%:grid-job-worker] ERROR server.common.aspect.ExceptionHandlingAspect - Handling Throwable for joinPoint execution(Serializable com.t
raficon.tmsng.server.common.message.ProcessMessageJob.execute())
(Wrapped) java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:31)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:11)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$BinaryMap.containsKey(DistributedCache.CDB:24)
at com.tangosol.util.ConverterCollections$ConverterMap.containsKey(ConverterCollections.java:1494)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap.containsKey(DistributedCache.CDB:1)
at com.tangosol.coherence.component.util.SafeNamedCache.containsKey(SafeNamedCache.CDB:1)
at com.tangosol.net.cache.CachingMap.containsKey(CachingMap.java:400)
at com.traficon.tmsng.server.common.cache.impl.coherence.CoherenceMessageCacheFacade.hasDuplicate(CoherenceMessageCacheFacade.java:445)
at com.traficon.tmsng.server.common.cache.impl.coherence.CoherenceMessageCacheFacade.addEventMessage(CoherenceMessageCacheFacade.java:131)
at com.traficon.tmsng.server.common.process.impl.MessageProcessorImpl.process(MessageProcessorImpl.java:115)
at com.traficon.tmsng.server.common.message.ProcessMessageJob.execute(ProcessMessageJob.java:57)
at org.gridgain.grid.kernal.processors.job.GridJobWorker.body(GridJobWorker.java:406)
at org.gridgain.grid.util.runnable.GridRunnable$1.run(GridRunnable.java:142)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Any idea why this is? The call is not executed within a coherence service thread and I think I made sure none of my custom filters, entryProcessors, valueUpdaters etc make any re-entrant calls.
Below you can find the coherence-config.xml:
<cache-config>
<caching-scheme-mapping>
<cache-mapping>
<cache-name>DATA_STORE</cache-name>
<scheme-name>outputNear</scheme-name>
</cache-mapping>
<cache-mapping>
<cache-name>EVENTS</cache-name>
<scheme-name>processNear</scheme-name>
</cache-mapping>
<cache-mapping>
<cache-name>REDUNDANT_EXECUTOR</cache-name>
<scheme-name>processNear</scheme-name>
</cache-mapping>
<cache-mapping>
<cache-name>NODE_MONITOR</cache-name>
<scheme-name>processNear</scheme-name>
</cache-mapping>
<cache-mapping>
<cache-name>MEMORY_MONITOR</cache-name>
<scheme-name>processNear</scheme-name>
</cache-mapping>
<cache-mapping>
<cache-name>DATABASE_MONITOR</cache-name>
<scheme-name>processNear</scheme-name>
</cache-mapping>
<cache-mapping>
<cache-name>DETECTOR_COMMUNICATOR_MANAGER_LISTENER</cache-name>
<scheme-name>inputNear</scheme-name>
</cache-mapping>
<cache-mapping>
<cache-name>REMOTE_FILTER_MONITOR</cache-name>
<scheme-name>processNear</scheme-name>
</cache-mapping>
<cache-mapping>
<cache-name>LOCK_MONITOR</cache-name>
<scheme-name>processNear</scheme-name>
</cache-mapping>
<cache-mapping>
<cache-name>EVENT_NUMBER</cache-name>
<scheme-name>processNear</scheme-name>
</cache-mapping>
<cache-mapping>
<cache-name>VERIFY_CONFIGURATION</cache-name>
<scheme-name>processNear</scheme-name>
</cache-mapping>
<cache-mapping>
<cache-name>OPEN_UNKNOWN_CONFIGURATION</cache-name>
<scheme-name>processNear</scheme-name>
</cache-mapping>
</caching-scheme-mapping>
<caching-schemes>
<near-scheme>
<scheme-name>inputNear</scheme-name>
<front-scheme>
<local-scheme>
<service-name>InputFrontService</service-name>
<eviction-policy>LRU</eviction-policy>
<high-units>10000</high-units>
<expiry-delay>10m</expiry-delay>
<flush-delay>5m</flush-delay>
</local-scheme>
</front-scheme>
<back-scheme>
<distributed-scheme>
<service-name>InputBackService</service-name>
<backing-map-scheme>
<local-scheme>
<service-name>InputBackLocalService</service-name>
</local-scheme>
</backing-map-scheme>
<local-storage system-property="tangosol.coherence.inputNode">false</local-storage>
<autostart system-property="tangosol.coherence.inputNode">false</autostart>
</distributed-scheme>
</back-scheme>
<invalidation-strategy>present</invalidation-strategy>
<autostart system-property="tangosol.coherence.inputNode">false</autostart>
</near-scheme>
<near-scheme>
<scheme-name>processNear</scheme-name>
<front-scheme>
<local-scheme>
<service-name>ProcessFrontService</service-name>
<eviction-policy>LRU</eviction-policy>
<high-units>10000</high-units>
<expiry-delay>10m</expiry-delay>
<flush-delay>5m</flush-delay>
</local-scheme>
</front-scheme>
<back-scheme>
<distributed-scheme>
<service-name>ProcessBackService</service-name>
<backing-map-scheme>
<local-scheme>
<service-name>ProcessBackLocalService</service-name>
</local-scheme>
</backing-map-scheme>
<local-storage system-property="tangosol.coherence.processNode">false</local-storage>
<autostart system-property="tangosol.coherence.processNode">false</autostart>
</distributed-scheme>
</back-scheme>
<invalidation-strategy>present</invalidation-strategy>
<autostart system-property="tangosol.coherence.processNode">false</autostart>
</near-scheme>
<near-scheme>
<scheme-name>outputNear</scheme-name>
<front-scheme>
<local-scheme>
<service-name>OutputFrontService</service-name>
<eviction-policy>LRU</eviction-policy>
<high-units>100</high-units>
<expiry-delay>5m</expiry-delay>
<flush-delay>1m</flush-delay>
</local-scheme>
</front-scheme>
<back-scheme>
<distributed-scheme>
<service-name>OutputBackService</service-name>
<backing-map-scheme>
<read-write-backing-map-scheme>
<internal-cache-scheme>
<local-scheme>
<service-name>OutputBackLocalService</service-name>
<eviction-policy>LRU</eviction-policy>
<high-units>1000</high-units>
<expiry-delay>10m</expiry-delay>
<flush-delay>1m</flush-delay>
</local-scheme>
</internal-cache-scheme>
<cachestore-scheme>
<class-scheme>
<scheme-name>outputBackStore</scheme-name>
<class-name>spring-bean:dataStore</class-name>
</class-scheme>
</cachestore-scheme>
<write-delay>20</write-delay>
<write-batch-factor>1</write-batch-factor>
</read-write-backing-map-scheme>
</backing-map-scheme>
<local-storage system-property="tangosol.coherence.outputNode">false</local-storage>
<autostart system-property="tangosol.coherence.outputNode">false</autostart>
</distributed-scheme>
</back-scheme>
<invalidation-strategy>auto</invalidation-strategy>
<autostart system-property="tangosol.coherence.outputNode">false</autostart>
</near-scheme>
</caching-schemes>
</cache-config>
Best regards
JanHi Jan,
you should not inject Coherence caches into Spring beans and Spring beans into Coherence-managed objects from the same bean-factory/application-context.
If you want to do so, then you should use a parent application context to be used by Coherence-managed objects, and another (child or one that is initialized later) application context to define beans that have Coherence caches injected into them.
You might want to look at Spring classes containing SingletonBeanFactoryLocator in their names for easily defining multiple application contexts in a hierarchy.
Best regards,
Robert -
ORA-00060: Deadlock detected
I getting error as "ORA-00060: Deadlock detected. More info in file /d01/oracle/VIS/db/tech_st/10.2.0/.................". Following are my observations on the occurence of this error.
The deadlock is encountered first time when trying to login to applications. I have R12 vision instance on linux.
Following the content of the alert_VIS.log file
Mon Jun 15 04:41:41 2009
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /d01/oracle/VIS/db/tech_st/10.2.0/dbs/arch
Autotune of undo retention is turned on.
IMODE=BR
ILAT =44
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.3.0.
System parameters with non-default values:
tracefiles_public = TRUE
processes = 200
sessions = 400
timed_statistics = TRUE
shared_pool_size = 419430400
shared_pool_reserved_size= 41943040
nls_language = american
nls_territory = america
nls_sort = binary
nls_date_format = DD-MON-RR
nls_numeric_characters = .,
nls_comp = binary
nls_length_semantics = BYTE
sga_target = 1073741824
control_files = /d01/oracle/VIS/db/apps_st/data/cntrl01.dbf, /d01/oracle/VIS/db/apps_st/data/cntrl02.dbf, /d01/oracle/VIS/db/apps_st/data/cntrl03.dbf
db_block_checksum = TRUE
db_block_size = 8192
compatible = 10.2.0
log_buffer = 14251008
log_checkpoint_interval = 100000
log_checkpoint_timeout = 1200
db_files = 512
log_checkpoints_to_alert = TRUE
dml_locks = 10000
undo_management = AUTO
undo_tablespace = APPS_UNDOTS1
db_block_checking = FALSE
O7_DICTIONARY_ACCESSIBILITY= FALSE
session_cached_cursors = 500
utl_file_dir = /usr/tmp, /usr/tmp, /d01/oracle/VIS/db/tech_st/10.2.0/appsutil/outbound/VIS_oracleebsr12, /usr/tmp
plsql_native_library_dir = /d01/oracle/VIS/db/tech_st/10.2.0/plsql/nativelib
plsql_native_library_subdir_count= 149
plsql_code_type = native
plsql_optimize_level = 2
job_queue_processes = 2
systemtrig_enabled = TRUE
cursor_sharing = EXACT
parallel_min_servers = 0
parallel_max_servers = 8
background_dump_dest = /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/bdump
user_dump_dest = /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/udump
max_dump_file_size = 20480
core_dump_dest = /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/cdump
db_name = VIS
open_cursors = 600
ifile = /d01/oracle/VIS/db/tech_st/10.2.0/dbs/VIS_oracleebsr12_ifile.ora
sortelimination_cost_ratio= 5
btree_bitmap_plans = FALSE
fastfull_scan_enabled = FALSE
sqlexecprogression_cost= 2147483647
likewith_bind_as_equality= TRUE
pga_aggregate_target = 1073741824
workarea_size_policy = AUTO
optimizer_secure_view_merging= FALSE
aq_tm_processes = 1
olap_page_pool_size = 4194304
Mon Jun 15 04:42:05 2009
WARNING:Oracle instance running on a system with low open file descriptor
limit. Tune your system to increase this limit to avoid
severe performance degradation.
PSP0 started with pid=3, OS id=6824
PMON started with pid=2, OS id=6822
MMAN started with pid=4, OS id=6826
DBW0 started with pid=5, OS id=6828
CKPT started with pid=7, OS id=6832
SMON started with pid=8, OS id=6834
RECO started with pid=9, OS id=6836
CJQ0 started with pid=10, OS id=6838
LGWR started with pid=6, OS id=6830
MMON started with pid=11, OS id=6840
MMNL started with pid=12, OS id=6842
Mon Jun 15 04:42:19 2009
ALTER DATABASE MOUNT
Mon Jun 15 04:42:25 2009
Setting recovery target incarnation to 2
Mon Jun 15 04:42:27 2009
Successful mount of redo thread 1, with mount id 243370348
Mon Jun 15 04:42:27 2009
Database mounted in Exclusive Mode
Completed: ALTER DATABASE MOUNT
Mon Jun 15 04:42:28 2009
ALTER DATABASE OPEN
Mon Jun 15 04:42:48 2009
Thread 1 opened at log sequence 16
Current log# 3 seq# 16 mem# 0: /d01/oracle/VIS/db/apps_st/data/log3.dbf
Successful open of redo thread 1
Mon Jun 15 04:42:48 2009
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Jun 15 04:42:48 2009
SMON: enabling cache recovery
Mon Jun 15 04:42:48 2009
Incremental checkpoint up to RBA [0x10.a779.0], current log tail at RBA [0x10.a779.0]
Mon Jun 15 04:43:01 2009
Successfully onlined Undo Tablespace 18.
Mon Jun 15 04:43:01 2009
SMON: enabling tx recovery
Mon Jun 15 04:43:04 2009
Database Characterset is UTF8
Mon Jun 15 04:43:18 2009
replication_dependency_tracking turned off (no async multimaster replication found)
Mon Jun 15 04:43:44 2009
Starting background process QMNC
QMNC started with pid=14, OS id=6884
Mon Jun 15 04:46:48 2009
Completed: ALTER DATABASE OPEN
Mon Jun 15 05:03:23 2009
Incremental checkpoint up to RBA [0x10.b1bd.0], current log tail at RBA [0x10.b1f3.0]
Mon Jun 15 05:23:33 2009
Incremental checkpoint up to RBA [0x10.b5b3.0], current log tail at RBA [0x10.b5c2.0]
Mon Jun 15 05:45:12 2009
Incremental checkpoint up to RBA [0x10.b7b0.0], current log tail at RBA [0x10.fbce.0]
This is upto the point where all DB and application services has been started.
Once trying to login to applications following content got appended to the log file
Mon Jun 15 05:53:39 2009
ORA-00060: Deadlock detected. More info in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/udump/vis_ora_8149.trc.
Mon Jun 15 05:53:51 2009
ORA-00060: Deadlock detected. More info in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/udump/vis_ora_8149.trc.
Mon Jun 15 05:54:02 2009
ORA-00060: Deadlock detected. More info in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/udump/vis_ora_8149.trc.
Mon Jun 15 05:54:12 2009
ORA-00060: Deadlock detected. More info in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/udump/vis_ora_8149.trc.
Mon Jun 15 05:54:22 2009
ORA-00060: Deadlock detected. More info in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/udump/vis_ora_8149.trc.
Mon Jun 15 05:54:28 2009
ORA-00060: Deadlock detected. More info in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/udump/vis_ora_8149.trc.
Mon Jun 15 05:54:35 2009
ORA-00060: Deadlock detected. More info in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/udump/vis_ora_8149.trc.
Mon Jun 15 05:54:42 2009
ORA-00060: Deadlock detected. More info in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/udump/vis_ora_8149.trc.
Mon Jun 15 05:59:06 2009
Process J000 died, see its trace file
Mon Jun 15 05:59:11 2009
kkjcre1p: unable to spawn jobq slave process
Mon Jun 15 05:59:11 2009
Errors in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/bdump/vis_cjq0_6838.trc:
Mon Jun 15 05:59:23 2009
Process J000 died, see its trace file
Mon Jun 15 05:59:24 2009
kkjcre1p: unable to spawn jobq slave process
Mon Jun 15 05:59:24 2009
Errors in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/bdump/vis_cjq0_6838.trc:
Mon Jun 15 05:59:50 2009
Process J000 died, see its trace file
Mon Jun 15 05:59:50 2009
kkjcre1p: unable to spawn jobq slave process
Mon Jun 15 05:59:50 2009
Errors in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/bdump/vis_cjq0_6838.trc:
I did the TKPROF on .trc files but the tkprof file does not show any details as such except similar to :- (don't know if I am missing anything while issuing TKPROF as $ tkprof filename.trc
filename.txt explain=apps/apps)
TKPROF: Release 10.2.0.3.0 - Production on Mon Jun 15 06:07:14 2009
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Trace file: /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/bdump/vis_cjq0_6838.trc
Sort options: default
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
elapsed = elapsed time in seconds executing
disk = number of physical reads of buffers from disk
query = number of buffers gotten for consistent read
current = number of buffers gotten in current mode (usually for update)
rows = number of rows processed by the fetch or execute call
0 statements EXPLAINed in this session.
Trace file: /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/bdump/vis_cjq0_6838.trc
Trace file compatibility: 10.01.00
Sort options: default
1 session in tracefile.
0 user SQL statements in trace file.
0 internal SQL statements in trace file.
0 SQL statements in trace file.
0 unique SQL statements in trace file.
22 lines in trace file.
0 elapsed seconds in trace file.
Yesterday, I did login to applications after multiple attempts and tried submitting a concurrent request of a standard report (after resolving the data block corrupt issue) and got the same ORA-00060 Error.
I have a fresh VISION R12 (12.0.4) installed without any customizations. My installation looks to be quite unstable, takes 2-3 attempts for successful login to apps.
Can you please give any clues on this and how to overcome the problem?
Thanks,
AmitI have run cmclean.sql as per :- Re: R12 Vision install - Unable to submit concurrent request
This is the only change made. No new patches etc. Before running cmclean.sql I believe the instance was working fine.
Now everytime I start the application services, its causing ORA=00060: Deadlock error. There are no issues with just DB services up and running.
And after apps services up, when trying to Login to apps it just hangs, get error as follows:
Tue Jun 23 02:04:55 2009
Process J001 died, see its trace file
Tue Jun 23 02:04:55 2009
kkjcre1p: unable to spawn jobq slave process
Tue Jun 23 02:04:55 2009
Errors in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/bdump/vis_cjq0_6747.trc:
Tue Jun 23 02:05:04 2009
Process q002 died, see its trace file
Tue Jun 23 02:05:04 2009
ksvcreate: Process(q002) creation failed
Tue Jun 23 02:05:55 2009
Process J000 died, see its trace file
Tue Jun 23 02:05:55 2009
kkjcre1p: unable to spawn jobq slave process
Tue Jun 23 02:05:55 2009
Errors in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/bdump/vis_cjq0_6747.trc:
Tue Jun 23 02:06:11 2009
Process J000 died, see its trace file
Tue Jun 23 02:06:11 2009
kkjcre1p: unable to spawn jobq slave process
Tue Jun 23 02:06:11 2009
Errors in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/bdump/vis_cjq0_6747.trc:
Tue Jun 23 02:08:51 2009
Process J000 died, see its trace file
Tue Jun 23 02:08:52 2009
kkjcre1p: unable to spawn jobq slave process
Tue Jun 23 02:08:52 2009
Errors in file /d01/oracle/VIS/db/tech_st/10.2.0/admin/VIS_oracleebsr12/bdump/vis_cjq0_6747.trc:
The OS also hangs (Linux) and I have to exit abnormally everytime which is frustrating.
I am not sure the reason for the same. I have gone through the metalink notes pointed which says to install the health check engine.
Do you have any clues based on above information as to what might be causing this problem.
I have 4 GB RAM installed on my Windows and 2 GB has been allocated to Linux on VMware.
Please let me know if I need to upgrade the memory.
Any pointers would be really helpful.
Thanks,
Amit -
Hi All,
<br/>
<br/>We are now facing some problems on our smartform printing programs, sometimes some of the smartform programs look getting stuck, from SM50, the processing time is endless, and from the detailed tracing log, there are repeated messages like below for every ONE MINUTE:
<br/>I WARNING: MtxLock 0x70000000636197c rrol0046 owner=33 deadlock ?
<br/>
<br/>I've searched the forum and also did google and SAP notes but can not get useful information match with my error, so could anyone please help to analyse what exact cause of this problem? Any suggestions is highly appreciated.
<br/>
<br/>The smartform program itself is simple and correct, just has one header and several items and uses normal SSF_FUNCTION_MODULE_NAME and CALL FNAME to do the print work, in most of time it works correct, but sometimes it just has problem and then the deadlock error occurred. I guess maybe it's because user tried to print too fast than the capability of the printer and then user canceled the job, then in some situations, the upcoming printing jobs get blocked and then get stuck, but it's just a guess, i've no proof to identify it.
<br/>
<br/>Here I attached the full trace log, hope someone could give me a idea, thank you very much!
<br/>
<br/>----
<br/>trc file: "dev_w38", trc level: 1, release: "701"
<br/>----
<br/>M Thu Dec 9 14:03:55 2010
<br/>M db_connect o.k.
<br/>M ICT: exclude compression: .zip,.cs,.rar,.arj,.z,.gz,.tar,.lzh,.cab,.hqx,.ace,.jar,.ear,.war,.css,.pdf,.js,.gzip
<br/>I MtxInit: 38 0 0
<br/>M SHM_PRES_BUF (addr: 0x700001050000000, size: 44000000)
<br/>M SHM_ROLL_AREA (addr: 0x700001060000000, size: 536870912)
<br/>M SHM_PAGING_AREA (addr: 0x700001080000000, size: 536870912)
<br/>M SHM_ROLL_ADM (addr: 0x700000006221000, size: 5506336)
<br/>M SHM_PAGING_ADM (addr: 0x7000010a0000000, size: 1311776)
<br/>M ThCreateNoBuffer allocated 544152 bytes for 1000 entries at 0x7000010b0002000
<br/>M ThCreateNoBuffer index size: 3000 elems
<br/>M ThCreateVBAdm allocated 11776 bytes (50 server) at 0x7000010d0000000
<br/>X EmInit: MmSetImplementation( 2 ).
<br/>X MM global diagnostic options set: 0
<br/>X EM/TOTAL_SIZE_MB = 262144
<br/>X mm.dump: set maximum dump mem to 96 MB
<br/>M Deactivate statistics hyper index locking
<br/>I *** INFO Shm 44 in Pool 40 18928 KB estimated 14500 KB real ( -4427 KB -24 %)
<br/>I *** INFO Shm 45 in Pool 40 12928 KB estimated 8500 KB real ( -4427 KB -35 %)
<br/>B dbntab: NTAB buffers attached
<br/>B dbntab: Buffer FTAB(hash header) (addr: 0x7000010b0088088, size: 584)
<br/>B dbntab: Buffer FTAB(anchor array) (addr: 0x7000010b00882d0, size: 1280008)
<br/>B dbntab: Buffer FTAB(item array) (addr: 0x7000010b01c0ad8, size: 5120000)
<br/>B dbntab: Buffer FTAB(data area) (addr: 0x7000010b06a2ad8, size: 122880000)
<br/>B dbntab: Buffer IREC(hash header) (addr: 0x7000010b7bd4088, size: 584)
<br/>B dbntab: Buffer IREC(anchor array) (addr: 0x7000010b7bd42d0, size: 1280008)
<br/>B dbntab: Buffer IREC(item array) (addr: 0x7000010b7d0cad8, size: 1280000)
<br/>B dbntab: Buffer IREC(data area) (addr: 0x7000010b7e452d8, size: 12288000)
<br/>B dbntab: Buffer STAB(hash header) (addr: 0x7000010b89ff088, size: 584)
<br/>B dbntab: Buffer STAB(anchor array) (addr: 0x7000010b89ff2d0, size: 1280008)
<br/>B dbntab: Buffer STAB(item array) (addr: 0x7000010b8b37ad8, size: 1280000)
<br/>B dbntab: Buffer STAB(data area) (addr: 0x7000010b8c702d8, size: 6144000)
<br/>B dbntab: Buffer TTAB(hash header) (addr: 0x7000010b924e088, size: 6720)
<br/>B dbntab: Buffer TTAB(anchor array) (addr: 0x7000010b924fac8, size: 1280008)
<br/>B dbntab: Buffer TTAB(item array) (addr: 0x7000010b93882d0, size: 3200000)
<br/>B dbntab: Buffer TTAB(data area) (addr: 0x7000010b96956d0, size: 23360000)
<br/>B db_con_shm_ini: WP_ID = 38, WP_CNT = 59, CON_ID = -1
<br/>B dbstat: TABSTAT buffer attached (addr: 0x7000010f002d2d0)
<br/>B dbtbxbuf: Buffer TABL (addr: 0x700001100000100, size: 180000000, end: 0x70000110aba9600)
<br/>B dbtbxbuf: Buffer TABLP (addr: 0x700000006763100, size: 20480000, end: 0x700000007aeb100)
<br/>B dbexpbuf: Buffer EIBUF (addr: 0x700000007aec108, size: 67108864, end: 0x70000000baec108)
<br/>B dbexpbuf: Buffer ESM (addr: 0x700001110000108, size: 4194304, end: 0x700001110400108)
<br/>B dbexpbuf: Buffer CUA (addr: 0x7000010bace2108, size: 18432000, end: 0x7000010bbe76108)
<br/>B dbexpbuf: Buffer OTR (addr: 0x700001120000108, size: 4194304, end: 0x700001120400108)
<br/>B dbcalbuf: Buffer CALE (addr: 0x70000000baee000, size: 500000, end: 0x70000000bb68120)
<br/>M CCMS: AlInitGlobals : alert/use_sema_lock = TRUE.
<br/>S *** init spool environment
<br/>S TSPEVJOB updates inside critical section: event_update_nocsec = 0
<br/>S initialize debug system
<br/>T Stack direction is downwards.
<br/>T debug control: prepare exclude for printer trace
<br/>T new memory block 0x114388060
<br/>S spool kernel/ddic check: Ok
<br/>S using table TSP02FX for frontend printing
<br/>S 1 spool work process(es) found
<br/>S frontend print via spool service enabled
<br/>S printer list size is 150
<br/>S printer type list size is 50
<br/>S queue size (profile) = 300
<br/>S hostspool list size = 3000
<br/>S option list size is 30
<br/>I *** INFO Shm 49 in Pool 40 2898 KB estimated 1632 KB real ( -1266 KB -44 %)
<br/>S found processing queue enabled
<br/>S found spool memory service RSPO-RCLOCKS at 0x7000010bbe77070
<br/>S doing lock recovery
<br/>S setting server cache root
<br/>S found spool memory service RSPO-SERVERCACHE at 0x7000010bbe78160
<br/>S using messages for server info
<br/>S size of spec char cache entry: 297032 bytes (timeout 100 sec)
<br/>S size of open spool request entry: 2512 bytes
<br/>S immediate print option for implicitely closed spool requests is disabled
<br/>A **GENER Trace switched on ***
<br/>A
<br/>A -PXA--
<br/>A PXA INITIALIZATION
<br/>A PXA: Locked PXA-Semaphore.
<br/>A System page size: 4kb, total admin_size: 237304kb, dir_size: 58960kb.
<br/>A Attached to PXA (address 0x700001130000000, size 3000000K, 4 fragments of 690676K )
<br/>A
<br/>A Thu Dec 9 14:03:59 2010
<br/>A abap/pxa = shared unprotect gen_remote
<br/>A PXA INITIALIZATION FINISHED
<br/>A -PXA--
<br/>A
<br/>A ABAP ShmAdm attached (addr=0x700000f4046c000 leng=20955136 end=0x700000f41868000)
<br/>A >> Shm MMADM area (addr=0x700000f40915418 leng=247168 end=0x700000f40951998)
<br/>A >> Shm MMDAT area (addr=0x700000f40952000 leng=15818752 end=0x700000f41868000)
<br/>A RFC rfc/signon_error_log = -1
<br/>A RFC rfc/dump_connection_info = 0
<br/>A RFC rfc/dump_client_info = 0
<br/>A RFC rfc/cp_convert/ignore_error = 1
<br/>A RFC rfc/cp_convert/conversion_char = 23
<br/>A RFC rfc/wan_compress/threshold = 251
<br/>A RFC rfc/recorder_pcs not set, use defaule value: 2
<br/>A RFC rfc/delta_trc_level not set, use default value: 0
<br/>A RFC rfc/no_uuid_check not set, use default value: 0
<br/>A RFC rfc/bc_ignore_thcmaccp_retcode not set, use default value: 0
<br/>A RFC Method> initialize RemObjDriver for ABAP Objects
<br/>M ThrCreateShObjects allocated 122630 bytes at 0x70000000c124000
<br/>N SsfSapSecin: putenv(SECUDIR=/usr/sap/PRD/DVEBMGS00/sec): ok
<br/>N
<br/>N =================================================
<br/>N === SSF INITIALIZATION:
<br/>N ===...SSF Security Toolkit name SAPSECULIB .
<br/>N ===...SSF library is /usr/sap/PRD/DVEBMGS00/exe/libsapcrypto.o .
<br/>N ===...SSF default hash algorithm is SHA1 .
<br/>N ===...SSF default symmetric encryption algorithm is DES-CBC .
<br/>N ===...SECUDIR="/usr/sap/PRD/DVEBMGS00/sec"
<br/>N ===...loading of Security Toolkit successfully completed.
<br/>N === SAPCRYPTOLIB 5.5.5C pl29 (Jan 30 2010) MT-safe
<br/>N =================================================
<br/>N MskiInitLogonTicketCacheHandle: Logon Ticket cache pointer retrieved from shared memory.
<br/>N MskiInitLogonTicketCacheHandle: Workprocess runs with Logon Ticket cache.
<br/>M JrfcVmcRegisterNativesDriver o.k.
<br/>W =================================================
<br/>W === ipl_Init() called
<br/>B dbtran INFO (init_connection '<DEFAULT>' [ORACLE:700.08]):
<br/>B max_blocking_factor = 5, max_in_blocking_factor = 5,
<br/>B min_blocking_factor = 5, min_in_blocking_factor = 5,
<br/>B prefer_union_all = 0, prefer_join = 0,
<br/>B prefer_fix_blocking = 0, prefer_in_itab_opt = 1,
<br/>B convert AVG = 0, alias table FUPD = 0,
<br/>B escape_as_literal = 1, opt GE LE to BETWEEN = 0,
<br/>B select * =0x0f, character encoding = STD / <none>:-,
<br/>B use_hints = abap->1, dbif->0x1, upto->2147483647, rule_in->0,
<br/>B rule_fae->0, concat_fae->0, concat_fae_or->0
<br/>W ITS Plugin: Path dw_gui
<br/>W ITS Plugin: Description ITS Plugin - ITS rendering DLL
<br/>W ITS Plugin: sizeof(SAP_UC) 2
<br/>W ITS Plugin: Release: 701, [7010.0.97.20020600]
<br/>W ITS Plugin: Int.version, [33]
<br/>W ITS Plugin: Feature set: [22]
<br/>W ===... Calling itsp_Init in external dll ===>
<br/>W PpioRecoverLocks, table: 0x700000f418f2778
<br/>W PpioRecoverLocks, number of file locks 256
<br/>W PpioRecoverLocks: file lock set to: (nil)
<br/>W PpioRecoverLocks: directory lock set to: (nil)
<br/>W PpioRecoverLocks: global lock set to: (nil)
<br/>W PpioRecoverLocks() done
<br/>W PprcRecoverLocks, table: 0x700000f418f27e8
<br/>W PprcRecoverLocks: directory lock set to: (nil)
<br/>W PprcRecoverLocks() done
<br/>W === ipl_Init() returns 0, ITSPE_OK: OK
<br/>W =================================================
<br/>N VSI: WP init in ABAP VM completed with rc=0
<br/>E Profile-Parameter: enque/deque_wait_answer = FALSE
<br/>E Profile-Parameter: enque/sync_dequeall = 0
<br/>E EnqId_SuppressIpc: local EnqId initialization o.k.
<br/>E EnqCcInitialize: local enqueue client init o.k.
<br/>M ThCheckPrevUser: previous user was T78/M0, clean counter 0
<br/>M ThCheckPrevUser: clean previous user T78/U26013/M0/I2/V-1
<br/>M
<br/>M Modeinfo for User T78/M0
<br/>M
<br/>M tm state = 4
<br/>M uid = 26013
<br/>M term type = 0x4
<br/>M display = 0x8
<br/>M cpic_no = 0
<br/>M cpic_idx = -1
<br/>M usr = >8000199 <
<br/>M terminal = >ceegsap20 <
<br/>M client = >800<
<br/>M conversation_ID = > <
<br/>M appc_tm_conv_idx = -1
<br/>M its_plugin = NO
<br/>M allowCreateMode = YES
<br/>M wp_ca block = -1
<br/>M appc_ca block = -1
<br/>M blockSoftCanel = NO
<br/>M session_id = >4CFF77CE4A6A0068E10080000A04C87E<
<br/>M ext_session_id = >4CFF77CE4A6A0068E10080000A04C87E<
<br/>M imode = 2
<br/>M mode state = 0x1a
<br/>M mode clean_state = 2
<br/>M task_type = ZTTADIA
<br/>M lastThFc = THFCTERM
<br/>M lastAction = TH_IACT_NO_ACTION
<br/>M th_errno = 0
<br/>M rollout_reason = 1
<br/>M last_rollout_level = 7
<br/>M async_receives = 0
<br/>M cpic_receive = 0
<br/>M em handle = 67
<br/>M roll state = 3
<br/>M abap state = 3
<br/>M em state = 2
<br/>M eg state = 1
<br/>M spa state = 3
<br/>M enq state = 0
<br/>M softcancel = 1
<br/>M cancelInitiator = DISPATCHER
<br/>M clean_state = DP_SOFTCANCEL
<br/>M next hook = T-1/U-1/M255
<br/>M master hook = T-1/U-1/M255
<br/>M slave hook = T-1/U-1/M255
<br/>M debug_tid = 255
<br/>M debug_mode = 0
<br/>M mode type = 0x1
<br/>M debug = 0
<br/>M msg_count = 6
<br/>M tcode = >ZPP015 <
<br/>M last_wp = 38
<br/>M client conversation_ID = > <
<br/>M server conversation_ID = > <
<br/>M lock = 0
<br/>M max enq infos = 0
<br/>M act enq infos = 0
<br/>M em_hyper_hdl = 0x700000f41d918e8
<br/>M plugin_info = NULL
<br/>M act_plugin_hdl = -1
<br/>M act_plugin_no = 0
<br/>M max_plugin_no = 0
<br/>M
<br/>M ThCheckPrevUser: reset spa state for user T78/U26013/M0
<br/>M ThSetDoSafeCleanup: th_do_safe_cleanup = FALSE (wanted FALSE)
<br/>M LOCK WP ca_blk 44
<br/>M ThAtWpBlk: set zttatiln to zero
<br/>M ThAtWpBlk: set zttatoln to zero
<br/>M DpVmcGetVmByTmAdm: no VM found for T78/M0/I2
<br/>M LOCK APPC ca_blk 640
<br/>M set task type ZTTADIA
<br/>M ThCleanPrevUser: clean U26013 T78 M0 I2 no VM clean state DP_SOFTCANCEL clean counter 1
<br/>M ThCleanPrevUser: saved MODE_REC = 10
<br/>M PfStatDisconnect: disconnect statistics
<br/>M ThCleanPrevUser: found soft cancel flag
<br/>M ThSoftCancel: set clean state of T78/M0 to DP_DEFAULT_CLEANING
<br/>M ThSoftCancel session in state TM_DISCONNECTED, delete mode
<br/>M ThIAMDel: delete tid/mode 78/0 (th_errno 47, release 1)
<br/>M ThIDeleteMode (78, 0, 3, ><, 0, 255, TRUE)
<br/>M ThIDeleteMode: no modes found ..
<br/>M no sub modes
<br/>M ThCheckMemoryState (0, 0, 1)
<br/>M ThRollIn: roll in T78/U26013/M0/I2 (level=7, abap_level=1, attach_em=1)
<br/>M ThCheckEmState: check ATTACH for em hdl 67
<br/>M ThCheckEmState: call EmContextAttach (em_hdl=67)
<br/>I Thu Dec 9 14:05:00 2010
<br/>I WARNING: MtxLock 0x70000000636197c rrol0046 owner=33 deadlock ?
<br/>I Thu Dec 9 14:06:00 2010
<br/>I WARNING: MtxLock 0x70000000636197c rrol0046 owner=33 deadlock ?
<br/>I Thu Dec 9 14:07:00 2010
<br/>I WARNING: MtxLock 0x70000000636197c rrol0046 owner=33 deadlock ?
<br/>I Thu Dec 9 14:08:00 2010
<br/>I WARNING: MtxLock 0x70000000636197c rrol0046 owner=33 deadlock ?Hi Sitarama,
<br/>
<br/>Thanks very much for your fast reply. Yes, the smartforms are custom-built. Besides the deadlock error, there still has some errors in SP01 says "Could not pass request to host spool system". But it previews wonderfully in SP01. The detailed SP01 error log is like below:
Print request processing log
Errors occurred processing this print request
Error during print request output. l_rc = 99
There may be no printout
Most important attributes of spool request
Request number 2828
Request name SMART LOCA 8000199
Client 800
Owner 8000199
Request attributes
Time created 2010120906014800
Remaining life +00007235800
Dispo 1 (Go/Hold) G
Dispo 2 (Keep/Delete) D
Dispo 3 (Indirect/Direct) D
Default output device LOCA
Default no. copies 1
Format ZTEST
Main print request characteristics
Spool request number 2828
Print request number 1
Print request attributes
Time created 2010120906020500
Output device LOCA
Format ZTEST
What do you mean parallel processing on table level? The smartforms extract data from different SAP tables and then transfer to smartform interface, then they can added in form pages. Do you think this results parallel processing towards tables?
<br/>
Thank you and hope to hear more from you.
Best Regards,
Jeff -
Cache and/or Connection problems under load
I have a Kodo web app that's been running just fine in
production for many months now. However, recently the web
traffic has shot up by a huge amount, literally overnight.
But unfortunately, it's caused the app to fail very ungracefully
under the strain.
It's been a crazy few days, and I haven't been able to do
very much analysis because of higher priorities. But from
what I have been able to gleen, it now looks like Kodo is
the most likely culprit. From what I've read in other messages
here, it appears others may have been experiencing similar
problems.
My environment: Redhat Linux 8, Postgres 7.3.4 with the
included JDBC3 driver, Apache 1.3.x, Tomcat 4.1.x and the
webapp connector. Similar behavior was seen with Apache 2.x,
Tomcat 4.1.x and the JK2 connector (that was on the new machine
I setup to handle the new traffic, which, of course, died the
night before).
As I mentioned, this app has been running reliably for
months with no problems. But when placed under heavy load,
it appears to get into some sort of pathological state where
it slows down dramatically (asymptotically?) to the point where
it's effectively locked up. In one case, where the app was
left running for several hours in this state, requests were
taking 90 minutes to complete (normal is 1-5 seconds).
From what I can deduce, there seem to be four things
going on, three of which have been mentioned in recent threads
here:
1) Excessive memory consumption. When the app is
operating normally, I see fairly flat memory usage for
the JVM process. Under load, the JVM steadily expands
until it hits its heap limit. I've gotten OutOfMemory
exceptions with a heap size of 350MB, which should be plenty.
2) Level 2 cache locking issues. I've seen dozens of
threads waiting on a lock in the DataCache code. Not sure
if there's a deadlock happening here or just that the
threads are waiting on a lock that's being held for a long time.
3) Database Connection leaks or contention. I see threads
spinning in the DataSource code trying to get a connection.
I also see dozens of connections from the Postgres side which
seem to be sitting idle, but in the middle of a transaction.
When things get bad, I also see exceptions being thrown because
of timeouts waiting for a connection to become available. It's
a web app, PMs should not be tied up for more than a few seconds.
4) CPU usage pegged or nearly so for the JVM. I suspect
this is related to #3. Something very bad is going on here.
If I stop all inbound requests to the JVM when it's in this
bad state, it will continue to burn CPU at 90%+ for a very
long time. I think it will eventually finish what it's doing,
but I haven't had the luxury of waiting for it. It's definitely
not a linear slowdown proportional to the load.
Attached are my kodo.properties file and some thread stack
traces along with some comments. Any advice would be greatly
appreciated. This is not a complicated app nor am I doing
anything unusual. It doesn't seem logical that Kodo could
breakdown so dramatically under load, so I'm hoping it's some
sort of interaction thing that I can work around.
Thanks.
Ron Hitchens {mailto:[email protected]} RonSoft Technologies
(510) 494-9597 (Home Office) http://www.ronsoft.com
(707) 924-3878 (fax) Bit Twiddling At Its Finest
"Born with a broken heart" -Kenny Wayne ShepardPlease read prior posts regarding level 2 cache. It is unusable under stress
as far I am concerned. Basically entire cache gets locked on any database
read. Makes it very unscalable
Are you using 2.5.3? It will request a connection from a pool every time it
resolves reference to a PC even if it is cached in PM and therefore Kodo
does not need to read any. As result if you iterate over 100 objects in your
query and for each object resolve reference to another object (always the
same) kodo will request 100 database connections from the pool (and note
they issue rollback on every time they return a connection to the pool so
getting connection might be fairly expensive)
In conjunction with level 2 cache contention this causes application to go
into a stupor.
Try to go back to 2.5.2 (or may be 2.5.4 they promised in the near future
with a workaround) or use "persistent-manager" connection retention if you
discard PM after each HTTP invocation - it will take care of connection
pooling issue. As far as L2 cache I was unable to find any work around so
far - see if you might be better of without cache. You might if your object
graph is not very complex
"Ron Hitchens" <[email protected]> wrote in message
news:[email protected]...
>
I have a Kodo web app that's been running just fine in
production for many months now. However, recently the web
traffic has shot up by a huge amount, literally overnight.
But unfortunately, it's caused the app to fail very ungracefully
under the strain.
It's been a crazy few days, and I haven't been able to do
very much analysis because of higher priorities. But from
what I have been able to gleen, it now looks like Kodo is
the most likely culprit. From what I've read in other messages
here, it appears others may have been experiencing similar
problems.
My environment: Redhat Linux 8, Postgres 7.3.4 with the
included JDBC3 driver, Apache 1.3.x, Tomcat 4.1.x and the
webapp connector. Similar behavior was seen with Apache 2.x,
Tomcat 4.1.x and the JK2 connector (that was on the new machine
I setup to handle the new traffic, which, of course, died the
night before).
As I mentioned, this app has been running reliably for
months with no problems. But when placed under heavy load,
it appears to get into some sort of pathological state where
it slows down dramatically (asymptotically?) to the point where
it's effectively locked up. In one case, where the app was
left running for several hours in this state, requests were
taking 90 minutes to complete (normal is 1-5 seconds).
From what I can deduce, there seem to be four things
going on, three of which have been mentioned in recent threads
here:
1) Excessive memory consumption. When the app is
operating normally, I see fairly flat memory usage for
the JVM process. Under load, the JVM steadily expands
until it hits its heap limit. I've gotten OutOfMemory
exceptions with a heap size of 350MB, which should be plenty.
2) Level 2 cache locking issues. I've seen dozens of
threads waiting on a lock in the DataCache code. Not sure
if there's a deadlock happening here or just that the
threads are waiting on a lock that's being held for a long time.
3) Database Connection leaks or contention. I see threads
spinning in the DataSource code trying to get a connection.
I also see dozens of connections from the Postgres side which
seem to be sitting idle, but in the middle of a transaction.
When things get bad, I also see exceptions being thrown because
of timeouts waiting for a connection to become available. It's
a web app, PMs should not be tied up for more than a few seconds.
4) CPU usage pegged or nearly so for the JVM. I suspect
this is related to #3. Something very bad is going on here.
If I stop all inbound requests to the JVM when it's in this
bad state, it will continue to burn CPU at 90%+ for a very
long time. I think it will eventually finish what it's doing,
but I haven't had the luxury of waiting for it. It's definitely
not a linear slowdown proportional to the load.
Attached are my kodo.properties file and some thread stack
traces along with some comments. Any advice would be greatly
appreciated. This is not a complicated app nor am I doing
anything unusual. It doesn't seem logical that Kodo could
breakdown so dramatically under load, so I'm hoping it's some
sort of interaction thing that I can work around.
Thanks.
Ron Hitchens {mailto:[email protected]} RonSoft Technologies
(510) 494-9597 (Home Office) http://www.ronsoft.com
(707) 924-3878 (fax) Bit Twiddling At Its Finest
"Born with a broken heart" -Kenny Wayne Shepard
With cahce enabled, 2.5.3
Here the app had recently slowed down and then effectively locked up.
There where many outstanding web requests that were not receiving output.
At this point most threads seemed to be waiting at the same location.
There were a large number of active database connections and most of
them had open transactions (according to pg_stat_activity). The app
was not responding to any web requests.
It would seem that db transactions had been started, then the thread
got stuck for a long time on a synchronization lock in the cache lookup.
Below are two randomly chosen thread stack dumps.
Thread-72[1] where
[1] java.lang.Object.wait (native method)
[2] java.lang.Object.wait (Object.java:429)
[3]oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$ReaderLock.acquir
e (WriterPreferenceReadWriteLock.java:169)
[4]com.solarmetric.kodo.runtime.datacache.AbstractCacheImpl.acquireReadLock
(AbstractCacheImpl.java:384)
[5]com.solarmetric.kodo.runtime.datacache.TimedDataCache.acquireReadLock
(TimedDataCache.java:256)
[6] com.solarmetric.kodo.runtime.datacache.DataCacheStoreManager.load(DataCacheStoreManager.java:595)
[7] com.solarmetric.kodo.runtime.StateManagerImpl.loadFields(StateManagerImpl.java:2,330)
[8] com.solarmetric.kodo.runtime.StateManagerImpl.isLoaded(StateManagerImpl.java:897)
[9] com.europeasap.data.City.jdoGetname (null)
[10] com.europeasap.data.City.getName (City.java:39)
[11] com.europeasap.form.CustomerBookingForm.populateDepartureCityInfo(CustomerBookingForm.java:922)
[12] com.europeasap.form.CustomerBookingForm.onetimeInit(CustomerBookingForm.java:871)
[13] com.europeasap.form.CustomerBookingForm.populatePackageInfo(CustomerBookingForm.java:880)
[14] com.europeasap.action.CustomizeTrip.perform (CustomizeTrip.java:66)
[15] org.apache.struts.action.ActionServlet.processActionPerform(ActionServlet.java:1,787)
[16] org.apache.struts.action.ActionServlet.process(ActionServlet.java:1,586) [17]
org.apache.struts.action.ActionServlet.doGet (ActionServlet.java:492)
[18] javax.servlet.http.HttpServlet.service (HttpServlet.java:740)
[19] javax.servlet.http.HttpServlet.service (HttpServlet.java:853)
[20] org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:247)
[21] org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:193)
[22] org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:260)
[23]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[24] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[25] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[26] org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
[27]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[28] org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
[29]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:641)
[30] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[31] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[32] org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2,415)
[33] org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180)
[34]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[35] org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.java:170)
[36]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:641)
[37] org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:172)
[38]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:641)
[39] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[40] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[41] org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:174)
[42]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[43] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[44] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[45] org.apache.catalina.connector.warp.WarpRequestHandler.handle (null)
[46] org.apache.catalina.connector.warp.WarpConnection.run (null)
[47] java.lang.Thread.run (Thread.java:534)
Thread-64[1] where
[1] java.lang.Object.wait (native method)
[2] java.lang.Object.wait (Object.java:429)
[3]oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$ReaderLock.acquir
e (WriterPreferenceReadWriteLock.java:169)
[4]com.solarmetric.kodo.runtime.datacache.AbstractCacheImpl.acquireReadLock
(AbstractCacheImpl.java:384)
[5]com.solarmetric.kodo.runtime.datacache.TimedDataCache.acquireReadLock
(TimedDataCache.java:256)
[6] com.solarmetric.kodo.runtime.datacache.DataCacheStoreManager.load(DataCacheStoreManager.java:595)
[7] com.solarmetric.kodo.runtime.StateManagerImpl.loadField(StateManagerImpl.java:2,248)
[8] com.solarmetric.kodo.runtime.StateManagerImpl.isLoaded(StateManagerImpl.java:899)
[9] com.europeasap.data.HotelPrices.jdoGetseasonalPrices (null)
[10] com.europeasap.data.HotelPrices.normalizeIndex(HotelPrices.java:113)
[11] com.europeasap.data.HotelPrices.getCost (HotelPrices.java:45)
[12] com.europeasap.logic.CostHelper.findLowestHotel(CostHelper.java:181)
[13] com.europeasap.logic.CostHelper.computeBasePackageCost(CostHelper.java:297)
[14] com.europeasap.logic.CostHelper.computeFinalPackageCost(CostHelper.java:246)
[15] com.europeasap.form.CustomerBookingForm.updateDisplayCosts(CustomerBookingForm.java:1,440)
[16] com.europeasap.form.CustomerBookingForm.updateCustomizeDisplayInfo(CustomerBookingForm.java:1,407)
[17] com.europeasap.action.CustomizeTrip.perform (CustomizeTrip.java:68)
[18] org.apache.struts.action.ActionServlet.processActionPerform(ActionServlet.java:1,787)
[19] org.apache.struts.action.ActionServlet.process(ActionServlet.java:1,586) [20]
org.apache.struts.action.ActionServlet.doGet (ActionServlet.java:492)
[21] javax.servlet.http.HttpServlet.service (HttpServlet.java:740)
[22] javax.servlet.http.HttpServlet.service (HttpServlet.java:853)
[23] org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:247)
[24] org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:193)
[25] org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:260)
[26]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[27] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[28] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[29] org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
[30]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[31] org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
[32]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:641)
[33] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[34] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[35] org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2,415)
[36] org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180)
[37]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[38] org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.java:170)
[39]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:641)
[40] org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:172)
[41]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:641)
[42] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[43] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[44] org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:174)
[45]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[46] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[47] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[48] org.apache.catalina.connector.warp.WarpRequestHandler.handle (null)
[49] org.apache.catalina.connector.warp.WarpConnection.run (null)
[50] java.lang.Thread.run (Thread.java:534)
while running slow, 2.5.3
At this point, the app had been running several hours normally, then
apparently slowed down and locked up while I was away. When looking
at the app threads and database activity, everything appeared idle.
No transactions seemed to be open in the db. But the app was not
behaving normally. Web requests that did not make use of JDO worked
fine (but slow). But requests that hit the db either blocked or were
very slow to respond.
Looking back at the log, there had been a large number of requests
that threw exceptions because they could not get a connection within
five seconds.
Most threads were idle, waiting on read, but some were in the state
shown by the following two stack dumps. Unlike the cache threads above,
they did not seem to be waiting for a lock to be granted, they seemed
to be spinning in the connection management code. Apparently trying
to get a connection. I suspended and resumed the same thread repeatedly
and it always seemd to be doing the same thing. Single stepping was
very difficult because the debugger was slow to respond, apparently
because of other threads also busy spinning.
Postgres indicated that there where lots of connections open and
that they were all idle, so there should not have been a shortage
of connections in the pool. There are two mysteries here: 1) why
can't this thread get a connection? and 2) Why is it busy spinning?
Thread-56[1] where
[1]com.solarmetric.datasource.PreparedStatementCache$CacheAwareConnection.prepa
reStatement (PreparedStatementCache.java:184)
[2]com.solarmetric.datasource.PreparedStatementCache$CacheAwareConnection.prepa
reStatement (PreparedStatementCache.java:169)
[3] com.solarmetric.datasource.ConnectionWrapper.prepareStatement(ConnectionWrapper.java:199)
[4]com.solarmetric.kodo.impl.jdbc.schema.dict.AbstractDictionary.isClosed
(AbstractDictionary.java:1,912)
[5]com.solarmetric.kodo.impl.jdbc.SQLExecutionManagerImpl.getConnectionFromFact
ory (SQLExecutionManagerImpl.java:186)
[6] com.solarmetric.kodo.impl.jdbc.SQLExecutionManagerImpl.getConnection(SQLExecutionManagerImpl.java:147)
[7]com.solarmetric.kodo.impl.jdbc.runtime.JDBCStoreManager.newSQLExecutionManag
er (JDBCStoreManager.java:828)
[8]com.solarmetric.kodo.impl.jdbc.runtime.JDBCStoreManager.getSQLExecutionManag
er (JDBCStoreManager.java:714)
[9]com.solarmetric.kodo.impl.jdbc.runtime.JDBCStoreManager.getDatastoreConnecti
on (JDBCStoreManager.java:287)
[10]com.solarmetric.kodo.runtime.datacache.DataCacheStoreManager.getDatastoreCon
nection (DataCacheStoreManager.java:465)
[11] com.solarmetric.kodo.runtime.datacache.DataCacheStoreManager.load(DataCacheStoreManager.java:591)
[12] com.solarmetric.kodo.runtime.StateManagerImpl.loadFields(StateManagerImpl.java:2,330)
[13] com.solarmetric.kodo.runtime.StateManagerImpl.isLoaded(StateManagerImpl.java:897)
[14] com.europeasap.data.City.jdoGetname (null)
[15] com.europeasap.data.City.getName (City.java:39)
[16] com.europeasap.form.CustomerBookingForm.populateDepartureCityInfo(CustomerBookingForm.java:922)
[17] com.europeasap.form.CustomerBookingForm.onetimeInit(CustomerBookingForm.java:871)
[18] com.europeasap.form.CustomerBookingForm.populatePackageInfo(CustomerBookingForm.java:880)
[19] com.europeasap.action.CustomizeTrip.perform (CustomizeTrip.java:66)
[20] org.apache.struts.action.ActionServlet.processActionPerform(ActionServlet.java:1,787)
[21] org.apache.struts.action.ActionServlet.process(ActionServlet.java:1,586) [22]
org.apache.struts.action.ActionServlet.doGet (ActionServlet.java:492)
[23] javax.servlet.http.HttpServlet.service (HttpServlet.java:740)
[24] javax.servlet.http.HttpServlet.service (HttpServlet.java:853)
[25] org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:247)
[26] org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:193)
[27] org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:260)
[28]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[29] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[30] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[31] org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
[32]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[33] org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
[34]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:641)
[35] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[36] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[37] org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2,415)
[38] org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180)
[39]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[40] org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.java:170)
[41]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:641)
[42] org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:172)
[43]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:641)
[44] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[45] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[46] org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:174)
[47]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[48] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[49] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[50] org.apache.catalina.connector.warp.WarpRequestHandler.handle (null)
[51] org.apache.catalina.connector.warp.WarpConnection.run (null)
[52] java.lang.Thread.run (Thread.java:534)
Thread-56[1] where
[1]com.solarmetric.datasource.DataSourceImpl$AbstractPool.findConnection
(DataSourceImpl.java:826)
[2] com.solarmetric.datasource.DataSourceImpl$AbstractPool.getConnection(DataSourceImpl.java:605)
[3] com.solarmetric.datasource.DataSourceImpl.getConnection(DataSourceImpl.java:363)
[4] com.solarmetric.datasource.DataSourceImpl.getConnection(DataSourceImpl.java:356)
[5]com.solarmetric.kodo.impl.jdbc.runtime.DataSourceConnector.getConnection
(DataSourceConnector.java:63)
[6]com.solarmetric.kodo.impl.jdbc.SQLExecutionManagerImpl.getConnectionFromFact
ory (SQLExecutionManagerImpl.java:185)
[7] com.solarmetric.kodo.impl.jdbc.SQLExecutionManagerImpl.getConnection(SQLExecutionManagerImpl.java:147)
[8]com.solarmetric.kodo.impl.jdbc.runtime.JDBCStoreManager.newSQLExecutionManag
er (JDBCStoreManager.java:828)
[9]com.solarmetric.kodo.impl.jdbc.runtime.JDBCStoreManager.getSQLExecutionManag
er (JDBCStoreManager.java:714)
[10]com.solarmetric.kodo.impl.jdbc.runtime.JDBCStoreManager.getDatastoreConnecti
on (JDBCStoreManager.java:287)
[11]com.solarmetric.kodo.runtime.datacache.DataCacheStoreManager.getDatastoreCon
nection (DataCacheStoreManager.java:465)
[12]com.solarmetric.kodo.runtime.datacache.DataCacheStoreManager.initialize
(DataCacheStoreManager.java:519)
[13] com.solarmetric.kodo.runtime.StateManagerImpl.loadInitialState(StateManagerImpl.java:215)
[14]com.solarmetric.kodo.runtime.PersistenceManagerImpl.getObjectByIdFilter
(PersistenceManagerImpl.java:1,278)
[15] com.solarmetric.kodo.runtime.PersistenceManagerImpl.getObjectById(PersistenceManagerImpl.java:1,179)
[16]com.solarmetric.kodo.runtime.datacache.query.CacheAwareQuery$CachedResultLis
t.get (CacheAwareQuery.java:432)
[17] java.util.AbstractList$Itr.next (AbstractList.java:421)
[18] com.europeasap.form.CustomerBookingForm.populateDepartureCityInfo(CustomerBookingForm.java:919)
[19] com.europeasap.form.CustomerBookingForm.onetimeInit(CustomerBookingForm.java:871)
[20] com.europeasap.form.CustomerBookingForm.populatePackageInfo(CustomerBookingForm.java:880)
[21] com.europeasap.action.CustomizeTrip.perform (CustomizeTrip.java:66)
[22] org.apache.struts.action.ActionServlet.processActionPerform(ActionServlet.java:1,787)
[23] org.apache.struts.action.ActionServlet.process(ActionServlet.java:1,586) [24]
org.apache.struts.action.ActionServlet.doGet (ActionServlet.java:492)
[25] javax.servlet.http.HttpServlet.service (HttpServlet.java:740)
[26] javax.servlet.http.HttpServlet.service (HttpServlet.java:853)
[27] org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:247)
[28] org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:193)
[29] org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:260)
[30]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[31] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[32] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[33] org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
[34]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[35] org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
[36]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:641)
[37] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[38] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[39] org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2,415)
[40] org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180)
[41]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[42] org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.java:170)
[43]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:641)
[44] org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:172)
[45]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:641)
[46] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[47] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[48] org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:174)
[49]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[50] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[51] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[52] org.apache.catalina.connector.warp.WarpRequestHandler.handle (null)
[53] org.apache.catalina.connector.warp.WarpConnection.run (null)
[54] java.lang.Thread.run (Thread.java:534)
With cache disabled 2.4.3
This run was an accident. I inadvertently ran the app with the older
2.4.3 version of Kodo, with the cache disabled. This one got into trouble
almost immediately. It also seemed to lockup with lots of opentransactions
in the db. It's also interesting that these two threads also seem to be
hanging around the same method as in 2.5.3.
Thread-63[1] where 0x9f9
[1]com.solarmetric.datasource.PreparedStatementCache$CacheAwareConnection.prepa
reStatement (PreparedStatementCache.java:184)
[2] com.solarmetric.datasource.ConnectionWrapper.prepareStatement(ConnectionWrapper.java:377)
[3]com.solarmetric.kodo.impl.jdbc.SQLExecutionManagerImpl.prepareStatementInter
nal (SQLExecutionManagerImpl.java:807)
[4]com.solarmetric.kodo.impl.jdbc.SQLExecutionManagerImpl.executePreparedQueryI
nternal (SQLExecutionManagerImpl.java:761)
[5]com.solarmetric.kodo.impl.jdbc.SQLExecutionManagerImpl.executeQueryInternal
(SQLExecutionManagerImpl.java:691)
[6] com.solarmetric.kodo.impl.jdbc.SQLExecutionManagerImpl.executeQuery(SQLExecutionManagerImpl.java:372)
[7] com.solarmetric.kodo.impl.jdbc.SQLExecutionManagerImpl.executeQuery(SQLExecutionManagerImpl.java:356)
[8] com.solarmetric.kodo.impl.jdbc.ormapping.ClassMapping.loadByPK(ClassMapping.java:950)
[9] com.solarmetric.kodo.impl.jdbc.runtime.JDBCStoreManager.initialize(JDBCStoreManager.java:263)
[10] com.solarmetric.kodo.runtime.StateManagerImpl.loadInitialState(StateManagerImpl.java:174)
[11]com.solarmetric.kodo.runtime.PersistenceManagerImpl.getObjectByIdFilter
(PersistenceManagerImpl.java:1,023)
[12] com.solarmetric.kodo.runtime.PersistenceManagerImpl.getObjectById(PersistenceManagerImpl.java:942)
[13] com.solarmetric.kodo.impl.jdbc.ormapping.OneToOneMapping.load(OneToOneMapping.java:147)
[14] com.solarmetric.kodo.impl.jdbc.runtime.JDBCStoreManager.load(JDBCStoreManager.java:375)
[15] com.solarmetric.kodo.runtime.StateManagerImpl.loadField(StateManagerImpl.java:2,035)
[16] com.solarmetric.kodo.runtime.StateManagerImpl.isLoaded(StateManagerImpl.java:720)
[17] com.europeasap.data.CityMarkup.jdoGetcity (null)
[18] com.europeasap.data.CityMarkup.getCity (CityMarkup.java:30)
[19] com.europeasap.logic.CostHelper.getCityMarkup (CostHelper.java:81)
[20] com.europeasap.logic.CostHelper.computeBasePackageCost(CostHelper.java:289)
[21] com.europeasap.logic.CostHelper.computeFinalPackageCost(CostHelper.java:246)
[22] com.europeasap.form.CustomerBookingForm.updateDisplayCosts(CustomerBookingForm.java:1,440)
[23] com.europeasap.form.CustomerBookingForm.updateCustomizeDisplayInfo(CustomerBookingForm.java:1,407)
[24] com.europeasap.action.CustomizeTrip.perform (CustomizeTrip.java:68)
[25] org.apache.struts.action.ActionServlet.processActionPerform(ActionServlet.java:1,787)
[26] org.apache.struts.action.ActionServlet.process(ActionServlet.java:1,586) [27]
org.apache.struts.action.ActionServlet.doPost (ActionServlet.java:510)
[28] javax.servlet.http.HttpServlet.service (HttpServlet.java:760)
[29] javax.servlet.http.HttpServlet.service (HttpServlet.java:853)
[30] org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:247)
[31] org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:193)
[32] org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:260)
[33]org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext (StandardPipeline.java:643)
[34] org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
[35] org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
[36] org.apache.catal -
What are the different ways to handle deadlocks?
Hi,
May I know what are the ways to solve a deadlock problem?
Currently, I have the following code to catch the exception:
catch (XmlException ex)
try
ex.printStackTrace();
txn.abort();
} catch (DatabaseException DbEx)
System.err.println("txn abort failed.");
}and the resulting error is:
com.sleepycat.dbxml.XmlException: Error: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock, errcode = DATABASE_ERROR
Any other more efficient way to handle deadlock?
Or better ways to prevent deadlock from happening?
I am using this environment config
EnvironmentConfig envConf = new EnvironmentConfig();
envConf.setAllowCreate(true); // If the environment does not exits,
// create it.
envConf.setInitializeCache(true); // Turn on the shared memory
// region.
// envConf.setCacheSize(25 * 1024 * 1024); // 25MB cache
envConf.setInitializeLocking(true); // Turn on the locking
// subsystem.
envConf.setInitializeLogging(true); // Turn on the logging
// subsystem.
envConf.setTransactional(true); // Turn on the transactional
// subsystem.
// envConf.setRunRecovery(true); //Turn on run recovery
// envConf.setTxnNoSync(true); // Cause BDB XML to not synchronously
// force any log data to disk upon transaction commit
envConf.setLogInMemory(true); // specify in-memory logging
envConf.setLogBufferSize(60 * 1024 * 1024); // set logging size.
// envConf.setTxnWriteNoSync(true); //method. This causes logging
// data to be synchronously written to the OS's file system buffers
// upon transaction commit.
// envConf.setThreaded(true); //default by Java that threaded = true
// envConf.setMultiversion(true);
envConf.setLockDetectMode(LockDetectMode.DEFAULT); // Reject a
// random lock
// requestThanks in advance for any help!
:)Hi Vyacheslav,
here is the code:
package ag;
import com.sleepycat.db.DatabaseException;
import com.sleepycat.db.Environment;
import com.sleepycat.db.EnvironmentConfig;
import com.sleepycat.db.LockDetectMode;
import com.sleepycat.dbxml.XmlContainerConfig;
import com.sleepycat.dbxml.XmlDocumentConfig;
import com.sleepycat.dbxml.XmlException;
import com.sleepycat.dbxml.XmlManager;
import com.sleepycat.dbxml.XmlContainer;
import com.sleepycat.dbxml.XmlDocument;
import com.sleepycat.dbxml.XmlManagerConfig;
import com.sleepycat.dbxml.XmlTransaction;
import com.sleepycat.dbxml.XmlUpdateContext;
import inter.DBInterface;
import java.io.*;
import java.util.Properties;
import cp.CheckPointer;
public class SaveMessageinDB implements DBInterface
Environment myEnv;
XmlManager myManager;
XmlContainer myContainer;
XmlTransaction txn;
XmlContainerConfig cconfig;
Properties properties;
// CheckPointer cp;
int Counter;
public SaveMessageinDB()
try
properties = new Properties();
properties.load(ClassLoader
.getSystemResourceAsStream("Aggregator.properties"));
setXmlEnvrionment();
setXmlManager();
setXmlContainer();
// cp = new CheckPointer(myEnv);
// cp.start();
// System.out.println("Checkpointer started....");
Counter = 0;
} catch (Exception ex)
ex.printStackTrace();
public void saveMessage(String docName, String content) throws Exception
addXMLDocument(docName, content);
public void setXmlEnvrionment()
try
File envHome = new File(properties.getProperty("DATABASE_LOCATION"));
EnvironmentConfig envConf = new EnvironmentConfig();
envConf.setAllowCreate(true); // If the environment does not exits,
// create it.
envConf.setInitializeCache(true); // Turn on the shared memory
// region.
envConf.setCacheSize(100 * 1024 * 1024); // 100MB cache
envConf.setInitializeLocking(true); // Turn on the locking
// subsystem.
envConf.setInitializeLogging(true); // Turn on the logging
// subsystem.
envConf.setTransactional(true); // Turn on the transactional
// subsystem.
// envConf.setRunRecovery(true); // Turn on run recovery
// envConf.setTxnNoSync(true); // Cause BDB XML to not synchronously
// force any log data to disk upon transaction commit
envConf.setLogInMemory(true); // specify in-memory logging
envConf.setLogBufferSize(60 * 1024 * 1024); // set logging size.
// envConf.setTxnWriteNoSync(true);
// This causes logging
// data to be synchronously written to the OS's file system buffers
// upon transaction commit.
envConf.setMultiversion(true); //Turn on snapshot isolation
envConf.setLockDetectMode(LockDetectMode.DEFAULT); // Reject a
// random lock
// request
// myEnv = new Environment(envHome, null); //To adopt Environment
// already set by others
myEnv = new Environment(envHome, envConf);
System.out.println("Environment created...");
} catch (Exception ex)
ex.printStackTrace();
// All BDB XML programs require an XmlManager instance.
// Create it from the DB Environment, but do not adopt the
// Environment
public void setXmlManager()
try
XmlManagerConfig mconfig = new XmlManagerConfig();
mconfig.setAllowAutoOpen(true);
mconfig.setAdoptEnvironment(true);
mconfig.setAllowExternalAccess(true);
myManager = new XmlManager(myEnv, mconfig);
// myManager = new XmlManager (mconfig);
System.out.println("Manager created...");
} catch (Exception ex)
ex.printStackTrace();
public void setXmlContainer()
try
cconfig = new XmlContainerConfig();
cconfig.setNodeContainer(true);
cconfig.setIndexNodes(true);
cconfig.setTransactional(true); // set transaction need an
// cconfig.setAllowValidation(false);
// environment
// cconfig.setReadUncommitted(true); // This container allow
// uncommitted read (able to read dirty data and not set a deadlock
// cconfig.setMultiversion(true);
myContainer = myManager.openContainer(properties
.getProperty("DATABASE_LOCATION")
+ properties.getProperty("CONTAINER_NAME"), cconfig);
System.out.println("Container Opened...");
} catch (XmlException XmlE)
try
myContainer = myManager.createContainer(properties
.getProperty("DATABASE_LOCATION")
+ properties.getProperty("CONTAINER_NAME"), cconfig);
System.out.println("Container Created...");
} catch (Exception e)
e.printStackTrace();
} catch (Exception ex)
ex.printStackTrace();
public void addXMLDocument(String docName, String content)
try
txn = myManager.createTransaction(); // no need to create
// transaction. auto commit
// by the environment
XmlDocumentConfig docConfig = new XmlDocumentConfig();
docConfig.setGenerateName(true);
docConfig.setWellFormedOnly(true);
myContainer.putDocument(txn, docName, content, docConfig);
// commit the Transaction
txn.commit();
System.out.println("documents added.....");
Counter++;
System.out.println("Document no: " + Counter);
txn.delete();
} catch (XmlException ex)
try
System.out.println("Occuring in addXMLDocument");
ex.printStackTrace();
txn.abort();
} catch (DatabaseException DbEx)
System.err.println("txn abort failed.");
public void cleanup()
try
if (myContainer != null)
myContainer.close();
if (myManager != null)
myManager.close();
if (myEnv != null)
System.out.println("All cleaned up done..in sm");
myEnv.close();
} catch (Exception e)
// ignore exceptions in cleanup
}Thanks! -
I am observing dead locks on oracle 8i server
I identify that is the problem due to application code
Db cache hit ratio >99%
There are no buffer busy waits ,and i think there is no problem at server side ,can anybody help me what to do at the server level to resolve this problemDeadlocks are the result of bugs in applications. There is nothing you can do on the server side to resolve them. You'll have to go into the application and change the code to prevent deadlocks, which generally means ensuring that each thread locks resources in the same order.
Justin -
DEADLOCK DETECTED ( ORA-00060 )
Hi,
We are getting deadlocks when the System (Exteranl-Portal-
Production) Cache Upload process runs. It's Looks like
application problem.
We have been getting this deadlock problem whenever the CACHE job gets
executed.
Basically its a portal SRM Business package system.
We have scheduled the CACHE job on portal at 1 AM MSTAZ time , so it
will use Jco to connect R3 and SRM system to bring the data.
We have been observing this deadlock problem only at the time of cache
job.
But cache job is executing succesfully , but we could see the deadlock
in the logs.
In the beginning of trace file (Location:-
/oracle/RPE/saptrace/usertrace/) we found the information as follows
more rpe_ora_1835342.trc
Dump file /oracle/RPE/saptrace/usertrace/rpe_ora_1835342.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
ORACLE_HOME = /oracle/RPE/102_64
System name: AIX
Node name: fsp55a08
Release: 3
Version: 5
Machine: 000788ABD600
Instance name: RPE
Redo thread mounted by this instance: 1
Oracle process number: 17
Unix process pid: 1835342, image: oracle@fsp55a08
2009-05-31 01:00:49.485
ACTION NAME:() 2009-05-31 01:00:49.485
MODULE NAME:(JDBC Thin Client) 2009-05-31 01:00:49.485
SERVICE NAME:(SYS$USERS) 2009-05-31 01:00:49.485
SESSION ID:(275.1229) 2009-05-31 01:00:49.485
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
Blocker(s)-- -Waiter(s)--
Resource Name process session holds waits process session holds waits
TX-00050010-00003981 17 275 X 18 284 X
TX-00010000-000038a3 18 284 X 17 275 X
session 275: DID 0001-0011-00000139 session 284: DID 0001-0012-0000004F
session 284: DID 0001-0012-0000004F session 275: DID 0001-0011-00000139
Rows waited on:
Session 284: obj - rowid = 0000561D - AAAFYdAAJAAAHUlAAU
(dictionary objn - 22045, file - 9, block - 29989, slot - 20)
Session 275: obj - rowid = 00005617 - AAAFYXAAJAAAG1aAAc
(dictionary objn - 22039, file - 9, block - 27994, slot - 28)
Information on the OTHER waiting sessions:
Session 284:
pid=18 serial=1236 audsid=672749 user: 21/SAPRPEDB
O/S info: user: rpeadm, term: unknown, ospid: 1234, machine: fsp65003
program: JDBC Thin Client
Please help me on this issue.
Please let me know if you need any more information on the same.
Thank you in advance.
Regards,
A.Naresh
SAP-BASISHi,
Check SAP Note 84348 - Oracle deadlocks, ORA-00060
Thanks
Sunny -
ORA-00060 DEADLOCK DETECTED - Need Help
Hi Gurus,
I have a question on how to determine the trace log. Where the deadlock happen.
Please help. I have no other hint on how to resolve the error.
*** ACTION NAME:() 2008-08-06 03:34:21.740
*** MODULE NAME:(OEM.SystemPool) 2008-08-06 03:34:21.740
*** SERVICE NAME:(celcomdb) 2008-08-06 03:34:21.740
*** CLIENT ID:() 2008-08-06 03:34:21.740
*** SESSION ID:(113.3188) 2008-08-06 03:34:21.740
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TM-0000c45a-00000000 119 113 X 27 60 SX
TX-0004002c-000269bc 27 60 X 119 113 X
session 113: DID 0001-0077-00000028 session 60: DID 0001-001B-00000278
session 60: DID 0001-001B-00000278 session 113: DID 0001-0077-00000028
Rows waited on:
Session 60: no row
Session 113: obj - rowid = 0000C384 - AAAMOEAADAAAF99AAA
(dictionary objn - 50052, file - 3, block - 24445, slot - 0)
Information on the OTHER waiting sessions:
Session 60:
pid=27 serial=1313 audsid=0 user: 51/SYSMAN
O/S info: user: oracle10, term: UNKNOWN, ospid: 14610456, machine: S63KLJ01
program: oracle@S63KLJ01 (J000)
application name: EM_PING, hash value=2147830874
action name: AGENT_STATUS_MARKER, hash value=2850782869
Current SQL information unavailable
End of information on OTHER waiting sessions.
Current SQL statement for this session:
UPDATE MGMT_OMS_PARAMETERS SET VALUE=TO_CHAR(SYSDATE, 'DD-Mon-YYYY HH24:MI:SS') WHERE HOST_URL=:B1 AND NAME='TIMESTAMP'
----- PL/SQL Call Stack -----
object line object
handle number name
70000006e5ab778 39 package body SYSMAN.MGMT_FAILOVER
70000002e8aaf98 1 anonymous block
===================================================
PROCESS STATE
Process global information:
process: 70000006f4b1598, call: 70000006a9c9320, xact: 70000006dedcf98, curses: 70000006f5a5a50, usrses: 70000006f5a5a50
SO: 70000006f4b1598, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=119, calls cur/top: 70000006a9c9320/70000006abfcaf8, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 9
last post received-location: ksqrcl
last process to post me: 70000006f484118 141 0
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 70000006f50bcd0
O/S info: user: oracle10, term: UNKNOWN, ospid: 11538508
OSD pid info: Unix process pid: 11538508, image: oraclecelcomdb@S63KLJ01
Dump of memory from 0x070000006F45FCD0 to 0x070000006F45FED8
70000006F45FCD0 00000004 00000000 07000000 6A9ABD00 [............j...]
70000006F45FCE0 00000010 0003139D 07000000 6ABFCAF8 [............j...]
70000006F45FCF0 00000003 0003139D 07000000 6F8FD600 [............o...]
70000006F45FD00 0000000B 0003139D 07000000 6F5A5A50 [............oZZP]
70000006F45FD10 00000004 00031291 00000000 00000000 [................]
70000006F45FD20 00000000 00000000 00000000 00000000 [................]
Repeat 26 times
70000006F45FED0 00000000 00000000 [........]
SO: 70000006f5a5a50, type: 4, owner: 70000006f4b1598, flag: INIT/-/-/0x00
(session) sid: 113 trans: 70000006dedcf98, creator: 70000006f4b1598, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-0077-00000028, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 6, prv: 0, sql: 70000006e5a9140, psql: 70000006e5a9410, user: 51/SYSMAN
O/S info: user: oracle10, term: unknown, ospid: 1234, machine: S63KLJ01
program: OMS
client info: S63KLJ01_Management_Service
application name: OEM.SystemPool, hash value=2960518376
last wait for 'enq: TX - row lock contention' blocking sess=0x70000006f5611e0 seq=322 wait_time=2929700 seconds since wait started=4
name|mode=54580006, usn<<16 | slot=4002c, sequence=269bc
Dumping Session Wait History
for 'enq: TX - row lock contention' count=1 wait_time=2929700
name|mode=54580006, usn<<16 | slot=4002c, sequence=269bc
for 'enq: TM - contention' count=1 wait_time=224489
name|mode=544d0006, object #=c45a, table/partition=0
for 'enq: TM - contention' count=1 wait_time=2929708
name|mode=544d0006, object #=c45a, table/partition=0
for 'SQL*Net message from client' count=1 wait_time=35033
driver id=28444553, #bytes=1, =0
for 'SQL*Net message to client' count=1 wait_time=1
driver id=28444553, #bytes=1, =0
for 'SQL*Net message from client' count=1 wait_time=227
driver id=28444553, #bytes=1, =0
for 'SQL*Net message to client' count=1 wait_time=1
driver id=28444553, #bytes=1, =0
for 'latch: library cache' count=1 wait_time=96826
address=70000006cf2f298, number=d6, tries=0
for 'latch: library cache' count=1 wait_time=36929
address=70000006cf2f0b8, number=d6, tries=0
for 'SQL*Net message from client' count=1 wait_time=131974
driver id=28444553, #bytes=1, =0
temporary object counter: 0
Virtual Thread:
kgskvt: 70000006e82cd98, sess: 70000006f5a5a50, vc: 0, proc: 70000006f4b1598
consumer group cur: OTHER_GROUPS (upd? 0), mapped: DEFAULT_CONSUMER_GROUP, orig:
vt_state: 0x200, vt_flags: 0x30, blkrun: 0
is_assigned: 1, in_sched: 0 (0)
vt_active: 0 (pending: 1)
used quanta: 0 (cg: 0)
cpu start time: 0, quantum status: 0x0
quantum checks to skip: 0, check thresh: 0
idle time: 0, active time: 0 (cg: 0)
cpu yields: 0 (cg: 0), waits: 0 (cg: 0), wait time: 0 (cg: 0)
queued time outs: 0, time: 0 (cur 0, cg 0)
calls aborted: 0, num est exec limit hit: 0
undo current: 0k max: 0k
UOL used : 0 locks(used=2, free=0)
KGX Atomic Operation Log 70000002eb54978
Mutex 0(0, 0) idn 0 oper NONE
Cursor Parent uid 113 efd 15 whr 22 slp 0
oper=NONE pt1=0 pt2=0 pt3=0
pt4=0 u41=0 stt=0
KGX Atomic Operation Log 70000002eb549c0
Mutex 0(0, 0) idn 0 oper NONE
Library Cache uid 113 efd 0 whr 0 slp 0
KGX Atomic Operation Log 70000002eb54a08
Mutex 0(0, 0) idn 0 oper NONE
Library Cache uid 113 efd 0 whr 0 slp 0
SO: 70000006ae53108, type: 53, owner: 70000006f5a5a50, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=70000006ae53108 handle=70000002e8e8150 mode=N
call pin=0 session pin=0 hpc=0000 hlc=0000
htl=70000006ae53188[70000006a753478,70000006abaf090] htb=70000006abaf090 ssga=70000006abae018
user=70000006f5a5a50 session=70000006f5a5a50 count=1 flags=CBK[0020] savepoint=0x0
LIBRARY OBJECT HANDLE: handle=70000002e8e8150 mtx=70000002e8e8280(0) cdp=0
namespace=CRSR flags=RON/KGHP/PN0/EXP/[10010100]
kkkk-dddd-llll=0000-0001-0001 lock=N pin=S latch#=4 hpc=0000 hlc=0000
lwt=70000002e8e81f8[70000002e8e81f8,70000002e8e81f8] ltm=70000002e8e8208[70000002e8e8208,70000002e8e8208]
pwt=70000002e8e81c0[70000002e8e81c0,70000002e8e81c0] ptm=70000002e8e81d0[70000002e8e81d0,70000002e8e81d0]
ref=70000002e8e8228[70000004ad30028,70000004ad30028] lnd=70000002e8e8240[70000002e8e8240,70000002e8e8240]
LIBRARY OBJECT: object=70000004ad2f990
type=CRSR flags=EXS[0001] pflags=[0000] status=VALD load=0
DEPENDENCIES: count=1 size=16
AUTHORIZATIONS: count=1 size=16 minimum entrysize=16
ACCESSES: count=1 size=16
TRANSLATIONS: count=1 size=16
DATA BLOCKS:
data# heap pointer status pins change whr
0 70000006eada130 70000004ad2faa8 I/P/A/-/- 0 NONE 00
6 70000004ad2fec8 700000040f57a78 I/P/A/-/E 0 NONE 00
----------------------------------------Oracle handles dead lock by its own. but to resolve the issue u can try following things.
1. take explain for UPDATE MGMT_OMS_PARAMETERS SET VALUE=TO_CHAR(SYSDATE, 'DD-Mon-YYYY HH24:MI:SS') WHERE HOST_URL=:B1 AND NAME='TIMESTAMP';
2. analyze above explain plan and try to tune it.
i think, its because of missing indexes. -
Bug in Oracle JDBC Pooling Classes - Deadlock
We are utilizing Oracle's connection caching (drivers 10.2.0.1) and have found a deadlock situation. I reviewed the code for the (drivers 10.2.0.3) and I see the same problem could happen.
I searched and have not found this problem identified anywhere. Is this something I should post to Oracle in some way (i.e. Metalink?) or is there a better forum to get this resolved?
We are utilizing an OCI driver with the following setup in the server.xml
<ResourceParams name="cmf_toolbox">
<parameter>
<name>factory</name>
<value>oracle.jdbc.pool.OracleDataSourceFactory</value>
</parameter>
<parameter>
<name>driverClassName</name>
<value>oracle.jdbc.driver.OracleDriver</value>
</parameter>
<parameter>
<name>user</name>
<value>hidden</value>
</parameter>
<parameter>
<name>password</name>
<value>hidden</value>
</parameter>
<parameter>
<name>url</name>
<value>jdbc:oracle:oci:@PTB2</value>
</parameter>
<parameter>
<name>connectionCachingEnabled</name>
<value>true</value>
</parameter>
<parameter>
<name>connectionCacheProperties</name>
<value>(InitialLimit=5,MinLimit=15,MaxLimit=75,ConnectionWaitTimeout=30,InactivityTimeout=300,AbandonedConnectionTimeout=300,ValidateConnection=false)</value>
</parameter>
</ResourceParams>
We get a deadlock situation between two threads and the exact steps are this:
1) thread1 - The OracleImplicitConnectionClassThread class is executing the runAbandonedTimeout method which will lock the OracleImplicitConnectionCache class with a synchronized block. It will then go thru additional steps and finally try to call the LogicalConnection.close method which is already locked by thread2
2) thread2 - This thread is doing a standard .close() on the Logical Connection and when it does this it obtains a lock on the LogicalConnection class. This thread then goes through additional steps till it gets to a point in the OracleImplicitConnectionCache class where it executes the reusePooledConnection method. This method is synchronized.
Actual steps that cause deadlock:
1) thread1 locks OracleImplicitConnectionClass in runAbandonedTimeout method
2) thread2 locks LogicalConnection class in close function.
3) thread1 tries to lock the LogicalConnection and is unable to do this, waits for lock
4) thread2 tries to lock the OracleImplicitConnectionClass and waits for lock.
***DEADLOCK***
Thread Dumps from two threads listed above
thread1
Thread Name : Thread-1 State : Deadlock/Waiting on monitor Owns Monitor Lock on 0x30267fe8 Waiting for Monitor Lock on 0x509190d8 Java Stack at oracle.jdbc.driver.LogicalConnection.close(LogicalConnection.java:214) - waiting to lock 0x509190d8> (a oracle.jdbc.driver.LogicalConnection) at oracle.jdbc.pool.OracleImplicitConnectionCache.closeCheckedOutConnection(OracleImplicitConnectionCache.java:1330) at oracle.jdbc.pool.OracleImplicitConnectionCacheThread.runAbandonedTimeout(OracleImplicitConnectionCacheThread.java:261) - locked 0x30267fe8> (a oracle.jdbc.pool.OracleImplicitConnectionCache) at oracle.jdbc.pool.OracleImplicitConnectionCacheThread.run(OracleImplicitConnectionCacheThread.java:81)
thread2
Thread Name : http-7320-Processor83 State : Deadlock/Waiting on monitor Owns Monitor Lock on 0x509190d8 Waiting for Monitor Lock on 0x30267fe8 Java Stack at oracle.jdbc.pool.OracleImplicitConnectionCache.reusePooledConnection(OracleImplicitConnectionCache.java:1608) - waiting to lock 0x30267fe8> (a oracle.jdbc.pool.OracleImplicitConnectionCache) at oracle.jdbc.pool.OracleConnectionCacheEventListener.connectionClosed(OracleConnectionCacheEventListener.java:71) - locked 0x34d514f8> (a oracle.jdbc.pool.OracleConnectionCacheEventListener) at oracle.jdbc.pool.OraclePooledConnection.callImplicitCacheListener(OraclePooledConnection.java:544) at oracle.jdbc.pool.OraclePooledConnection.logicalCloseForImplicitConnectionCache(OraclePooledConnection.java:459) at oracle.jdbc.pool.OraclePooledConnection.logicalClose(OraclePooledConnection.java:475) at oracle.jdbc.driver.LogicalConnection.closeInternal(LogicalConnection.java:243) at oracle.jdbc.driver.LogicalConnection.close(LogicalConnection.java:214) - locked 0x509190d8> (a oracle.jdbc.driver.LogicalConnection) at com.schoolspecialty.cmf.yantra.OrderDB.updateOrder(OrderDB.java:2022) at com.schoolspecialty.cmf.yantra.OrderFactoryImpl.saveOrder(OrderFactoryImpl.java:119) at com.schoolspecialty.cmf.yantra.OrderFactoryImpl.saveOrder(OrderFactoryImpl.java:67) at com.schoolspecialty.ecommerce.beans.ECommerceUtil.saveOrder(Unknown Source) at com.schoolspecialty.ecommerce.beans.ECommerceUtil.saveOrder(Unknown Source) at com.schoolspecialty.ecommerce.beans.UpdateCartAction.perform(Unknown Source) at com.schoolspecialty.mvc2.ActionServlet.doPost(ActionServlet.java:112) at com.schoolspecialty.ecommerce.servlets.ECServlet.doPostOrGet(Unknown Source) at com.schoolspecialty.ecommerce.servlets.ECServlet.doPost(Unknown Source) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:237) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:157) at com.schoolspecialty.ecommerce.servlets.filters.EcommerceURLFilter.doFilter(Unknown Source) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:186) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:157) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:214) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContext.java:104) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:520) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:198) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:152) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContext.java:104) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContext.java:102) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:520) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:137) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContext.java:104) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:118) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContext.java:102) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:520) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContext.java:104) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:520) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:929) at org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:160) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:799) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:705) at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:577) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683) at java.lang.Thread.run(Thread.java:534)We used a documented option to abandon connects in the case of an unforeseen error. The consequence of using this option was not a graceful degradation in performance but a complete lockup of the application. The scenario in which we created a moderate number of abandoned connections was a rare error scenario but a valid test.
How could this not be a bug in the Oracle driver? Is dead-lock a desireable outcome of using an option? Is dead-lock ever an acceptable consequence of using a feature as documented?
Turns out other Oracle options to recover from an unexpected error also incur a similar deadlock (TimeToLiveTimeout).
I did a code review of the decompiled drivers and it clearly shows the issue, confirming the original report of this issue. Perhaps you have evidence to the contrary or better evidence to support your statement "not a bug in Oracle"?
Perhaps you are one of the very few people who have not experience problems with Oracle drivers? I've been using Oracle since 7.3.4 and it seems that I have always been working around Oracle JDBC driver problems.
We are using Tomcat with the OracleDataSourceFactory. -
Deadlock in TopLink when using JMS listener on WebLogic
I am experiencing a deadlock in TopLink 10.1.3 on WebLogic 9 in code that previously worked on TopLink 9.0.4 with WebLogic 8.1. As such, I'm not sure if it's due to the TopLink change, the WebLogic change or both. Anyway, we have a JMS listener (note, NOT a MessageDrivenBean) that is updating an existing TopLink cached domaing object. The JMS listener thread gets stuck when attempting to commit the transaction. The thread-dump shows that there is another thread which is blocked in the ConcurrencyManager waiting to obtain the lock on an object which is being updated by the listener thread. It appears to me that the root cause is that the Synchronization.afterCompletion() listener is running on a different thread than the one which owns the locks which were obtained beforeCompletion.
See stack traces.
First, the message listener thread which is waiting for participants in the transaction to commit:
"[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=9 tid=0x3a4a4728 nid=0xa48 in Object.wait() [0x3a0cf000..0x3a0cfbec]
at java.lang.Object.wait(Native Method)
- waiting on <0x0c7a0908> (a weblogic.transaction.internal.ServerTransactionImpl)
at weblogic.transaction.internal.ServerTransactionImpl.globalRetryCommit(ServerTransactionImpl.java:2665)
- locked <0x0c7a0908> (a weblogic.transaction.internal.ServerTransactionImpl)
at weblogic.transaction.internal.ServerTransactionImpl.globalCommit(ServerTransactionImpl.java:2570)
at weblogic.transaction.internal.ServerTransactionImpl.internalCommit(ServerTransactionImpl.java:277)
at weblogic.transaction.internal.ServerTransactionImpl.commit(ServerTransactionImpl.java:226)
at weblogic.ejb.container.internal.BaseEJBObject.postInvoke1(BaseEJBObject.java:539)
at weblogic.ejb.container.internal.StatelessEJBObject.postInvoke1(StatelessEJBObject.java:72)
at weblogic.ejb.container.internal.BaseEJBObject.postInvokeTxRetry(BaseEJBObject.java:374)
at com.avinamart.BusinessLogic.Bean.JobService.JobService_u1ylwo_EOImpl.submitJobAndRun(JobService_u1ylwo_EOImpl.java:1388)
at com.avinamart.Framework.Event.Task.OptimizationTaskListener._submitAsAJob(OptimizationTaskListener.java:253)
at com.avinamart.Framework.Event.Task.OptimizationTaskListener._submitAsAJob(OptimizationTaskListener.java:217)
at com.avinamart.Framework.Event.Task.OptimizationTaskListener.processMessage(OptimizationTaskListener.java:344)
at com.emptoris.base.event.EPASSMessageBaseListener.onMessage(EPASSMessageBaseListener.java:722)
at weblogic.jms.client.JMSSession.onMessage(JMSSession.java:3824)
at weblogic.jms.client.JMSSession.execute(JMSSession.java:3738)
at weblogic.jms.client.JMSSession.pushMessage(JMSSession.java:3253)
at weblogic.jms.client.JMSSession.invoke(JMSSession.java:4195)
at weblogic.messaging.dispatcher.Request.wrappedFiniteStateMachine(Request.java:674)
at weblogic.messaging.dispatcher.DispatcherServerRef.invoke(DispatcherServerRef.java:262)
at weblogic.messaging.dispatcher.DispatcherServerRef.handleRequest(DispatcherServerRef.java:134)
at weblogic.messaging.dispatcher.DispatcherServerRef.access$000(DispatcherServerRef.java:36)
at weblogic.messaging.dispatcher.DispatcherServerRef$1.run(DispatcherServerRef.java:105)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:207)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:179)
Next, the other thread which is participating in the transaction which is stuck:
"[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=5 tid=0x3adb80a0 nid=0xb30 in Object.wait() [0x3c7af000..0x3c7afd6c]
at java.lang.Object.wait(Native Method)
- waiting on <0x0c7a0000> (a oracle.toplink.internal.helper.ConcurrencyManager)
at java.lang.Object.wait(Object.java:474)
at oracle.toplink.internal.helper.ConcurrencyManager.acquire(ConcurrencyManager.java:76)
- locked <0x0c7a0000> (a oracle.toplink.internal.helper.ConcurrencyManager)
at oracle.toplink.internal.identitymaps.CacheKey.acquire(CacheKey.java:80)
at oracle.toplink.internal.identitymaps.FullIdentityMap.remove(FullIdentityMap.java:164)
at oracle.toplink.internal.identitymaps.HardCacheWeakIdentityMap.remove(HardCacheWeakIdentityMap.java:82)
at oracle.toplink.internal.helper.WriteLockManager.releaseAllAcquiredLocks(WriteLockManager.java:363)
at oracle.toplink.publicinterface.UnitOfWork.afterTransaction(UnitOfWork.java:2123)
at oracle.toplink.transaction.AbstractSynchronizationListener.afterCompletion(AbstractSynchronizationListener.java:135)
at oracle.toplink.transaction.JTASynchronizationListener.afterCompletion(JTASynchronizationListener.java:66)
at weblogic.transaction.internal.ServerSCInfo.callAfterCompletions(ServerSCInfo.java:862)
at weblogic.transaction.internal.ServerTransactionImpl.callAfterCompletions(ServerTransactionImpl.java:2913)
at weblogic.transaction.internal.ServerTransactionImpl.afterCommittedStateHousekeeping(ServerTransactionImpl.java:2806)
at weblogic.transaction.internal.ServerTransactionImpl.setCommittedUnsync(ServerTransactionImpl.java:2857)
at weblogic.transaction.internal.ServerTransactionImpl.ackCommit(ServerTransactionImpl.java:1097)
- locked <0x0c7a0908> (a weblogic.transaction.internal.ServerTransactionImpl)
at weblogic.transaction.internal.CoordinatorImpl.ackCommit(CoordinatorImpl.java:211)
at weblogic.transaction.internal.CoordinatorImpl_WLSkel.invoke(Unknown Source)
at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:517)
at weblogic.rmi.internal.BasicServerRef$1.run(BasicServerRef.java:407)
at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:363)
at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:147)
at weblogic.rmi.internal.BasicServerRef.handleRequest(BasicServerRef.java:403)
at weblogic.rmi.internal.BasicServerRef.access$300(BasicServerRef.java:56)
at weblogic.rmi.internal.BasicServerRef$BasicExecuteRequest.run(BasicServerRef.java:934)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:207)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:179)
Is this the same concurrency bug which was fixed in 10.1.3.1??? As I am writing this, I am attempting to build the application with the updated TopLink jar to test for myself. Has anyone else seen this scenario with WebLogic? I should also point out that the problem only occurs when the listener is running on a separate server than the one hosting the JMS queue it reads from. It may be that when the listener runs on the same server, it does not use multiple threads in the transaction.
Any ideas are greatly appreciated.
- BrunoWe've got the same kind of issue with toplink 10.1.3.0.0 and bea weblogic 8.1 SP5.
I 've not tried with 10.1.3.1.0, did you?
Do you have a new status for this issue.
Chris
Maybe you are looking for
-
I dropped my iphone 4s (it had an otterbox-thick case on it at the time of drop) last night and the screen quickly went to different colored stripes then to black. The phone wouldn't work. This morning, I was pressing all the buttons I could-at the s
-
Adobe Exchange panel not installing update
I am on OS X 10.8.5 trying to get Illustrator CC 2014 to update its Exchange panel. I have been trying to get it to update for the past hour and it downloads through Illustrator just fine, but it won't finish installing. I currently have Extension Ma
-
Issue with Home Address and Phone of ESS(data not getting saved)
Hi Everyone, We have a issue with ESS HomeAddress and Phone. when we edit and save data, its showing a message that data is saved. but the data is not getting updated in the backend and the view also is showing old data only. Pleas help us, if any
-
Top CPU, MEM, I/O commands
Hi friendships, Can I ask the ff questions? :) 1. What command in linux that will show or list the TOP 10 CPU usage? I mean those PID that eats a lot of CPUs. 2. What command in linux that will show or list the TOP 10 MEM usage? I mean those PID that
-
Background job only sends 1 spooljob of report in gr55 ?
in gr55 we want to run a report in the background and send it by email to the user. we noticed that by running in forground we get like 10 pages of output but as we schedule it in sm37 with the same variant that the user only receives 1 page. I notic