Two System cluster panics due to SCSI reservation

We have two E2900's configured as a fail-over cluster. They use a Sun Store Edge JBOD as a storage device and a quorum device (running ZFS).
Sometimes, when we reboot the offline server (using init 6 or init 5 or reboot or shutdown), the online server panics due to loss of quorum. After some digging, we found that the SCSI reservation is pointing to the offline server instead of the online server. When this happens and the offline server is rebooted, the online server panics.
Is this normal? Is there any way for the online server to take possession of the quorum device when the resource group is brought online?
Thanks
-Ali

Hi,
it is a bit surprising to me, that you are already deep in the system and analyzing reservations on LUNs using undocumented commands. There are several possibilities why things go wrong:
a) There is a bug. So the best thing to do is to update and install patches. Sorry to say that, but that often helps.
b)Someone fiddles around with the reservations. The fact that this discussion centers around reservations on disks and not on the generic problem of "loss of quorum" and how to analyze this, makes me nervous.
As I said earlier, I would start with a fresh quorum device (and if you have a JBOD I am sure this will offer many more LUNs to your cluster nodes, and you could just one of them), remove the old one and then keep your and everyone else's fingers off the commands in /usr/cluster/lib/sc.
As a last question: What does clq status or on older systems scstat -q tell us?
And now I'll be quiet!
Hartmut

Similar Messages

Reservation Conflict Cause Cluster Panic

Solaris 10 u4/Sun cluster 3.2 u1/IBM storage DS8000
In my two nodes cluster, rebooting node A will cause node B rebooting too. The panic info in the /var/adm/messages is followed.
But if I change the "auto-failback" from "enable" to "disable" in "/kernel/drv/scsi_vhci.conf", this problem disappeared.
Is there anybody know the cause ?
Thanks in advance.
dress 202500a0b8269bb0,5 is now ONLINE because of an externally initiated failover
Jun 26 01:16:03 arcsun42kd0629 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jun 26 01:16:03 arcsun42kd0629 /scsi_vhci/disk@g600a0b800029ae980000f23d48323963 (sd4): path /pci@7b,0/pci1022,7458@10/pci10df,fd00@1/fp@0,0 (fp0) target ad
dress 202400a0b8269bb0,5 is now STANDBY because of an externally initiated failover
Jun 26 01:16:03 arcsun42kd0629 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jun 26 01:16:03 arcsun42kd0629 /scsi_vhci/disk@g600a0b800029ae980000f23d48323963 (sd4): path /pci@7b,0/pci1022,7458@10/pci10df,fd00@1,1/fp@0,0 (fp1) target
address 202500a0b8269bb0,5 is now ONLINE because of an externally initiated failover
Jun 26 01:16:06 arcsun42kd0629 cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster that this node is panicking
Jun 26 01:16:06 arcsun42kd0629 unix: [ID 836849 kern.notice]
Jun 26 01:16:06 arcsun42kd0629 ^Mpanic[cpu0]/thread=fffffe80009c9c80:
Jun 26 01:16:06 arcsun42kd0629 genunix: [ID 747640 kern.notice] Reservation Conflict
Jun 26 01:16:06 arcsun42kd0629 unix: [ID 100000 kern.notice]
Jun 26 01:16:06 arcsun42kd0629 genunix: [ID 802836 kern.notice] fffffe80009c99f0 fffffffffbbd5135 ()
Jun 26 01:16:06 arcsun42kd0629 genunix: [ID 655072 kern.notice] fffffe80009c9a40 scsi:scsi_watch_request_intr+73 ()
Jun 26 01:16:06 arcsun42kd0629 genunix: [ID 655072 kern.notice] fffffe80009c9ae0 scsi_vhci:vhci_intr+3da ()
Jun 26 01:16:06 arcsun42kd0629 genunix: [ID 655072 kern.notice] fffffe80009c9b00 fcp:ssfcp_post_callback+4a ()
Jun 26 01:16:06 arcsun42kd0629 genunix: [ID 655072 kern.notice] fffffe80009c9b30 fcp:ssfcp_cmd_callback+4c ()
Jun 26 01:16:06 arcsun42kd0629 genunix: [ID 655072 kern.notice] fffffe80009c9b90 emlxs:emlxs_iodone+c5 ()
Jun 26 01:16:06 arcsun42kd0629 genunix: [ID 655072 kern.notice] fffffe80009c9c00 emlxs:emlxs_iodone_server+171 ()
Jun 26 01:16:06 arcsun42kd0629 genunix: [ID 655072 kern.notice] fffffe80009c9c60 emlxs:emlxs_thread+172 ()
Jun 26 01:16:06 arcsun42kd0629 genunix: [ID 655072 kern.notice] fffffe80009c9c70 unix:thread_start+8 ()
Jun 26 01:16:06 arcsun42kd0629 unix: [ID 100000 kern.notice]
Jun 26 01:16:06 arcsun42kd0629 genunix: [ID 672855 kern.notice] syncing file systems...
Jun 26 01:16:06 arcsun42kd0629 genunix: [ID 904073 kern.notice] done
Jun 26 01:16:07 arcsun42kd0629 genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c4t0d0s1, offset 1087373312, content: kernel
Jun 26 01:16:15 arcsun42kd0629 genunix: [ID 409368 kern.notice] ^M100% done: 156023 pages dumped, compression ratio 5.31,
Jun 26 01:16:15 arcsun42kd0629 genunix: [ID 851671 kern.notice] dump succeeded
Edited by: minsun on Jun 26, 2008 1:20 AM

Hi,
Sorry we missed that question.
I would urge you to open a call at sun support on that issue, becaus thi is normally a missed quorum device, but not if you can fix the issue just by changing a driver setting.
Detlef

After install of Mavericks, it looks like my system kernal panics and just sits there. I tried waiting over two nights. Tried reinstalling twice, booting r/s. Recovery partition now has Mavericks as OS. Im running a iMac i3 27 inch circa 2011. Any ideas?

After install of Mavericks, it looks like my system kernal panics and just sits there. I tried waiting over two nights. Tried reinstalling twice, booting ro the repair partition and repairs permissions/drive/verify , with no problems. Tried reseting PVRAM(or whateverit called). Recovery partition now has Mavericks as OS. Im running a iMac i3 27 inch circa 2011. Any ideas? Can still bootcamp into windows.
Thank you

bhadotia wrote:Anyway's the file downloaded from dell to update the partition for Studio 1555 is corrupted (checksums don't match). My partition still doesn't boot. I'm working to fix this and will update my post when I'm done.
The file seems to create the CD/DVD/Image and USB just fine. So I used this only to create a CD image which I then wrote on a blank CD which seems to work fine. Also, I played around a bit and had some partial success in booting the partition. I've updated my original opening post with the new findings.
Whew!! what a waste of time! Never want to do all of this again .
Last edited by bhadotia (2012-03-03 00:05:22)

Two node cluster - disk not responding to selection

I'm building 2 node cluster (Solaris 10/SC3.2) on Dell's 1950/PERC6i servers with quorum as a virtual server. Because I need to introduce quorum server to the cluster - my cluster nodes are still in install mode.
I have tried to add quorum using scsetup or clsetup but I'm getting always the same message:
root@node01:~# scsetup
Failed to get node zone list
Failed to get node zone list
    This program has detected that the cluster "installmode" attribute is
    still enabled. As such, certain initial cluster setup steps will be
    performed at this time. This includes adding any necessary quorum
    devices, then resetting both the quorum vote counts and the
    "installmode" property.
    Please do not proceed if any additional nodes have yet to join the
    cluster.
    Is it okay to continue (yes/no) [yes]? yes
Unable to establish the list of cluster nodes.
Press Enter to continue:Also the most imortant issue is that immediately after restart of the first node during scinstall procedure, I started to getting those follwing messages:
Feb 4 17:33:20 node01 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,25e3@3/pci1028,1f0c@0/sd@0,0 (sd3):
Feb 4 17:33:20 node01      disk not responding to selection
Feb 4 17:33:21 node01 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,25e3@3/pci1028,1f0c@0/sd@0,0 (sd3):
Feb 4 17:33:21 node01      disk not responding to selection
Feb 4 17:26:46 node02 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,25e3@3/pci1028,1f0c@0/sd@1,0 (sd4):
Feb 4 17:26:46 node02      disk not responding to selection
Feb 4 17:26:46 node02 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,25e3@3/pci1028,1f0c@0/sd@0,0 (sd3):
Feb 4 17:26:46 node02      disk not responding to selectionBoth nodes are extreamly slow, I could use only telnet to login because of long timeouts for many services.
Here is output from: cfgadm -l
root@node01:~# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0                 disk         connected    configured   unknown
c0::dsk/c0t1d0                 disk         connected    configured   unknown
c4                             fc           connected    unconfigured unknown
c5                             fc           connected    unconfigured unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb1/1                         unknown      empty        unconfigured ok
usb1/2                         unknown      empty        unconfigured ok
usb2/1                         unknown      empty        unconfigured ok
usb2/2                         unknown      empty        unconfigured ok
usb3/1                         unknown      empty        unconfigured ok
usb3/2                         unknown      empty        unconfigured ok
usb4/1                         usb-hub      connected    configured   ok
usb4/1.1                       usb-device   connected    configured   ok
usb4/1.2                       usb-device   connected    configured   ok
usb4/2                         unknown      empty        unconfigured ok
usb4/3                         unknown      empty        unconfigured ok
usb4/4                         unknown      empty        unconfigured ok
usb4/5                         usb-hub      connected    configured   ok
usb4/5.1                       unknown      empty        unconfigured ok
usb4/5.2                       unknown      empty        unconfigured ok
usb4/5.3                       unknown      empty        unconfigured ok
usb4/5.4                       unknown      empty        unconfigured ok
usb4/6                         unknown      empty        unconfigured ok
usb4/7                         unknown      empty        unconfigured ok
usb4/8                         unknown      empty        unconfigured okHow to solve those two problems? This one with SCSI issue and a problem with node list...
Best regards,
Vladimir

During a last weekend I have reinstalled both nodes. I rebuild virtual disks (LUNs) under PERC 6i controller. I have chosen RAID0 instead of RAID1 as it was before. Still I'm not sure did RAID0 help me or just a rebuilding disks. MegaCli tool from LSI could not help me and tell what was wrong with disk/partitions that I used with first try. For MegaCli status of controller was without errors.
Probably something was wrong with this partitioning. I used also as in first try Solaris 10 with latest patches 01/2009 and Sun Cluster 3.2 also with latest security patch.
After reinstalling everything is almost fine :-). The only difference with other productive clusters (the patch level of Solaris & SUN Cluster is not as not this test cluster) is that cacao container agent is offline:
offline        Feb_07   svc:/application/management/common-agent-container-1:default
because of that I have following service as disabled:
disabled       Feb_07   svc:/system/cluster/rgm:default
Does anyone knows, how serious is this? And how to enable now svc:/application/management/common-agent-container-1:default?
Here is svcs -xv output:
svc:/application/print/server:default (LP print server)
State: disabled since Sat Feb 07 10:42:05 2009
Reason: Disabled by an administrator.
   See: http://sun.com/msg/SMF-8000-05
   See: man -M /usr/share/man -s 1M lpsched
Impact: 2 dependent services are not running:
        svc:/application/print/rfc1179:default
        svc:/application/print/ipp-listener:default
svc:/system/cluster/rgm:default (Resource Group Manager Daemon)
State: disabled since Sat Feb 07 10:42:06 2009
Reason: Disabled by an administrator.
   See: http://sun.com/msg/SMF-8000-05
Impact: 1 dependent service is not running:
        svc:/application/management/common-agent-container-1:default
svc:/system/cluster/scsymon-srv:default (Sun Cluster SyMON Server Daemon)
State: offline since Sat Feb 07 10:42:07 2009
Reason: Dependency svc:/application/management/sunmcagent:default is absent.
   See: http://sun.com/msg/SMF-8000-E2
Impact: This service is not running.Regards,
Vladimir

Testing ha-nfs in two node cluster (cannot statvfs /global/nfs: I/O error )

Hi all,
I am testing HA-NFS(Failover) on two node cluster. I have sun fire v240 ,e250 and Netra st a1000/d1000 storage. I have installed Solaris 10 update 6 and cluster packages on both nodes.
I have created one global file system (/dev/did/dsk/d4s7) and mounted as /global/nfs. This file system is accessible form both the nodes. I have configured ha-nfs according to the document, Sun Cluster Data Service for NFS Guide for Solaris, using command line interface.
Logical host is pinging from nfs client. I have mounted there using logical hostname. For testing purpose I have made one machine down. After this step files tem is giving I/O error (server and client). And when I run df command it is showing
df: cannot statvfs /global/nfs: I/O error.
I have configured with following commands.
#clnode status
# mkdir -p /global/nfs
# clresourcegroup create -n test1,test2 -p Pathprefix=/global/nfs rg-nfs
I have added logical hostname,ip address in /etc/hosts
I have commented hosts and rpc lines in /etc/nsswitch.conf
# clreslogicalhostname create -g rg-nfs -h ha-host-1 -N
sc_ipmp0@test1, sc_ipmp0@test2 ha-host-1
# mkdir /global/nfs/SUNW.nfs
Created one file called dfstab.user-home in /global/nfs/SUNW.nfs and that file contains follwing line
share -F nfs –o rw /global/nfs
# clresourcetype register SUNW.nfs
# clresource create -g rg-nfs -t SUNW.nfs ; user-home
# clresourcegroup online -M rg-nfs
Where I went wrong? Can any one provide document on this?
Any help..?
Thanks in advance.

test1# tail -20 /var/adm/messages
Feb 28 22:28:54 testlab5 Cluster.SMF.DR: [ID 344672 daemon.error] Unable to open door descriptor /var/run/rgmd_receptionist_door
Feb 28 22:28:54 testlab5 Cluster.SMF.DR: [ID 801855 daemon.error]
Feb 28 22:28:54 testlab5 Error in scha_cluster_get
Feb 28 22:28:54 testlab5 Cluster.scdpmd: [ID 489913 daemon.notice] The state of the path to device: /dev/did/rdsk/d5s0 has changed to OK
Feb 28 22:28:54 testlab5 Cluster.scdpmd: [ID 489913 daemon.notice] The state of the path to device: /dev/did/rdsk/d6s0 has changed to OK
Feb 28 22:28:58 testlab5 svc.startd[8]: [ID 652011 daemon.warning] svc:/system/cluster/scsymon-srv:default: Method "/usr/cluster/lib/svc/method/svc_scsymon_srv start" failed with exit status 96.
Feb 28 22:28:58 testlab5 svc.startd[8]: [ID 748625 daemon.error] system/cluster/scsymon-srv:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details)
Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 537175 daemon.notice] CMM: Node e250 (nodeid: 1, incarnation #: 1235752006) has become reachable.
Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 525628 daemon.notice] CMM: Cluster has reached quorum.
Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node e250 (nodeid = 1) is up; new incarnation number = 1235752006.
Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node testlab5 (nodeid = 2) is up; new incarnation number = 1235840337.
Feb 28 22:37:15 testlab5 Cluster.CCR: [ID 499775 daemon.notice] resource group rg-nfs added.
Feb 28 22:39:05 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<5>:cmd=<null>:tag=<>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Feb 28 22:39:05 testlab5 Cluster.CCR: [ID 491081 daemon.notice] resource ha-host-1 removed.
Feb 28 22:39:17 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<5>:cmd=<null>:tag=<>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Feb 28 22:39:17 testlab5 Cluster.CCR: [ID 254131 daemon.notice] resource group nfs-rg removed.
Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_validate> for resource <ha-host-1>, resource group <rg-nfs>, node <testlab5>, timeout <300> seconds
Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_validate>:tag=<rg-nfs.ha-host-1.2>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_validate> completed successfully for resource <ha-host-1>, resource group <rg-nfs>, node <testlab5>, time used: 0% of timeout <300 seconds>
Feb 28 22:39:30 testlab5 Cluster.CCR: [ID 973933 daemon.notice] resource ha-host-1 added.

Creation of diskset in Two node cluster

Hi All ,
I have created one diskset in solaris 9 using SVM in two node cluster.
After diskset creation, I mounted the diskset in a primary node in the mount point /test. But, the disk set is mounting on both the nodes.
I created this diskset for failover purposes, if one node goes down the other node will take care.
My idea is to create a failover resource (diskset resources) in the two node cluster.
Below are steps used for creating the disk set.
root@host2# /usr/cluster/bin/scdidadm -L d8
8        host1:/dev/rdsk/c1t9d0   /dev/did/rdsk/d8
8        host2:/dev/rdsk/c1t9d0   /dev/did/rdsk/d8
metaset -s diskset -a -h host2 host1
metaset -s diskset -a -m host2 host1
metaset -s diskset -a /dev/did/rdsk/d8
metainit -s diskset d40 1 1 /dev/did/dsk/d8s0
newfs /dev/md/diskset/rdsk/d40
mount /dev/md/diskset/dsk/d40 /test
root@host2# metaset -s diskset
Set name = diskset, Set number = 1
Host                Owner
host2                  Yes
host1
Mediator Host(s)    Aliases
host2
host1
Driv Dbase
d8   YesPlease let me know how to mount the disk set in one node.
If i am wrong, please correct me.
Regards,
R. Rajesh Kannan.

The file system will only mount on both (all) nodes if you mount it globally, i.e with the global flag or if there is an entry in /etc/vfstab that has a global option.
Given your output below, I would guess you have a global mount for /test defined in /etc/vfstab.
Regards,
Tim
---

Error: Halting this cluster node due to unrecoverable service failure

Our cluster has experienced some sort of fault that has only become apparent today. The origin appears to have been nearly a month ago yet the symptoms have only just manifested.
The node in question is a standalone instance running a DistributedCache service with local storage. It output the following to stdout on Jan-22:
Coherence <Error>: Halting this cluster node due to unrecoverable service failure
It finally failed today with OutOfMemoryError: Java heap space.
We're running coherence-3.5.2.jar.
Q1: It looks like this node failed on Jan-22 yet we did not notice. What is the best way to monitor node health?
Q2: What might the root cause be for such a fault?
I found the following in the logs:
2011-01-22 01:18:58,296 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:18:58.296/9910749.462 Oracle Coherence EE 3.5.2/463 <Error> (thread=Cluster, member=33): Attempting recovery (due to soft timeout) of Guard{Daemon=DistributedCache}
2011-01-22 01:18:58,296 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:18:58.296/9910749.462 Oracle Coherence EE 3.5.2/463 <Error> (thread=Cluster, member=33): Attempting recovery (due to soft timeout) of Guard{Daemon=DistributedCache}
2011-01-22 01:19:04,772 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:04.772/9910755.938 Oracle Coherence EE 3.5.2/463 <Error> (thread=Cluster, member=33): Terminating guarded execution (due to hard timeout) of Guard{Daemon=DistributedCache}
2011-01-22 01:19:04,772 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:04.772/9910755.938 Oracle Coherence EE 3.5.2/463 <Error> (thread=Cluster, member=33): Terminating guarded execution (due to hard timeout) of Guard{Daemon=DistributedCache}
2011-01-22 01:19:05,785 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:05.785/9910756.951 Oracle Coherence EE 3.5.2/463 <Error> (thread=Termination Thread, member=33): Full Thread Dump
Thread[Reference Handler,10,system]
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
Thread[DistributedCache,5,Cluster]
java.nio.Bits.copyToByteArray(Native Method)
java.nio.DirectByteBuffer.get(DirectByteBuffer.java:224)
com.tangosol.io.nio.ByteBufferInputStream.read(ByteBufferInputStream.java:123)
java.io.DataInputStream.readFully(DataInputStream.java:178)
java.io.DataInputStream.readFully(DataInputStream.java:152)
com.tangosol.util.Binary.readExternal(Binary.java:1066)
com.tangosol.util.Binary.<init>(Binary.java:183)
com.tangosol.io.nio.BinaryMap$Block.readValue(BinaryMap.java:4304)
com.tangosol.io.nio.BinaryMap$Block.getValue(BinaryMap.java:4130)
com.tangosol.io.nio.BinaryMap.get(BinaryMap.java:377)
com.tangosol.io.nio.BinaryMapStore.load(BinaryMapStore.java:64)
com.tangosol.net.cache.SerializationPagedCache$WrapperBinaryStore.load(SerializationPagedCache.java:1547)
com.tangosol.net.cache.SerializationPagedCache$PagedBinaryStore.load(SerializationPagedCache.java:1097)
com.tangosol.net.cache.SerializationMap.get(SerializationMap.java:121)
com.tangosol.net.cache.SerializationPagedCache.get(SerializationPagedCache.java:247)
com.tangosol.net.cache.AbstractSerializationCache$1.getOldValue(AbstractSerializationCache.java:315)
com.tangosol.net.cache.OverflowMap$Status.registerBackEvent(OverflowMap.java:4210)
com.tangosol.net.cache.OverflowMap.onBackEvent(OverflowMap.java:2316)
com.tangosol.net.cache.OverflowMap$BackMapListener.onMapEvent(OverflowMap.java:4544)
com.tangosol.util.MultiplexingMapListener.entryDeleted(MultiplexingMapListener.java:49)
com.tangosol.util.MapEvent.dispatch(MapEvent.java:214)
com.tangosol.util.MapEvent.dispatch(MapEvent.java:166)
com.tangosol.util.MapListenerSupport.fireEvent(MapListenerSupport.java:556)
com.tangosol.net.cache.AbstractSerializationCache.dispatchEvent(AbstractSerializationCache.java:338)
com.tangosol.net.cache.AbstractSerializationCache.dispatchPendingEvent(AbstractSerializationCache.java:321)
com.tangosol.net.cache.AbstractSerializationCache.removeBlind(AbstractSerializationCache.java:155)
com.tangosol.net.cache.SerializationPagedCache.removeBlind(SerializationPagedCache.java:348)
com.tangosol.util.AbstractKeyBasedMap$KeySet.remove(AbstractKeyBasedMap.java:556)
com.tangosol.net.cache.OverflowMap.removeInternal(OverflowMap.java:1299)
com.tangosol.net.cache.OverflowMap.remove(OverflowMap.java:380)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$Storage.clear(DistributedCache.CDB:24)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onClearRequest(DistributedCache.CDB:32)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ClearRequest.run(DistributedCache.CDB:1)
com.tangosol.coherence.component.net.message.requestMessage.DistributedCacheRequest.onReceived(DistributedCacheRequest.CDB:12)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:9)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:136)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onNotify(DistributedCache.CDB:3)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Finalizer,8,system]
java.lang.Object.wait(Native Method)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
Thread[PacketReceiver,7,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketReceiver.onWait(PacketReceiver.CDB:2)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[RMI TCP Accept-0,5,system]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
java.lang.Thread.run(Thread.java:619)
Thread[PacketSpeaker,8,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.queue.ConcurrentQueue.waitForEntry(ConcurrentQueue.CDB:16)
com.tangosol.coherence.component.util.queue.ConcurrentQueue.remove(ConcurrentQueue.CDB:7)
com.tangosol.coherence.component.util.Queue.remove(Queue.CDB:1)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketSpeaker.onNotify(PacketSpeaker.CDB:62)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Logger@9216774 3.5.2/463,3,main]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[PacketListener1,8,Cluster]
java.net.PlainDatagramSocketImpl.receive0(Native Method)
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
java.net.DatagramSocket.receive(DatagramSocket.java:712)
com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[main,5,main]
java.lang.Object.wait(Native Method)
com.tangosol.net.DefaultCacheServer.main(DefaultCacheServer.java:79)
com.networkfleet.cacheserver.Launcher.main(Launcher.java:122)
Thread[Signal Dispatcher,9,system]
Thread[RMI TCP Accept-41006,5,system]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
java.lang.Thread.run(Thread.java:619)
ThreadCluster
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[TcpRingListener,6,Cluster]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
com.tangosol.coherence.component.net.socket.TcpSocketAccepter.accept(TcpSocketAccepter.CDB:18)
com.tangosol.coherence.component.util.daemon.TcpRingListener.acceptConnection(TcpRingListener.CDB:10)
com.tangosol.coherence.component.util.daemon.TcpRingListener.onNotify(TcpRingListener.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[PacketPublisher,6,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher.onWait(PacketPublisher.CDB:2)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[RMI TCP Accept-0,5,system]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
java.lang.Thread.run(Thread.java:619)
Thread[PacketListenerN,8,Cluster]
java.net.PlainDatagramSocketImpl.receive0(Native Method)
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
java.net.DatagramSocket.receive(DatagramSocket.java:712)
com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Invocation:Management,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[DistributedCache:PofDistributedCache,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[Invocation:Management:EventDispatcher,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.Service$EventDispatcher.onWait(Service.CDB:7)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[Termination Thread,5,Cluster]
java.lang.Thread.dumpThreads(Native Method)
java.lang.Thread.getAllStackTraces(Thread.java:1487)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
com.tangosol.net.GuardSupport.logStackTraces(GuardSupport.java:791)
com.tangosol.coherence.component.net.Cluster.onServiceFailed(Cluster.CDB:5)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$Guard.terminate(Grid.CDB:17)
com.tangosol.net.GuardSupport$2.run(GuardSupport.java:652)
java.lang.Thread.run(Thread.java:619)
2011-01-22 01:19:05,785 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:05.785/9910756.951 Oracle Coherence EE 3.5.2/463 <Error> (thread=Termination Thread, member=33): Full Thread Dump
Thread[Reference Handler,10,system]
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
Thread[DistributedCache,5,Cluster]
java.nio.Bits.copyToByteArray(Native Method)
java.nio.DirectByteBuffer.get(DirectByteBuffer.java:224)
com.tangosol.io.nio.ByteBufferInputStream.read(ByteBufferInputStream.java:123)
java.io.DataInputStream.readFully(DataInputStream.java:178)
java.io.DataInputStream.readFully(DataInputStream.java:152)
com.tangosol.util.Binary.readExternal(Binary.java:1066)
com.tangosol.util.Binary.<init>(Binary.java:183)
com.tangosol.io.nio.BinaryMap$Block.readValue(BinaryMap.java:4304)
com.tangosol.io.nio.BinaryMap$Block.getValue(BinaryMap.java:4130)
com.tangosol.io.nio.BinaryMap.get(BinaryMap.java:377)
com.tangosol.io.nio.BinaryMapStore.load(BinaryMapStore.java:64)
com.tangosol.net.cache.SerializationPagedCache$WrapperBinaryStore.load(SerializationPagedCache.java:1547)
com.tangosol.net.cache.SerializationPagedCache$PagedBinaryStore.load(SerializationPagedCache.java:1097)
com.tangosol.net.cache.SerializationMap.get(SerializationMap.java:121)
com.tangosol.net.cache.SerializationPagedCache.get(SerializationPagedCache.java:247)
com.tangosol.net.cache.AbstractSerializationCache$1.getOldValue(AbstractSerializationCache.java:315)
com.tangosol.net.cache.OverflowMap$Status.registerBackEvent(OverflowMap.java:4210)
com.tangosol.net.cache.OverflowMap.onBackEvent(OverflowMap.java:2316)
com.tangosol.net.cache.OverflowMap$BackMapListener.onMapEvent(OverflowMap.java:4544)
com.tangosol.util.MultiplexingMapListener.entryDeleted(MultiplexingMapListener.java:49)
com.tangosol.util.MapEvent.dispatch(MapEvent.java:214)
com.tangosol.util.MapEvent.dispatch(MapEvent.java:166)
com.tangosol.util.MapListenerSupport.fireEvent(MapListenerSupport.java:556)
com.tangosol.net.cache.AbstractSerializationCache.dispatchEvent(AbstractSerializationCache.java:338)
com.tangosol.net.cache.AbstractSerializationCache.dispatchPendingEvent(AbstractSerializationCache.java:321)
com.tangosol.net.cache.AbstractSerializationCache.removeBlind(AbstractSerializationCache.java:155)
com.tangosol.net.cache.SerializationPagedCache.removeBlind(SerializationPagedCache.java:348)
com.tangosol.util.AbstractKeyBasedMap$KeySet.remove(AbstractKeyBasedMap.java:556)
com.tangosol.net.cache.OverflowMap.removeInternal(OverflowMap.java:1299)
com.tangosol.net.cache.OverflowMap.remove(OverflowMap.java:380)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$Storage.clear(DistributedCache.CDB:24)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onClearRequest(DistributedCache.CDB:32)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ClearRequest.run(DistributedCache.CDB:1)
com.tangosol.coherence.component.net.message.requestMessage.DistributedCacheRequest.onReceived(DistributedCacheRequest.CDB:12)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:9)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:136)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onNotify(DistributedCache.CDB:3)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Finalizer,8,system]
java.lang.Object.wait(Native Method)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
Thread[PacketReceiver,7,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketReceiver.onWait(PacketReceiver.CDB:2)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[RMI TCP Accept-0,5,system]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
java.lang.Thread.run(Thread.java:619)
Thread[PacketSpeaker,8,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.queue.ConcurrentQueue.waitForEntry(ConcurrentQueue.CDB:16)
com.tangosol.coherence.component.util.queue.ConcurrentQueue.remove(ConcurrentQueue.CDB:7)
com.tangosol.coherence.component.util.Queue.remove(Queue.CDB:1)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketSpeaker.onNotify(PacketSpeaker.CDB:62)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Logger@9216774 3.5.2/463,3,main]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[PacketListener1,8,Cluster]
java.net.PlainDatagramSocketImpl.receive0(Native Method)
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
java.net.DatagramSocket.receive(DatagramSocket.java:712)
com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[main,5,main]
java.lang.Object.wait(Native Method)
com.tangosol.net.DefaultCacheServer.main(DefaultCacheServer.java:79)
com.networkfleet.cacheserver.Launcher.main(Launcher.java:122)
Thread[Signal Dispatcher,9,system]
Thread[RMI TCP Accept-41006,5,system]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
java.lang.Thread.run(Thread.java:619)
ThreadCluster
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[TcpRingListener,6,Cluster]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
com.tangosol.coherence.component.net.socket.TcpSocketAccepter.accept(TcpSocketAccepter.CDB:18)
com.tangosol.coherence.component.util.daemon.TcpRingListener.acceptConnection(TcpRingListener.CDB:10)
com.tangosol.coherence.component.util.daemon.TcpRingListener.onNotify(TcpRingListener.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[PacketPublisher,6,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher.onWait(PacketPublisher.CDB:2)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[RMI TCP Accept-0,5,system]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
java.lang.Thread.run(Thread.java:619)
Thread[PacketListenerN,8,Cluster]
java.net.PlainDatagramSocketImpl.receive0(Native Method)
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
java.net.DatagramSocket.receive(DatagramSocket.java:712)
com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Invocation:Management,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[DistributedCache:PofDistributedCache,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[Invocation:Management:EventDispatcher,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.Service$EventDispatcher.onWait(Service.CDB:7)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[Termination Thread,5,Cluster]
java.lang.Thread.dumpThreads(Native Method)
java.lang.Thread.getAllStackTraces(Thread.java:1487)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
com.tangosol.net.GuardSupport.logStackTraces(GuardSupport.java:791)
com.tangosol.coherence.component.net.Cluster.onServiceFailed(Cluster.CDB:5)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$Guard.terminate(Grid.CDB:17)
com.tangosol.net.GuardSupport$2.run(GuardSupport.java:652)
java.lang.Thread.run(Thread.java:619)
2011-01-22 01:19:06,738 Coherence Logger@9216774 3.5.2/463 INFO 2011-01-22 01:19:06.738/9910757.904 Oracle Coherence EE 3.5.2/463 <Info> (thread=main, member=33): Restarting Service: DistributedCache
2011-01-22 01:19:06,738 Coherence Logger@9216774 3.5.2/463 INFO 2011-01-22 01:19:06.738/9910757.904 Oracle Coherence EE 3.5.2/463 <Info> (thread=main, member=33): Restarting Service: DistributedCache
2011-01-22 01:19:06,738 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:06.738/9910757.904 Oracle Coherence EE 3.5.2/463 <Error> (thread=main, member=33): Failed to restart services: java.lang.IllegalStateException: Failed to unregister: Distr
butedCache{Name=DistributedCache, State=(SERVICE_STARTED), LocalStorage=enabled, PartitionCount=257, BackupCount=1, AssignedPartitions=16, BackupPartitions=16}
2011-01-22 01:19:06,738 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:06.738/9910757.904 Oracle Coherence EE 3.5.2/463 <Error> (thread=main, member=33): Failed to restart services: java.lang.IllegalStateException: Failed to unregister: Distr
butedCache{Name=DistributedCache, State=(SERVICE_STARTED), LocalStorage=enabled, PartitionCount=257, BackupCount=1, AssignedPartitions=16, BackupPartitions=16}

Hi
It seems like the problem in this case is the call to clear() which will try to load all entries stored in the overflow scheme to emit potential cache events to listeners. This probably requires much more memory than there is Java heap available, hence the OOM.
Our recommendation in this case is to call destroy() since this will bypass the even firing.
/Charlie

Creating a Two Node Cluster

Good afternoon,
I'm looking to build/create an inexpensive two node cluster. I have a SLES11SP1 server that is running XEN as a virtual hosting server, I run about five servers in a virtual environment. I have three USB drives set up to host my Guest servers, what I would like to do is to purchase another USB drive, so that I can use that as an iscsi/SAN server location, it would be 2TB in size, and then I would build my new servers making them cluster enabled. They should then be able to "see" the SAN/iscsi/storage location.
Does anyone have any further suggestions?
Thanks
-DS

Originally Posted by gleach1
while I wouldn't recommend running servers off USB drives (unless this is for testing), I don't see any issue in doing it but the performance may not be fantastic
I really appreciate your responding back and assisting in this testing. This cluster is for testing only, I have a small server farm in my basement, and I have a customer that is using clustering and I would like to at least be able to have something to test with
I've actually run a test cluster off a USB drive using vmware workstation before and it seemed to run fine just as a test system
If you set up your xen host as an iscsi server, use the USB disk as the storage you present to the guests and set them up with iscsi initiators it should work like any other iscsi san would, obviously a touch slower...
Is there some documentation that explains how to do the iscsi server setup that you described? I have the "Configuring Novell Cluster Services in a XEN Virtualization Environment" but I really don't see anything about the iscsi initiator setup, I was going to add the USB drive as a /storage volume on the XEN host, and then point the cluster to that? I will also be adding a third card to handle the clustering network.

Don't know where to start with replicating ZFS in a two node cluster

I've got two systems in a lab I'm trying to use to make a ZFS based SAN for a VMWare cluster. I have experience with Linux and Gluster which was easy to have Active/Active High availability, however the benefits of ZFS I'm ok with Active/Passive. The issue I can't get around is finding out the solution in Solaris 11.2 that will mirror the data from the primary node to the secondary node and allow the secondary node to serve and receive data when the primary is down. This is a lab, so I'm not prepared or able to purchase a solution like EMC until I already have some proof of concept. Any ideas will be greatly appreciated

Hi user8777368,
for me this sounds like to failover the zfs filesystem from primary to secondary when the primary goes down. In such a case you can simply use the SUNW.HAStoragePlus resource with property "Zpools" to failover the zpool from one node to the other. The disks where the zfs is located should be in SAN and accessible from all nodes in the Solaris Cluster. There is no need to replicate data in such a scenario. An example is available in:
Solaris Cluster How to use ZFS and Zpools with HAStoragePlus and to Get it Mounted Correctly (Doc ID 1019912.1)
Does this help?
Juergen

Is a one server with two system IDs configuration valid when using MSCS

As the last step in my upgrade from BW30b to BI70 I need to introduce Java.
My newly upgraded ABAP BI70 system is running on a Microsoft Cluster and it was my intention to add Java as a separate system(SID). When doing this it appeared I would be able to add the two systems to the same cluster group by selecting "Support of multiple SAP systems in one MSCS cluster" but this resulted in errors and I have found note 967123 which tells me not to chose this option when using MSCS.
It appears my options are a separate server all together or an ABAP+Java install which limits our upgrade options on each stack.
If anyone has found a work around for this I would be very interested.
Thank you

Hi Helmut,
Not sure how the D-Link works, but it looks like it has Wireless 802 also from the specs, so the Ethernet & Wireless would each have an IP & different MAC addies.
I always thought one Mac address can have only one IP address.
Nope, you can prove this to yourself on your Mac, In Network>Show:>Network Port Configurations, highlight say Ethernet, Copy, give that another IP Manually if you wish...

% DB increase differs for two system from the same landscape

Hi all ,
I have done upgrade and then unicode conversion for the two system- sandbox and development from 4.6 C to ECC 6.0 with oracle 10 G and AIX 5.3
On sandbox % db( used db size in DB02 ) increase is 60 to 80 which is as per SAP figures after unicode conversion however on dev it is almost 150 % .
Has anybody faced the same problem before or can you suggest somthing for this tremendous data growth
Also what I have observed is threre are 7 tablespace on sandbox and 8 on Dev with "PSAPSR370" as the additional one on DEV ( PSAPSR3700 is different )
Please help me on this
-Ganesh

Hi Ganesh,
the database growth depends heavily on the content. Which languages do you have installed in your system and which modules are used by your company?
I think the normal growth should be in a range of zero to 30 percent. The lower end comes from the reorganization effect due to the migration and 30 percent comes due to the fact that most data is normally numeric and numbers are the same with or without Unicode. But in rare situations it could be possible to see much higher values. The reason is, that Oracle, and also DB2, are using UTF-8 encoding if you're running your system as Unicode system. In UTF-8 a two byte Unicode character could be encoded in up to four bytes. That means if your systems contains lots of double byte characters it could grow much more than 30% but it seems unusual to me.
Are you sure that nothing was wrong with your migration?
Best regards
Ralph Ganszky

Simple two node Cluster Install - Hung after reboot of first node

Hello,
Over the past couple of days I have tried to install a simple two node cluster using two identical SunFire X4200s, firstly following the recipe in: http://www.sun.com/software/solaris/howtoguides/twonodecluster.jsp
and when that failed referring to http://docs.sun.com/app/docs/doc/819-0912 and http://docs.sun.com/app/docs/doc/819-2970.
I am trying to keep the install process as simple as possible, no switch, just back to back connections for the internal networking (node1 e1000g0 <--> node2 e1000g0, node1 e1000g1 <--> node2 e1000g1)
I ran the installer on both X4200s with default answers. This went through smoothly without problems.
I ran scinstall on node1, first time through, choosing "typical" as suggested in the how to guide. Everything goes OK (no errors) node2 reboots, but node1 just sits there waiting for node2, no errors, nothing....
I also tried rerunning scinstall choosing "Custom", and then selecting the no switch option. Same thing happened.
I must be doing something stupid, it's such a simple setup! Any ideas??
Here's the final screen from node1 (dcmds0) in both cases:
Cluster Creation
Log file - /var/cluster/logs/install/scinstall.log.940
Checking installation status ... done
The Sun Cluster software is installed on "dcmds0".
The Sun Cluster software is installed on "dcmds1".
Started sccheck on "dcmds0".
Started sccheck on "dcmds1".
sccheck completed with no errors or warnings for "dcmds0".
sccheck completed with no errors or warnings for "dcmds1".
Configuring "dcmds1" ... done
Rebooting "dcmds1" ...
Output from scconf on node2 (dcmds1):
bash-3.00# scconf -p
Cluster name: dcmdscluster
Cluster ID: 0x47538959
Cluster install mode: enabled
Cluster private net: 172.16.0.0
Cluster private netmask: 255.255.248.0
Cluster maximum nodes: 64
Cluster maximum private networks: 10
Cluster new node authentication: unix
Cluster authorized-node list: dcmds0 dcmds1
Cluster transport heart beat timeout: 10000
Cluster transport heart beat quantum: 1000
Round Robin Load Balancing UDP session timeout: 480
Cluster nodes: dcmds1
Cluster node name: dcmds1
Node ID: 1
Node enabled: yes
Node private hostname: clusternode1-priv
Node quorum vote count: 1
Node reservation key: 0x4753895900000001
Node zones: <NULL>
CPU shares for global zone: 1
Minimum CPU requested for global zone: 1
Node transport adapters: e1000g0 e1000g1
Node transport adapter: e1000g0
Adapter enabled: no
Adapter transport type: dlpi
Adapter property: device_name=e1000g
Adapter property: device_instance=0
Adapter property: lazy_free=1
Adapter property: dlpi_heartbeat_timeout=10000
Adapter property: dlpi_heartbeat_quantum=1000
Adapter property: nw_bandwidth=80
Adapter property: bandwidth=70
Adapter port names: <NULL>
Node transport adapter: e1000g1
Adapter enabled: no
Adapter transport type: dlpi
Adapter property: device_name=e1000g
Adapter property: device_instance=1
Adapter property: lazy_free=1
Adapter property: dlpi_heartbeat_timeout=10000
Adapter property: dlpi_heartbeat_quantum=1000
Adapter property: nw_bandwidth=80
Adapter property: bandwidth=70
Adapter port names: <NULL>
Cluster transport switches: <NULL>
Cluster transport cables
Endpoint Endpoint State
Quorum devices: <NULL>
Rob.

I have found out why the install hung - this needs to be added into the install guide(s) at once!! - It's VERY frustrating when an install guide is incomplete!
The solution is posted in the HA-Cluster OpenSolaris forums at:
http://opensolaris.org/os/community/ha-clusters/ohac/Documentation/SCXdocs/relnotes/#bugs
In particular, my problem was that I selected to make my Solaris install secure (A good idea, I thought!). Unfortunately, this stops Sun Cluster from working. To fix the problem you need to perform the following steps on each secured node:
Problem Summary: During Solaris installation, the setting of a restricted network profile disables external access to network services that Sun Cluster functionality uses, ie: The RPC communication service, which is required for cluster communication
Workaround: Restore external access to RPC communication.
Perform the following commands to restore external access to RPC communication.
# svccfg
svc:> select network/rpc/bind
svc:/network/rpc/bind> setprop config/local_only=false
svc:/network/rpc/bind> quit
# svcadm refresh network/rpc/bind:default
# svcprop network/rpc/bind:default | grep local_only
Once I applied these commands, the install process continued ... AT LAST!!!
Rob.

We are contemplating a Mac for a family Christmas gift -we currently use a windows based laptop. Do the two systems interface? If a child starts his homework on the pc - can he finish it on the Mac? Can they both be hooked up to the same printer?

We are Mac beginners, considerin a Mac for a family Christmas gift. We currently share one windows based pc. Does anyone use a Mac AND a pc in their household? Would the two systems interface? (Can you start your homework on one system, but finish it on another?) Can they both be hooked up to the same printer? What advice would you give us as we consider this big purchase?

You may find this useful:
Switching from Windows to Mac:
http://support.apple.com/kb/HT2514?viewlocale=en_US
and
http://support.apple.com/kb/HT2518?viewlocale=en_US
and possible even the 'propaganda bits':
Macs are cheaper to own that PCs: http://techpatio.com/2010/apple/mac/it-admins-total-cost-ownership-mac-less-pc
and: http://www.zdnet.com/blog/apple/tco-new-research-finds-macs-in-the-enterprise-ea sier-cheaper-to-manage-than-windows-pcs/6294
Why will you love Mac? http://www.apple.com/why-mac/
- Better Hardware http://www.apple.com/why-mac/better-hardware/
- Better software http://www.apple.com/why-mac/better-software/
- Better OS http://www.apple.com/why-mac/better-os/
- Better Support http://www.apple.com/why-mac/better-support/
- It's Compatible http://www.apple.com/why-mac/its-compatible/

Creation of trusted RFCs between two systems

Hai,
Can anyone help me to create TRUSTED RFC between two systems ie Solution Manager and the Production. To carry out Service desk facility.

Dear Mohan,
You can create RFC's in the Solution manager system using T-Code SMSY. there you find all your installed systems select the system from the left hand side under Landscape Components click on Application Server ABAP on the right hand side you will find a tab client in this screen select the client for which you want to create TRFC. Go to change mode and click on the button Generate RFC with Assistance. Carry out the wizard for creating the RFC's...

How to check whether transport path exist between two systems in sld??

Hi,
I have two systems namely 'A' and 'B' and created business systems for both of them.Then i created transport path between the two systems.How i check whether what i have done is right in SLD.

<b>WRT to CMS</b>
am not sure with this but u can try:
1. Start CMS: http://<host>:<J2EE Engine http port>/webdynpro/dispatcher/sap.com/tcSLCMS~WebUI/Cms.
2. Goto lansdscape configurator and check there
Message was edited by:
Prabhu S

Two System cluster panics due to SCSI reservation

Similar Messages

Maybe you are looking for