"cluster.ClusteredHandlerFactory: Failed to initialize Coherence Cluster"

Hi,
I have a cluster with Admin, Proxy and MS1 on one server and MS2 ,MS3 on another server.
Below is the content of tangosol-coherence.xml file modified according to my env set up (The file is taken from Middleware1036/coherence_3.7/lib/coherence.jar and retained the content that is only required for the unicast configuration)
<cluster-config>
<member-identity>

<cluster-name system-property="tangosol.coherence.cluster">ThirdCluster
</cluster-name>
</member-identity>
<unicast-listener>

<well-known-addresses>
<socket-address id="1">
<address>host1.example.com</address>
<port>31171</port>
</socket-address>
<socket-address id="2">
<address>host2.example.com</address>
<port>31172</port>
</socket-address>
<socket-address id="3">
<address>host2.example.com</address>
<port>31173</port>
</socket-address>
</well-known-addresses>
</unicast-listener>
</cluster-config>
After configuring the cluster domain with coherence, I made all the managed servers up with Admin and proxy and installed my application(Oracle communications order and service management product) into the cluster.
As per recommendation I need to restart all the servers to see my application osm.ear in active state.
Admin, proxy got restarted but while restarting the managed servers below is the error message I am encountering:
Hence please assist me on why is it saying "Failed to initialize the coherence cluster" .
####<Aug 6, 2012 6:20:04 AM PDT> <Error> <oms> <blr2230328> <ms1> <[STANDBY] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'> <oms-internal> <> <0000JZw4SMKB1FHpMs8Dye1G7wBy000001> <1344259204521> <BEA-000000> <cluster.ClusteredHandlerFactory: Failed to initialize Coherence cluster
com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=0, Name=Cluster, Type=Cluster
MemberSet=MasterMemberSet(
ThisMember=null
OldestMember=null
ActualMemberSet=MemberSet(Size=0
MemberId|ServiceVersion|ServiceJoined|MemberState
RecycleMillis=1200000
RecycleSet=MemberSet(Size=0
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:3)
at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:56)
at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:10)
at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:26)
at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:427)
at com.mslv.oms.handler.cluster.g.refresh(Unknown Source)
at oracle.communications.ordermanagement.listener.impl.a.a(Unknown Source)
at com.mslv.oms.handler.cluster.ClusteredHandlerFactory.<clinit>(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at com.mslv.oms.security.HandlerFactory.b(Unknown Source)
at com.mslv.oms.security.HandlerFactory.startup(Unknown Source)
at com.mslv.oms.j2ee.LifecycleListener.postStart(Unknown Source)
at weblogic.application.internal.flow.BaseLifecycleFlow$PostStartAction.run(BaseLifecycleFlow.java:297)
at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
at weblogic.application.internal.flow.BaseLifecycleFlow$LifecycleListenerAction.invoke(BaseLifecycleFlow.java:199)
at weblogic.application.internal.flow.BaseLifecycleFlow.postStart(BaseLifecycleFlow.java:71)
at weblogic.application.internal.flow.TailLifecycleFlow.activate(TailLifecycleFlow.java:33)
at weblogic.application.internal.BaseDeployment$2.next(BaseDeployment.java:671)
at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:52)
at weblogic.application.internal.BaseDeployment.activate(BaseDeployment.java:212)
at weblogic.application.internal.EarDeployment.activate(EarDeployment.java:59)
at weblogic.application.internal.DeploymentStateChecker.activate(DeploymentStateChecker.java:161)
at weblogic.deploy.internal.targetserver.AppContainerInvoker.activate(AppContainerInvoker.java:79)
at weblogic.deploy.internal.targetserver.BasicDeployment.activate(BasicDeployment.java:184)
at weblogic.deploy.internal.targetserver.BasicDeployment.activateFromServerLifecycle(BasicDeployment.java:361)
at weblogic.management.deploy.internal.DeploymentAdapter$1.doActivate(DeploymentAdapter.java:51)
at weblogic.management.deploy.internal.DeploymentAdapter.activate(DeploymentAdapter.java:200)
at weblogic.management.deploy.internal.AppTransition$2.transitionApp(AppTransition.java:30)
at weblogic.management.deploy.internal.ConfiguredDeployments.transitionApps(ConfiguredDeployments.java:240)
at weblogic.management.deploy.internal.ConfiguredDeployments.activate(ConfiguredDeployments.java:169)
at weblogic.management.deploy.internal.ConfiguredDeployments.deploy(ConfiguredDeployments.java:123)
at weblogic.management.deploy.internal.DeploymentServerService.resume(DeploymentServerService.java:180)
at weblogic.management.deploy.internal.DeploymentServerService.start(DeploymentServerService.java:96)
at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64)
at weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:545)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:256)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:221)

user768552 wrote:
Hi,
I have a cluster with Admin, Proxy and MS1 on one server and MS2 ,MS3 on another server.
Below is the content of tangosol-coherence.xml file modified according to my env set up (The file is taken from Middleware1036/coherence_3.7/lib/coherence.jar and retained the content that is only required for the unicast configuration)
<cluster-config>
<member-identity>

<cluster-name system-property="tangosol.coherence.cluster">ThirdCluster
</cluster-name>
</member-identity>
<unicast-listener>

<well-known-addresses>
<socket-address id="1">
<address>host1.example.com</address>
<port>31171</port>
</socket-address>
<socket-address id="2">
<address>host2.example.com</address>
<port>31172</port>
</socket-address>
<socket-address id="3">
<address>host2.example.com</address>
<port>31173</port>
</socket-address>
</well-known-addresses>
</unicast-listener>
</cluster-config>
After configuring the cluster domain with coherence, I made all the managed servers up with Admin and proxy and installed my application(Oracle communications order and service management product) into the cluster.
As per recommendation I need to restart all the servers to see my application osm.ear in active state.
Admin, proxy got restarted but while restarting the managed servers below is the error message I am encountering:
Hence please assist me on why is it saying "Failed to initialize the coherence cluster" .
####<Aug 6, 2012 6:20:04 AM PDT> <Error> <oms> <blr2230328> <ms1> <[STANDBY] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'> <oms-internal> <> <0000JZw4SMKB1FHpMs8Dye1G7wBy000001> <1344259204521> <BEA-000000> <cluster.ClusteredHandlerFactory: Failed to initialize Coherence cluster
com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=0, Name=Cluster, Type=Cluster
MemberSet=MasterMemberSet(
ThisMember=null
OldestMember=null
ActualMemberSet=MemberSet(Size=0
MemberId|ServiceVersion|ServiceJoined|MemberState
RecycleMillis=1200000
RecycleSet=MemberSet(Size=0
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:3)
at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:56)
at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:10)
at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:26)
at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:427)
at com.mslv.oms.handler.cluster.g.refresh(Unknown Source)
at oracle.communications.ordermanagement.listener.impl.a.a(Unknown Source)
at com.mslv.oms.handler.cluster.ClusteredHandlerFactory.<clinit>(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at com.mslv.oms.security.HandlerFactory.b(Unknown Source)
at com.mslv.oms.security.HandlerFactory.startup(Unknown Source)
at com.mslv.oms.j2ee.LifecycleListener.postStart(Unknown Source)
at weblogic.application.internal.flow.BaseLifecycleFlow$PostStartAction.run(BaseLifecycleFlow.java:297)
at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
at weblogic.application.internal.flow.BaseLifecycleFlow$LifecycleListenerAction.invoke(BaseLifecycleFlow.java:199)
at weblogic.application.internal.flow.BaseLifecycleFlow.postStart(BaseLifecycleFlow.java:71)
at weblogic.application.internal.flow.TailLifecycleFlow.activate(TailLifecycleFlow.java:33)
at weblogic.application.internal.BaseDeployment$2.next(BaseDeployment.java:671)
at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:52)
at weblogic.application.internal.BaseDeployment.activate(BaseDeployment.java:212)
at weblogic.application.internal.EarDeployment.activate(EarDeployment.java:59)
at weblogic.application.internal.DeploymentStateChecker.activate(DeploymentStateChecker.java:161)
at weblogic.deploy.internal.targetserver.AppContainerInvoker.activate(AppContainerInvoker.java:79)
at weblogic.deploy.internal.targetserver.BasicDeployment.activate(BasicDeployment.java:184)
at weblogic.deploy.internal.targetserver.BasicDeployment.activateFromServerLifecycle(BasicDeployment.java:361)
at weblogic.management.deploy.internal.DeploymentAdapter$1.doActivate(DeploymentAdapter.java:51)
at weblogic.management.deploy.internal.DeploymentAdapter.activate(DeploymentAdapter.java:200)
at weblogic.management.deploy.internal.AppTransition$2.transitionApp(AppTransition.java:30)
at weblogic.management.deploy.internal.ConfiguredDeployments.transitionApps(ConfiguredDeployments.java:240)
at weblogic.management.deploy.internal.ConfiguredDeployments.activate(ConfiguredDeployments.java:169)
at weblogic.management.deploy.internal.ConfiguredDeployments.deploy(ConfiguredDeployments.java:123)
at weblogic.management.deploy.internal.DeploymentServerService.resume(DeploymentServerService.java:180)
at weblogic.management.deploy.internal.DeploymentServerService.start(DeploymentServerService.java:96)
at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64)
at weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:545)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:256)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:221)Seems like, the OSM application is unable to join the cluster. Modify the WKA definition as below:
<well-known-addresses>
<socket-address id="1">
<address system-property="tangosol.coherence.wka1"></address>
<port system-property="tangosol.coherence.wka1.port"></port>
</socket-address>
</well-known-addresses>
Now, in the ServerStart of the ManagedServers running the OSM application add the following -Dtangosol.coherence.wka1=<> -Dtangosol.coherence.wka1.port=<>
Hope this helps!
Cheers,
NJ

Similar Messages

Cluster disk failed - 2 node multisite cluster

i am setting up a 2 node , 2 site cluster with no shared disks. on each node I have a C, D and E drive.
file share witness is setup on a 3rd site as a share.
when i go to failover cluster administrator, I see that cluster disk 1 and 2 are failed. any idea why?

Hi,
From your description “2 site cluster with no shared disks. on each node I have a C, D and E drive.“ it seems you didn’t setup the CSV disk, but in the server 2008 or above
system edition failover cluster the CSV is the necessary condition, you must setup the CSV disk first.
If you want to create one failover cluster in the Geographically separated site, please refer the following related article first.
SQL Server 2012 AlwaysOn – Part 4 – SAP configuration in Geo-Cluster configuration
http://blogs.msdn.com/b/saponsqlserver/archive/2012/02/29/sql-server-2012-alwayson-part-4-sap-configuration-in-geo-cluster-configuration.aspx
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

SLD : Failed to initialize cluster notification during SLD startup

hello,
We are using WAS 7.0.
While starting the SLD the server gives following error.
Failed to send start signal for complete cluster, SLD may still be stopped on other cluster nodes.
in the log of SLD following entries are there..
FATAL com.sap.lcr.cimsrv.CIMOMServlet: SLD initialization failure, can not set up cluster-wide event notification. Please check your JMS provider for errors.
3428858 11/22/2007 13:42:11.515 [SAPEngine_Application_Thread[impl:3]_70] FATAL com.sap.lcr.cimsrv.CIMOMServlet: SLD initialization failure, can not set up cluster-wide event notification. Please check your JMS provider for errors.
Thrown:
com.sap.sld.api.wbem.exception.CIMException: CIM_ERR_FAILED: Failed to initialize cluster notification. Please check your JNDI service and JMS provider for errors.
at com.sap.lcr.cimsrv.ClusterNotificationListener.<init>(ClusterNotificationListener.java:154)
at com.sap.lcr.cimsrv.ClusterNotificationListener.start(ClusterNotificationListener.java:69)
at com.sap.lcr.cimsrv.CIMOMServlet.init(CIMOMServlet.java:109)
at javax.servlet.GenericServlet.init(GenericServlet.java:258)
at com.sap.engine.services.servlets_jsp.server.runtime.context.WebComponents.getServlet(WebComponents.java:339)
at com.sap.engine.services.servlets_jsp.server.HttpHandlerImpl.runServlet(HttpHandlerImpl.java:354)
at com.sap.engine.services.servlets_jsp.server.HttpHandlerImpl.handleRequest(HttpHandlerImpl.java:266)
at com.sap.engine.services.httpserver.server.RequestAnalizer.startServlet(RequestAnalizer.java:387)
at com.sap.engine.services.httpserver.server.RequestAnalizer.startServlet(RequestAnalizer.java:365)
at com.sap.engine.services.httpserver.server.RequestAnalizer.invokeWebContainer(RequestAnalizer.java:944)
at com.sap.engine.services.httpserver.server.RequestAnalizer.handle(RequestAnalizer.java:266)
at com.sap.engine.services.httpserver.server.Client.handle(Client.java:95)
at com.sap.engine.services.httpserver.server.Processor.request(Processor.java:175)
at com.sap.engine.core.service630.context.cluster.session.ApplicationSessionMessageListener.process(ApplicationSessionMessageListener.java:33)
at com.sap.engine.core.cluster.impl6.session.MessageRunner.run(MessageRunner.java:41)
at com.sap.engine.core.thread.impl3.ActionObject.run(ActionObject.java:37)
at java.security.AccessController.doPrivileged(Native Method)
at com.sap.engine.core.thread.impl3.SingleThread.execute(SingleThread.java:100)
at com.sap.engine.core.thread.impl3.SingleThread.run(SingleThread.java:170)
Caused by: javax.jms.JMSException:
at com.sap.jms.protocol.notification.ServerExceptionResponse.getException(ServerExceptionResponse.java:271)
at com.sap.jms.client.session.Session.checkReceivedPacket(Session.java:2614)
at com.sap.jms.client.session.Session.createConsumer(Session.java:2173)
at com.sap.jms.client.session.TopicSession.createSubscriber(TopicSession.java:39)
at com.sap.lcr.cimsrv.ClusterNotificationListener.<init>(ClusterNotificationListener.java:142)
... 18 more
caused by:
javax.jms.JMSException:
at com.sap.jms.protocol.notification.ServerExceptionResponse.getException(ServerExceptionResponse.java:271)
at com.sap.jms.client.session.Session.checkReceivedPacket(Session.java:2614)
at com.sap.jms.client.session.Session.createConsumer(Session.java:2173)
at com.sap.jms.client.session.TopicSession.createSubscriber(TopicSession.java:39)
at com.sap.lcr.cimsrv.ClusterNotificationListener.<init>(ClusterNotificationListener.java:142)
at com.sap.lcr.cimsrv.ClusterNotificationListener.start(ClusterNotificationListener.java:69)
at com.sap.lcr.cimsrv.CIMOMServlet.init(CIMOMServlet.java:109)
at javax.servlet.GenericServlet.init(GenericServlet.java:258)
at com.sap.engine.services.servlets_jsp.server.runtime.context.WebComponents.getServlet(WebComponents.java:339)
at com.sap.engine.services.servlets_jsp.server.HttpHandlerImpl.runServlet(HttpHandlerImpl.java:354)
at com.sap.engine.services.servlets_jsp.server.HttpHandlerImpl.handleRequest(HttpHandlerImpl.java:266)
at com.sap.engine.services.httpserver.server.RequestAnalizer.startServlet(RequestAnalizer.java:387)
at com.sap.engine.services.httpserver.server.RequestAnalizer.startServlet(RequestAnalizer.java:365)
at com.sap.engine.services.httpserver.server.RequestAnalizer.invokeWebContainer(RequestAnalizer.java:944)
at com.sap.engine.services.httpserver.server.RequestAnalizer.handle(RequestAnalizer.java:266)
at com.sap.engine.services.httpserver.server.Client.handle(Client.java:95)
at com.sap.engine.services.httpserver.server.Processor.request(Processor.java:175)
at com.sap.engine.core.service630.context.cluster.session.ApplicationSessionMessageListener.process(ApplicationSessionMessageListener.java:33)
at com.sap.engine.core.cluster.impl6.session.MessageRunner.run(MessageRunner.java:41)
at com.sap.engine.core.thread.impl3.ActionObject.run(ActionObject.java:37)
at java.security.AccessController.doPrivileged(Native Method)
at com.sap.engine.core.thread.impl3.SingleThread.execute(SingleThread.java:100)
at com.sap.engine.core.thread.impl3.SingleThread.run(SingleThread.java:170)
Please help.
Thanks,
Manoj

Hi Pankaj,
The ABAP RFC is working fine but the JCo RFC is not there.
But in the create JCo connect button in Content Administration -->webdynpro.
Please suggest.
Thanks
Manoj

Urgent! Node keep disconnecting from Coherence Cluster

The system consists of 4 standalone cache servers with local storage set to true and 14 other embedded nodes started with different web apps on tomcat with local storage set to false.
When the servers are started after a new deployment, sometimes it would just work, but most times some random tomcat server will stuck in the following pattern.
First it would successful start the cluster service and join an existing cluster.
Oracle Coherence Version 3.5.1/461
Grid Edition: Development mode
Copyright (c) 2000, 2009, Oracle and/or its affiliates. All rights reserved.
2012-07-18 12:24:33.335/31.845 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2012-07-18 12:24:33.550/32.060 Oracle Coherence GE 3.5.1/461 <Info> (thread=Cluster, member=n/a): This Member(Id=8, Timestamp=2012-07-18 12:24:33.347, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:20831, Role=ApacheCatalinaStartupBootstrap, Edition=Grid Edition, Mode=Development, CpuCount=24, SocketCount=24) joined cluster "DEV1" with senior Member(Id=10, Timestamp=2012-07-18 09:39:44.861, Address=10.34.32.101:8090, MachineId=2149, Location=machine:dev1ssapp3,process:27796, Role=ApacheCatalinaStartupBootstrap, Edition=Grid Edition, Mode=Development, CpuCount=64, SocketCount=64)
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=1, Timestamp=2012-07-18 12:22:14.231, Address=10.34.32.107:8090, MachineId=2155, Location=machine:dev1sxapp2,process:1278, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=2, Timestamp=2012-07-18 12:22:14.331, Address=10.34.32.106:8089, MachineId=2154, Location=machine:dev1sxapp1,process:6549, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=3, Timestamp=2012-07-18 12:22:55.086, Address=10.34.32.106:8088, MachineId=2154, Location=machine:dev1sxapp1,process:23083, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=4, Timestamp=2012-07-18 12:22:56.799, Address=10.34.32.107:8088, MachineId=2155, Location=machine:dev1sxapp2,process:19624, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=5, Timestamp=2012-07-18 12:24:31.869, Address=10.34.32.106:8090, MachineId=2154, Location=machine:dev1sxapp1,process:24411, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=6, Timestamp=2012-07-18 12:24:33.084, Address=10.34.32.101:8088, MachineId=2149, Location=machine:dev1ssapp3,process:28932, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=14, Timestamp=2012-07-18 09:40:50.645, Address=10.34.32.104:8090, MachineId=2152, Location=machine:dev1ssapp4,process:17697, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.556/32.066 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=17, Timestamp=2012-07-18 10:35:16.722, Address=10.34.32.104:8093, MachineId=2152, Location=machine:dev1ssapp4,process:19365, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.556/32.066 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=18, Timestamp=2012-07-18 10:38:47.714, Address=10.34.32.101:8093, MachineId=2149, Location=machine:dev1ssapp3,process:29887, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.563/32.073 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 10 joined Service Management with senior member 10
2012-07-18 12:24:33.566/32.076 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 1 joined Service Management with senior member 10
2012-07-18 12:24:33.566/32.076 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 2 joined Service Management with senior member 10
2012-07-18 12:24:33.566/32.076 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 4 joined Service Management with senior member 10
2012-07-18 12:24:33.566/32.076 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 17 joined Service Management with senior member 10
2012-07-18 12:24:33.566/32.076 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 17 joined Service PFExpiryDistributedCache with senior member 17
2012-07-18 12:24:33.566/32.076 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 17 joined Service SsoRuleEntryDistributedCache with senior member 17
2012-07-18 12:24:33.567/32.077 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 3 joined Service Management with senior member 10
2012-07-18 12:24:33.567/32.077 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 18 joined Service Management with senior member 10
2012-07-18 12:24:33.567/32.077 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 18 joined Service PFExpiryDistributedCache with senior member 17
2012-07-18 12:24:33.567/32.077 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 18 joined Service SsoRuleEntryDistributedCache with senior member 17
2012-07-18 12:24:33.568/32.078 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 14 joined Service Management with senior member 10
2012-07-18 12:24:33.568/32.078 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 5 joined Service Management with senior member 10
2012-07-18 12:24:33.579/32.089 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 6 joined Service Management with senior member 10
Then it started getting heartbeat overdue message and cluster stopped:
2012-07-18 12:37:20.717/799.227 Oracle Coherence GE 3.5.1/461 <Info> (thread=PacketListenerN, member=8): Scheduled senior member heartbeat is overdue; rejoining multicast group.
2012-07-18 12:37:29.916/808.426 Oracle Coherence GE 3.5.1/461 <Info> (thread=PacketListenerN, member=8): Scheduled senior member heartbeat is overdue; rejoining multicast group.
2012-07-18 12:37:59.291/837.801 Oracle Coherence GE 3.5.1/461 <Error> (thread=PacketListenerN, member=8): Stopping cluster due to unhandled exception: com.tangosol.net.messaging.ConnectionException: Unable to refresh sockets: [UnicastUdpSocket{State=STATE_OPEN, address:port=10.34.32.107:8089}, MulticastUdpSocket{State=STATE_OPEN, address:port=237.0.0.1:40109, InterfaceAddress=10.34.32.107, TimeToLive=4}, TcpSocketAccepter{State=STATE_OPEN, ServerSocket=10.34.32.107:8089}]; last failed socket: MulticastUdpSocket{State=STATE_OPEN, address:port=237.0.0.1:40109, InterfaceAddress=10.34.32.107, TimeToLive=4}
at com.tangosol.coherence.component.net.Cluster$SocketManager.refreshSockets(Cluster.CDB:91)
at com.tangosol.coherence.component.net.Cluster$SocketManager$MulticastUdpSocket.onInterruptedIOException(Cluster.CDB:9)
at com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:33)
at com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: Receive timed out
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:145)
at java.net.DatagramSocket.receive(DatagramSocket.java:725)
at com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
at com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
at java.lang.Thread.run(Thread.java:662)
2012-07-18 12:37:59.291/837.801 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=8): Service Cluster left the cluster
2012-07-18 12:37:59.293/837.803 Oracle Coherence GE 3.5.1/461 <D5> (thread=Invocation:Management, member=8): Service Management left the cluster
2012-07-18 12:37:59.293/837.803 Oracle Coherence GE 3.5.1/461 <D5> (thread=ReplicatedCache:HibernateReplicatedCache, member=8): Service HibernateReplicatedCache left the cluster
Then it started getting messages from various nodes about the existing cluster:
2012-07-18 12:40:02.862/961.372 Oracle Coherence GE 3.5.1/461 <Info> (thread=queue://authenticationService.logonEvent.consumer-2, member=n/a): Restarting cluster
2012-07-18 12:40:02.891/961.401 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2012-07-18 12:40:20.167/978.677 Oracle Coherence GE 3.5.1/461 <Warning> (thread=Cluster, member=n/a): This Member(Id=0, Timestamp=2012-07-18 12:40:02.867, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:20831, Role=ApacheCatalinaStartupBootstrap) has been attempting to join the cluster at address 237.0.0.1:40109 with TTL 4 for 17 seconds without success; this could indicate a mis-configured TTL value, or it may simply be the result of a busy cluster or active failover.
2012-07-18 12:40:20.168/978.678 Oracle Coherence GE 3.5.1/461 <Warning> (thread=Cluster, member=n/a): Received a discovery message that indicates the presence of an existing cluster:
Message "NewMemberAnnounceWait"
FromMember=Member(Id=4, Timestamp=2012-07-18 12:22:56.799, Address=10.34.32.107:8088, MachineId=2155, Location=machine:dev1sxapp2,process:19624, Role=ApacheCatalinaStartupBootstrap)
FromMessageId=0
Internal=false
MessagePartCount=1
PendingCount=0
MessageType=9
ToPollId=0
Poll=null
Packets
[000]=Broadcast{PacketType=0x0DDF00D2, ToId=0, FromId=4, Direction=Incoming, ReceivedMillis=12:40:20.167, MessageType=9, MessagePartCount=1, MessagePartIndex=0, Body=0x00000001389AE63E1F0A22206B00000000000000000000000040001
F980000086B000405011818044445563140400A64657631737861707032053139363234401E417061636865436174616C696E6153746172747570426F6F7473747261700001000001389AF5E6330A22206B00000000000000000000000040001F990000086B000005011818044445563140
400A64657631737861707032053230383331401E417061636865436174616C696E6153746172747570426F6F74737472617000000001389AE6376E0A22206A00000000000000000000000040001F980000086A000305011818044445563140400A646576317378617070310532333038334
01E417061636865436174616C696E6153746172747570426F6F74737472617000, Body.length=287}
Service=ClusterService{Name=Cluster, State=(SERVICE_STARTED, STATE_ANNOUNCE), Id=0, Version=3.5}
ToMemberSet=null
NotifySent=false
ToMember=Member(Id=0, Timestamp=2012-07-18 12:40:02.867, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:20831, Role=ApacheCatalinaStartupBootstrap)
SeniorMember=Member(Id=3, Timestamp=2012-07-18 12:22:55.086, Address=10.34.32.106:8088, MachineId=2154, Location=machine:dev1sxapp1,process:23083, Role=ApacheCatalinaStartupBootstrap)
Then it failed to connect to the cluster:
2012-07-18 12:40:33.187/991.697 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster left the cluster
2012-07-18 12:40:33.190/991.700 Oracle Coherence GE 3.5.1/461 <Error> (thread=queue://authenticationService.logonEvent.consumer-2, member=n/a): Error while starting cluster: com.tangosol.net.RequestTimeoutException:
Timeout during service start: ServiceInfo(Id=0, Name=Cluster, Type=Cluster
MemberSet=ServiceMemberSet(
OldestMember=n/a
ActualMemberSet=MemberSet(Size=0, BitSetCount=0
MemberId/ServiceVersion/ServiceJoined/ServiceLeaving
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:6)
at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:38)
at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:366)
at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:7)
at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:27)
at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:1011)
at com.tangosol.coherence.hibernate.CoherenceCacheProvider.nextTimestamp(CoherenceCacheProvider.java:58)
at org.hibernate.cache.impl.bridge.RegionFactoryCacheProviderBridge.nextTimestamp(RegionFactoryCacheProviderBridge.java:93)
at org.hibernate.impl.SessionFactoryImpl.openSession(SessionFactoryImpl.java:652)
at org.hibernate.ejb.EntityManagerImpl.getRawSession(EntityManagerImpl.java:111)
at org.hibernate.ejb.EntityManagerImpl.getSession(EntityManagerImpl.java:91)
at org.hibernate.ejb.AbstractEntityManagerImpl.setDefaultProperties(AbstractEntityManagerImpl.java:250)
at org.hibernate.ejb.AbstractEntityManagerImpl.postInit(AbstractEntityManagerImpl.java:162)
at org.hibernate.ejb.EntityManagerImpl.<init>(EntityManagerImpl.java:84)
at org.hibernate.ejb.EntityManagerFactoryImpl.createEntityManager(EntityManagerFactoryImpl.java:112)
at org.hibernate.ejb.EntityManagerFactoryImpl.createEntityManager(EntityManagerFactoryImpl.java:107)
at org.springframework.orm.jpa.JpaTransactionManager.createEntityManagerForTransaction(JpaTransactionManager.java:399)
at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:321)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:371)
at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:335)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:105)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:621)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:560)
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:498)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:467)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:325)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:263)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1058)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1050)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:947)
at java.lang.Thread.run(Thread.java:662)
2012-07-18 12:40:33.216/991.726 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2012-07-18 12:40:50.398/1008.908 Oracle Coherence GE 3.5.1/461 <Warning> (thread=Cluster, member=n/a): This Member(Id=0, Timestamp=2012-07-18 12:40:33.194, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:20831, Role=ApacheCatalinaStartupBootstrap) has been attempting to join the cluster at address 237.0.0.1:40109 with TTL 4 for 17 seconds without success; this could indicate a mis-configured TTL value, or it may simply be the result of a busy cluster or active failover.
This particular jvm then start go into this kind of loop, receive a lot of messages from other nodes about the exist cluster but failed to join.
I have ran the MulticastTest and Datagram test which didn't reveal any obvious network issue. What should I do next?
JVM is 1.6.0_31
Thanks a lot in advance, any help will be greatly appreciated.

I correlated the log with all servers and found the issue might be due to some member it is connected with actually was being restarted.
Server 1:
- starts as member 23 and discovered the existing cluster and joined it. Then a lot of messages on server1 with all different members joining the cluster with different member id.
- Then it found some member failed to respond:
2012-07-30 22:00:25.371/34.325 Oracle Coherence GE 3.5.1/461 <D6> (thread=PacketPublisher, member=n/a): Member(Id=5, Timestamp=2012-07-30 15:35:09.735, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:21324, Role=ApacheCatalinaStartupBootstrap) has failed to respond to 17 packets; declaring this member as paused.
- Then it's requesting the departure confirmation for member 5:
2012-07-30 22:00:52.042/60.996 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=n/a): Timeout while delivering a packet Directed{PacketType=0x0DDF00D5, ToId=0, FromId=23, Direction=Outgoing, SentCount=145, SentMillis=22:00:51.832, ToMemberSet=[5(1)], ServiceId=0, MessageType=16, FromMessageId=6, ToMessageId=0, MessagePartCount=1, MessagePartIndex=0, NackInProgress=false, ResendScheduled=22:00:52.32, Timeout=22:00:51.849, PendingResendSkips=0, DeliveryState=unsent, Body=0x0000000200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000..., Body.length=1398}; requesting the departure confirmation for Member(Id=5, Timestamp=2012-07-30 15:35:09.735, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:21324, Role=ApacheCatalinaStartupBootstrap)
by MemberSet(Size=2, BitSetCount=2
Member(Id=1, Timestamp=2012-07-27 10:46:51.616, Address=10.34.32.101:8088, MachineId=2149, Location=machine:dev1ssapp3,process:1, Role=CoherenceServer)
Member(Id=12, Timestamp=2012-07-30 22:00:43.132, Address=10.34.32.101:8091, MachineId=2149, Location=machine:dev1ssapp3,process:7803, Role=ApacheCatalinaStartupBootstrap)
- Then the member set confirmed the departure however at the same time, service cluster also left.
2012-07-30 22:00:52.046/61.000 Oracle Coherence GE 3.5.1/461 <Info> (thread=Cluster, member=n/a): Member departure confirmed by MemberSet(Size=1, BitSetCount=2
Member(Id=12, Timestamp=2012-07-30 22:00:43.132, Address=10.34.32.101:8091, MachineId=2149, Location=machine:dev1ssapp3,process:7803, Role=ApacheCatalinaStartupBootstrap)
); removing Member(Id=5, Timestamp=2012-07-30 15:35:09.735, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:21324, Role=ApacheCatalinaStartupBootstrap)
2012-07-30 22:00:52.046/61.000 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=5, Timestamp=2012-07-30 22:00:52.046, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:21324, Role=ApacheCatalinaStartupBootstrap) left Cluster with senior member 1
2012-07-30 22:00:52.049/61.003 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster left the cluster
- Then the timeout during service start hence application fails to start
2012-07-30 22:00:52.051/61.005 Oracle Coherence GE 3.5.1/461 <Error> (thread=main, member=n/a): Error while starting cluster: com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=0, Name=Cluster, Type=Cluster
MemberSet=ServiceMemberSet(
OldestMember=n/a
ActualMemberSet=MemberSet(Size=15, BitSetCount=2
Member(Id=1, Timestamp=2012-07-27 10:46:51.616, Address=10.34.32.101:8088, MachineId=2149, Location=machine:dev1ssapp3,process:1, Role=CoherenceServer)
Member(Id=2, Timestamp=2012-07-27 10:47:12.122, Address=10.34.32.101:8089, MachineId=2149, Location=machine:dev1ssapp3,process:2, Role=CoherenceServer)
Member(Id=3, Timestamp=2012-07-27 10:48:02.603, Address=10.34.32.104:8088, MachineId=2152, Location=machine:dev1ssapp4,process:1, Role=CoherenceServer)
Member(Id=4, Timestamp=2012-07-27 10:48:04.76, Address=10.34.32.104:8089, MachineId=2152, Location=machine:dev1ssapp4,process:2, Role=CoherenceServer)
Member(Id=8, Timestamp=2012-07-30 14:27:07.382, Address=10.34.32.101:8090, MachineId=2149, Location=machine:dev1ssapp3,process:23727, Role=ApacheCatalinaStartupBootstrap)
Member(Id=9, Timestamp=2012-07-30 22:00:28.596, Address=10.34.32.101:8092, MachineId=2149, Location=machine:dev1ssapp3,process:7619, Role=ApacheCatalinaStartupBootstrap)
Member(Id=10, Timestamp=2012-07-30 14:34:27.573, Address=10.34.32.104:8090, MachineId=2152, Location=machine:dev1ssapp4,process:25219, Role=ApacheCatalinaStartupBootstrap)
Member(Id=11, Timestamp=2012-07-30 22:00:41.609, Address=10.34.32.107:8088, MachineId=2155, Location=machine:dev1sxapp2,process:17632, Role=ApacheCatalinaStartupBootstrap)
Member(Id=12, Timestamp=2012-07-30 22:00:43.132, Address=10.34.32.101:8091, MachineId=2149, Location=machine:dev1ssapp3,process:7803, Role=ApacheCatalinaStartupBootstrap)
Member(Id=14, Timestamp=2012-07-30 15:35:09.811, Address=10.34.32.106:8088, MachineId=2154, Location=machine:dev1sxapp1,process:5186, Role=ApacheCatalinaStartupBootstrap)
Member(Id=15, Timestamp=2012-07-30 16:02:34.096, Address=10.34.32.106:8091, MachineId=2154, Location=machine:dev1sxapp1,process:2691, Role=ApacheCatalinaStartupBootstrap)
Member(Id=16, Timestamp=2012-07-30 16:08:41.885, Address=10.34.32.107:8091, MachineId=2155, Location=machine:dev1sxapp2,process:15992, Role=ApacheCatalinaStartupBootstrap)
Member(Id=21, Timestamp=2012-07-30 21:58:56.669, Address=10.34.32.106:8089, MachineId=2154, Location=machine:dev1sxapp1,process:28689, Role=ApacheCatalinaStartupBootstrap)
Member(Id=22, Timestamp=2012-07-30 21:58:58.29, Address=10.34.32.107:8090, MachineId=2155, Location=machine:dev1sxapp2,process:15491, Role=ApacheCatalinaStartupBootstrap)
Member(Id=23, Timestamp=2012-07-30 22:00:21.648, Address=10.34.32.106:8090, MachineId=2154, Location=machine:dev1sxapp1,process:556, Role=ApacheCatalinaStartupBootstrap)
MemberId/ServiceVersion/ServiceJoined/ServiceLeaving
1/3.5/Fri Jul 27 10:46:51 EDT 2012/false,
2/3.5/Fri Jul 27 10:47:12 EDT 2012/false,
3/3.5/Fri Jul 27 10:48:02 EDT 2012/false,
4/3.5/Fri Jul 27 10:48:04 EDT 2012/false,
8/3.5/Mon Jul 30 14:27:07 EDT 2012/false,
9/3.5/Mon Jul 30 22:00:28 EDT 2012/false,
10/3.5/Mon Jul 30 14:34:27 EDT 2012/false,
11/3.5/Mon Jul 30 22:00:41 EDT 2012/false,
12/3.5/Mon Jul 30 22:00:43 EDT 2012/false,
14/3.5/Mon Jul 30 15:35:09 EDT 2012/false,
15/3.5/Mon Jul 30 16:02:34 EDT 2012/false,
16/3.5/Mon Jul 30 16:08:41 EDT 2012/false,
21/3.5/Mon Jul 30 21:58:56 EDT 2012/false,
22/3.5/Mon Jul 30 21:58:58 EDT 2012/false,
23/3.5/Mon Jul 30 22:00:21 EDT 2012/false
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:6)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:38)
     at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:366)
     at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
     at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
     at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:7)
     at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:27)
     at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
     at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:1011)
     at com.tangosol.coherence.hibernate.CoherenceCacheProvider.start(CoherenceCacheProvider.java:73)
     at org.hibernate.cache.impl.bridge.RegionFactoryCacheProviderBridge.start(RegionFactoryCacheProviderBridge.java:72)
Looking at member 5's log and I found it was being bounced at that time but somehow it failed to stop the coherence thread and didn't send out departure event to the cluster until was requested by other members.
SEVERE: The web application [riding-services] appears to have started a thread named Cluster but has failed to stop it. This is very likely to create a memory leak.
Questions:
1. Seems that this issue only happens when one server starts while another server is shut down at the same time range and both happen to be connected with each other for distributed caching. How can I modify the script to retry during startup when the first time it timed out? Or maybe modify the configuration to use a longer timeout value?
2. Is it possible to detect the unavailability of certain member quicker? Now seems 30 seconds or more.
Thanks in advance,

Coherence Cluster Errors- Need your help to solve

Hi,
We had this error recently in QA and these servers are not new servers. These servers were running from some time and in good condition.
We had a below error happened suddently and cuased servers outage for some time.
After restarted all the servers, this issue has gone.
We are trying to understand the root cause to avoid this issue in future and need expertise in this forum for that.
Brief summary of issue
1. We had performed multicaste testing on the coherence cluster IP and port and all the communication is good.
2. Issues started with error of Unable to refresh sockets:
                      Stopping cluster due to unhandled exception: com.tangosol.net.messaging.ConnectionException: Unable to refresh sockets: [UnicastUdpSocket{State=STATE_OPEN, address:port=1.1.1.85:8088},                     MulticastUdpSocket{State=STATE_OPEN, address:port=239.3.1.17:35122, InterfaceAddress=10.137.3.85, TimeToLive=1}, TcpSocketAccepter{State=STATE_OPEN, ServerSocket=1.1.1.85:8088}]; last failed socket:                          MulticastUdpSocket{State=STATE_OPEN, address:port=239.3.1.17:35122, InterfaceAddress=10.137.3.85, TimeToLive=1}
                                           at com.tangosol.coherence.component.net.Cluster$SocketManager.refreshSockets(Cluster.CDB:91)
                                            at com.tangosol.coherence.component.net.Cluster$SocketManager$MulticastUdpSocket.onInterruptedIOException(Cluster.CDB:9)
                                       at com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:33)
                                  at com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
                                       at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
                                       at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
                                       at java.lang.Thread.run(Thread.java:662)
                    Caused by: java.net.SocketTimeoutException: Receive timed out
3. After that, I noticed copule of errors like
                                   Restarting Service: DistributedCache   validatePolls: This service timed-out due to unanswered handshake request. Manual intervention is required to stop the members that have not responded to this Poll
4. Continously logging errors like :   Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/
5. After that noticed,
                         Service DistributedCache: received ServiceConfigSync containing 272 entries
                         2013-10-26 08:26:43,241 -0700 level=ERROR class="STDERR"              2013-10-26 08:26:43.241/76.243 Oracle Coherence GE 3.5.1/461 <Error> (thread=main, member=1): Error while starting service "DistributedCache":                          com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=2, Name=DistributedCache, Type=DistributedCache
                           MemberSet=ServiceMemberSet(
                             OldestMember=Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=10.137.3.49:8088, MachineId=32817, Location=site:test.test.net,machine:test30b,process:3870)
                                       ActualMemberSet=MemberSet(Size=3, BitSetCount=2
                                    Member(Id=1, Timestamp=2013-10-26 08:26:12.289, Address=1.1.1.85:8088, MachineId=32853, Location=site:test.test.net,machine:test304,process:6207, Role=JavaLangThread)
                                    Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=1.1.1.49:8088, MachineId=32817, Location=site:test.test.net,machine:test30b,process:3870)
                                    Member(Id=5, Timestamp=2013-10-26 08:26:29.871, Address=1.1.1.86:8088, MachineId=32854, Location=site:test.test.net,machine:test305,process:3988)
                        MemberId/ServiceVersion/ServiceJoined/ServiceLeaving
                          1/3.5/Sat Oct 26 08:26:13 PDT 2013/false,
                          3/3.5/Sat Oct 26 05:16:47 PDT 2013/false,
                          5/3.5/Sat Oct 26 08:26:30 PDT 2013/false
    at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:6)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
Your Help is highly appreciated !!!!
Detailed Server Error Log:
2013-10-26 00:15:13,280 -0700 level=ERROR class="STDERR"
2013-10-26 00:15:13.279/2079180.072 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 2642 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread); 34 packets rescheduled, PauseRate=0.0010, Threshold=222
2013-10-26 00:15:15,508 -0700 level=ERROR class="STDERR"
2013-10-26 00:15:15.508/2079182.301 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 4875 ms communication delay (probable remote GC) with Member(Id=1, Timestamp=2013-10-08 22:00:17.258, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988, Role=JavaLangThread); 47 packets rescheduled, PauseRate=3.0E-4, Threshold=1438
2013-10-26 01:15:29,028 -0700 level=ERROR class="STDERR"
2013-10-26 01:15:29.018/2082795.811 Oracle Coherence GE 3.5.1/461 <Info> (thread=PacketListenerN, member=4): Scheduled senior member heartbeat is overdue; rejoining multicast group.
2013-10-26 01:15:29,036 -0700 level=ERROR class="STDERR"
2013-10-26 01:15:29.036/2082795.829 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 13068 ms communication delay (probable remote GC) with Member(Id=1, Timestamp=2013-10-08 22:00:17.258, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988, Role=JavaLangThread); 86 packets rescheduled, PauseRate=4.0E-4, Threshold=1438
2013-10-26 01:15:29,037 -0700 level=ERROR class="STDERR"
2013-10-26 01:15:29.036/2082795.829 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 13069 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread); 84 packets rescheduled, PauseRate=0.0010, Threshold=269
2013-10-26 01:31:44,494 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 02:15:34,907 -0700 level=ERROR class="STDERR"
2013-10-26 02:15:34.906/2086401.699 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 6476 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread); 24 packets rescheduled, PauseRate=0.0011, Threshold=313
2013-10-26 02:43:52,199 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 03:00:55,493 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 03:15:41,144 -0700 level=ERROR class="STDERR"
2013-10-26 03:15:41.144/2090007.937 Oracle Coherence GE 3.5.1/461 <D5> (thread=PacketPublisher, member=4): Experienced a 202 ms communication delay (probable remote GC) with Member(Id=1, Timestamp=2013-10-08 22:00:17.258, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988, Role=JavaLangThread); 25 packets rescheduled, PauseRate=4.0E-4, Threshold=1509
2013-10-26 03:15:41,592 -0700 level=ERROR class="STDERR"
2013-10-26 03:15:41.592/2090008.385 Oracle Coherence GE 3.5.1/461 <D5> (thread=PacketPublisher, member=4): Experienced a 371 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread); 41 packets rescheduled, PauseRate=0.0010, Threshold=290
2013-10-26 03:31:38,099 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 04:15:47,869 -0700 level=ERROR class="STDERR"
2013-10-26 04:15:47.869/2093614.662 Oracle Coherence GE 3.5.1/461 <D5> (thread=PacketPublisher, member=4): Experienced a 850 ms communication delay (probable remote GC) with Member(Id=1, Timestamp=2013-10-08 22:00:17.258, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988, Role=JavaLangThread); 52 packets rescheduled, PauseRate=4.0E-4, Threshold=1509
2013-10-26 04:16:00,192 -0700 level=ERROR class="STDERR"
2013-10-26 04:16:00.182/2093626.975 Oracle Coherence GE 3.5.1/461 <Info> (thread=PacketListenerN, member=4): Scheduled senior member heartbeat is overdue; rejoining multicast group.
2013-10-26 04:16:00,199 -0700 level=ERROR class="STDERR"
2013-10-26 04:16:00.199/2093626.992 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 13180 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread); 126 packets rescheduled, PauseRate=0.0011, Threshold=424
2013-10-26 04:16:01,897 -0700 level=ERROR class="STDERR"
2013-10-26 04:16:01.897/2093628.690 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 1503 ms communication delay (probable remote GC) with Member(Id=1, Timestamp=2013-10-08 22:00:17.258, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988, Role=JavaLangThread); 173 packets rescheduled, PauseRate=4.0E-4, Threshold=1509
2013-10-26 04:26:54,424 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 04:51:52,096 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 05:02:52,292 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 05:16:06,076 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.075/2097232.868 Oracle Coherence GE 3.5.1/461 <Error> (thread=PacketListenerN, member=4):
Stopping cluster due to unhandled exception: com.tangosol.net.messaging.ConnectionException: Unable to refresh sockets: [UnicastUdpSocket{State=STATE_OPEN, address:port=1.1.1..85:8088}, MulticastUdpSocket{State=STATE_OPEN, address:port=239.3.1.17:35122, InterfaceAddress=1.1.1..85, TimeToLive=1}, TcpSocketAccepter{State=STATE_OPEN, ServerSocket=1.1.1..85:8088}]; last failed socket: MulticastUdpSocket{State=STATE_OPEN, address:port=239.3.1.17:35122, InterfaceAddress=1.1.1..85, TimeToLive=1}
    at com.tangosol.coherence.component.net.Cluster$SocketManager.refreshSockets(Cluster.CDB:91)
    at com.tangosol.coherence.component.net.Cluster$SocketManager$MulticastUdpSocket.onInterruptedIOException(Cluster.CDB:9)
    at com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:33)
    at com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
    at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: Receive timed out
    at java.net.PlainDatagramSocketImpl.receive0(Native Method)
    at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:145)
    at java.net.DatagramSocket.receive(DatagramSocket.java:725)
    at com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
    at com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
    at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    at java.lang.Thread.run(Thread.java:662)
2013-10-26 05:16:06,080 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.080/2097232.873 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=4): Service Cluster left the cluster
2013-10-26 05:16:06,105 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.105/2097232.898 Oracle Coherence GE 3.5.1/461 <D5> (thread=Invocation:Management, member=4): Service Management left the cluster
2013-10-26 05:16:06,105 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.105/2097232.898 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-180, member=4): Restarting NamedCache: test234aaaapeu-cache
2013-10-26 05:16:06,105 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.105/2097232.898 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-180, member=4): Restarting Service: DistributedCache
2013-10-26 05:16:06,110 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.106/2097232.899 Oracle Coherence GE 3.5.1/461 <Error> (thread=DistributedCache, member=4):
validatePolls: This service timed-out due to unanswered handshake request. Manual intervention is required to stop the members that have not responded to this Poll
PollId=24209529, active
InitTimeMillis=1382789736843
Service=DistributedCache (2)
RespondedMemberSet=[]
LeftMemberSet=[]
RemainingMemberSet=[3]
Request=Message "LockRequest"
{test.test.net
FromMember=Member(Id=4, Timestamp=2013-10-24 15:16:09.067, Address=1.1.1..85:8088, MachineId=32853, Location=site:test.test.net,machine:testabc304,process:4000)
FromMessageId=38338332
Internal=false
MessagePartCount=1
PendingCount=0
MessageType=12
ToPollId=0
Poll=null
Packets
Service=DistributedCache{Name=DistributedCache, State=(SERVICE_STOPPED), Not initialized}
ToMemberSet=MemberSet(Size=1, BitSetCount=1
Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread)
NotifySent=false
null
WaitTimeout=1382789776739, LeaseExpiration=9223372036854775807
2013-10-26 05:16:06,110 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.109/2097232.902 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=4): Service DistributedCache left the cluster
2013-10-26 05:16:06,117 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.117/2097232.910 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-180, member=n/a): Restarting cluster
2013-10-26 05:16:06,198 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.198/2097232.991 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2013-10-26 05:16:07,410 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.410/2097234.203 Oracle Coherence GE 3.5.1/461 <Info> (thread=Cluster, member=n/a): Created a new cluster "cluster:0x27CB" with Member(Id=1, Timestamp=2013-10-26 05:16:06.128, Address=1.1.1..85:8088, MachineId=32853, Location=site:test.test.net,machine:testabc304,process:4000, Edition=Grid Edition, Mode=Development, CpuCount=4, SocketCount=4) UID=0x0A89035500000141F4B15BF080551F98
2013-10-26 05:16:07,436 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.436/2097234.229 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-180, member=1): Restarting Service: Management
2013-10-26 05:16:07,450 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.450/2097234.243 Oracle Coherence GE 3.5.1/461 <D5> (thread=Invocation:Management, member=1): Service Management joined the cluster with senior service member 1
2013-10-26 05:16:07,474 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.474/2097234.267 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Service DistributedCache joined the cluster with senior service member 1
2013-10-26 05:16:07,491 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.491/2097234.284 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-183, member=1): Restarting NamedCache: test234aaaaficustomer-cache
2013-10-26 05:16:07,514 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.514/2097234.307 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-38, member=1): Restarting NamedCache: test234aaaaaccount-no-export-cache
2013-10-26 05:16:07,529 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.529/2097234.322 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-136, member=1): Restarting NamedCache: test234aaaausrsum-cache
2013-10-26 05:16:07,546 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.545/2097234.338 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-9, member=1): Restarting NamedCache: test234aaaafi-v2-cache
2013-10-26 05:16:07,569 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.567/2097234.360 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-59, member=1): Restarting NamedCache: test234aaaaaccount-v2-cache
2013-10-26 05:16:07,748 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.748/2097234.541 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-28, member=1): Restarting NamedCache: test234aaaafi-cache
2013-10-26 05:16:07,816 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.816/2097234.609 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-133, member=1): Restarting NamedCache: test234aaaahistory-v2-cache
2013-10-26 05:16:09,154 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.154/2097235.947 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-134, member=1): Restarting NamedCache: test234aaaaaccount-cache
2013-10-26 05:16:09,169 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.169/2097235.962 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-134, member=1): Restarting NamedCache: test234aaaahistory-cache
2013-10-26 05:16:09,444 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.444/2097236.237 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member(Id=2, Timestamp=2013-10-26 05:16:09.259, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988) joined Cluster with senior member 1
2013-10-26 05:16:09,539 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.539/2097236.332 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 2 joined Service Management with senior member 1
2013-10-26 05:16:09,580 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.579/2097236.372 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 2 joined Service DistributedCache with senior member 1
2013-10-26 05:16:09,599 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.599/2097236.392 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Service DistributedCache: sending ServiceConfigSync containing 268 entries to Member 2
2013-10-26 05:16:09,681 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.681/2097236.474 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 1> Transferring 128 out of 257 vulnerable partitions to member 2 requesting 128
2013-10-26 05:16:09,892 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.881/2097236.674 Oracle Coherence GE 3.5.1/461 <D4> (thread=DistributedCache, member=1): 1> Transferring 129 out of 129 partitions to a machine-safe backup 1 at member 2 (under 129)
2013-10-26 05:16:09,901 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.901/2097236.694 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 388KB of backup[1] for PartitionSet{128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256} to member 2
2013-10-26 05:16:10,415 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:10.415/2097237.208 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): TcpRing: connecting to member 2 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..86,port=8088,localport=37005]}
2013-10-26 05:16:10,657 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:10.657/2097237.450 Oracle Coherence GE 3.5.1/461 <Warning> (thread=Cluster, member=1): Received panic from junior member Member(Id=2, Timestamp=2013-10-26 05:16:09.259, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988) caused by Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread)
2013-10-26 05:16:11,592 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:11.592/2097238.385 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32822,localport=8088]}
2013-10-26 05:16:13,568 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:13.568/2097240.361 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-52, member=1): Restarting NamedCache: test234aaaauserData-cache
2013-10-26 05:16:13,596 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:13.596/2097240.389 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32823,localport=8088]}
2013-10-26 05:16:14,937 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:14.937/2097241.730 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-52, member=1): Restarting NamedCache: test234aaaacheckimage-cache
2013-10-26 05:16:15,600 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:15.600/2097242.393 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32824,localport=8088]}
2013-10-26 05:16:17,602 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:17.602/2097244.395 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32825,localport=8088]}
2013-10-26 05:16:19,605 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:19.605/2097246.398 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32828,localport=8088]}
2013-10-26 05:16:21,609 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:21.609/2097248.402 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32829,localport=8088]}
2013-10-26 05:16:23,611 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:23.611/2097250.404 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32830,localport=8088]}
2013-10-26 05:16:25,616 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:25.616/2097252.409 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32831,localport=8088]}
2013-10-26 05:16:27,619 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:27.619/2097254.412 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32832,localport=8088]}
2013-10-26 05:16:29,621 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:29.621/2097256.414 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32833,localport=8088]}
2013-10-26 05:16:31,626 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:31.626/2097258.419 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32834,localport=8088]}
2013-10-26 05:16:33,631 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:33.631/2097260.424 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32835,localport=8088]}
2013-10-26 05:16:35,632 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:35.632/2097262.425 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32836,localport=8088]}
2013-10-26 05:16:37,636 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:37.635/2097264.428 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32837,localport=8088]}
2013-10-26 05:16:39,641 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:39.640/2097266.433 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32838,localport=8088]}
2013-10-26 05:16:41,643 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:41.643/2097268.436 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32841,localport=8088]}
2013-10-26 05:16:47,329 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:47.329/2097274.122 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870) joined Cluster with senior member 1
2013-10-26 05:16:47,425 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:47.425/2097274.218 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 3 joined Service Management with senior member 1
2013-10-26 05:16:47,477 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:47.476/2097274.269 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 3 joined Service DistributedCache with senior member 1
2013-10-26 05:16:47,501 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:47.500/2097274.294 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Service DistributedCache: sending ServiceConfigSync containing 270 entries to Member 3
2013-10-26 05:16:47,548 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:47.548/2097274.341 Oracle Coherence GE 3.5.1/461 <D5> (thread=TcpRingListener, member=1): TcpRing: connecting to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32846,localport=8088]}
2013-10-26 05:16:48,454 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:48.453/2097275.246 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 43 out of 129 primary partitions to member 3 requesting 43
2013-10-26 05:16:48,709 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:48.709/2097275.502 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 39 out of 125 primary partitions to member 3 requesting 39
2013-10-26 05:16:48,885 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:48.884/2097275.677 Oracle Coherence GE 3.5.1/461 <D5> (thread=http-0.0.0.0-8080-210, member=1): Repeating QueryRequest due to the re-distribution of PartitionSet{132, 133, 134, 135, 136, 137, 138, 139, 140, 141}
2013-10-26 05:16:50,850 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:50.848/2097277.641 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 29 out of 115 primary partitions to member 3 requesting 29
2013-10-26 05:16:50,968 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:50.968/2097277.761 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 21 out of 107 primary partitions to member 3 requesting 21
2013-10-26 05:16:51,097 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.097/2097277.890 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 14 out of 100 primary partitions to member 3 requesting 14
2013-10-26 05:16:51,218 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.218/2097278.011 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 6 out of 92 primary partitions to member 3 requesting 6
2013-10-26 05:16:51,340 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.340/2097278.133 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 1 out of 87 primary partitions to member 3 requesting 1
2013-10-26 05:16:51,352 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.352/2097278.145 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 540KB of backup[1] for PartitionSet{171, 172, 173, 174, 175, 176, 177} to member 3
2013-10-26 05:16:51,465 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.464/2097278.257 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 575KB of backup[1] for PartitionSet{178, 179, 180, 181, 182, 183} to member 3
2013-10-26 05:16:51,569 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.569/2097278.362 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 537KB of backup[1] for PartitionSet{184, 185, 186, 187} to member 3
2013-10-26 05:16:51,688 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.688/2097278.481 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 553KB of backup[1] for PartitionSet{188, 189, 190, 191, 192, 193, 194, 195, 196} to member 3
2013-10-26 05:16:51,817 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.817/2097278.610 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 526KB of backup[1] for PartitionSet{197, 198, 199, 200, 201, 202} to member 3
2013-10-26 05:16:51,928 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.928/2097278.721 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 768KB of backup[1] for PartitionSet{203, 204, 205, 206, 207, 208, 209} to member 3
2013-10-26 05:16:52,040 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:52.039/2097278.832 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 198KB of backup[1] for PartitionSet{210, 211, 212, 213} to member 3
2013-10-26 05:19:06,157 -0700 level=ERROR class="STDERR"
2013-10-26 05:19:06.157/2097412.950 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-63, member=1): Restarting NamedCache: throttleData-cache
2013-10-26 05:22:15,094 -0700 level=ERROR class="STDERR"
2013-10-26 05:22:15.094/2097601.887 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-136, member=1): Restarting NamedCache: test234aaaadepositslipimage-cache
2013-10-26 05:22:17,183 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 05:28:49,617 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 05:29:39,729 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 05:33:37,607 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 05:39:33,872 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 06:49:30,617 -0700 level=ERROR class="STDERR"
2013-10-26 06:49:30.617/2102837.410 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=1): Experienced a 6378 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-10-26 05:16:09.259, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988); 56 packets rescheduled, PauseRate=0.0011, Threshold=1976
2013-10-26 07:39:18,855 -0700 level=ERROR class="STDERR"
2013-10-26 07:39:18.854/2105825.647 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=1): Experienced a 7318 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870); 68 packets rescheduled, PauseRate=8.0E-4, Threshold=497
2013-10-26 07:49:37,510 -0700 level=ERROR class="STDERR"
2013-10-26 07:49:37.510/2106444.303 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=1): Experienced a 6653 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-10-26 05:16:09.259, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988); 69 packets rescheduled, PauseRate=0.0014, Threshold=1785
Copyright (c) 2000, 2009, Oracle and/or its affiliates. All rights reserved.
2013-10-26 08:26:11,291 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:11.291/44.293 Oracle Coherence GE 3.5.1/461 <Info> (thread=main, member=n/a): Loaded cache configuration from "file:/usr/local/whp-jboss-web-5/server/default/env/test234aaaacoherence-cache-config.xml"
2013-10-26 08:26:12,263 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.263/45.265 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2013-10-26 08:26:12,477 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.477/45.479 Oracle Coherence GE 3.5.1/461 <Info> (thread=Cluster, member=n/a): This Member(Id=1, Timestamp=2013-10-26 08:26:12.289, Address=1.1.1..85:8088, MachineId=32853, Location=site:test.test.net,machine:testabc304,process:6207, Role=JavaLangThread, Edition=Grid Edition, Mode=Development, CpuCount=4, SocketCount=4) joined cluster "cluster:0x27CB" with senior Member(Id=2, Timestamp=2013-10-26 05:16:09.259, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988, Edition=Grid Edition, Mode=Development, CpuCount=4, SocketCount=4)
2013-10-26 08:26:12,501 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.501/45.503 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870) joined Cluster with senior member 2
2013-10-26 08:26:12,507 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.506/45.508 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 2 joined Service Management with senior member 2
2013-10-26 08:26:12,507 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.507/45.509 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 2 joined Service DistributedCache with senior member 2
2013-10-26 08:26:12,520 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.520/45.522 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 3 joined Service Management with senior member 2
2013-10-26 08:26:12,520 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.520/45.522 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 3 joined Service DistributedCache with senior member 2
2013-10-26 08:26:12,639 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.639/45.641 Oracle Coherence GE 3.5.1/461 <D5> (thread=Invocation:Management, member=1): Service Management joined the cluster with senior service member 2
2013-10-26 08:26:12,700 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.700/45.702 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): TcpRing: connecting to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=8088,localport=52891]}
2013-10-26 08:26:13,191 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:13.190/46.193 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Service DistributedCache joined the cluster with senior service member 2
2013-10-26 08:26:14,538 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:14.538/47.540 Oracle Coherence GE 3.5.1/461 <D5> (thread=TcpRingListener, member=1): TcpRing: connecting to member 2 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..86,port=40281,localport=8088]}
2013-10-26 08:26:29,695 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:29.694/62.696 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): TcpRing: disconnected from member 2 due to a kill request
2013-10-26 08:26:29,695 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:29.694/62.696 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 2 left service Management with senior member 3
2013-10-26 08:26:29,695 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:29.694/62.696 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 2 left service DistributedCache with senior member 3
2013-10-26 08:26:29,696 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:29.696/62.698 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member(Id=2, Timestamp=2013-10-26 08:26:29.694, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988) left Cluster with senior member 3
2013-10-26 08:26:30,069 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:30.069/63.071 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member(Id=5, Timestamp=2013-10-26 08:26:29.871, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988) joined Cluster with senior member 3
2013-10-26 08:26:30,271 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:30.271/63.273 Oracle Coherence GE 3.5.1/461 <D5> (thread=TcpRingListener, member=1): TcpRing: connecting to member 5 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..86,port=40285,localport=8088]}
2013-10-26 08:26:30,272 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:30.272/63.274 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 5 joined Service Management with senior member 3
2013-10-26 08:26:30,443 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:30.443/63.445 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 5 joined Service DistributedCache with senior member 3
2013-10-26 08:26:38,739 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:38.738/71.740 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Service DistributedCache: received ServiceConfigSync containing 272 entries
2013-10-26 08:26:43,241 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:43.241/76.243 Oracle Coherence GE 3.5.1/461 <Error> (thread=main, member=1): Error while starting service "DistributedCache": com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=2, Name=DistributedCache, Type=DistributedCache
MemberSet=ServiceMemberSet(
OldestMember=Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870)
ActualMemberSet=MemberSet(Size=3, BitSetCount=2
Member(Id=1, Timestamp=2013-10-26 08:26:12.289, Address=1.1.1..85:8088, MachineId=32853, Location=site:test.test.net,machine:testabc304,process:6207, Role=JavaLangThread)
Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870)
Member(Id=5, Timestamp=2013-10-26 08:26:29.871, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988)
MemberId/ServiceVersion/ServiceJoined/ServiceLeaving
1/3.5/Sat Oct 26 08:26:13 PDT 2013/false,
3/3.5/Sat Oct 26 05:16:47 PDT 2013/false,
5/3.5/Sat Oct 26 08:26:30 PDT 2013/false
    at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:6)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:38)
    at com.tangosol.coherence.component.util.SafeService.startService(SafeService.CDB:28)
    at com.tangosol.coherence.component.util.safeService.SafeCacheService.startService(SafeCacheService.CDB:5)
    at com.tangosol.coherence.component.util.SafeService.ensureRunningService(SafeService.CDB:27)
    at com.tangosol.coherence.component.util.SafeService.start(SafeService.CDB:14)
    at com.tangosol.net.DefaultConfigurableCacheFactory.ensureService(DefaultConfigurableCacheFactory.java:973)
    at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurableCacheFactory.java:842)
    at com.tangosol.net.DefaultConfigurableCacheFactory.configureCache(DefaultConfigurableCacheFactory.java:1053)
    at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurableCacheFactory.java:290)
    at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:747)
    at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:724

Hi
The common causes of communication delays and packet timeouts are excessive GC pauses, high CPU usage, and swapping.
Each of these occurrences may disrupt the Coherence packet processing threads, thus preventing the processing and acknowledgment of packets from other cluster members.
1 check GC performance , see process memory consumption and GC logs.
2 check cpu , vmstat , top command.
3 check swap , vmstat command.
see Oracle Support Doc ID 1110544.1
Although communication delays and packet timeouts can be caused by network related issue.
check performance network :
Performing a Datagram Test for Network Performance - Coherence 3.5 User Guide - Oracle Coherence Knowledge Base
regards,
Leo_TA

Testing Coherence Cluster and Servers after WebLogic Console Creation

Hello,
I have created WLST scripts that extend a Domain with Coherence Clusters and Servers using unicast configurations. I can start and run the Coherence Servers from WL Admin Console without errors and warnings.WL 10.3.6
I am looking to test the configuration with something like coherence.sh and query.sh but I am missing instructions on how to use these tools with unicast and connect to the caches?
Is there a command line interface that connects to a Coherence Server cache created form the WL Admin Console using unicast? Do I need to override the any xml configuration to make this work?
Examples would be helpful.
While testing I have found the following....
I have changed coherence.sh and enabled storage. In addition:
JAVA_OPTS="-Xms$MEMORY -Xmx$MEMORY -Dtangosol.coherence.distributed.localstorage=$STORAGE_ENABLED $JMXPROPERTIES -Dtangosol.coherence.clusterport=7777 -Dtangosol.coherence.clusteraddress=231.1.1.1"
The Coherence Cluster configurations were changed to match the multicast settings for port and address above.
When this was performed all worked!!
However, if I changed JAVA_OPTS to use unicast
JAVA_OPTS="-Xms$MEMORY -Xmx$MEMORY -Dtangosol.coherence.distributed.localstorage=$STORAGE_ENABLED $JMXPROPERTIES -Dtangosol.coherence.localport=8088 -Dtangosol.coherence.localhost=192.168.2.69"
It fails to connect with the Coherence Server in the cluster.

Hi there,
1. How did you achieve the https configuration in Weblogic ? And for which server Admin server or managed Server ?
2. Are you using which java key store ? Able to see the successful entries in <server>.out log file which is used for start up and stop of the weblogic server ?
Thanks
Laksh

Accessing Coherence Extend* Proxy Deployoed on Weblogic Coherence Cluster from Java Client

Hi,
I am trying to access Extend Proxy through Thick Java Client
Followed steps as per below links and deployed a GAR on 3 Server ( 2 Storage Enabled Coherence Cluster and 1 Coherence Storage Disabled Extend Proxy Enabled). I could see ExtendProxyService using JMX and can see Port running on the System.
Ref :
Setting Up Coherence*Extend - 12c (12.1.2)
http://docs.oracle.com/middleware/1212/coherence/COHAG/deploy_options.htm#CHDJBJDI
Issue :
When I tried to Execute Java Client to Connect to Proxy Server it Connects to Port and then Disconnects with ConnectionException as below.
Observer below Lines in Box is show he Connected Socket with Port 9099 which is Extend Proxy Port
Error Message
2013-11-08 14:55:55.114/1.202 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=TcpClientRemoteService:TcpInitiator, member=n/a): Started: TcpInitiator{Name=TcpClientRemoteService:TcpInitiator, State=(SERVICE_STARTED), ThreadCount=0, Codec=Codec(Format=POF), Serializer=com.tangosol.io.DefaultSerializer, PingInterval=0, PingTimeout=30000, RequestTimeout=30000, ConnectTimeout=10000, SocketProvider=[email protected], RemoteAddresses=WrapperSocketAddressProvider{Providers=[[DTC37446E9C6CBD/127.0.0.0:9099]]}, SocketOptions{LingerTimeout=0, KeepAliveEnabled=true, TcpDelayEnabled=false}}
2013-11-08 14:55:55.146/1.234 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=main, member=n/a): Connecting Socket to 127.0.0.0:9099
2013-11-08 14:55:55.146/1.234 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=main, member=n/a): Connected Socket to 127.0.0.0:9099
2013-11-08 14:55:55.161/1.249 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=main, member=n/a): Error establishing a connection with 127.0.0.0:9099: com.tangosol.net.messaging.ConnectionException: TcpConnection(Id=null, Open=true, LocalAddress=0.0.0.0:54384, RemoteAddress=127.0.0.0:9099)
2013-11-08 14:55:55.161/1.249 Oracle Coherence GE 12.1.2.0.0 <Error> (thread=main, member=n/a): Error while starting service "TcpClientRemoteService": com.tangosol.net.messaging.ConnectionException: could not establish a connection to one of the following addresses: [127.0.0.0:9099]; make sure the "remote-addresses" configuration element contains an address and port of a running TcpAcceptor
    at com.tangosol.coherence.component.util.daemon.queueProcessor.service.peer.initiator.TcpInitiator.openConnection(TcpInitiator.CDB:121)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.service.peer.Initiator.ensureConnection(Initiator.CDB:11)
    at com.tangosol.coherence.component.net.extend.remoteService.RemoteCacheService.openChannel(RemoteCacheService.CDB:7)
    at com.tangosol.coherence.component.net.extend.RemoteService.doStart(RemoteService.CDB:11)
    at com.tangosol.coherence.component.net.extend.RemoteService.start(RemoteService.CDB:5)
    at com.tangosol.coherence.component.util.SafeService.startService(SafeService.CDB:53)
    at com.tangosol.coherence.component.util.safeService.SafeCacheService.startService(SafeCacheService.CDB:5)
    at com.tangosol.coherence.component.util.SafeService.ensureRunningService(SafeService.CDB:27)
    at com.tangosol.coherence.component.util.SafeService.start(SafeService.CDB:14)
    at com.tangosol.net.ExtensibleConfigurableCacheFactory.startService(ExtensibleConfigurableCacheFactory.java:681)
    at com.tangosol.net.ExtensibleConfigurableCacheFactory.ensureService(ExtensibleConfigurableCacheFactory.java:599)
    at com.tangosol.coherence.config.scheme.AbstractCachingScheme.realizeCache(AbstractCachingScheme.java:50)
    at com.tangosol.coherence.config.scheme.AbstractBundlingScheme.realizeCache(AbstractBundlingScheme.java:31)
    at com.tangosol.net.ExtensibleConfigurableCacheFactory.ensureCache(ExtensibleConfigurableCacheFactory.java:254)
    at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:205)
    at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:182)

If this proxy design (not starting up due to a invalid entry in "authroized-hosts") is on-purpose from Coherence Engineers - then it should be re-visited.
I think the PROXY Server should just log a message stating about the invalid DNS entry for the Authorized-host and continue with the startup...Failing to start completely doesn;t make sense since one cannot rely completely on DNS to
say everything should be correct before a server start.
Ofcourse you can overcome by writing your own Custom Filter - but the issue pop's out as with any custom filter(s) is maintaining them along the road (with all minor/major coherence upgrades).
Also - this "Authorized-Hosts" concept should be carefully analyzed particularly for the following issues...
(1) if the client IP is changed in the DNS server - will the proxy-server allow the new Client connection without any issues? when will the PROXY server flush its CLIENT DNS entries or what is the TTL time-limit for a CLIENT cached through Authorized-hosts by the PROXY-SERVER?
(2) Suppose, we have a CLIENT in the "Authroized-Hosts" making a valid connection to the PROXY and putting some cache into the SERVER CACHE through the PROXY....now if the IP-address (DNS being the same) of the CLIENT is changed - can the CLIENT can GET the CACHE it just PUT into the SERVER without any ERRORS?
(3) How often we need to re-start PROXIES? Do we need to re-start them often for the DNS issues (if any) mentioned above?
Looks like the Limited documentation & examples for Coherence*Extend - particularly for .NET & C++ clients & *Extend Proxies is a point of concern.
vk

Node fails to join the cluster

We are observing a problem where a node, after getting restarted, fails to join the cluster.
We run two coherence clusters across three boxes. Each box runs 8 java processes, 4 processes of one cluster, another 4 process
of the other cluster. They all run as windows NT services. Sometimes, some node goes down and gets restarted. But then it fails to join the cluster with following exception :
"com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=8, Name=DistributedFIIndicativeCacheWithPublishingCacheStore, Type=DistributedCache"
Has anyone experienced and addressed such a problem? If required, I can provide exact details of the cluster setup.
-Bharat

Hi Bharat,
This may be caused by a stuck or slow DistributedService thread on one of your nodes. Please log into http://support.oracle.com and take a look at [Note 845363.1|https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=845363.1] for more details. Additionally, consider upgrading to Coherence 3.5 as it includes the [Service Guardian for deadlock detection/resolution|http://blackbeanbag.net/wp/2009/07/20/coherence-3-5-service-guardian-deadlock-detection/].
Thanks,
Patrick

Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason: The handle is invalid.

I'm stuck here trying to figure this error out.
2003 domain, 2012 hyper v core 3 nodes. (I have two of these hyper V groups, hvclust2012 is the problem group, hvclust2008 is okay)
In Failover Cluster Manager I see these errors, "Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason: The handle is invalid."
I restarted the host node that was listed in having the error then another node starts showing the errors.
I tried to follow this site: http://blog.subvertallmedia.com/2012/12/06/repairing-a-failover-cluster-in-windows-server-2012-live-migration-fails-dns-cluster-name-errors/
Then this error shows up when doing the repair: there was an error repairing the active directory object for 'Cluster Name'
I looked at our domain controller and noticed I don't have access to local users and groups. I can access our other hvclust2008 (both clusters are same version 2012).
<image here>
I came upon this thread: http://social.technet.microsoft.com/Forums/en-US/85fc2ad5-b0c0-41f0-900e-df1db8625445/windows-2012-cluster-resource-name-fails-dns-registration-evt-1196?forum=winserverClustering
Now, I'm stuck on adding a managed service account (mas). I'm not sure if I'm way off track to fix this. Any advice? Thanks in advance!
<image here>

Thanks Elton,
I restarted 3 hosts after applying the hotfix. Then I did the steps below and got stuck on step 5. That is when I get the error (image above). There
was an error repairing the active directory object for 'Cluster Name'. For more data, see 'Information Details'.
To reset the password on the affected name resource, perform the following steps:
From Failover Cluster Manager, locate the name resource.
Right-click on the resource, and click Properties.
On the Policies tab, select If resource fails, do not restart, and then click OK.
Right-click on the resource, click More Actions, and then click Simulate Failure.
When the name resource shows "Failed," right-click on the resource, click More Actions, and then click Repair.
After the name resource is online, right-click on the resource, and then click Properties.
On the Policies tab, select If resource fails, attempt restart on current node, and then click OK.
Thanks

Creating sub-cluster within a Coherence cluster

Hi all,
Does Coherence support creation of 'sub-clusters' within a larger coherence cluster - such that certain caches can be configured to run only on these subclusters, and other caches run on the entire coherence cluster as usual.
E.g., suppose my application consists of 3 websphere clusters (under same cell) - each cluster consists of 2 Websphere server instances. Each Websphere cluster has got a specific functional responsbility (e.g., 1 cluster handles the UI, one handles core processing functionality and the 3rd cluster handles links with external legacy systems). Since the functionality itself is 'partitioned' - its possible that certain data managed by a particular WAS cluster should only be managed within that cluster and not across all 6 WAS instances.
So - in this case - suppose I do have an 'outer' Coherence cluster of all 6 WAS instances (and some Caches are configured to be acessible to all 6 WAS instances - since the data managed in these caches is needed by all 6 WAS instances). Can I configure a smaller Coherence cluster to be available only on say 2 of the Websphere instances (say the WAS cluster which handles legacy links) - and configure certain caches which are available only on this smaller sub-cluster.
regards,
Sanjeev.

I am quite confused about the purpose of the service-name. How would you tie down a cache to a particular service? In the context of the above example, the requirement seems to be:
CacheA should be spread over the UI cluster.
CacheB should be spread over the legacy cluster.
CacheC should be spread over the global cluster.
Are you suggesting something like the following:
<u>Cache config file on a UI node</u>:
<cluster-config>
   <caching-scheme-mapping>
      <cache-mapping>
         <cache-name>CacheA</cache-name>
         <scheme-name>ui</scheme-name>
      </cache-mapping>
      <cache-mapping>
         <cache-name>CacheC</cache-name>
         <scheme-name>global</scheme-name>
      </cache-mapping>
   </caching-scheme-mapping>
   <caching-schemes>
      <distributed-scheme>
          <scheme-name>ui</scheme-name>
          <service-name>ui</service-name>
     </distributed-scheme>
     <distributed-scheme>
          <scheme-name>global</scheme-name>
          <service-name>global</service-name>
     </distributed-scheme>
   </caching-schemes>
</cluster-config><u>Cache config file on a legacy node</u>:
<cluster-config>
   <caching-scheme-mapping>
      <cache-mapping>
         <cache-name>CacheB</cache-name>
         <scheme-name>legacy</scheme-name>
      </cache-mapping>
      <cache-mapping>
         <cache-name>CacheC</cache-name>
         <scheme-name>global</scheme-name>
      </cache-mapping>
   </caching-scheme-mapping>
   <caching-schemes>
      <distributed-scheme>
          <scheme-name>legacy</scheme-name>
          <service-name>legacy</service-name>
     </distributed-scheme>
     <distributed-scheme>
          <scheme-name>global</scheme-name>
          <service-name>global</service-name>
     </distributed-scheme>
   </caching-schemes>
</cluster-config>The basic question seems to be: how do you control the nodes over which a cache is spread, purely from the cache name?
Also, the 3.2 <role-name> feature seems to be something that addresses this requirement. How does that play v/s a service-name?
My requirement is similar (needing to control the nodes over which different caches are spread) but I do not quite understand how the service-name would be used to satisfy this example. Could you please explain via cache configurations for this example?
Thanks
Ghanshyam

Starting cluster ocfs2: Failed,Checking cluster ocfs2 is offline

Checking cluster ocfs2 is offline. And when I online the ocfs2 ,display the following error message,who can tell me what's wrong with the ocfs2? thanks!!!!
# /etc/init.d/o2cb status
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking cluster ocfs2: Offline
# /etc/init.d/o2cb force-reload
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK
Loading module "configfs": OK
Mounting configfs filesystem at /config: OK
Loading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting cluster ocfs2: Failed
Cluster ocfs2 created
o2cb_ctl: Configuration error discovered while populating cluster ocfs2. None of its nodes were considered local. A node is considered local when its node name in the configuration maches this machine's host name.
Stopping cluster ocfs2: OK
# /etc/init.d/o2cb online ocfs2
Starting cluster ocfs2: Failed
Cluster ocfs2 created
o2cb_ctl: Configuration error discovered while populating cluster ocfs2. None of its nodes were considered local. A node is considered local when its node name in the configuration maches this machine's host name.
Stopping cluster ocfs2: OK
ocfs2_hb_ctl: Unable to access cluster service while starting heartbeat
mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted"
: Could not mount /dev/sdb1
o2cb_ctl: Unable to access cluster service while creating node

http://unirac.in/qtoa/122/o2cb_ctl-unable-access-cluster-service-while-creating-node
TODO:
- Quit ocfs2console
- Stop the service
- Remove file /etc/ocfs2/cluster.conf
- Restart ocfs2console
- Configure the nodes again
[root@rac1 ~]# /etc/init.d/ocfs2 stop
Stopping Oracle Cluster File System (OCFS2) [ OK ]
[root@rac1 ~]# /etc/init.d/o2cb offline ocfs2
[root@rac1 ~]# /etc/init.d/o2cb unload
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK
[root@rac1 ~]# rm -f /etc/ocfs2/cluster.conf
[root@rac1 ~]# /usr/sbin/ocfs2console &

Checking cluster ocfs2 is offline,Starting cluster ocfs2: Failed

Checking cluster ocfs2 is offline. And when I online the ocfs2 ,display the following error message,who can tell me what's wrong with the ocfs2? thanks!!!!
# /etc/init.d/o2cb status
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking cluster ocfs2: Offline
# /etc/init.d/o2cb force-reload
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK
Loading module "configfs": OK
Mounting configfs filesystem at /config: OK
Loading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting cluster ocfs2: Failed
Cluster ocfs2 created
o2cb_ctl: Configuration error discovered while populating cluster ocfs2. None of its nodes were considered local. A node is considered local when its node name in the configuration maches this machine's host name.
Stopping cluster ocfs2: OK
# /etc/init.d/o2cb online ocfs2
Starting cluster ocfs2: Failed
Cluster ocfs2 created
o2cb_ctl: Configuration error discovered while populating cluster ocfs2. None of its nodes were considered local. A node is considered local when its node name in the configuration maches this machine's host name.
Stopping cluster ocfs2: OK
ocfs2_hb_ctl: Unable to access cluster service while starting heartbeat
mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted"
: Could not mount /dev/sdb1
o2cb_ctl: Unable to access cluster service while creating node

Hi
Getting exactly the same errors when trying to implement OCFS2 on a VMWARE solution
[root@raclinux1 init.d]# ./o2cb status
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking cluster ocfs2: Offline
[root@raclinux1 init.d]# ./o2cb online ocfs2
Starting cluster ocfs2: Failed
Cluster ocfs2 created
o2cb_ctl: Configuration error discovered while populating cluster ocfs2. None of its nodes were considered local. A node is considered local when its node name in the configuration maches this machine's host name.
Stopping cluster ocfs2: OK
And from ocfs2console we get the following errors when trying to add our node to the configuration
o2cb_ctl: Unable to access cluster service while creating node
Could not add node raclinux1
Any one got it to work on VMWARE?
PMcM

What's the maximum size of data a coherence cluster can hold?

What's the maximum size of data a coherence cluster can hold before it starts noticing a degradation in performance?
Assume a partitioned topology is used with only one backup for each partition.

Hi,
Coherence partitioned cache is designed for linear scalability and it does it quite well. I don't see any reasons of performance degrations with increase in data size given, you have enough cores and memory for processing the requests and managing the data.
Cheers,
_NJ

10.2.0.1 RHEL 4.0 - root.sh CRS - Failed to upgrade Oracle Cluster Registry

I am trying to install 10.2.0.1 CRS on a 2 node cluster on RHEL 4.0 Nahant Update 7. I get the following errors when root.sh is run
Failed to upgrade Oracle Cluster Registry configuration._
Due to this reason , none of the clusterware background processes fail to start, following which the Configuration Assistant screen also experiences failure (Oracle Notification Server Configuration Assistant).
I tried to apply the 10.2.0.4 patchset also, still same issue. Did anyone see this issue ?
1) We use LINUX DEVICE MAPPER for multipathing and udev
2) We configured the multipathing device partitions and bound them to raw devices
So /dev/mapper/oradisk1 bound to /dev/raw/raw1 ( OCR disk)
and /dev/mapper/oradisk2 bound to /dev/raw/raw2 (VD)
I am able to perform the dd command successfully.
Any ideas ?
-Srinivas

BTW, we our OS is Linux x86-64 with RHEL 4.0 Nahant Update 7. I also applied the patch 4679769.
Did a complete uninstall and reinstall. But issue persists.
-Srinivas

Installation of oracle fail safe on windows cluster - plz very urgent

hi everybody,
this is the first time i am going to install oracle failsafe on windows clusters.
i am going to install oracle 9i release 2 software.
so i have to install the patches.
plz tell me with oracle cmds how to do the following.
1. after installing oracle 9i rel 2 how to install the patches to it in windows environment?
2. how to create oracle database on one node using shared disk for storage.
3. how to install oracle fail safe on each cluster?
4. how to configure cluster using fail safemanager?
plz provide me soln asap very urgent.
and don't forget to give me the soln with commands aqnd step by step.
plz plz plz
thanks in advance
suresh

For 9i , this is better:
http://download-uk.oracle.com/docs/cd/B10501_01/rac.920/a96600/toc.htm

"cluster.ClusteredHandlerFactory: Failed to initialize Coherence Cluster"

Similar Messages

Maybe you are looking for