Restarting Coherence

Dear Coherents,
We have several application-level tasks running inside a single JVM. Each task is run in its own thread but tasks are run serially, so that only one task is running at any point in time (actually managed by DataSynapse). Each of the tasks may use a different Coherence configuration and therefore begins by calling:
1: CacheFactory.shutdown();
2: System.setProperty("tangosol.coherence.override", /* config file for current function */);
3: System.setProperty("tangosol.coherence.cacheconfig", /* config file for current function */);
4: CacheFactory.ensureCluster();
These calls are made in a single thread for each task, but occassionally Coherence throws an assertion error on line 4 with the message "run() invoked on a different thread".
I expect this is because CacheFactory.shutdown() returns before daemon Coherence threads from the previous task are completely stopped, and so for a moment there are two sets of Coherence threads running simultaneously; or at least previous static data has not been fully cleared.
Putting a 10 second sleep statement between lines 1 and 2 appears to resolve the issue, but any timeout-based synchronization is essentially bad practice in my book.
I would like to know firstly whether this assertion error signifies a problem, although the cluster appears to operate fine aftwards. Secondly, what is your recommendation to guarantee that this does not occur?
On a bigger note, is there any way to get visibility on:
- whether the JVM is currently a member of Coherence (apart from 'ensuring' that it is);
- what Coherence daemon threads are running;
- how long Coherence daemon threads have been running.
Many thanks,
James

James,
Please send the full logs to our support team at Oracle Metalink: (https://metalink.oracle.com)
Regards,
Gene

Similar Messages

Gar REdeployment feature support in 12.1.2 ?

Since Coherence 3.7.1. We must restart all Coherence and WebLogic servers. if cache-config or pof classes are changed.
Now we have 12.1.2
Coherence servers can be manged with WebLogic console.
*.Gar module can be deployed to Coherence server(storage enabled managed server) and WebLogic managed server.
Can I redeploy a *.gar package module without restarting Coherence or WebLogic in production environment?

Yes, it is possible to redeploy a Coherence package (GAR file) without need to restart the servers. The only catch you should pay attemption is when the GAR target is a cluster. You can update the GAR without restarting the servers, but there will be lost of data since updating it done all-at-once across the managed servers. To ensure that no data is lost, you need to use the rolling redeploy feature. You can get much details in the documentation: Deploying Coherence Applications in WebLogic Server - 12c (12.1.2)
Cheers,
Ricardo Ferreira

Application deployment questions

I am looking for help to get proper deployment procedure. Right now it is setup as following:
There is exploded ear ROOT.ear on nfs. Domain has admin server and two managed servers in cluster, on different servers.
For new code deployment we:
stop managed servers,
create new ear file/folder with date stamp,
link ROOT.ear to new folder,
restart coherence (it needs jars from deployment) and do some other necessary stuff,
start managed servers.
We do not explicitly ask to redeploy application, weblogics pick up the latest code. I do not deploy app to Admin Server.
Application is deployed on servers in nostage mode, so they read data from the exploded ear on nfs.
While it works most of the time, sometimes developers complain that it seems there is some stale code in new deployment. Not sure how it is possible.
Is this a good way to deploy applications at all? Can it be improved if it is not good way of doing it?
I noticed also that managed servers do not clean up its tmp folders. Should I manually do it when server is down?
Thanks,
Oleg

You can check the values set for "page-check-seconds", "resource-reload-check-secs", and "servlet-reload-check-secs" in weblogic.xml of you application.
Link : [http://e-docs.bea.com/wls/docs92/webapp/weblogic_xml.html]
I hope you are not using any symbolic links and refreshing them dynamically. Weblogic may not refresh the values of the symbolic links AFAIK.
Edited by: mchellap on May 12, 2009 11:32 PM

Temporarily disable caching

Hello,
I would like to be able to update an application to a newer version without any interruption of service by sequentially updating every machine in the cluster while the others are running.
There will be in consequence a small period of time during the update in which different application versions will be running on different servers. Different domain object versions may clash inside the Coherence cache.
In order to tackle this problem I am considering temporarily reducing the size of our caches to 0 (i.e. temporarily "disabling" cache). When all the servers are updated to the newer version, cache would be reactivated.
My question is: how can I do that at runtime, without changing configuration files and restarting Coherence, using just the Coherence console?
Otherwise, anybody has any experience to share on hot updating with distributed cache ?
I am using Coherence 3.0.
Thank you,
Adrian Dimulescu

Adrian,
First, I'd like to underscore that the Coherence command line console is a tool for developers, not the operational people. It requires deep understanding of your caching layer topology and a small mistake or typo can result in an application failure. Majority of commercial deployments I've seen contain a dedicated application control and management tier developed by application developers for operational use at deployment time.
Secondly, I don't know whether or not your application uses a read-through caching approach or a read-aside, but in either case let's assume that immediately after you clear the cache a user thread checks the cache for some data and immediately turns around and puts new data into the cache. You would end up with an "old" version of data, which is what you want to avoid. To prevent this from happening you need to have some synchronization logic or flags that indicate the fact the application is currently in transition. That could only be a part of application logic I was talking about.
All said, I may not see all the complexity around your specific upgrade process, so the decision is yours to make...
Regards,
Gene

How to reduce the pain of a rolling restart of a Coherence cluster?

One thing that I see as a big pain point in some of the groups I work with that are using Coherence is the difficulty with doing a rolling restart of all the nodes in a coherence cluster.
Is there anything that can be done to help manage or automate this process to reduce the pain and time required for this?

Do you use (WebLogic) managed Coherence servers or standalone ones?

Anyone ever written a script (Groovy?) to do a Coherence cluster rolling restart?

I read the instructions for doing a Coherence cluster rolling restart. This seems like something that would be straightforward to do in a script, perhaps Groovy. It would have to poll the "StatusHA" attribute on the mbean, but that seems reasonable. This seems so obvious that I'd be surprised if someone hasn't written, and perhaps published, a script to do this. Is anyone aware of anything like this?

I wouldn't recommend using a modified version of the provided gar_common.py. As you can see in the source comments, garRedeploy() is meant to receive a Weblogic cluster, not a Coherence cluster. I've tried the modified version which receives a Coherence cluster, and I've ended up with extra deployments in servers outside the Coherence cluster, specifically the Admin Server.
If your Coherence cluster contains several Weblogic clusters, just call garRedeploy() for each Weblogic cluster.

Why does Coherence node restart due to presumed inactivity

I noticed this message in one of the nodes in a 2 node Coherence cluster. The service restarted the particular node immediately. To me it did not look like the node was inactive. What could be other reasons why this node might have been restarted?
2006-09-01 11:11:07.705 Tangosol Coherence 3.1.1/341 <Error> (thread=Cluster, member=1): This senior Member(Id=1, Timestamp=Thu Aug 31 18:19:17 PDT 2006, Address=10.24.0.87, Port=8088, MachineId=4695) appears to have been disconnected from other nodes due to a long period of inactivity and the seniority has been assumed by the Member(Id=2, Timestamp=Thu Aug 31 18:19:47 PDT 2006, Address=10.24.0.87, Port=8089, MachineId=4695); stopping cluster service.
Thanks
Ramdas

Mark,
I increased the buffer size to 2MB - the datagram test goes thru fine. I then restarted my application and the distributed Coherence cache. But ran into the same problem with one of the nodes restarting roughly 20 minutes after starting the cahce server. The application is doing gets and puts via multiple threads - roughly 2/sec into 2 Coherence caches.
I have not been able to attach files when posting into this forum - hence i have cut and pasted portions of the log from node1 and node2 around the time that i noticed the restart.
++++++++++++++node 1++++++++++++++++++++
2006-09-01 14:29:55.282 Tangosol Coherence 3.1.1/341 <D5> (thread=Cluster, member=2): Member 3 joined Service DistributedCache with senior member 2
2006-09-01 14:29:55.289 Tangosol Coherence 3.1.1/341 <D5> (thread=DistributedCache, member=2): Service DistributedCache: sending ServiceConfigSync containing 262 entries to Member 3
2006-09-01 14:49:41.522 Tangosol Coherence 3.1.1/341 <Error> (thread=Cluster, member=2): This senior Member(Id=2, Timestamp=Fri Sep 01 14:24:55 PDT 2006, Address=10.24.0.87, Port=8088, MachineId=4695) appears to have been disconnected from other nodes due to a long period of inactivity and the seniority has been assumed by the Member(Id=1, Timestamp=Fri Sep 01 14:27:45 PDT 2006, Address=10.24.0.87, Port=8089, MachineId=4695); stopping cluster service.
2006-09-01 14:49:41.522 Tangosol Coherence 3.1.1/341 <D5> (thread=Cluster, member=2): Service Cluster left the cluster
2006-09-01 14:49:41.524 Tangosol Coherence 3.1.1/341 <D5> (thread=DistributedCache, member=2): Service DistributedCache left the cluster
2006-09-01 14:49:41.525 Tangosol Coherence 3.1.1/341 <D5> (thread=Invocation:Management, member=2): Service Management left the cluster
2006-09-01 14:49:42.753 Tangosol Coherence 3.1.1/341 <Info> (thread=main, member=n/a): Restarting cluster
2006-09-01 14:49:42.760 Tangosol Coherence 3.1.1/341 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2006-09-01 14:49:42.968 Tangosol Coherence 3.1.1/
++++++++++++++++++++++++++++node 2 ++++++++++++++++++++++
2006-09-01 14:29:54.376 Tangosol Coherence 3.1.1/341 <D5> (thread=Cluster, member=7): Member 3 joined Service Management with senior member 2
2006-09-01 14:29:54.550 Tangosol Coherence 3.1.1/341 <D5> (thread=Cluster, member=7): Member 3 joined Service ReplicatedCache with senior member 6
2006-09-01 14:29:55.016 Tangosol Coherence 3.1.1/341 <D5> (thread=Cluster, member=7): TcpRing: disconnected from member 1
2006-09-01 14:29:55.251 Tangosol Coherence 3.1.1/341 <D5> (thread=Cluster, member=7): Member 3 joined Service DistributedCache with senior member 2
2006-09-01 14:49:41.419 Tangosol Coherence 3.1.1/341 <D5> (thread=Cluster, member=7): Member 2 left service Management with senior member 1
2006-09-01 14:49:41.419 Tangosol Coherence 3.1.1/341 <D5> (thread=Cluster, member=7): Member 2 left service DistributedCache with senior member 6
2006-09-01 14:49:41.421 Tangosol Coherence 3.1.1/341 <D5> (thread=Cluster, member=7): Member 2 left Cluster with senior member 1
2006-09-01 14:49:41.490 Tangosol Coherence 3.1.1/341 <Info> (thread=DistributedCache, member=7): Restored from backup 129 buckets
2006-09-01 14:49:41.490 Tangosol Coherence 3.1.1/341 <D4> (thread=DistributedCache, member=7): 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256,
2006-09-01 14:49:41.529 Tangosol Coherence 3.1.1/341 <D5> (thread=DistributedCache, member=7): Service DistributedCache: received ServiceConfigSync containing 262 entries

Coherence*Web on GlassFish Server Issues

Hi!
We are using Coherence 3.7.1.8 in our application in GlassFish 3.1.2 both as application cache and for storing session data (Coherence*Web) so it can be shared between multiple EARs and App server instances. Session data sharing between EARs in the same container works fine, but when we try to share session data between application server instances the server stops responding and has to be restarted. The only thing we get on the log is this exception:
#|SEVERE|oracle-glassfish3.1.2|com.tangosol.coherence.servlet.ParallelReapTask|_ThreadID=169;_ThreadName=Thread-2;|An exception was thrown while reaping a session.
com.tangosol.coherence.servlet.commonj.WorkException: Work Failed.
at com.tangosol.coherence.servlet.commonj.impl.WorkItemImpl.run(WorkItemImpl.java:167)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.ClassCastException: com.tangosol.coherence.servlet.SplittableHolder cannot be cast to com.tangosol.coherence.servlet.AttributeHolder
at com.tangosol.coherence.servlet.AbstractHttpSessionModel.readAttributes(AbstractHttpSessionModel.java:1815)
at com.tangosol.coherence.servlet.AbstractHttpSessionModel.readExternal(AbstractHttpSessionModel.java:1735)
at com.tangosol.util.ExternalizableHelper.readExternalizableLite(ExternalizableHelper.java:2042)
at com.tangosol.util.ExternalizableHelper.readObjectInternal(ExternalizableHelper.java:2346)
at com.tangosol.util.ExternalizableHelper.deserializeInternal(ExternalizableHelper.java:2747)
at com.tangosol.util.ExternalizableHelper.fromBinary(ExternalizableHelper.java:263)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$ConverterFromBinary.convert(PartitionedCache.CDB:4)
at com.tangosol.util.ConverterCollections$ConverterMap.get(ConverterCollections.java:1656)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$ViewMap.get(PartitionedCache.CDB:1)
at com.tangosol.coherence.component.util.SafeNamedCache.get(SafeNamedCache.CDB:1)
at com.tangosol.net.cache.CachingMap.get(CachingMap.java:491)
at com.tangosol.coherence.servlet.DefaultCacheDelegator.getModel(DefaultCacheDelegator.java:122)
at com.tangosol.coherence.servlet.AbstractHttpSessionCollection.getModel(AbstractHttpSessionCollection.java:2288)
at com.tangosol.coherence.servlet.AbstractReapTask.checkAndInvalidate(AbstractReapTask.java:140)
at com.tangosol.coherence.servlet.ParallelReapTask$ReapWork.run(ParallelReapTask.java:89)
at com.tangosol.coherence.servlet.commonj.impl.WorkItemImpl.run(WorkItemImpl.java:164)
... 3 more
We tried to restrict the shared session data by implementing a custom SessionDistributionController, but according to the documentation, this feature requires coherence-sticky-sessions optimization to be enabled and this last one requires coherence-session-member-locking to be enabled. This led us to the following error:
#|SEVERE|oracle-glassfish3.1.2|org.apache.catalina.connector.CoyoteAdapter|_ThreadID=202;_ThreadName=Thread-2;|PWC3989: An exception or error occurred in the container during the request processing
java.lang.IllegalStateException: attempt to exit session VhSnfqkcwAza when it was not owned
    at com.tangosol.coherence.servlet.AbstractHttpSessionCollection.exit(AbstractHttpSessionCollection.java:799)
    at com.tangosol.coherence.servlet.AbstractHttpSessionCollection.exit(AbstractHttpSessionCollection.java:696)
    at com.tangosol.coherence.servlet.glassfish31.CoherenceWebSessionManager.exit(CoherenceWebSessionManager.java:536)
    at com.tangosol.coherence.servlet.glassfish31.CoherenceWebSession.unlockForeground(CoherenceWebSession.java:451)
    at org.apache.catalina.connector.Request.unlockSession(Request.java:4222)
    at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:342)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:231)
    at com.sun.enterprise.v3.services.impl.ContainerMapper$AdapterCallable.call(ContainerMapper.java:317)
    at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:195)
    at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:860)
    at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:757)
    at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1056)
    at com.sun.grizzly.http.DefaultProtocolFilter.execute(DefaultProtocolFilter.java:229)
    at com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137)
    at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104)
    at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90)
    at com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:79)
    at com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54)
    at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59)
    at com.sun.grizzly.ContextTask.run(ContextTask.java:71)
    at com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532)
    at com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513)
    at java.lang.Thread.run(Thread.java:722)
We want to know if we are missing something in our configuration. We are using com.tangosol.coherence.servlet.AbstractHttpSessionCollection$GlobalScopeController. The session-cache-config.xml file has only this item added:
        <replicated-scheme>
            <scheme-name>default-replicated</scheme-name>
            <service-name>ReplicatedCache</service-name>
            <backing-map-scheme>
                <class-scheme>
                    <scheme-ref>default-backing-map</scheme-ref>
                </class-scheme>
            </backing-map-scheme>
        </replicated-scheme>
        <class-scheme>
            <scheme-name>default-backing-map</scheme-name>
            <class-name>com.tangosol.util.SafeHashMap</class-name>
        </class-scheme>
Any help or light would be greatly appreciated. Thanks in advance.

Depending on the version of WebLogic (which unfortunately I cannot remember off of the top of my head), you don't have to use the installer, because WebLogic has added built-in Coherence*Web support. Have you checked the WebLogic documentation for using Coherence*Web?
Peace,
Cameron Purdy | Oracle Coherence
http://coherence.oracle.com/

Weblogic Admin server restart issue

While restarting Oracle Weblogic Admin server, it is not running after below log.
INFO: Instantiated an instance of org.hibernate.validator.engine.resolver.JPATraversableResolver.
Mar 12, 2015 7:33:07 PM oracle.security.jps.internal.idstore.util.LibOvdUtil pushLdapNamesToLibOvd
INFO: Pushed ldap name and types info to libOvd. Ldaps : DefaultAuthenticator:idstore.ldap.provideridstore.ldap.

Hi,
These messages are Info Message Can you please post the complete Error stack for the same
Check the below threads which have similar issue reported
Integrated JDeveloper Weblogic 12C issues
Weblogic server not starting due to OutOfMemory
SOA Suite 12c Integrated Weblogic / Coherence Issue
Unable to run my application
Hope it helps

Error while starting coherence server

Hi I am working with the coherence in that i am using pof concept to build portable objects , i configured the cache , coherence server i was imported the cache , pof config paths into the these two servers when i am trying to start the coherence server its showing following error like pof config file not loaded ....is there any problem with my cache-config file....can u pls tell me how to approach to this task..
Exception in thread "main" (Wrapped) (Wrapped: error configuring class "com.tangosol.io.pof.ConfigurablePofContext") java.lang.IllegalStateException: Duplicate included POF configuration (Config=Manager-pof-config.xml)
     at com.tangosol.coherence.component.util.Daemon.start(Daemon.CDB:52)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:7)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
     at com.tangosol.coherence.component.util.SafeService.startService(SafeService.CDB:28)
     at com.tangosol.coherence.component.util.safeService.SafeCacheService.startService(SafeCacheService.CDB:5)
     at com.tangosol.coherence.component.util.SafeService.ensureRunningService(SafeService.CDB:27)
     at com.tangosol.coherence.component.util.SafeService.start(SafeService.CDB:14)
     at com.tangosol.net.DefaultConfigurableCacheFactory.ensureServiceInternal(DefaultConfigurableCacheFactory.java:1057)
     at com.tangosol.net.DefaultConfigurableCacheFactory.ensureService(DefaultConfigurableCacheFactory.java:892)
     at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurableCacheFactory.java:874)
     at com.tangosol.net.DefaultConfigurableCacheFactory.configureCache(DefaultConfigurableCacheFactory.java:1231)
     at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurableCacheFactory.java:290)
     at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:735)
     at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:712)
     at com.Manager.Main.main(Main.java:19)
Caused by: (Wrapped: error configuring class "com.tangosol.io.pof.ConfigurablePofContext") java.lang.IllegalStateException: Duplicate included POF configuration (Config=Manager-pof-config.xml)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.instantiateSerializer(Service.CDB:17)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.ensureSerializer(Service.CDB:31)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.ensureSerializer(Service.CDB:4)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onEnter(Grid.CDB:26)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.onEnter(PartitionedService.CDB:19)
     at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:14)
     at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalStateException: Duplicate included POF configuration (Config=Manager-pof-config.xml)
     at com.tangosol.io.pof.ConfigurablePofContext.report(ConfigurablePofContext.java:1260)
     at com.tangosol.io.pof.ConfigurablePofContext.createPofConfig(ConfigurablePofContext.java:848)
     at com.tangosol.io.pof.ConfigurablePofContext.initialize(ConfigurablePofContext.java:775)
     at com.tangosol.io.pof.ConfigurablePofContext.setContextClassLoader(ConfigurablePofContext.java:319)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.instantiateSerializer(Service.CDB:13)
     ... 6 more
2012-06-15 15:03:38.953/2.043 Oracle Coherence GE 3.6.0.4 <Error> (thread=main, member=29): Error while starting service "DistributedCacheService": (Wrapped) (Wrapped: error configuring class "com.tangosol.io.pof.ConfigurablePofContext") java.lang.IllegalStateException: Duplicate included POF configuration (Config=Manager-pof-config.xml)
     at com.tangosol.coherence.component.util.Daemon.start(Daemon.CDB:52)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:7)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
     at com.tangosol.coherence.component.util.SafeService.startService(SafeService.CDB:28)
     at com.tangosol.coherence.component.util.safeService.SafeCacheService.startService(SafeCacheService.CDB:5)
     at com.tangosol.coherence.component.util.SafeService.ensureRunningService(SafeService.CDB:27)
     at com.tangosol.coherence.component.util.SafeService.start(SafeService.CDB:14)
     at com.tangosol.net.DefaultConfigurableCacheFactory.ensureServiceInternal(DefaultConfigurableCacheFactory.java:1057)
     at com.tangosol.net.DefaultConfigurableCacheFactory.ensureService(DefaultConfigurableCacheFactory.java:892)
     at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurableCacheFactory.java:874)
     at com.tangosol.net.DefaultConfigurableCacheFactory.configureCache(DefaultConfigurableCacheFactory.java:1231)
     at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurableCacheFactory.java:290)
     at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:735)
     at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:712)
     at com.Manager.Main.main(Main.java:19)
Caused by: (Wrapped: error configuring class "com.tangosol.io.pof.ConfigurablePofContext") java.lang.IllegalStateException: Duplicate included POF configuration (Config=Manager-pof-config.xml)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.instantiateSerializer(Service.CDB:17)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.ensureSerializer(Service.CDB:31)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.ensureSerializer(Service.CDB:4)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onEnter(Grid.CDB:26)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.onEnter(PartitionedService.CDB:19)
     at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:14)
     at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalStateException: Duplicate included POF configuration (Config=Manager-pof-config.xml)
     at com.tangosol.io.pof.ConfigurablePofContext.report(ConfigurablePofContext.java:1260)
     at com.tangosol.io.pof.ConfigurablePofContext.createPofConfig(ConfigurablePofContext.java:848)
     at com.tangosol.io.pof.ConfigurablePofContext.initialize(ConfigurablePofContext.java:775)
     at com.tangosol.io.pof.ConfigurablePofContext.setContextClassLoader(ConfigurablePofContext.java:319)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.instantiateSerializer(Service.CDB:13)
     ... 6 more
my cache-config file look like this...
<!DOCTYPE cache-config SYSTEM "cache-config.dtd">
<cache-config>
<caching-scheme-mapping>
<cache-mapping>
<cache-name>hello</cache-name>
<scheme-name>default-distributed</scheme-name>
</cache-mapping>
</caching-scheme-mapping>
<caching-schemes>
<distributed-scheme>
<scheme-name>default-distributed</scheme-name>
<service-name>DistributedCacheService</service-name>
<serializer>
<instance>
<class-name>com.tangosol.io.pof.ConfigurablePofContext</class-name>
<init-params>
<init-param>
<param-type>String</param-type>
<param-value system-property="Manager-pof-config.xml">/C:/Users/lakshmana/JPACoherenceWorkspace/Application/appClientModule/Manager-pof-config.xml</param-value>
</init-param>
</init-params>
</instance>
</serializer>
<backing-map-scheme>
<local-scheme/>
</backing-map-scheme>
<autostart>true</autostart>
</distributed-scheme>
<class-scheme>
<scheme-name>default-backing-map</scheme-name>
<class-name>com.tangosol.util.SafeHashMap</class-name>
</class-scheme>
</caching-schemes>
</cache-config>
Thanks-Advance

Hi Jon i removed the include option in the pof-config...
when i am trying to deploy the main method its showing following error ..........
2012-06-15 17:38:24.803/87.644 Oracle Coherence GE 3.6.0.4 <Info> (thread=main, member=3): Restarting Service: DistributedCacheService
2012-06-15 17:38:24.803/87.644 Oracle Coherence GE 3.6.0.4 <D5> (thread=DistributedCache:DistributedCacheService, member=3): Service DistributedCacheService left the cluster
2012-06-15 17:38:24.809/87.650 Oracle Coherence GE 3.6.0.4 <D5> (thread=DistributedCache:DistributedCacheService, member=3): Service DistributedCacheService joined the cluster with senior service member 3
2012-06-15 17:38:24.810/87.651 Oracle Coherence GE 3.6.0.4 <Error> (thread=DistributedCache:DistributedCacheService, member=3): Terminating PartitionedCache due to unhandled exception: com.tangosol.util.WrapperException
2012-06-15 17:38:24.810/87.651 Oracle Coherence GE 3.6.0.4 <Error> (thread=DistributedCache:DistributedCacheService, member=3):
(Wrapped) java.io.IOException: unknown user type: com.tangosol.run.xml.SimpleElement
     at com.tangosol.util.Base.ensureRuntimeException(Base.java:293)
     at com.tangosol.util.Base.ensureRuntimeException(Base.java:269)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher.serializeMessage(PacketPublisher.CDB:34)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher$InQueue.add(PacketPublisher.CDB:8)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.dispatchMessage(Grid.CDB:62)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.post(Grid.CDB:53)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.send(Grid.CDB:1)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$ConfigRequest.onReceived(Grid.CDB:83)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:11)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:33)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.onNotify(PartitionedService.CDB:3)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onNotify(PartitionedCache.CDB:3)
     at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
     at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: unknown user type: com.tangosol.run.xml.SimpleElement
     at com.tangosol.io.pof.ConfigurablePofContext.serialize(ConfigurablePofContext.java:341)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.writeObject(Service.CDB:4)
     at com.tangosol.coherence.component.util.ServiceConfig.writeObject(ServiceConfig.CDB:1)
     at com.tangosol.coherence.component.util.ServiceConfig$Map.writeObject(ServiceConfig.CDB:1)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$ConfigUpdate.write(Grid.CDB:17)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher.serializeMessage(PacketPublisher.CDB:28)
     ... 11 more
Caused by: java.lang.IllegalArgumentException: unknown user type: com.tangosol.run.xml.SimpleElement
     at com.tangosol.io.pof.ConfigurablePofContext.getUserTypeIdentifier(ConfigurablePofContext.java:420)
     at com.tangosol.io.pof.ConfigurablePofContext.getUserTypeIdentifier(ConfigurablePofContext.java:409)
     at com.tangosol.io.pof.PofBufferWriter.writeUserType(PofBufferWriter.java:1660)
     at com.tangosol.io.pof.PofBufferWriter.writeObject(PofBufferWriter.java:1622)
     at com.tangosol.io.pof.ConfigurablePofContext.serialize(ConfigurablePofContext.java:335)
     ... 16 more
2012-06-15 17:38:24.811/87.652 Oracle Coherence GE 3.6.0.4 <D5> (thread=DistributedCache:DistributedCacheService, member=3): Service DistributedCacheService left the cluster
2012-06-15 17:38:24.811/87.652 Oracle Coherence GE 3.6.0.4 <Error> (thread=main, member=3): Error while starting service "DistributedCacheService": java.lang.RuntimeException: Failed to start Service "DistributedCacheService" (ServiceState=SERVICE_STOPPED)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:38)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
     at com.tangosol.coherence.component.util.SafeService.startService(SafeService.CDB:28)
     at com.tangosol.coherence.component.util.safeService.SafeCacheService.startService(SafeCacheService.CDB:5)
     at com.tangosol.coherence.component.util.SafeService.restartService(SafeService.CDB:25)
     at com.tangosol.coherence.component.util.SafeService.ensureRunningService(SafeService.CDB:39)
     at com.tangosol.coherence.component.util.SafeService.getRunningService(SafeService.CDB:1)
     at com.tangosol.coherence.component.util.SafeService.getInfo(SafeService.CDB:1)
     at com.tangosol.coherence.component.net.management.Gateway.instantiateServiceModel(Gateway.CDB:6)
     at com.tangosol.coherence.component.net.management.Gateway.instantiateLocalModel(Gateway.CDB:45)
     at com.tangosol.coherence.component.net.management.Gateway.register(Gateway.CDB:6)
     at com.tangosol.coherence.component.util.SafeService.register(SafeService.CDB:14)
     at com.tangosol.coherence.component.util.SafeService.ensureRunningService(SafeService.CDB:56)
     at com.tangosol.coherence.component.util.SafeService.start(SafeService.CDB:14)
     at com.tangosol.net.DefaultConfigurableCacheFactory.ensureServiceInternal(DefaultConfigurableCacheFactory.java:1057)
     at com.tangosol.net.DefaultConfigurableCacheFactory.ensureService(DefaultConfigurableCacheFactory.java:892)
     at com.tangosol.net.DefaultCacheServer.startServices(DefaultCacheServer.java:81)
     at com.tangosol.net.DefaultCacheServer.monitorServices(DefaultCacheServer.java:285)
     at com.tangosol.net.DefaultCacheServer.startAndMonitor(DefaultCacheServer.java:56)
     at com.tangosol.net.DefaultCacheServer.main(DefaultCacheServer.java:197)
2012-06-15 17:38:24.811/87.652 Oracle Coherence GE 3.6.0.4 <Error> (thread=main, member=3): Failed to restart services: java.lang.RuntimeException: Failed to start Service "DistributedCacheService" (ServiceState=SERVICE_STOPPED)
2012-06-15 17:38:29.811/92.652 Oracle Coherence GE 3.6.0.4 <Info> (thread=main, member=3): Restarting Service: DistributedCacheService
2012-06-15 17:38:29.816/92.657 Oracle Coherence GE 3.6.0.4 <D5> (thread=DistributedCache:DistributedCacheService, member=3): Service DistributedCacheService joined the cluster with senior service member 3
2012-06-15 17:38:29.817/92.658 Oracle Coherence GE 3.6.0.4 <Error> (thread=DistributedCache:DistributedCacheService, member=3): Terminating PartitionedCache due to unhandled exception: com.tangosol.util.WrapperException
2012-06-15 17:38:29.817/92.658 Oracle Coherence GE 3.6.0.4 <Error> (thread=DistributedCache:DistributedCacheService, member=3):
(Wrapped) java.io.IOException: unknown user type: com.tangosol.run.xml.SimpleElement
     at com.tangosol.util.Base.ensureRuntimeException(Base.java:293)
     at com.tangosol.util.Base.ensureRuntimeException(Base.java:269)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher.serializeMessage(PacketPublisher.CDB:34)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher$InQueue.add(PacketPublisher.CDB:8)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.dispatchMessage(Grid.CDB:62)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.post(Grid.CDB:53)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.send(Grid.CDB:1)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$ConfigRequest.onReceived(Grid.CDB:83)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:11)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:33)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.onNotify(PartitionedService.CDB:3)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onNotify(PartitionedCache.CDB:3)
     at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
     at java.lang.Thread.run(Unknown Source)

EM for Coherence - Cluster upgrade

Hi Gurus,
I noticed that there are "Coherence Node Provisioning" process in EM12c, and it says "You can also update selected nodes by copying configuration files and restarting the nodes.". Does the EM will internally check the service HA status before update (stop/start) each node? The "NODE-SAFE" should be the minimum HA Status criterion to meet to ensure there is no data loss.
Thanks in advance
Hysun

Thanks for your hints, but it didn't work either. Maybe because the metaset uses the disks DID-name and those are not available when the node is not booted as part of the cluster.
What I hope will work is this:
- deactivate the zones resourcegroup
- make a backup of the non-global zones root
- restore the backup to a temporary filesystem on the nodes bootdisk
- mount the temporary filesystem as the zones root (via vfstab)
- upgrade this node including the zone
- reboot as part of the cluster (the zone should not start because of autoboot=false and the RG being deactivated)
- acquire access to the zones shared disk resource
- copy the content of the zones root back to its original place
- activate the zones resourcegroup
- upgrade the other node
- and of cource backups, backups and even more backups at the right moments :-)
I will test this scenario as soon as I can find the time for it. If I am successful I will post again.
Regards, Paul

Urgent! Node keep disconnecting from Coherence Cluster

The system consists of 4 standalone cache servers with local storage set to true and 14 other embedded nodes started with different web apps on tomcat with local storage set to false.
When the servers are started after a new deployment, sometimes it would just work, but most times some random tomcat server will stuck in the following pattern.
First it would successful start the cluster service and join an existing cluster.
Oracle Coherence Version 3.5.1/461
Grid Edition: Development mode
Copyright (c) 2000, 2009, Oracle and/or its affiliates. All rights reserved.
2012-07-18 12:24:33.335/31.845 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2012-07-18 12:24:33.550/32.060 Oracle Coherence GE 3.5.1/461 <Info> (thread=Cluster, member=n/a): This Member(Id=8, Timestamp=2012-07-18 12:24:33.347, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:20831, Role=ApacheCatalinaStartupBootstrap, Edition=Grid Edition, Mode=Development, CpuCount=24, SocketCount=24) joined cluster "DEV1" with senior Member(Id=10, Timestamp=2012-07-18 09:39:44.861, Address=10.34.32.101:8090, MachineId=2149, Location=machine:dev1ssapp3,process:27796, Role=ApacheCatalinaStartupBootstrap, Edition=Grid Edition, Mode=Development, CpuCount=64, SocketCount=64)
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=1, Timestamp=2012-07-18 12:22:14.231, Address=10.34.32.107:8090, MachineId=2155, Location=machine:dev1sxapp2,process:1278, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=2, Timestamp=2012-07-18 12:22:14.331, Address=10.34.32.106:8089, MachineId=2154, Location=machine:dev1sxapp1,process:6549, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=3, Timestamp=2012-07-18 12:22:55.086, Address=10.34.32.106:8088, MachineId=2154, Location=machine:dev1sxapp1,process:23083, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=4, Timestamp=2012-07-18 12:22:56.799, Address=10.34.32.107:8088, MachineId=2155, Location=machine:dev1sxapp2,process:19624, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=5, Timestamp=2012-07-18 12:24:31.869, Address=10.34.32.106:8090, MachineId=2154, Location=machine:dev1sxapp1,process:24411, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=6, Timestamp=2012-07-18 12:24:33.084, Address=10.34.32.101:8088, MachineId=2149, Location=machine:dev1ssapp3,process:28932, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.555/32.065 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=14, Timestamp=2012-07-18 09:40:50.645, Address=10.34.32.104:8090, MachineId=2152, Location=machine:dev1ssapp4,process:17697, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.556/32.066 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=17, Timestamp=2012-07-18 10:35:16.722, Address=10.34.32.104:8093, MachineId=2152, Location=machine:dev1ssapp4,process:19365, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.556/32.066 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=18, Timestamp=2012-07-18 10:38:47.714, Address=10.34.32.101:8093, MachineId=2149, Location=machine:dev1ssapp3,process:29887, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 10
2012-07-18 12:24:33.563/32.073 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 10 joined Service Management with senior member 10
2012-07-18 12:24:33.566/32.076 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 1 joined Service Management with senior member 10
2012-07-18 12:24:33.566/32.076 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 2 joined Service Management with senior member 10
2012-07-18 12:24:33.566/32.076 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 4 joined Service Management with senior member 10
2012-07-18 12:24:33.566/32.076 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 17 joined Service Management with senior member 10
2012-07-18 12:24:33.566/32.076 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 17 joined Service PFExpiryDistributedCache with senior member 17
2012-07-18 12:24:33.566/32.076 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 17 joined Service SsoRuleEntryDistributedCache with senior member 17
2012-07-18 12:24:33.567/32.077 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 3 joined Service Management with senior member 10
2012-07-18 12:24:33.567/32.077 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 18 joined Service Management with senior member 10
2012-07-18 12:24:33.567/32.077 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 18 joined Service PFExpiryDistributedCache with senior member 17
2012-07-18 12:24:33.567/32.077 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 18 joined Service SsoRuleEntryDistributedCache with senior member 17
2012-07-18 12:24:33.568/32.078 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 14 joined Service Management with senior member 10
2012-07-18 12:24:33.568/32.078 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 5 joined Service Management with senior member 10
2012-07-18 12:24:33.579/32.089 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 6 joined Service Management with senior member 10
Then it started getting heartbeat overdue message and cluster stopped:
2012-07-18 12:37:20.717/799.227 Oracle Coherence GE 3.5.1/461 <Info> (thread=PacketListenerN, member=8): Scheduled senior member heartbeat is overdue; rejoining multicast group.
2012-07-18 12:37:29.916/808.426 Oracle Coherence GE 3.5.1/461 <Info> (thread=PacketListenerN, member=8): Scheduled senior member heartbeat is overdue; rejoining multicast group.
2012-07-18 12:37:59.291/837.801 Oracle Coherence GE 3.5.1/461 <Error> (thread=PacketListenerN, member=8): Stopping cluster due to unhandled exception: com.tangosol.net.messaging.ConnectionException: Unable to refresh sockets: [UnicastUdpSocket{State=STATE_OPEN, address:port=10.34.32.107:8089}, MulticastUdpSocket{State=STATE_OPEN, address:port=237.0.0.1:40109, InterfaceAddress=10.34.32.107, TimeToLive=4}, TcpSocketAccepter{State=STATE_OPEN, ServerSocket=10.34.32.107:8089}]; last failed socket: MulticastUdpSocket{State=STATE_OPEN, address:port=237.0.0.1:40109, InterfaceAddress=10.34.32.107, TimeToLive=4}
at com.tangosol.coherence.component.net.Cluster$SocketManager.refreshSockets(Cluster.CDB:91)
at com.tangosol.coherence.component.net.Cluster$SocketManager$MulticastUdpSocket.onInterruptedIOException(Cluster.CDB:9)
at com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:33)
at com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: Receive timed out
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:145)
at java.net.DatagramSocket.receive(DatagramSocket.java:725)
at com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
at com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
at java.lang.Thread.run(Thread.java:662)
2012-07-18 12:37:59.291/837.801 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=8): Service Cluster left the cluster
2012-07-18 12:37:59.293/837.803 Oracle Coherence GE 3.5.1/461 <D5> (thread=Invocation:Management, member=8): Service Management left the cluster
2012-07-18 12:37:59.293/837.803 Oracle Coherence GE 3.5.1/461 <D5> (thread=ReplicatedCache:HibernateReplicatedCache, member=8): Service HibernateReplicatedCache left the cluster
Then it started getting messages from various nodes about the existing cluster:
2012-07-18 12:40:02.862/961.372 Oracle Coherence GE 3.5.1/461 <Info> (thread=queue://authenticationService.logonEvent.consumer-2, member=n/a): Restarting cluster
2012-07-18 12:40:02.891/961.401 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2012-07-18 12:40:20.167/978.677 Oracle Coherence GE 3.5.1/461 <Warning> (thread=Cluster, member=n/a): This Member(Id=0, Timestamp=2012-07-18 12:40:02.867, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:20831, Role=ApacheCatalinaStartupBootstrap) has been attempting to join the cluster at address 237.0.0.1:40109 with TTL 4 for 17 seconds without success; this could indicate a mis-configured TTL value, or it may simply be the result of a busy cluster or active failover.
2012-07-18 12:40:20.168/978.678 Oracle Coherence GE 3.5.1/461 <Warning> (thread=Cluster, member=n/a): Received a discovery message that indicates the presence of an existing cluster:
Message "NewMemberAnnounceWait"
FromMember=Member(Id=4, Timestamp=2012-07-18 12:22:56.799, Address=10.34.32.107:8088, MachineId=2155, Location=machine:dev1sxapp2,process:19624, Role=ApacheCatalinaStartupBootstrap)
FromMessageId=0
Internal=false
MessagePartCount=1
PendingCount=0
MessageType=9
ToPollId=0
Poll=null
Packets
[000]=Broadcast{PacketType=0x0DDF00D2, ToId=0, FromId=4, Direction=Incoming, ReceivedMillis=12:40:20.167, MessageType=9, MessagePartCount=1, MessagePartIndex=0, Body=0x00000001389AE63E1F0A22206B00000000000000000000000040001
F980000086B000405011818044445563140400A64657631737861707032053139363234401E417061636865436174616C696E6153746172747570426F6F7473747261700001000001389AF5E6330A22206B00000000000000000000000040001F990000086B000005011818044445563140
400A64657631737861707032053230383331401E417061636865436174616C696E6153746172747570426F6F74737472617000000001389AE6376E0A22206A00000000000000000000000040001F980000086A000305011818044445563140400A646576317378617070310532333038334
01E417061636865436174616C696E6153746172747570426F6F74737472617000, Body.length=287}
Service=ClusterService{Name=Cluster, State=(SERVICE_STARTED, STATE_ANNOUNCE), Id=0, Version=3.5}
ToMemberSet=null
NotifySent=false
ToMember=Member(Id=0, Timestamp=2012-07-18 12:40:02.867, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:20831, Role=ApacheCatalinaStartupBootstrap)
SeniorMember=Member(Id=3, Timestamp=2012-07-18 12:22:55.086, Address=10.34.32.106:8088, MachineId=2154, Location=machine:dev1sxapp1,process:23083, Role=ApacheCatalinaStartupBootstrap)
Then it failed to connect to the cluster:
2012-07-18 12:40:33.187/991.697 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster left the cluster
2012-07-18 12:40:33.190/991.700 Oracle Coherence GE 3.5.1/461 <Error> (thread=queue://authenticationService.logonEvent.consumer-2, member=n/a): Error while starting cluster: com.tangosol.net.RequestTimeoutException:
Timeout during service start: ServiceInfo(Id=0, Name=Cluster, Type=Cluster
MemberSet=ServiceMemberSet(
OldestMember=n/a
ActualMemberSet=MemberSet(Size=0, BitSetCount=0
MemberId/ServiceVersion/ServiceJoined/ServiceLeaving
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:6)
at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:38)
at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:366)
at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:7)
at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:27)
at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:1011)
at com.tangosol.coherence.hibernate.CoherenceCacheProvider.nextTimestamp(CoherenceCacheProvider.java:58)
at org.hibernate.cache.impl.bridge.RegionFactoryCacheProviderBridge.nextTimestamp(RegionFactoryCacheProviderBridge.java:93)
at org.hibernate.impl.SessionFactoryImpl.openSession(SessionFactoryImpl.java:652)
at org.hibernate.ejb.EntityManagerImpl.getRawSession(EntityManagerImpl.java:111)
at org.hibernate.ejb.EntityManagerImpl.getSession(EntityManagerImpl.java:91)
at org.hibernate.ejb.AbstractEntityManagerImpl.setDefaultProperties(AbstractEntityManagerImpl.java:250)
at org.hibernate.ejb.AbstractEntityManagerImpl.postInit(AbstractEntityManagerImpl.java:162)
at org.hibernate.ejb.EntityManagerImpl.<init>(EntityManagerImpl.java:84)
at org.hibernate.ejb.EntityManagerFactoryImpl.createEntityManager(EntityManagerFactoryImpl.java:112)
at org.hibernate.ejb.EntityManagerFactoryImpl.createEntityManager(EntityManagerFactoryImpl.java:107)
at org.springframework.orm.jpa.JpaTransactionManager.createEntityManagerForTransaction(JpaTransactionManager.java:399)
at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:321)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:371)
at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:335)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:105)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:621)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:560)
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:498)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:467)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:325)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:263)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1058)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1050)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:947)
at java.lang.Thread.run(Thread.java:662)
2012-07-18 12:40:33.216/991.726 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2012-07-18 12:40:50.398/1008.908 Oracle Coherence GE 3.5.1/461 <Warning> (thread=Cluster, member=n/a): This Member(Id=0, Timestamp=2012-07-18 12:40:33.194, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:20831, Role=ApacheCatalinaStartupBootstrap) has been attempting to join the cluster at address 237.0.0.1:40109 with TTL 4 for 17 seconds without success; this could indicate a mis-configured TTL value, or it may simply be the result of a busy cluster or active failover.
This particular jvm then start go into this kind of loop, receive a lot of messages from other nodes about the exist cluster but failed to join.
I have ran the MulticastTest and Datagram test which didn't reveal any obvious network issue. What should I do next?
JVM is 1.6.0_31
Thanks a lot in advance, any help will be greatly appreciated.

I correlated the log with all servers and found the issue might be due to some member it is connected with actually was being restarted.
Server 1:
- starts as member 23 and discovered the existing cluster and joined it. Then a lot of messages on server1 with all different members joining the cluster with different member id.
- Then it found some member failed to respond:
2012-07-30 22:00:25.371/34.325 Oracle Coherence GE 3.5.1/461 <D6> (thread=PacketPublisher, member=n/a): Member(Id=5, Timestamp=2012-07-30 15:35:09.735, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:21324, Role=ApacheCatalinaStartupBootstrap) has failed to respond to 17 packets; declaring this member as paused.
- Then it's requesting the departure confirmation for member 5:
2012-07-30 22:00:52.042/60.996 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=n/a): Timeout while delivering a packet Directed{PacketType=0x0DDF00D5, ToId=0, FromId=23, Direction=Outgoing, SentCount=145, SentMillis=22:00:51.832, ToMemberSet=[5(1)], ServiceId=0, MessageType=16, FromMessageId=6, ToMessageId=0, MessagePartCount=1, MessagePartIndex=0, NackInProgress=false, ResendScheduled=22:00:52.32, Timeout=22:00:51.849, PendingResendSkips=0, DeliveryState=unsent, Body=0x0000000200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000..., Body.length=1398}; requesting the departure confirmation for Member(Id=5, Timestamp=2012-07-30 15:35:09.735, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:21324, Role=ApacheCatalinaStartupBootstrap)
by MemberSet(Size=2, BitSetCount=2
Member(Id=1, Timestamp=2012-07-27 10:46:51.616, Address=10.34.32.101:8088, MachineId=2149, Location=machine:dev1ssapp3,process:1, Role=CoherenceServer)
Member(Id=12, Timestamp=2012-07-30 22:00:43.132, Address=10.34.32.101:8091, MachineId=2149, Location=machine:dev1ssapp3,process:7803, Role=ApacheCatalinaStartupBootstrap)
- Then the member set confirmed the departure however at the same time, service cluster also left.
2012-07-30 22:00:52.046/61.000 Oracle Coherence GE 3.5.1/461 <Info> (thread=Cluster, member=n/a): Member departure confirmed by MemberSet(Size=1, BitSetCount=2
Member(Id=12, Timestamp=2012-07-30 22:00:43.132, Address=10.34.32.101:8091, MachineId=2149, Location=machine:dev1ssapp3,process:7803, Role=ApacheCatalinaStartupBootstrap)
); removing Member(Id=5, Timestamp=2012-07-30 15:35:09.735, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:21324, Role=ApacheCatalinaStartupBootstrap)
2012-07-30 22:00:52.046/61.000 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=5, Timestamp=2012-07-30 22:00:52.046, Address=10.34.32.107:8089, MachineId=2155, Location=machine:dev1sxapp2,process:21324, Role=ApacheCatalinaStartupBootstrap) left Cluster with senior member 1
2012-07-30 22:00:52.049/61.003 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster left the cluster
- Then the timeout during service start hence application fails to start
2012-07-30 22:00:52.051/61.005 Oracle Coherence GE 3.5.1/461 <Error> (thread=main, member=n/a): Error while starting cluster: com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=0, Name=Cluster, Type=Cluster
MemberSet=ServiceMemberSet(
OldestMember=n/a
ActualMemberSet=MemberSet(Size=15, BitSetCount=2
Member(Id=1, Timestamp=2012-07-27 10:46:51.616, Address=10.34.32.101:8088, MachineId=2149, Location=machine:dev1ssapp3,process:1, Role=CoherenceServer)
Member(Id=2, Timestamp=2012-07-27 10:47:12.122, Address=10.34.32.101:8089, MachineId=2149, Location=machine:dev1ssapp3,process:2, Role=CoherenceServer)
Member(Id=3, Timestamp=2012-07-27 10:48:02.603, Address=10.34.32.104:8088, MachineId=2152, Location=machine:dev1ssapp4,process:1, Role=CoherenceServer)
Member(Id=4, Timestamp=2012-07-27 10:48:04.76, Address=10.34.32.104:8089, MachineId=2152, Location=machine:dev1ssapp4,process:2, Role=CoherenceServer)
Member(Id=8, Timestamp=2012-07-30 14:27:07.382, Address=10.34.32.101:8090, MachineId=2149, Location=machine:dev1ssapp3,process:23727, Role=ApacheCatalinaStartupBootstrap)
Member(Id=9, Timestamp=2012-07-30 22:00:28.596, Address=10.34.32.101:8092, MachineId=2149, Location=machine:dev1ssapp3,process:7619, Role=ApacheCatalinaStartupBootstrap)
Member(Id=10, Timestamp=2012-07-30 14:34:27.573, Address=10.34.32.104:8090, MachineId=2152, Location=machine:dev1ssapp4,process:25219, Role=ApacheCatalinaStartupBootstrap)
Member(Id=11, Timestamp=2012-07-30 22:00:41.609, Address=10.34.32.107:8088, MachineId=2155, Location=machine:dev1sxapp2,process:17632, Role=ApacheCatalinaStartupBootstrap)
Member(Id=12, Timestamp=2012-07-30 22:00:43.132, Address=10.34.32.101:8091, MachineId=2149, Location=machine:dev1ssapp3,process:7803, Role=ApacheCatalinaStartupBootstrap)
Member(Id=14, Timestamp=2012-07-30 15:35:09.811, Address=10.34.32.106:8088, MachineId=2154, Location=machine:dev1sxapp1,process:5186, Role=ApacheCatalinaStartupBootstrap)
Member(Id=15, Timestamp=2012-07-30 16:02:34.096, Address=10.34.32.106:8091, MachineId=2154, Location=machine:dev1sxapp1,process:2691, Role=ApacheCatalinaStartupBootstrap)
Member(Id=16, Timestamp=2012-07-30 16:08:41.885, Address=10.34.32.107:8091, MachineId=2155, Location=machine:dev1sxapp2,process:15992, Role=ApacheCatalinaStartupBootstrap)
Member(Id=21, Timestamp=2012-07-30 21:58:56.669, Address=10.34.32.106:8089, MachineId=2154, Location=machine:dev1sxapp1,process:28689, Role=ApacheCatalinaStartupBootstrap)
Member(Id=22, Timestamp=2012-07-30 21:58:58.29, Address=10.34.32.107:8090, MachineId=2155, Location=machine:dev1sxapp2,process:15491, Role=ApacheCatalinaStartupBootstrap)
Member(Id=23, Timestamp=2012-07-30 22:00:21.648, Address=10.34.32.106:8090, MachineId=2154, Location=machine:dev1sxapp1,process:556, Role=ApacheCatalinaStartupBootstrap)
MemberId/ServiceVersion/ServiceJoined/ServiceLeaving
1/3.5/Fri Jul 27 10:46:51 EDT 2012/false,
2/3.5/Fri Jul 27 10:47:12 EDT 2012/false,
3/3.5/Fri Jul 27 10:48:02 EDT 2012/false,
4/3.5/Fri Jul 27 10:48:04 EDT 2012/false,
8/3.5/Mon Jul 30 14:27:07 EDT 2012/false,
9/3.5/Mon Jul 30 22:00:28 EDT 2012/false,
10/3.5/Mon Jul 30 14:34:27 EDT 2012/false,
11/3.5/Mon Jul 30 22:00:41 EDT 2012/false,
12/3.5/Mon Jul 30 22:00:43 EDT 2012/false,
14/3.5/Mon Jul 30 15:35:09 EDT 2012/false,
15/3.5/Mon Jul 30 16:02:34 EDT 2012/false,
16/3.5/Mon Jul 30 16:08:41 EDT 2012/false,
21/3.5/Mon Jul 30 21:58:56 EDT 2012/false,
22/3.5/Mon Jul 30 21:58:58 EDT 2012/false,
23/3.5/Mon Jul 30 22:00:21 EDT 2012/false
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:6)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:38)
     at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:366)
     at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
     at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
     at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:7)
     at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:27)
     at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
     at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:1011)
     at com.tangosol.coherence.hibernate.CoherenceCacheProvider.start(CoherenceCacheProvider.java:73)
     at org.hibernate.cache.impl.bridge.RegionFactoryCacheProviderBridge.start(RegionFactoryCacheProviderBridge.java:72)
Looking at member 5's log and I found it was being bounced at that time but somehow it failed to stop the coherence thread and didn't send out departure event to the cluster until was requested by other members.
SEVERE: The web application [riding-services] appears to have started a thread named Cluster but has failed to stop it. This is very likely to create a memory leak.
Questions:
1. Seems that this issue only happens when one server starts while another server is shut down at the same time range and both happen to be connected with each other for distributed caching. How can I modify the script to retry during startup when the first time it timed out? Or maybe modify the configuration to use a longer timeout value?
2. Is it possible to detect the unavailability of certain member quicker? Now seems 30 seconds or more.
Thanks in advance,

Questions related to the localstorage=false client restarted and join back

Hi, We are using the coherence 3.4.2 version. I notified a strange problem in our application, related to the coherence. I am posting here seeking for help. We are using JDK 1.6.0_16 on Linux 64 OS.
To simplify the problems, let me just say that we have 3 JVM running as now. One is we called the data server, starting with following JVM settings:
java -Xms1G -Xmx1G -server -Dlog4j.configuration=log4j.xml
-Dtangosol.coherence.cacheconfig=/xxx/coherence_config.xml
-Dtangosol.coherence.ttl=0
-Dtangosol.coherence.log=log4j
-Dtangosol.coherence.clusterport=5511
-Dtangosol.coherence.distributed.localstorage=true The other 2 JVMs will use the same JVM settings as above, except -Dtangosol.coherence.distributed.localstorage=false.
When I started the all servers in this order: data server -> one client jvm -> the other client jvm, I can see they all join into one cluster, and our application works fine.
Now, if I shutdown client1 and client2, and restarted them, the problems comes:
1) From the coherence log, I can see the client1 and client2 joined back in the cluster.
2) But when client1 is trying to get cache using the following api:
NamedCache cache = CacheFactory.getCache("cacheName");
Object o = (Object)cache.get(id);The cache object is there, but without any elements in it. So Object o will always to Null. But I know in this case, there is one element in the case, which I just put in before restarted client 1 and 2.
So in this case, I can get a cache object back, but without any element into it. This is not the case before the JVM restarted.
3) In the above case, the cache is defined as a replicated cache as following:
          <cache-mapping>
               <cache-name>cacheName</cache-name>
               <scheme-name>ReplicatedScheme</scheme-name>
          </cache-mapping>
          <replicated-scheme>
               <scheme-name>ReplicatedScheme</scheme-name>
               <backing-map-scheme>
                    <local-scheme>
                         <scheme-ref>unlimited-backing-map</scheme-ref>
                    </local-scheme>
               </backing-map-scheme>
          </replicated-scheme>
          
          <local-scheme>
               <scheme-name>unlimited-backing-map</scheme-name>
          </local-scheme>4) I tried to use the SimpleCacheExplorer example coming with the coherence to see if I can reproduce this case. But I can not. I use the same coherence.xml as our application. I used the same cache name in the SimpleCacheExplorer as in our application. I started 3 JVMs, one with localstorage=true, the other 2 with localstorage=false. Everytime, I restarted the other 2 JVMs, it can join back the cluster and get the value for the original key. So I am not sure which part in our application breaks this.
5) I list the coherence log below from our application. I added 2-3 lines comments just to list what happened then.
Any idea or hints about why this is happenning? Thanks for your help.

Here is the log:
############ start the data server
2009-11-19 14:55:07,389 Coherence Logger@1398577124 3.4.2/411 INFO   2009-11-19 14:55:07.132/65.613 Oracle Coherence 3.4.2/411 <Info> (thread=http-8080-1, member=n/a): Loaded operational configuration from resource "jar:file:/datacloud/trunk/install/apache-tomcat-6.0.18/webapps/datacloud/WEB-INF/lib/coherence.jar!/tangosol-coherence.xml"
2009-11-19 14:55:07,389 Coherence Logger@1398577124 3.4.2/411 INFO   2009-11-19 14:55:07.187/65.668 Oracle Coherence 3.4.2/411 <Info> (thread=http-8080-1, member=n/a): Loaded operational overrides from resource "jar:file:/datacloud/trunk/install/apache-tomcat-6.0.18/webapps/datacloud/WEB-INF/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
2009-11-19 14:55:07,389 Coherence Logger@1398577124 3.4.2/411 DEBUG 2009-11-19 14:55:07.187/65.668 Oracle Coherence 3.4.2/411 <D5> (thread=http-8080-1, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified
2009-11-19 14:55:07,389 Coherence Logger@1398577124 3.4.2/411 DEBUG 2009-11-19 14:55:07.190/65.671 Oracle Coherence 3.4.2/411 <D5> (thread=http-8080-1, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified
2009-11-19 14:55:07,389 Coherence Logger@1398577124 3.4.2/411 DEBUG
Oracle Coherence Version 3.4.2/411
Grid Edition: Development mode
Copyright (c) 2000-2009 Oracle. All rights reserved.
2009-11-19 14:55:08,356 Coherence Logger@9218328 3.4.2/411 DEBUG 2009-11-19 14:55:08.351/863.022 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=1): Member(Id=2, Timestamp=2009-11-19 14:55:08.15, Address=10.241.59.246:8089, MachineId=49398, Location=site:americas.nokia.com,machine:daec-linuxvpn059246,process:30265, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 1
2009-11-19 14:55:08,360 Coherence Logger@9230760 3.4.2/411 INFO   2009-11-19 14:55:08.360/66.841 Oracle Coherence GE 3.4.2/411 <Info> (thread=Cluster, member=n/a): This Member(Id=2, Timestamp=2009-11-19 14:55:08.15, Address=10.241.59.246:8089, MachineId=49398, Location=site:americas.nokia.com,machine:daec-linuxvpn059246,process:30265, Role=ApacheCatalinaStartupBootstrap, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1) joined cluster "cluster:0xC3E1" with senior Member(Id=1, Timestamp=2009-11-19 14:40:47.04, Address=10.241.59.246:8088, MachineId=49398, Location=site:americas.nokia.com,machine:daec-linuxvpn059246,process:29459, Role=NokiaDcServerDataServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1)
2009-11-19 14:55:08,386 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.385/66.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service ExpiryService with senior member 1
2009-11-19 14:55:08,386 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.385/66.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistSearchService with senior member 1
2009-11-19 14:55:08,386 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.385/66.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistributedCache with senior member 1
2009-11-19 14:55:08,386 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.385/66.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistSmallLRUService with senior member 1
2009-11-19 14:55:08,386 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.385/66.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistSmallLFUService with senior member 1
2009-11-19 14:55:08,386 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.385/66.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistSmallHYBRIDService with senior member 1
2009-11-19 14:55:08,386 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.385/66.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistMediumLRUService with senior member 1
2009-11-19 14:55:08,386 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.385/66.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistMediumLFUService with senior member 1
2009-11-19 14:55:08,386 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.385/66.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistMediumHYBRIDService with senior member 1
2009-11-19 14:55:08,386 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.385/66.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistLargeLRUService with senior member 1
2009-11-19 14:55:08,386 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.385/66.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistLargeLFUService with senior member 1
2009-11-19 14:55:08,387 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.385/66.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistLargeHYBRIDService with senior member 1
2009-11-19 14:55:08,682 Coherence Logger@9218328 3.4.2/411 DEBUG 2009-11-19 14:55:08.681/863.352 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=1): Member 2 joined Service DistributedCache with senior member 1
2009-11-19 14:55:08,686 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.686/67.167 Oracle Coherence GE 3.4.2/411 <D5> (thread=DistributedCache, member=2): Service DistributedCache joined the cluster with senior service member 1
2009-11-19 14:55:08,690 Coherence Logger@9218328 3.4.2/411 DEBUG 2009-11-19 14:55:08.686/863.357 Oracle Coherence GE 3.4.2/411 <D5> (thread=DistributedCache, member=1): Service DistributedCache: sending ServiceConfigSync containing 258 entries to Member 2
2009-11-19 14:55:08,693 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:08.693/67.174 Oracle Coherence GE 3.4.2/411 <D5> (thread=DistributedCache, member=2): Service DistributedCache: received ServiceConfigSync containing 258 entries
2009-11-19 14:55:09,593 Coherence Logger@9218328 3.4.2/411 DEBUG 2009-11-19 14:55:09.593/864.264 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=1): TcpRing: connecting to member 2 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/10.241.59.246,port=8089,localport=37475]}
2009-11-19 14:55:09,597 Coherence Logger@9230760 3.4.2/411 DEBUG 2009-11-19 14:55:09.596/68.077 Oracle Coherence GE 3.4.2/411 <D5> (thread=TcpRingListener, member=2): TcpRing: connecting to member 1 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/10.241.59.246,port=37475,localport=8089]}
2009-11-19 14:55:12,673 Coherence Logger@1144704103 3.4.2/411 INFO   2009-11-19 14:55:12.508/172.386 Oracle Coherence 3.4.2/411 <Info> (thread=pool-2-thread-1, member=n/a): Loaded operational configuration from resource "jar:file:/datacloud/trunk/install/lib/coherence.jar!/tangosol-coherence.xml"
2009-11-19 14:55:12,673 Coherence Logger@1144704103 3.4.2/411 INFO   2009-11-19 14:55:12.514/172.392 Oracle Coherence 3.4.2/411 <Info> (thread=pool-2-thread-1, member=n/a): Loaded operational overrides from resource "jar:file:/datacloud/trunk/install/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
2009-11-19 14:55:12,673 Coherence Logger@1144704103 3.4.2/411 DEBUG 2009-11-19 14:55:12.514/172.392 Oracle Coherence 3.4.2/411 <D5> (thread=pool-2-thread-1, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified
2009-11-19 14:55:12,673 Coherence Logger@1144704103 3.4.2/411 DEBUG 2009-11-19 14:55:12.517/172.395 Oracle Coherence 3.4.2/411 <D5> (thread=pool-2-thread-1, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified
2009-11-19 14:55:12,673 Coherence Logger@1144704103 3.4.2/411 DEBUG
Oracle Coherence Version 3.4.2/411
Grid Edition: Development mode
Copyright (c) 2000-2009 Oracle. All rights reserved.
2009-11-19 15:04:23,811 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:23.811/62.250 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistSmallLRUService with senior member 1
2009-11-19 15:04:23,811 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:23.811/62.250 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistSmallLFUService with senior member 1
2009-11-19 15:04:23,812 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:23.811/62.250 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistSmallHYBRIDService with senior member 1
2009-11-19 15:04:23,812 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:23.812/62.251 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistMediumLRUService with senior member 1
2009-11-19 15:04:23,812 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:23.812/62.251 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistMediumLFUService with senior member 1
2009-11-19 15:04:23,812 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:23.812/62.251 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistMediumHYBRIDService with senior member 1
2009-11-19 15:04:23,812 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:23.812/62.251 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistLargeLRUService with senior member 1
2009-11-19 15:04:23,812 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:23.812/62.251 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistLargeLFUService with senior member 1
2009-11-19 15:04:23,812 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:23.812/62.251 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistLargeHYBRIDService with senior member 1
2009-11-19 15:04:23,946 Coherence Logger@9218328 3.4.2/411 DEBUG 2009-11-19 15:04:23.946/1418.617 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=1): Member 2 joined Service DistributedCache with senior member 1
2009-11-19 15:04:23,953 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:23.952/62.391 Oracle Coherence GE 3.4.2/411 <D5> (thread=DistributedCache, member=2): Service DistributedCache joined the cluster with senior service member 1
2009-11-19 15:04:23,955 Coherence Logger@9218328 3.4.2/411 DEBUG 2009-11-19 15:04:23.953/1418.624 Oracle Coherence GE 3.4.2/411 <D5> (thread=DistributedCache, member=1): Service DistributedCache: sending ServiceConfigSync containing 259 entries to Member 2
2009-11-19 15:04:23,959 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:23.959/62.398 Oracle Coherence GE 3.4.2/411 <D5> (thread=DistributedCache, member=2): Service DistributedCache: received ServiceConfigSync containing 259 entries
2009-11-19 15:04:24,513 Coherence Logger@879081272 3.4.2/411 INFO   2009-11-19 15:04:24.351/181.083 Oracle Coherence 3.4.2/411 <Info> (thread=RMI TCP Connection(4)-127.0.0.2, member=n/a): Loaded operational configuration from resource "jar:file:/datacloud/trunk/install/lib/coherence.jar!/tangosol-coherence.xml"
2009-11-19 15:04:24,514 Coherence Logger@879081272 3.4.2/411 INFO   2009-11-19 15:04:24.355/181.087 Oracle Coherence 3.4.2/411 <Info> (thread=RMI TCP Connection(4)-127.0.0.2, member=n/a): Loaded operational overrides from resource "jar:file:/datacloud/trunk/install/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
2009-11-19 15:04:24,514 Coherence Logger@879081272 3.4.2/411 DEBUG 2009-11-19 15:04:24.355/181.087 Oracle Coherence 3.4.2/411 <D5> (thread=RMI TCP Connection(4)-127.0.0.2, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified
2009-11-19 15:04:24,514 Coherence Logger@879081272 3.4.2/411 DEBUG 2009-11-19 15:04:24.358/181.090 Oracle Coherence 3.4.2/411 <D5> (thread=RMI TCP Connection(4)-127.0.0.2, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified
2009-11-19 15:04:24,514 Coherence Logger@879081272 3.4.2/411 DEBUG
Oracle Coherence Version 3.4.2/411
Grid Edition: Development mode
Copyright (c) 2000-2009 Oracle. All rights reserved.
2009-11-19 15:04:24,546 Coherence Logger@879081272 3.4.2/411 INFO   2009-11-19 15:04:24.546/181.278 Oracle Coherence GE 3.4.2/411 <Info> (thread=RMI TCP Connection(4)-127.0.0.2, member=n/a): Loaded cache configuration from file "/datacloud/trunk/install/conf/coherence_config.xml"
2009-11-19 15:04:24,760 Coherence Logger@879081272 3.4.2/411 WARN   2009-11-19 15:04:24.760/181.492 Oracle Coherence GE 3.4.2/411 <Warning> (thread=RMI TCP Connection(4)-127.0.0.2, member=n/a): UnicastUdpSocket failed to set receive buffer size to 1428 packets (2096304 bytes); actual size is 89 packets (131071 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance.
2009-11-19 15:04:24,786 Coherence Logger@9218328 3.4.2/411 DEBUG 2009-11-19 15:04:24.785/1419.456 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=1): TcpRing: connecting to member 2 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/10.241.59.246,port=8089,localport=48995]}
2009-11-19 15:04:24,787 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:24.787/63.226 Oracle Coherence GE 3.4.2/411 <D5> (thread=TcpRingListener, member=2): TcpRing: connecting to member 1 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/10.241.59.246,port=48995,localport=8089]}
2009-11-19 15:04:24,882 Coherence Logger@879081272 3.4.2/411 DEBUG 2009-11-19 15:04:24.882/181.614 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2009-11-19 15:04:25,118 Coherence Logger@9218328 3.4.2/411 DEBUG 2009-11-19 15:04:25.112/1419.783 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=1): Member(Id=3, Timestamp=2009-11-19 15:04:24.917, Address=10.241.59.246:8090, MachineId=49398, Location=site:americas.nokia.com,machine:daec-linuxvpn059246,process:30657, Role=NokiaDcServerBuildBuildServer) joined Cluster with senior member 1
2009-11-19 15:04:25,118 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:25.115/63.554 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=2): Member(Id=3, Timestamp=2009-11-19 15:04:24.917, Address=10.241.59.246:8090, MachineId=49398, Location=site:americas.nokia.com,machine:daec-linuxvpn059246,process:30657, Role=NokiaDcServerBuildBuildServer) joined Cluster with senior member 1
2009-11-19 15:04:25,120 Coherence Logger@9242415 3.4.2/411 INFO   2009-11-19 15:04:25.120/181.852 Oracle Coherence GE 3.4.2/411 <Info> (thread=Cluster, member=n/a): This Member(Id=3, Timestamp=2009-11-19 15:04:24.917, Address=10.241.59.246:8090, MachineId=49398, Location=site:americas.nokia.com,machine:daec-linuxvpn059246,process:30657, Role=NokiaDcServerBuildBuildServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1) joined cluster "cluster:0xC3E1" with senior Member(Id=1, Timestamp=2009-11-19 14:40:47.04, Address=10.241.59.246:8088, MachineId=49398, Location=site:americas.nokia.com,machine:daec-linuxvpn059246,process:29459, Role=NokiaDcServerDataServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1)
2009-11-19 15:04:25,127 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.127/181.859 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member(Id=2, Timestamp=2009-11-19 15:04:23.541, Address=10.241.59.246:8089, MachineId=49398, Location=site:americas.nokia.com,machine:daec-linuxvpn059246,process:30753, Role=ApacheCatalinaStartupBootstrap) joined Cluster with senior member 1
2009-11-19 15:04:25,131 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.131/181.863 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service ExpiryService with senior member 1
2009-11-19 15:04:25,131 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.131/181.863 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistSearchService with senior member 1
2009-11-19 15:04:25,132 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.132/181.864 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistributedCache with senior member 1
2009-11-19 15:04:25,132 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.132/181.864 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistSmallLRUService with senior member 1
2009-11-19 15:04:25,132 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.132/181.864 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistSmallLFUService with senior member 1
2009-11-19 15:04:25,133 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.132/181.864 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistSmallHYBRIDService with senior member 1
2009-11-19 15:04:25,133 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.133/181.865 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistMediumLRUService with senior member 1
2009-11-19 15:04:25,133 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.133/181.865 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistMediumLFUService with senior member 1
2009-11-19 15:04:25,133 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.133/181.865 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistMediumHYBRIDService with senior member 1
2009-11-19 15:04:25,133 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.133/181.865 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistLargeLRUService with senior member 1
2009-11-19 15:04:25,133 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.133/181.865 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistLargeLFUService with senior member 1
2009-11-19 15:04:25,133 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.133/181.865 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistLargeHYBRIDService with senior member 1
2009-11-19 15:04:25,134 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.134/181.866 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=n/a): Member 2 joined Service DistributedCache with senior member 1
2009-11-19 15:04:25,219 Coherence Logger@9218328 3.4.2/411 DEBUG 2009-11-19 15:04:25.219/1419.890 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=1): Member 3 joined Service ReplicatedCache with senior member 3
2009-11-19 15:04:25,220 Coherence Logger@9249408 3.4.2/411 DEBUG 2009-11-19 15:04:25.220/63.659 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=2): Member 3 joined Service ReplicatedCache with senior member 3
2009-11-19 15:04:25,223 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:25.223/181.955 Oracle Coherence GE 3.4.2/411 <D5> (thread=ReplicatedCache, member=3): Service ReplicatedCache joined the cluster with senior service member 3
2009-11-19 15:04:26,310 Coherence Logger@9242415 3.4.2/411 DEBUG 2009-11-19 15:04:26.309/183.041 Oracle Coherence GE 3.4.2/411 <D5> (thread=Cluster, member=3): TcpRing: connecting to member 1 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/10.241.59.246,port=8088,localport=58713]}
2009-11-19 15:04:26,312 Coherence Logger@9218328 3.4.2/411 DEBUG 2009-11-19 15:04:26.312/1420.983 Oracle Coherence GE 3.4.2/411 <D5> (thread=TcpRingListener, member=1): TcpRing: connecting to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/10.241.59.246,port=58713,localport=8088]}

Coherence Cluster Errors- Need your help to solve

Hi,
We had this error recently in QA and these servers are not new servers. These servers were running from some time and in good condition.
We had a below error happened suddently and cuased servers outage for some time.
After restarted all the servers, this issue has gone.
We are trying to understand the root cause to avoid this issue in future and need expertise in this forum for that.
Brief summary of issue
1. We had performed multicaste testing on the coherence cluster IP and port and all the communication is good.
2. Issues started with error of Unable to refresh sockets:
                      Stopping cluster due to unhandled exception: com.tangosol.net.messaging.ConnectionException: Unable to refresh sockets: [UnicastUdpSocket{State=STATE_OPEN, address:port=1.1.1.85:8088},                     MulticastUdpSocket{State=STATE_OPEN, address:port=239.3.1.17:35122, InterfaceAddress=10.137.3.85, TimeToLive=1}, TcpSocketAccepter{State=STATE_OPEN, ServerSocket=1.1.1.85:8088}]; last failed socket:                          MulticastUdpSocket{State=STATE_OPEN, address:port=239.3.1.17:35122, InterfaceAddress=10.137.3.85, TimeToLive=1}
                                           at com.tangosol.coherence.component.net.Cluster$SocketManager.refreshSockets(Cluster.CDB:91)
                                            at com.tangosol.coherence.component.net.Cluster$SocketManager$MulticastUdpSocket.onInterruptedIOException(Cluster.CDB:9)
                                       at com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:33)
                                  at com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
                                       at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
                                       at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
                                       at java.lang.Thread.run(Thread.java:662)
                    Caused by: java.net.SocketTimeoutException: Receive timed out
3. After that, I noticed copule of errors like
                                   Restarting Service: DistributedCache   validatePolls: This service timed-out due to unanswered handshake request. Manual intervention is required to stop the members that have not responded to this Poll
4. Continously logging errors like :   Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/
5. After that noticed,
                         Service DistributedCache: received ServiceConfigSync containing 272 entries
                         2013-10-26 08:26:43,241 -0700 level=ERROR class="STDERR"              2013-10-26 08:26:43.241/76.243 Oracle Coherence GE 3.5.1/461 <Error> (thread=main, member=1): Error while starting service "DistributedCache":                          com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=2, Name=DistributedCache, Type=DistributedCache
                           MemberSet=ServiceMemberSet(
                             OldestMember=Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=10.137.3.49:8088, MachineId=32817, Location=site:test.test.net,machine:test30b,process:3870)
                                       ActualMemberSet=MemberSet(Size=3, BitSetCount=2
                                    Member(Id=1, Timestamp=2013-10-26 08:26:12.289, Address=1.1.1.85:8088, MachineId=32853, Location=site:test.test.net,machine:test304,process:6207, Role=JavaLangThread)
                                    Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=1.1.1.49:8088, MachineId=32817, Location=site:test.test.net,machine:test30b,process:3870)
                                    Member(Id=5, Timestamp=2013-10-26 08:26:29.871, Address=1.1.1.86:8088, MachineId=32854, Location=site:test.test.net,machine:test305,process:3988)
                        MemberId/ServiceVersion/ServiceJoined/ServiceLeaving
                          1/3.5/Sat Oct 26 08:26:13 PDT 2013/false,
                          3/3.5/Sat Oct 26 05:16:47 PDT 2013/false,
                          5/3.5/Sat Oct 26 08:26:30 PDT 2013/false
    at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:6)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
Your Help is highly appreciated !!!!
Detailed Server Error Log:
2013-10-26 00:15:13,280 -0700 level=ERROR class="STDERR"
2013-10-26 00:15:13.279/2079180.072 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 2642 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread); 34 packets rescheduled, PauseRate=0.0010, Threshold=222
2013-10-26 00:15:15,508 -0700 level=ERROR class="STDERR"
2013-10-26 00:15:15.508/2079182.301 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 4875 ms communication delay (probable remote GC) with Member(Id=1, Timestamp=2013-10-08 22:00:17.258, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988, Role=JavaLangThread); 47 packets rescheduled, PauseRate=3.0E-4, Threshold=1438
2013-10-26 01:15:29,028 -0700 level=ERROR class="STDERR"
2013-10-26 01:15:29.018/2082795.811 Oracle Coherence GE 3.5.1/461 <Info> (thread=PacketListenerN, member=4): Scheduled senior member heartbeat is overdue; rejoining multicast group.
2013-10-26 01:15:29,036 -0700 level=ERROR class="STDERR"
2013-10-26 01:15:29.036/2082795.829 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 13068 ms communication delay (probable remote GC) with Member(Id=1, Timestamp=2013-10-08 22:00:17.258, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988, Role=JavaLangThread); 86 packets rescheduled, PauseRate=4.0E-4, Threshold=1438
2013-10-26 01:15:29,037 -0700 level=ERROR class="STDERR"
2013-10-26 01:15:29.036/2082795.829 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 13069 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread); 84 packets rescheduled, PauseRate=0.0010, Threshold=269
2013-10-26 01:31:44,494 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 02:15:34,907 -0700 level=ERROR class="STDERR"
2013-10-26 02:15:34.906/2086401.699 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 6476 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread); 24 packets rescheduled, PauseRate=0.0011, Threshold=313
2013-10-26 02:43:52,199 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 03:00:55,493 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 03:15:41,144 -0700 level=ERROR class="STDERR"
2013-10-26 03:15:41.144/2090007.937 Oracle Coherence GE 3.5.1/461 <D5> (thread=PacketPublisher, member=4): Experienced a 202 ms communication delay (probable remote GC) with Member(Id=1, Timestamp=2013-10-08 22:00:17.258, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988, Role=JavaLangThread); 25 packets rescheduled, PauseRate=4.0E-4, Threshold=1509
2013-10-26 03:15:41,592 -0700 level=ERROR class="STDERR"
2013-10-26 03:15:41.592/2090008.385 Oracle Coherence GE 3.5.1/461 <D5> (thread=PacketPublisher, member=4): Experienced a 371 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread); 41 packets rescheduled, PauseRate=0.0010, Threshold=290
2013-10-26 03:31:38,099 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 04:15:47,869 -0700 level=ERROR class="STDERR"
2013-10-26 04:15:47.869/2093614.662 Oracle Coherence GE 3.5.1/461 <D5> (thread=PacketPublisher, member=4): Experienced a 850 ms communication delay (probable remote GC) with Member(Id=1, Timestamp=2013-10-08 22:00:17.258, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988, Role=JavaLangThread); 52 packets rescheduled, PauseRate=4.0E-4, Threshold=1509
2013-10-26 04:16:00,192 -0700 level=ERROR class="STDERR"
2013-10-26 04:16:00.182/2093626.975 Oracle Coherence GE 3.5.1/461 <Info> (thread=PacketListenerN, member=4): Scheduled senior member heartbeat is overdue; rejoining multicast group.
2013-10-26 04:16:00,199 -0700 level=ERROR class="STDERR"
2013-10-26 04:16:00.199/2093626.992 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 13180 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread); 126 packets rescheduled, PauseRate=0.0011, Threshold=424
2013-10-26 04:16:01,897 -0700 level=ERROR class="STDERR"
2013-10-26 04:16:01.897/2093628.690 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=4): Experienced a 1503 ms communication delay (probable remote GC) with Member(Id=1, Timestamp=2013-10-08 22:00:17.258, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988, Role=JavaLangThread); 173 packets rescheduled, PauseRate=4.0E-4, Threshold=1509
2013-10-26 04:26:54,424 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 04:51:52,096 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 05:02:52,292 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 05:16:06,076 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.075/2097232.868 Oracle Coherence GE 3.5.1/461 <Error> (thread=PacketListenerN, member=4):
Stopping cluster due to unhandled exception: com.tangosol.net.messaging.ConnectionException: Unable to refresh sockets: [UnicastUdpSocket{State=STATE_OPEN, address:port=1.1.1..85:8088}, MulticastUdpSocket{State=STATE_OPEN, address:port=239.3.1.17:35122, InterfaceAddress=1.1.1..85, TimeToLive=1}, TcpSocketAccepter{State=STATE_OPEN, ServerSocket=1.1.1..85:8088}]; last failed socket: MulticastUdpSocket{State=STATE_OPEN, address:port=239.3.1.17:35122, InterfaceAddress=1.1.1..85, TimeToLive=1}
    at com.tangosol.coherence.component.net.Cluster$SocketManager.refreshSockets(Cluster.CDB:91)
    at com.tangosol.coherence.component.net.Cluster$SocketManager$MulticastUdpSocket.onInterruptedIOException(Cluster.CDB:9)
    at com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:33)
    at com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
    at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: Receive timed out
    at java.net.PlainDatagramSocketImpl.receive0(Native Method)
    at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:145)
    at java.net.DatagramSocket.receive(DatagramSocket.java:725)
    at com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
    at com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
    at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
    at java.lang.Thread.run(Thread.java:662)
2013-10-26 05:16:06,080 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.080/2097232.873 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=4): Service Cluster left the cluster
2013-10-26 05:16:06,105 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.105/2097232.898 Oracle Coherence GE 3.5.1/461 <D5> (thread=Invocation:Management, member=4): Service Management left the cluster
2013-10-26 05:16:06,105 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.105/2097232.898 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-180, member=4): Restarting NamedCache: test234aaaapeu-cache
2013-10-26 05:16:06,105 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.105/2097232.898 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-180, member=4): Restarting Service: DistributedCache
2013-10-26 05:16:06,110 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.106/2097232.899 Oracle Coherence GE 3.5.1/461 <Error> (thread=DistributedCache, member=4):
validatePolls: This service timed-out due to unanswered handshake request. Manual intervention is required to stop the members that have not responded to this Poll
PollId=24209529, active
InitTimeMillis=1382789736843
Service=DistributedCache (2)
RespondedMemberSet=[]
LeftMemberSet=[]
RemainingMemberSet=[3]
Request=Message "LockRequest"
{test.test.net
FromMember=Member(Id=4, Timestamp=2013-10-24 15:16:09.067, Address=1.1.1..85:8088, MachineId=32853, Location=site:test.test.net,machine:testabc304,process:4000)
FromMessageId=38338332
Internal=false
MessagePartCount=1
PendingCount=0
MessageType=12
ToPollId=0
Poll=null
Packets
Service=DistributedCache{Name=DistributedCache, State=(SERVICE_STOPPED), Not initialized}
ToMemberSet=MemberSet(Size=1, BitSetCount=1
Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread)
NotifySent=false
null
WaitTimeout=1382789776739, LeaseExpiration=9223372036854775807
2013-10-26 05:16:06,110 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.109/2097232.902 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=4): Service DistributedCache left the cluster
2013-10-26 05:16:06,117 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.117/2097232.910 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-180, member=n/a): Restarting cluster
2013-10-26 05:16:06,198 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:06.198/2097232.991 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2013-10-26 05:16:07,410 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.410/2097234.203 Oracle Coherence GE 3.5.1/461 <Info> (thread=Cluster, member=n/a): Created a new cluster "cluster:0x27CB" with Member(Id=1, Timestamp=2013-10-26 05:16:06.128, Address=1.1.1..85:8088, MachineId=32853, Location=site:test.test.net,machine:testabc304,process:4000, Edition=Grid Edition, Mode=Development, CpuCount=4, SocketCount=4) UID=0x0A89035500000141F4B15BF080551F98
2013-10-26 05:16:07,436 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.436/2097234.229 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-180, member=1): Restarting Service: Management
2013-10-26 05:16:07,450 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.450/2097234.243 Oracle Coherence GE 3.5.1/461 <D5> (thread=Invocation:Management, member=1): Service Management joined the cluster with senior service member 1
2013-10-26 05:16:07,474 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.474/2097234.267 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Service DistributedCache joined the cluster with senior service member 1
2013-10-26 05:16:07,491 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.491/2097234.284 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-183, member=1): Restarting NamedCache: test234aaaaficustomer-cache
2013-10-26 05:16:07,514 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.514/2097234.307 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-38, member=1): Restarting NamedCache: test234aaaaaccount-no-export-cache
2013-10-26 05:16:07,529 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.529/2097234.322 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-136, member=1): Restarting NamedCache: test234aaaausrsum-cache
2013-10-26 05:16:07,546 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.545/2097234.338 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-9, member=1): Restarting NamedCache: test234aaaafi-v2-cache
2013-10-26 05:16:07,569 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.567/2097234.360 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-59, member=1): Restarting NamedCache: test234aaaaaccount-v2-cache
2013-10-26 05:16:07,748 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.748/2097234.541 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-28, member=1): Restarting NamedCache: test234aaaafi-cache
2013-10-26 05:16:07,816 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:07.816/2097234.609 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-133, member=1): Restarting NamedCache: test234aaaahistory-v2-cache
2013-10-26 05:16:09,154 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.154/2097235.947 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-134, member=1): Restarting NamedCache: test234aaaaaccount-cache
2013-10-26 05:16:09,169 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.169/2097235.962 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-134, member=1): Restarting NamedCache: test234aaaahistory-cache
2013-10-26 05:16:09,444 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.444/2097236.237 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member(Id=2, Timestamp=2013-10-26 05:16:09.259, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988) joined Cluster with senior member 1
2013-10-26 05:16:09,539 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.539/2097236.332 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 2 joined Service Management with senior member 1
2013-10-26 05:16:09,580 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.579/2097236.372 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 2 joined Service DistributedCache with senior member 1
2013-10-26 05:16:09,599 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.599/2097236.392 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Service DistributedCache: sending ServiceConfigSync containing 268 entries to Member 2
2013-10-26 05:16:09,681 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.681/2097236.474 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 1> Transferring 128 out of 257 vulnerable partitions to member 2 requesting 128
2013-10-26 05:16:09,892 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.881/2097236.674 Oracle Coherence GE 3.5.1/461 <D4> (thread=DistributedCache, member=1): 1> Transferring 129 out of 129 partitions to a machine-safe backup 1 at member 2 (under 129)
2013-10-26 05:16:09,901 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:09.901/2097236.694 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 388KB of backup[1] for PartitionSet{128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256} to member 2
2013-10-26 05:16:10,415 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:10.415/2097237.208 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): TcpRing: connecting to member 2 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..86,port=8088,localport=37005]}
2013-10-26 05:16:10,657 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:10.657/2097237.450 Oracle Coherence GE 3.5.1/461 <Warning> (thread=Cluster, member=1): Received panic from junior member Member(Id=2, Timestamp=2013-10-26 05:16:09.259, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988) caused by Member(Id=3, Timestamp=2013-10-01 22:43:27.913, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870, Role=JavaLangThread)
2013-10-26 05:16:11,592 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:11.592/2097238.385 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32822,localport=8088]}
2013-10-26 05:16:13,568 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:13.568/2097240.361 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-52, member=1): Restarting NamedCache: test234aaaauserData-cache
2013-10-26 05:16:13,596 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:13.596/2097240.389 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32823,localport=8088]}
2013-10-26 05:16:14,937 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:14.937/2097241.730 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-52, member=1): Restarting NamedCache: test234aaaacheckimage-cache
2013-10-26 05:16:15,600 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:15.600/2097242.393 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32824,localport=8088]}
2013-10-26 05:16:17,602 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:17.602/2097244.395 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32825,localport=8088]}
2013-10-26 05:16:19,605 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:19.605/2097246.398 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32828,localport=8088]}
2013-10-26 05:16:21,609 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:21.609/2097248.402 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32829,localport=8088]}
2013-10-26 05:16:23,611 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:23.611/2097250.404 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32830,localport=8088]}
2013-10-26 05:16:25,616 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:25.616/2097252.409 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32831,localport=8088]}
2013-10-26 05:16:27,619 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:27.619/2097254.412 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32832,localport=8088]}
2013-10-26 05:16:29,621 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:29.621/2097256.414 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32833,localport=8088]}
2013-10-26 05:16:31,626 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:31.626/2097258.419 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32834,localport=8088]}
2013-10-26 05:16:33,631 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:33.631/2097260.424 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32835,localport=8088]}
2013-10-26 05:16:35,632 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:35.632/2097262.425 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32836,localport=8088]}
2013-10-26 05:16:37,636 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:37.635/2097264.428 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32837,localport=8088]}
2013-10-26 05:16:39,641 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:39.640/2097266.433 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32838,localport=8088]}
2013-10-26 05:16:41,643 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:41.643/2097268.436 Oracle Coherence GE 3.5.1/461 <D4> (thread=TcpRingListener, member=1): Rejecting connection to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32841,localport=8088]}
2013-10-26 05:16:47,329 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:47.329/2097274.122 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870) joined Cluster with senior member 1
2013-10-26 05:16:47,425 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:47.425/2097274.218 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 3 joined Service Management with senior member 1
2013-10-26 05:16:47,477 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:47.476/2097274.269 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 3 joined Service DistributedCache with senior member 1
2013-10-26 05:16:47,501 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:47.500/2097274.294 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Service DistributedCache: sending ServiceConfigSync containing 270 entries to Member 3
2013-10-26 05:16:47,548 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:47.548/2097274.341 Oracle Coherence GE 3.5.1/461 <D5> (thread=TcpRingListener, member=1): TcpRing: connecting to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=32846,localport=8088]}
2013-10-26 05:16:48,454 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:48.453/2097275.246 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 43 out of 129 primary partitions to member 3 requesting 43
2013-10-26 05:16:48,709 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:48.709/2097275.502 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 39 out of 125 primary partitions to member 3 requesting 39
2013-10-26 05:16:48,885 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:48.884/2097275.677 Oracle Coherence GE 3.5.1/461 <D5> (thread=http-0.0.0.0-8080-210, member=1): Repeating QueryRequest due to the re-distribution of PartitionSet{132, 133, 134, 135, 136, 137, 138, 139, 140, 141}
2013-10-26 05:16:50,850 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:50.848/2097277.641 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 29 out of 115 primary partitions to member 3 requesting 29
2013-10-26 05:16:50,968 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:50.968/2097277.761 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 21 out of 107 primary partitions to member 3 requesting 21
2013-10-26 05:16:51,097 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.097/2097277.890 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 14 out of 100 primary partitions to member 3 requesting 14
2013-10-26 05:16:51,218 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.218/2097278.011 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 6 out of 92 primary partitions to member 3 requesting 6
2013-10-26 05:16:51,340 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.340/2097278.133 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): 2> Transferring 1 out of 87 primary partitions to member 3 requesting 1
2013-10-26 05:16:51,352 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.352/2097278.145 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 540KB of backup[1] for PartitionSet{171, 172, 173, 174, 175, 176, 177} to member 3
2013-10-26 05:16:51,465 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.464/2097278.257 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 575KB of backup[1] for PartitionSet{178, 179, 180, 181, 182, 183} to member 3
2013-10-26 05:16:51,569 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.569/2097278.362 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 537KB of backup[1] for PartitionSet{184, 185, 186, 187} to member 3
2013-10-26 05:16:51,688 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.688/2097278.481 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 553KB of backup[1] for PartitionSet{188, 189, 190, 191, 192, 193, 194, 195, 196} to member 3
2013-10-26 05:16:51,817 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.817/2097278.610 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 526KB of backup[1] for PartitionSet{197, 198, 199, 200, 201, 202} to member 3
2013-10-26 05:16:51,928 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:51.928/2097278.721 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 768KB of backup[1] for PartitionSet{203, 204, 205, 206, 207, 208, 209} to member 3
2013-10-26 05:16:52,040 -0700 level=ERROR class="STDERR"
2013-10-26 05:16:52.039/2097278.832 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Transferring 198KB of backup[1] for PartitionSet{210, 211, 212, 213} to member 3
2013-10-26 05:19:06,157 -0700 level=ERROR class="STDERR"
2013-10-26 05:19:06.157/2097412.950 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-63, member=1): Restarting NamedCache: throttleData-cache
2013-10-26 05:22:15,094 -0700 level=ERROR class="STDERR"
2013-10-26 05:22:15.094/2097601.887 Oracle Coherence GE 3.5.1/461 <Info> (thread=http-0.0.0.0-8080-136, member=1): Restarting NamedCache: test234aaaadepositslipimage-cache
2013-10-26 05:22:17,183 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 05:28:49,617 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 05:29:39,729 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 05:33:37,607 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 05:39:33,872 -0700 level=INFO class="STDOUT"
WARN   getResponseBody, Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2013-10-26 06:49:30,617 -0700 level=ERROR class="STDERR"
2013-10-26 06:49:30.617/2102837.410 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=1): Experienced a 6378 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-10-26 05:16:09.259, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988); 56 packets rescheduled, PauseRate=0.0011, Threshold=1976
2013-10-26 07:39:18,855 -0700 level=ERROR class="STDERR"
2013-10-26 07:39:18.854/2105825.647 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=1): Experienced a 7318 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870); 68 packets rescheduled, PauseRate=8.0E-4, Threshold=497
2013-10-26 07:49:37,510 -0700 level=ERROR class="STDERR"
2013-10-26 07:49:37.510/2106444.303 Oracle Coherence GE 3.5.1/461 <Warning> (thread=PacketPublisher, member=1): Experienced a 6653 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-10-26 05:16:09.259, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988); 69 packets rescheduled, PauseRate=0.0014, Threshold=1785
Copyright (c) 2000, 2009, Oracle and/or its affiliates. All rights reserved.
2013-10-26 08:26:11,291 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:11.291/44.293 Oracle Coherence GE 3.5.1/461 <Info> (thread=main, member=n/a): Loaded cache configuration from "file:/usr/local/whp-jboss-web-5/server/default/env/test234aaaacoherence-cache-config.xml"
2013-10-26 08:26:12,263 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.263/45.265 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2013-10-26 08:26:12,477 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.477/45.479 Oracle Coherence GE 3.5.1/461 <Info> (thread=Cluster, member=n/a): This Member(Id=1, Timestamp=2013-10-26 08:26:12.289, Address=1.1.1..85:8088, MachineId=32853, Location=site:test.test.net,machine:testabc304,process:6207, Role=JavaLangThread, Edition=Grid Edition, Mode=Development, CpuCount=4, SocketCount=4) joined cluster "cluster:0x27CB" with senior Member(Id=2, Timestamp=2013-10-26 05:16:09.259, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988, Edition=Grid Edition, Mode=Development, CpuCount=4, SocketCount=4)
2013-10-26 08:26:12,501 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.501/45.503 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870) joined Cluster with senior member 2
2013-10-26 08:26:12,507 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.506/45.508 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 2 joined Service Management with senior member 2
2013-10-26 08:26:12,507 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.507/45.509 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 2 joined Service DistributedCache with senior member 2
2013-10-26 08:26:12,520 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.520/45.522 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 3 joined Service Management with senior member 2
2013-10-26 08:26:12,520 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.520/45.522 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=n/a): Member 3 joined Service DistributedCache with senior member 2
2013-10-26 08:26:12,639 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.639/45.641 Oracle Coherence GE 3.5.1/461 <D5> (thread=Invocation:Management, member=1): Service Management joined the cluster with senior service member 2
2013-10-26 08:26:12,700 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:12.700/45.702 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): TcpRing: connecting to member 3 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..49,port=8088,localport=52891]}
2013-10-26 08:26:13,191 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:13.190/46.193 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Service DistributedCache joined the cluster with senior service member 2
2013-10-26 08:26:14,538 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:14.538/47.540 Oracle Coherence GE 3.5.1/461 <D5> (thread=TcpRingListener, member=1): TcpRing: connecting to member 2 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..86,port=40281,localport=8088]}
2013-10-26 08:26:29,695 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:29.694/62.696 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): TcpRing: disconnected from member 2 due to a kill request
2013-10-26 08:26:29,695 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:29.694/62.696 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 2 left service Management with senior member 3
2013-10-26 08:26:29,695 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:29.694/62.696 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 2 left service DistributedCache with senior member 3
2013-10-26 08:26:29,696 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:29.696/62.698 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member(Id=2, Timestamp=2013-10-26 08:26:29.694, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988) left Cluster with senior member 3
2013-10-26 08:26:30,069 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:30.069/63.071 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member(Id=5, Timestamp=2013-10-26 08:26:29.871, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988) joined Cluster with senior member 3
2013-10-26 08:26:30,271 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:30.271/63.273 Oracle Coherence GE 3.5.1/461 <D5> (thread=TcpRingListener, member=1): TcpRing: connecting to member 5 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/1.1.1..86,port=40285,localport=8088]}
2013-10-26 08:26:30,272 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:30.272/63.274 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 5 joined Service Management with senior member 3
2013-10-26 08:26:30,443 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:30.443/63.445 Oracle Coherence GE 3.5.1/461 <D5> (thread=Cluster, member=1): Member 5 joined Service DistributedCache with senior member 3
2013-10-26 08:26:38,739 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:38.738/71.740 Oracle Coherence GE 3.5.1/461 <D5> (thread=DistributedCache, member=1): Service DistributedCache: received ServiceConfigSync containing 272 entries
2013-10-26 08:26:43,241 -0700 level=ERROR class="STDERR"
2013-10-26 08:26:43.241/76.243 Oracle Coherence GE 3.5.1/461 <Error> (thread=main, member=1): Error while starting service "DistributedCache": com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=2, Name=DistributedCache, Type=DistributedCache
MemberSet=ServiceMemberSet(
OldestMember=Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870)
ActualMemberSet=MemberSet(Size=3, BitSetCount=2
Member(Id=1, Timestamp=2013-10-26 08:26:12.289, Address=1.1.1..85:8088, MachineId=32853, Location=site:test.test.net,machine:testabc304,process:6207, Role=JavaLangThread)
Member(Id=3, Timestamp=2013-10-26 05:16:47.128, Address=1.1.1..49:8088, MachineId=32817, Location=site:test.test.net,machine:testabc30b,process:3870)
Member(Id=5, Timestamp=2013-10-26 08:26:29.871, Address=1.1.1..86:8088, MachineId=32854, Location=site:test.test.net,machine:testabc305,process:3988)
MemberId/ServiceVersion/ServiceJoined/ServiceLeaving
1/3.5/Sat Oct 26 08:26:13 PDT 2013/false,
3/3.5/Sat Oct 26 05:16:47 PDT 2013/false,
5/3.5/Sat Oct 26 08:26:30 PDT 2013/false
    at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:6)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:38)
    at com.tangosol.coherence.component.util.SafeService.startService(SafeService.CDB:28)
    at com.tangosol.coherence.component.util.safeService.SafeCacheService.startService(SafeCacheService.CDB:5)
    at com.tangosol.coherence.component.util.SafeService.ensureRunningService(SafeService.CDB:27)
    at com.tangosol.coherence.component.util.SafeService.start(SafeService.CDB:14)
    at com.tangosol.net.DefaultConfigurableCacheFactory.ensureService(DefaultConfigurableCacheFactory.java:973)
    at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurableCacheFactory.java:842)
    at com.tangosol.net.DefaultConfigurableCacheFactory.configureCache(DefaultConfigurableCacheFactory.java:1053)
    at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurableCacheFactory.java:290)
    at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:747)
    at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:724

Hi
The common causes of communication delays and packet timeouts are excessive GC pauses, high CPU usage, and swapping.
Each of these occurrences may disrupt the Coherence packet processing threads, thus preventing the processing and acknowledgment of packets from other cluster members.
1 check GC performance , see process memory consumption and GC logs.
2 check cpu , vmstat , top command.
3 check swap , vmstat command.
see Oracle Support Doc ID 1110544.1
Although communication delays and packet timeouts can be caused by network related issue.
check performance network :
Performing a Datagram Test for Network Performance - Coherence 3.5 User Guide - Oracle Coherence Knowledge Base
regards,
Leo_TA

Coherence objects not released when undeploying application?

Hi
Part of our continuous integration build is deploying our application to a Weblogic server and running some selenium test cases against it. We have usually about 15 builds running each day.
Our problem is that we seem to be experiencing a memory leak in that process; each new build takes more and more memory until weblogic starts throwing out-of-memory errors (as a workaround we'r restarting weblogic every night).
After spending some time using a profiler (we'r using YourKit Java Profiler) it appears that the calls to Coherence libraries cause references to our objects not being properly released and GC is not able to get rid of them properly. If the calls to the Coherence libraries are removed teh GC correctly releases all our application objects.
To confirm this I have created this simle test scenario:
1. Created an empty web application (war) that has only one class in it:
package com.test;
import javax.servlet.ServletContextEvent;
import javax.servlet.ServletContextListener;
import com.tangosol.net.CacheFactory;
public class TestServletContextListener implements ServletContextListener {
     public void contextDestroyed(ServletContextEvent arg0) {
          CacheFactory.shutdown();
     public void contextInitialized(ServletContextEvent arg0) {}
2. Modified the web.xml to register the ContextListener:
<listener>
<listener-class>com.test.TestServletContextListener</listener-class>
</listener>
3. Created empty EAR that has only two libraries in it: coherence.jar & tangosol.jar (version 3.3.1)
5. included the test web application in the EAR (no other classes, projects or libraries are included; no other configuration settings are adjusted from defaults)
6. While using the profiler I was performing deployemnt and undeployment of teh EAR against the Weblogic server. With each new deployment a new set of com.tangosol* classes is created. Those classes are not released even when GC is forced from the profiler.
I'm not able to attach a screenshot from the profiler to this post, but with each redeployment I can see the following objects created (ordered with number of objects created in a descending order):
com.tangosol.util.Base$ComonMonitor: 1024 new objects
com.tangosol.run.xml.XmlToken: 16 new Objects
com.tangosol.util.ListMap: 11 new Objects
com.tangosol.util.RecyclingLingedList$Node: 5 new Objects
etc.
Am I doing something wrong or is there really a problem with the object references not properly released in Coherence?
thank you
s.

Hello Robert
I have double checked that the two coherence libs don’t exist anywhere on the server class path. I have also tried calling shutdown from preStop instead of postStop but it made no difference.
(Please note that I'm not really looking for the right place to shutdown Coherence; instead I'm trying to demonstrate that calls to Coherence libraries causes object references not to be released properly. I have chosen to use the shutdown method only because it's nicely visible when it's being invoked and also it is something that our application is calling during shutdowns).
To make sure that I'm not overlooking something with the Weblogic setup I have tried it with Tomcat:
1. Downloaded and installed Tomcat 6.0.14
2. Create new WAR project and put coherence.jar & tangosol.jar in WEB-INF/lib
3. Created one class:
package com.test;
import javax.servlet.ServletContextEvent;
import javax.servlet.ServletContextListener;
import com.tangosol.net.CacheFactory;
public class TestServletContextListener implements ServletContextListener {
     public void contextDestroyed(ServletContextEvent arg0) {
          CacheFactory.shutdown();
     public void contextInitialized(ServletContextEvent arg0) {}
4. modified web.xml:
<listener>
     <listener-class>com.test.TestServletContextListener</listener-class>
</listener>
5. deployed the WAR to the tomcat server with profiler connected to it
After that I have used Tomcat Web Application Manager (http://localhost:8088/manager/html/list) to Stop & Start the application repeatedly.
The behavior is the same as on weblogic. Each stop of the application creates another set of tangosol objects that do not get released by GC.
Am I correct when I say that there is some problem with the Coherence libraries causing memory leaks by not allowing GC to release old objects?
Thank you
s.

Restarting Coherence

Similar Messages

Maybe you are looking for