Cluster heartbeat message in Coherence 3.6

Hi,
We recently upgraded to Coherence 3.6 in our production environment. Occasionally in the Coherence cluster, I see the following happening.
2010-08-18 18:27:48.953/28.828 Oracle Coherence GE 3.6.0.0 <Error> (thread=Cluster, member=13): Received cluster heartbeat from the senior Member(Id=1, Timestamp=2010-08-18 18:03:43.927, Address=10.31.151.246:9000, MachineId=33526, Location=site:xxx.com,machine:machine1,process:2665,member:coherence_cache_server-0, Role=cache-server) that does not contain this Member(Id=13, Timestamp=2010-08-18 18:25:20.158, Address=10.30.71.60:8092, MachineId=21308, Location=site:xxx.com,machine:machine2,process:3540,member:CoherenceCommandLineTool, Role=cache-client); stopping cluster service.
Whenever any node for ex: coherence cmd line tries to join the cluster it gets kicked out of the cluster immediately. Nodes on the cluster keep exiting the cluster and joining back. This happens constantly. pasting another log snippet.
2010-08-18 14:23:22.458/-13596.00-214 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=7): Member 5 joined Service Management with senior member 1
2010-08-18 14:23:36.110/-13582.00-562 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=7): Member 5 joined Service DistributedCache with senior member 1
2010-08-18 14:23:37.811/-13580.00-861 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=7): MemberLeft notification for Member(Id=5, Timestamp=2010-08-18 14:23:25.924, Address=10.30.71.60:8092, MachineId=21308, Location=site:xxx.com,machine:machine2,process:5936,member:CoherenceCommandLineTool, Role=cache-client, PublisherSuccessRate=0.9166, ReceiverSuccessRate=1.0, PauseRate=0.0, Threshold=1976, Paused=false, Deferring=false, OutstandingPackets=0, DeferredPackets=0, ReadyPackets=0, LastIn=845ms, LastOut=854ms, LastSlow=n/a) received from Member(Id=2, Timestamp=2010-08-18 12:34:15.068, Address=10.31.151.246:9001, MachineId=33526, Location=site:xxx.com,machine:machine1,process:2667,member:coherence_cache_server-1, Role=cache-server, PublisherSuccessRate=0.8568, ReceiverSuccessRate=0.5934, PauseRate=0.0021, Threshold=1878, Paused=false, Deferring=false, OutstandingPackets=0, DeferredPackets=0, ReadyPackets=0, LastIn=24744ms, LastOut=24753ms, LastSlow=n/a)
2010-08-18 14:23:37.811/-13580.00-861 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=7): Member 5 left service Management with senior member 1
2010-08-18 14:23:37.811/-13580.00-860 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=7): Member 5 left service DistributedCache with senior member 1
Just to give a brief background on our environment, we have 3 linux hosts configured to form the cluster. 2 of them have 2 cache server nodes on each= total of 4 cache server's in the cluster, with about 7 storage disabled client nodes.
Any clues as to why this is happening with cluster? Do we need to configure anything on the cluster? We have all the ports on which the nodes communicate, opened up for udp-tcp/input-output.
Appreciate all help on this.
-Chandini

Here is the log snippet on member 2 for around the same timestamp when member 5 was removed
2010-08-18 14:23:26.269/6588.113 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=2): Member 5 joined Service Management with senior member 1
2010-08-18 14:23:39.922/6601.766 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=2): Member 5 joined Service DistributedCache with senior member 1
2010-08-18 14:23:41.607/6603.451 Oracle Coherence GE 3.6.0.0 <Warning> (thread=Cluster, member=2): Failed to reach address /10.30.71.60 within the IpMonitor timeout. Members [Member(Id=5, Timestamp=2010-08-18 14:23:25.924, Address=10.30.71.60:8092, MachineId=21308, Location=site:xxx.com,machine:machine2,process:5936,member:CoherenceCommandLineTool, Role=cache-client)] are suspect.
2010-08-18 14:23:41.608/6603.452 Oracle Coherence GE 3.6.0.0 <Warning> (thread=Cluster, member=2): Timed-out members MemberSet(Size=1, BitSetCount=2
Member(Id=5, Timestamp=2010-08-18 14:23:25.924, Address=10.30.71.60:8092, MachineId=21308, Location=site:xxx.com,machine:machine2,process:5936,member:CoherenceCommandLineTool, Role=cache-client)
) will be removed.
2010-08-18 14:23:41.608/6603.452 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=2): Member 5 left service Management with senior member 1
2010-08-18 14:23:41.609/6603.453 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=2): Member 5 left service DistributedCache with senior member 1
I will work on getting the logs on the other members for the first message and post it here.
Thanks.

Similar Messages

  • Unicast cluster - heartbeat message failure messages

    Using unicast messaging mode and i see following messages
    ####<Jul 9, 2010 12:46:56 AM PDT> <Info> <Cluster> <anaeur30> <WL10MP2-ServiceSTServer6> <[ACTIVE] ExecuteThread: '45'
    for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1278661616559> <BEA-000112> <Removing WL10M
    P2-ServiceSTServer1 jvmid:6806396782256322086S:anaeur10:[7033,7033,-1,-1,-1,-1,-1]:anaeur10:7033,anaeur10:7035,anaeur2
    0:7033,anaeur20:7035,anaeur30:7033,anaeur30:7035,anaeur50:7033,anaeur50:7035:WL10MP2-ServiceTier:WL10MP2-ServiceSTServ
    er1 from cluster view due to timeout.>
    ####<Jul 9, 2010 12:55:36 AM PDT> <Info> <Cluster> <anaeur30> <WL10MP2-ServiceSTServer6> <[ACTIVE] ExecuteThread: '34'
    for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1278662136552> <BEA-000112> <Removing WL10M
    P2-ServiceSTServer2 jvmid:-2694311272134716565S:anaeur10:[7035,7035,-1,-1,-1,-1,-1]:anaeur10:7033,anaeur10:7035,anaeur
    20:7033,anaeur20:7035,anaeur30:7033,anaeur30:7035,anaeur50:7033,anaeur50:7035:WL10MP2-ServiceTier:WL10MP2-ServiceSTSer
    ver2 from cluster view due to timeout.>
    During the same time frame, I see lost multicast messages on all the instances for a about 20 minutes. What could be the problem? Why am i seeing the multicast messages when using uncast? My config.xml has multicast related entries for each server but how will that be effective? is that an issue? we see servers dropping out frequently from cluster.
    000115> <Lost 1 multicast message(s).>
    ####<Jul 9, 2010 12:46:42 AM PDT> <Info> <Cluster> <anaeur10> <WL10MP2-ServiceSTServer2> <weblogic.cluster.MessageReceiver> <<WLS Kernel>> <> <> <1278661602751> <BEA-000115> <Lost 1 multicast message(s).>
    ####<Jul 9, 2010 12:46:46 AM PDT> <Info> <Cluster> <anaeur10> <WL10MP2-ServiceSTServer2> <weblogic.cluster.MessageReceiver> <<WLS Kernel>> <> <> <1278661606548> <BEA-000115> <Lost 2 multicast message(s).>
    ####<Jul 9, 2010 12:47:04 AM PDT> <Info> <Cluster> <anaeur10> <WL10MP2-ServiceSTServer2> <weblogic.cluster.MessageReceiver> <<WLS Kernel>> <> <> <1278661624185> <BEA-000115> <Lost 2 multicast message(s).>
    ####<Jul 9, 2010 12:48:40 AM PDT> <Info> <Cluster> <anaeur10> <WL10MP2-ServiceSTServer2> <weblogic.cluster.MessageReceiver> <<WLS Kernel>> <> <> <1278661720809> <BEA-000115> <Lost 2 multicast message(s).>
    ####<Jul 9, 2010 12:54:14 AM PDT> <Info> <Cluster> <anaeur10> <WL10MP2-ServiceSTServer2> <weblogic.cluster.MessageReceiver> <<WLS Kernel>> <> <> <1278662054823> <BEA-000115> <Lost 2 multicast message(s).>
    ####<Jul 9, 2010 12:54:14 AM PDT> <Info> <Cluster> <anaeur10> <WL10MP2-ServiceSTServer2> <weblogic.cluster.MessageReceiver> <<WLS Kernel>> <> <> <1278662054827> <BEA-000115> <Lost 1 multicast message(s).>
    ####<Jul 9, 2010 12:54:14 AM PDT> <Info> <Cluster> <anaeur10> <WL10MP2-ServiceSTServer2> <weblogic.cluster.MessageReceiver> <<WLS Kernel>> <> <> <1278662054827> <BEA-000115> <Lost 2 multicast message(s).>

    SJ,
    Thanks, that's perfect explanation i was looking for. We always create cluster from console and it could be that we used MULTICAST messaging mode in past hence the entries in config.xml. What made me to raise the question "will UNICAST or MULTICAST be used" is that when ever we experience a drop out server issue from cluster, i see the following message written into each managed server log. Ideally, the following should be written into log if the multicast messaging mode is in operation, right?
    <weblogic.cluster.MessageReceiver> <<WLS Kernel>> <> <> <1276490260768> <BEA-000115> <Lost 2 multicast message(s).>
    <weblogic.cluster.MessageReceiver> <<WLS Kernel>> <> <> <1276490260768> <BEA-000115> <Lost 2 multicast message(s).>
    <weblogic.cluster.MessageReceiver> <<WLS Kernel>> <> <> <1276490261355> <BEA-000115> <Lost 2 multicast message(s).>
    <weblogic.cluster.MessageReceiver> <<WLS Kernel>> <> <> <1276490261355> <BEA-000115> <Lost 2 multicast message(s).>
    The above message is not written all the time but only when server removed from cluster group. Please be inforemed that i have enable unicast debug mode. will unicast also writes messages as above when hearbeat message lost?
    To trace our issue further, i have to manually remove reference from config.xml and monitor for sometime. its still mystery why the clusters are dropping out. Sometimes, soon after cluster instances dropped out i can see the drop-out frequency as "Rarely" and after a week or so the members are regrouped with difference group leader. Are you aware of any issue with unicast messaging mode in WL10 MP2?
    Is it good idea of testing multicast?
    Thanks a lot for your time.
    -RR

  • Unexpected cluster heartbeat

    I'm seeing the following errors when starting several nodes of a cluster using scripts. This only happens occasionally and works most of the times. Also, there is no problem when the nodes were started manually one by one.
    The cluster consists of 2 hosts running multiple programs (jvm) each as indicated in the log.
    Could some one explain what happened and how to fix it?
    Thanks!
    2010-09-23 19:29:48,161 14496 [Logger@559022270 3.5.3/465] DEBUG Coherence - 2010-09-23 19:29:48.161/14.743 Oracle Coherence GE 3.5.3/465 <D5> (thread=Cluster,member=n/a): Service Cluster joined the cluster with senior service member n/a
    2010-09-23 19:29:48,382 14717 [Logger@9250185 3.5.3/465] INFO Coherence - 2010-09-23 19:29:48.382/14.964 Oracle Coherence GE 3.5.3/465 <Info> (thread=Cluster,member=n/a): This Member(Id=7, Timestamp=2010-09-23 19:29:48.207, Address=10.253.97.133:16001, MachineId=38533, Location=site:mytest.com,machine:host1,process:18482, Role=Program1, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2) joined cluster "test1" with senior Member(Id=12, Timestamp=2010-09-23 19:15:01.264, Address=10.253.97.133:16002, MachineId=38533, Location=site:mytest.com,machine:host1,process:15142, Role=Program2, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2)
    2010-09-23 19:29:49,157 15492 [Logger@9250185 3.5.3/465] WARN Coherence - 2010-09-23 19:29:49.157/15.739 Oracle Coherence GE 3.5.3/465 <Warning> (thread=Cluster, member=n/a): Notifying the senior Member(Id=12, Timestamp=2010-09-23 19:15:01.264, Address=10.253.97.133:16002, MachineId=38533, Location=site:mytest.com,machine:host1,process:15142, Role=Program2) of an unexpected cluster heartbeat from Member(Id=13, Timestamp=2010-09-23 19:15:03.524, Address=10.253.97.134:16001, MachineId=38534, Location=site:mytest.com,machine:host2,process:24439, Role=Program2)
    2010-09-23 19:29:57,729 24064 [Logger@9250185 3.5.3/465] WARN Coherence - 2010-09-23 19:29:57.729/24.311 Oracle Coherence GE 3.5.3/465 <Warning> (thread=Cluster, member=n/a): Notifying the senior Member(Id=12, Timestamp=2010-09-23 19:15:01.264, Address=10.253.97.133:16002, MachineId=38533, Location=site:mytest.com,machine:host1,process:15142, Role=Program2) of an unexpected cluster heartbeat from Member(Id=14, Timestamp=2010-09-23 19:15:16.618, Address=10.253.97.133:16003, MachineId=38533, Location=site:mytest.com,machine:host1,process:15734, Role=Program3)
    2010-09-23 19:30:13,808 40143 [Logger@9250185 3.5.3/465] WARN Coherence - 2010-09-23 19:30:13.807/40.390 Oracle Coherence GE 3.5.3/465 <Warning> (thread=Cluster, member=n/a): Notifying the senior Member(Id=12, Timestamp=2010-09-23 19:15:01.264, Address=10.253.97.133:16002, MachineId=38533, Location=site:mytest.com,machine:host1,process:15142, Role=Program2) of an unexpected cluster heartbeat from Member(Id=15, Timestamp=2010-09-23 19:15:26.421, Address=10.253.97.134:16002, MachineId=38534, Location=site:mytest.com,machine:host2,process:24991, Role=Program3)
    2010-09-23 19:30:18,453 44788 [Logger@9250185 3.5.3/465] ERROR Coherence - 2010-09-23 19:30:18.453/45.035 Oracle Coherence GE 3.5.3/465 <Error> (thread=main, member=n/a): Error while starting cluster: com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=0, Name=Cluster, Type=Cluster
    MemberSet=ServiceMemberSet(
    OldestMember=n/a
    ActualMemberSet=MemberSet(Size=2, BitSetCount=2
    Member(Id=7, Timestamp=2010-09-23 19:29:48.207, Address=10.253.97.133:1600
    1, MachineId=38533, Location=site:mytest.com,machine:host1,process:18482, Role=Program1)
    Member(Id=12, Timestamp=2010-09-23 19:15:01.264, Address=10.253.97.133:160
    02, MachineId=38533, Location=site:mytest.com,machine:host1,process:15142, Role=Program2)
    MemberId/ServiceVersion/ServiceJoined/ServiceLeaving
    7/3.5/Thu Sep 23 19:29:48 UTC 2010/false,
    12/3.5/Thu Sep 23 19:15:01 UTC 2010/false
    at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:6)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
    at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:38)
    at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:366)
    at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
    at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
    at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:7)
    at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:27)
    at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
    at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:998)
    at com.tangosol.net.DefaultConfigurableCacheFactory.ensureService(DefaultConfigurableCacheFactory.java:915)
    at com.oracle.coherence.environment.extensible.ExtensibleEnvironment.ensureService(ExtensibleEnvironment.java:374)
    at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurableCacheFactory.java:877)
    at com.tangosol.net.DefaultConfigurableCacheFactory.configureCache(DefaultConfigurableCacheFactory.java:1088)
    at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurableCacheFactory.java:304)
    at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:735)
    at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:712)
    2010-09-23 19:30:18,456 44791 [Logger@9250185 3.5.3/465] ERROR Coherence - 2010-09-23 19:30:18.456/45.038 Oracle Coherence GE 3.5.3/465 <Error> (thread=Cluster, member=n/a): validatePolls: This service timed-out due to unanswered handshake request. Manual intervention is required to stop the members that have not responded to this Poll
    PollId=1, active
    InitTimeMillis=1285270188378
    Service=Cluster (0)
    RespondedMemberSet=[]
    LeftMemberSet=[]
    RemainingMemberSet=[12]
    Edited by: user10049765 on Oct 7, 2010 2:23 PM

    Any suggestion?

  • Sending heartbeat messages

    Can anyone post sample code to send heartbeat messages from client to server. Need urgenltly.
    thanks in advance

    Most of the code is here. There are parts before and after the snipits - but this should be enough to get the point. I have not included the class that is getting serialized. However, it could be whatever you want it to be (doesn't matter what the contents of the class/object is).
    source system:
    try {
    | NON-SSL connection |
    if (!sslConnection) {
    mySocket = new Socket(serverName, communicationPort);
    oout = new ObjectOutputStream(mySocket.getOutputStream());
    oin = new ObjectInputStream(mySocket.getInputStream());
    | SSL connection |
    else {
    sslFact = (SSLSocketFactory) SSLSocketFactory.getDefault();
    mySSLSocket = (SSLSocket) sslFact.createSocket(serverName, communicationPort);
    oout = new ObjectOutputStream(mySSLSocket.getOutputStream());
    oin = new ObjectInputStream(mySSLSocket.getInputStream());
    heartBeatMessage = new DmiHeartBeatMessage(destinationName, serverName);
    myMessage = new CommunicationMessage(heartBeatMessage);
    oout.writeObject(myMessage);
    oout.flush();
    while (true) {
    incomingMessage = (CommunicationMessage)oin.readObject();
    if (incomingMessage.getMessageText().equals("Request Complete"))
    connectionClosed = true;
    catch (Exception e) {
    if (connectionClosed)
    executerLogger.writeToLog("Remote agent available.", true, true);
    else
    executerLogger.writeToLog("Remote agent not available.", true, true);
    target:
    | Non-SSL connection |
    if (communicationSocket != null) {
    agentInfo.getAgentLogger().writeToLog("Received a NON-SSL connection request...", true, true);
    socketInput = communicationSocket.getInputStream();
    socketOutput = communicationSocket.getOutputStream();
    | SSL connection |
    else {
    agentInfo.getAgentLogger().writeToLog("Received a SSL connection request...", true, true);
    socketInput = sslCommunicationSocket.getInputStream();
    socketOutput = sslCommunicationSocket.getOutputStream();
    oout = new ObjectOutputStream(socketOutput);
    oin = new ObjectInputStream(socketInput);
    | Continue to read messages until the client disconnects. |
    incomingMessage = (CommunicationMessage)oin.readObject();
    | Someone has requested to tail the agent log file. |
    if (incomingMessage.getObject() instanceof DmiHeartBeatMessage)
    agentInfo.getAgentLogger().writeToLog("Heartbeat received from Deployment Executer: " +
    ((DmiHeartBeatMessage)incomingMessage.getObject()).getDestinationName() + ", " +
    ((DmiHeartBeatMessage)incomingMessage.getObject()).getServerName() + ".", true, true);
    sendMessageToClient("Request Complete", false);
    ...

  • Compressor Cluster - Error message when attaching .scc caption files

    Hello,
    We have a 3 XServer Cluster controlled by a 4'th XServer (Our FCServer machine). My workflow is:
    Source Video: 1920X1080 ProRes Video (28:30min)
    Resized to 640X360 ProRes LT (also de interlaced and some black restore and sharpening applied here)
    Encoded 640X360 to H.264 at 750Kb.Sec - .scc files defines in "Additional Information" tab in Compressor at this point.
    This job is submitted to the cluster. My submitting machine as well as all cluster machines are all connected to the same fiber network. All files are on the same XSAN.
    I am getting the following error message. I get it after it has tried to encode the video:
    Status: Failed - 5x HOST [fcsqm2.local] error: Failed to add CC to movie: -50
    note: fcsqm2 is one of the encoding machines in the cluster.
    I can't seem to find any answers via google. Anyone got any suggestions where I can look? Any ideas?
    Thanks a lot!
    Nathan

    {Ctrl + Shft + J} - any messages in the Error Console, relating to that?

  • O-Cluster Errror Messages

    Hello,
    Out team is running the O-Cluster algorithm in ODM (10G R2). During training, we are getting the following error message. Our models train using the K-Means algorithm and we can train small, trivial models using O-Cluster, but something about our data, I'm assuming, it doesn't like:
    ORA-40101: Data Mining System Error ODM_OC_CLUSTERING_MODEL-BUILD_OC.build_ocluster--20010
    ORA-06512: at "SYS.DBMS_SYS_ERROR", line 105
    ORA-06512: at "DMSYS.ODM_OC_CLUSTERING_MODEL", line 122
    ORA-06512: at "DMSYS.ODM_OC_CLUSTERING_MODEL", line 2408
    ORA-40101: Data Mining System Error ODM_OC_CLUSTERING_MODEL-BUILD_OC.ocluster--20010
    ORA-06512: at "SYS.DBMS_SYS_ERROR", line 105
    ORA-06512: at "DMSYS.ODM_OC_CLUSTERING_MODEL", line 122
    ORA-06512: at "DMSYS.ODM_OC_CLUSTERING_MODEL", line 2312
    ORA-06500: PL/SQL:
    Any ideas?
    Thanks,
    Chad

    Hi Chad,
    Sorry but there is not enough to go on with the error message.
    Are you running ODM 10.2.0.3?
    Did you invoke model build using ODMr?
    We might need to have a test case to run to understand why the failure is taking place.
    Have you ever worked with Oracle Support to file a problem report.
    They provide a means for development to access data from a client.
    Thanks, Mark

  • Cluster error message ????

    I am running 2 WL servers in a cluster on two separate SUN Solaris
              machines not using a shared file system and using NES plugin. The
              properties file are exactly the same. The cluster comes up ok. I am
              using a simple counter servlet to test the clustering. First time the
              primary server updates seconday just fine.
              I take the primary down and reload the servlet a few times. Things work
              fine with the message
              <RepMan> updateSecondary called on unpaired primary
              Then I bring back the primary server that I had killed. Try to reload
              the servlet. I get the following messages
              "<RepMan> getRepMan unable to obtain ReplicationManager (for
              id+ipaddress - where id is WL generated appended to IP of the machine).
              .[7001,7001,7002,7002,-1] " in the new primary server.
              "Unable to to create secondary for (id - WL generated id for the
              server)"
              Has anyone come across these messages in their logs when running a WL
              cluster (http session) on separate boxes w/o a shared file system???
              I have tested multicast and seems to be ok for both machines. I have
              added 3rd machine in cluster and I still get the same results.
              

    Prasad,
              I did not see any comments on in-memory replication related issues in SP7.
              Do you have any information on when should we expect a service pack dealing
              with these issues.
              Thanks
              Vlad
              Prasad Peddada wrote:
              > This has been identified as a bug. We will fix this in the next service
              > pack.
              >
              > -- Prasad
              >
              > Junaid Hossain wrote:
              >
              > > I am running 2 WL servers in a cluster on two separate SUN Solaris
              > > machines not using a shared file system and using NES plugin. The
              > > properties file are exactly the same. The cluster comes up ok. I am
              > > using a simple counter servlet to test the clustering. First time the
              > > primary server updates seconday just fine.
              > > I take the primary down and reload the servlet a few times. Things work
              > > fine with the message
              > > <RepMan> updateSecondary called on unpaired primary
              > >
              > > Then I bring back the primary server that I had killed. Try to reload
              > > the servlet. I get the following messages
              > > "<RepMan> getRepMan unable to obtain ReplicationManager (for
              > > id+ipaddress - where id is WL generated appended to IP of the machine).
              > > .[7001,7001,7002,7002,-1] " in the new primary server.
              > >
              > > "Unable to to create secondary for (id - WL generated id for the
              > > server)"
              > >
              > > Has anyone come across these messages in their logs when running a WL
              > > cluster (http session) on separate boxes w/o a shared file system???
              > > I have tested multicast and seems to be ok for both machines. I have
              > > added 3rd machine in cluster and I still get the same results.
              [vlad.vcf]
              

  • Compressor 3 "no cluster found" message - EASY FIX

    Greetings,
    I have had to install compressor 3 times before I finally found this easy fix. In my case Compressor would stop working when a video would get "stuck" in the batch and just go on forever. From then on the batch would always say "no cluster found", and if I tried to submit a batch it would say something like cluster no found.
    So how did I fix it without the dreaded delete everything and reinstall? I saw a post that said to check that your sharing name matches your cluster name in the qmaster system preferences. They did not exactly match. The name in sharing was "My Mac" and the name in qmaster was "My Mac Cluster" I changed it to match the sharing "My Mac" and hit the start sharing button. I reset my Mac (no sure if that's needed) and then opened the batch monitor. Instead of "no cluster found" it now said "My Mac" and when I submitted a batch in compressor the "This Computer" showed up again, and it stared working again!
    Not sure if this works in all cases. But I hope this post helps someone else from having to reinstall everything, AND maybe apple can read this to help figure out what the problem is with their software.

    i had the usual "no clusters found" in batch monitor, and "this computer" did not appear in batch monitor. so, when trying to submit a batch from compressor, i got the "unable to submit batch / retart your computer" error message.
    so for 2 days i tried really EVERY METHOD i found in all blogs, threads, posts, discussion forums to remove / reinstall / make work compressor.
    here what i tried:
    fcs remover.app / completely reinstall fcp from scratch
    (http://www.digitalrebellion.com/fcs_remover.htm)
    partially delete / reinstall compressor / qmaster only via standard install
    (http://docs.info.apple.com/article.html?artnum=302845)
    partially delete / reinstall compressor / qmaster only via pacifist install
    http://www.scottsimmons.tv/blog/2008/01/11/compressor-hatred-resolved/
    http://www.charlessoft.com/
    NOTHING WORKED. before completely re-installing my entire OS, i found this post here AND IT WORKS - at leat on my machine
    i hope this will save some of you sleepless nights
    regards
    ivan

  • WLS Cluster with Message Driven Beans and MQSeries on more than one Host

              With the Examples of http://developer.bea.com/jmsproviders.jsp and http://developer.bea.com/jmsmdb.jsp
              a MDB can be
              configured to work with MQSeries with one WLS Server. This works only, if a Queuemanager
              is started at the same Host that runs the WLS Server too.
              And the QueueConnectionFactory (QCF) is configured to TRANSPORT(BIND).
              In my configuration should be two WLS Servers and one JMS Queue (MQS) with the
              Queuemanager.
              A Message Driven Bean is deployed on both WLS Servers wich should get the Messages
              of this Queue.
              If one of the two WLS Servers fails the other WLS Server with the corresponding
              MDB should get the Messages of the
              MQSeries Queue.
              If the QCF is configured to TRANSPORT(Client) the Message Driven Bean can't start
              and the following Exception is thrown:
              <Jul 18, 2001 3:52:49 PM CEST> <Error> <J2EE> <Error deploying EJB Component :
              mdb_deployed
              weblogic.ejb20.EJBDeploymentException: Error deploying Message-Driven EJB:; nested
              exception is:
              javax.jms.JMSException: MQJMS2005: failed to create MQQueueManager for
              'btsun1a:TEST'
              javax.jms.JMSException: MQJMS2005: failed to create MQQueueManager for 'btsun1a:TEST'
              at com.ibm.mq.jms.services.ConfigEnvironment.newException(ConfigEnvironment.java:434)
              I'm wondering, because their is a MQQueueManager on btsun1a; all Servers throws
              the same Exception when the MDB is deployed.
              The configuration of JMSadmin on both Hosts is the following:
              dis qcf(myQCF2)
              HOSTNAME(btsun1a)
              CCSID(819)
              TRANSPORT(CLIENT)
              PORT(1414)
              TEMPMODEL(SYSTEM.DEFAULT.MODEL.QUEUE)
              QMANAGER(TEST)
              CHANNEL(JAVA.CHANNEL)
              VERSION(1)
              dis q(myQueue)
              CCSID(819)
              PERSISTENCE(APP)
              TARGCLIENT(JMS)
              QUEUE(MYQUEUE)
              EXPIRY(APP)
              QMANAGER(TEST)
              ENCODING(NATIVE)
              VERSION(1)
              PRIORITY(APP)
              I think only TRANSPORT(CLIENT) can be used when i don't wan't to install a Queue
              and a QueueManager on each WLS Server.
              Does anybody know a problem of WLS 6.0 SP2 to cope with TRANSPORT(CLIENT)?
              

              With the Examples of http://developer.bea.com/jmsproviders.jsp and http://developer.bea.com/jmsmdb.jsp
              a MDB can be
              configured to work with MQSeries with one WLS Server. This works only, if a Queuemanager
              is started at the same Host that runs the WLS Server too.
              And the QueueConnectionFactory (QCF) is configured to TRANSPORT(BIND).
              In my configuration should be two WLS Servers and one JMS Queue (MQS) with the
              Queuemanager.
              A Message Driven Bean is deployed on both WLS Servers wich should get the Messages
              of this Queue.
              If one of the two WLS Servers fails the other WLS Server with the corresponding
              MDB should get the Messages of the
              MQSeries Queue.
              If the QCF is configured to TRANSPORT(Client) the Message Driven Bean can't start
              and the following Exception is thrown:
              <Jul 18, 2001 3:52:49 PM CEST> <Error> <J2EE> <Error deploying EJB Component :
              mdb_deployed
              weblogic.ejb20.EJBDeploymentException: Error deploying Message-Driven EJB:; nested
              exception is:
              javax.jms.JMSException: MQJMS2005: failed to create MQQueueManager for
              'btsun1a:TEST'
              javax.jms.JMSException: MQJMS2005: failed to create MQQueueManager for 'btsun1a:TEST'
              at com.ibm.mq.jms.services.ConfigEnvironment.newException(ConfigEnvironment.java:434)
              I'm wondering, because their is a MQQueueManager on btsun1a; all Servers throws
              the same Exception when the MDB is deployed.
              The configuration of JMSadmin on both Hosts is the following:
              dis qcf(myQCF2)
              HOSTNAME(btsun1a)
              CCSID(819)
              TRANSPORT(CLIENT)
              PORT(1414)
              TEMPMODEL(SYSTEM.DEFAULT.MODEL.QUEUE)
              QMANAGER(TEST)
              CHANNEL(JAVA.CHANNEL)
              VERSION(1)
              dis q(myQueue)
              CCSID(819)
              PERSISTENCE(APP)
              TARGCLIENT(JMS)
              QUEUE(MYQUEUE)
              EXPIRY(APP)
              QMANAGER(TEST)
              ENCODING(NATIVE)
              VERSION(1)
              PRIORITY(APP)
              I think only TRANSPORT(CLIENT) can be used when i don't wan't to install a Queue
              and a QueueManager on each WLS Server.
              Does anybody know a problem of WLS 6.0 SP2 to cope with TRANSPORT(CLIENT)?
              

  • The panic protocol and Coherence 3.5

    All,
    We just upgraded from 3.3.1 to 3.5 but I'm having trouble forming a cluster in multi-server environments. Our config files were developed against older versions of Coherence and I had a lot of trouble with them at first, some of which is detailed here: Config file problem with new Coherence 3.5
    The problem now is that we have 2 standalone nodes and 2 application nodes (WebLogic) spread across 2 physical servers (1 standalone and 1 application on each box.) Previously (Coherence 3.3.1,) they all formed one happy cluster of 4 members. Now (Coherence 3.5,) they form separate clusters: each physical machine makes a cluster of 2 members. At startup, I can see the 2-node clusters form. Some time later (not immediately) I see the "unexpected cluster heartbeat" message warning about getting a heartbeat from the other physical server. Clearly the members of the different servers can communicate to some degree if they get these unexpected heartbeats. But why don't they form a cluster in the first place?
    If I understand the config correctly, we're using a ttl of 4, the default. I ran the multicast test and a ttl of 1 worked also. I think the join timeout is 30000.
    When the standalone node starts, it outputs a ttl of 4 and the expected cluster address and port to the log.
    One wrinkle in the config is that there are 2 applications deployed to the same weblogic jvm that both use Coherence. They are in separate classloaders and use unique cluster ports. This hasn't been a problem in the past. Now, however, my app is Coherence 3.5 and the other one is still 3.3.1. The Coherence jars are not shared and the startup params apply to both applications.
    In the past I've seen errors where 2 nodes weren't using the same coherence version, same cluster name, etc. but I don't see anything like that now.
    thanks
    john

    Hi John,
    The clustering technologies did not change between 3.3 and 3.5. The fact that you could establish a multicast best cluster in 3.3 and not in 3.5 is therefor quite odd. My initial guess would be that your network may be blocking certain multicast address/port ranges? Are you using the same multicast address and port as you'd successfully used in 3.3? Also please use this address and port when running the multicast test to make it as close as possible to the medium on which coherence is trying to operate.
    If none of these suggestions resolves the issue, can you please post the following:
    - multicast test output from all nodes running the test concurrently
    - coherence logs from all nodes, including startup, and panic
    - coherence operational configuration
    Regarding the mix of Coherence 3.3 and 3.5 in the same JVM. So long as they are classloader isolated and running on a different multicast address/port you should be fine. Note I'm suggesting that both the address and the port be different. Some OSs (Linux) has issues related to not taking the port into consideration during multicast packet delivery. It wouldn't hurt to try starting 3.5 without the 3.3 app running, just to ensure that it isn't causing your troubles in some unforeseen way.
    thanks,
    Mark
    Oracle Coherence

  • Issue to setup local Coherence cluster with WKA (well-known-address)

    Hello - I have started local coherence cluster using WKA with single node,but when I start CacheFactory (coherence.cmd) with same configuration it throws following error message.
    Any help is appricicated.
    JVM startup Arrgument
    -Dtangosol.coherence.override=cluster.xml
    cluster.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <coherence xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config http://xmlns.oracle.com/coherence/coherence-operational-config/1.1/coherence-operational-config.xsd">
    <cluster-config>
      <unicast-listener>
       <well-known-addresses>
        <socket-address id="1">
         <address>171.193.103.25</address>
         <port>8088</port>
        </socket-address>
       </well-known-addresses>
          </unicast-listener>
    </cluster-config>
    <logging-config>
      <destination>stdout</destination>
      <severity-level>9</severity-level>
    </logging-config>
    </coherence>
    Cluster startup Message
    WellKnownAddressList(Size=1,
      WKA{Address=171.193.103.25, Port=8088}
    MasterMemberSet(
      ThisMember=Member(Id=1, Timestamp=2013-10-24 11:07:18.603, Address=171.193.103.25:8088, MachineId=9041, Location=site:,machine:FD4C9EF534D5D,process:16704, Role=CoherenceServer)
      OldestMember=Member(Id=1, Timestamp=2013-10-24 11:07:18.603, Address=171.193.103.25:8088, MachineId=9041, Location=site:,machine:FD4C9EF534D5D,process:16704, Role=CoherenceServer)
      ActualMemberSet=MemberSet(Size=1
        Member(Id=1, Timestamp=2013-10-24 11:07:18.603, Address=171.193.103.25:8088, MachineId=9041, Location=site:,machine:FD4C9EF534D5D,process:16704, Role=CoherenceServer)
      MemberId|ServiceVersion|ServiceJoined|MemberState
        1|3.7.1|2013-10-24 11:07:48.843|JOINED
      RecycleMillis=1200000
      RecycleSet=MemberSet(Size=0
    TcpRing{Connections=[]}
    IpMonitor{AddressListSize=0}
    2013-10-24 11:07:48.869/31.794 Oracle Coherence GE 3.7.1.0 <D5> (thread=Invocation:Management, member=1): Service Management joined the cluster with senior service member 1
    2013-10-24 11:07:49.058/31.983 Oracle Coherence GE 3.7.1.0 <D5> (thread=DistributedCache, member=1): Service DistributedCache joined the cluster with senior service member 1
    2013-10-24 11:07:49.077/32.002 Oracle Coherence GE 3.7.1.0 <D6> (thread=DistributedCache, member=1): Service DistributedCache: sending PartitionConfig ConfigSync to all
    2013-10-24 11:07:49.121/32.046 Oracle Coherence GE 3.7.1.0 <D5> (thread=ReplicatedCache, member=1): Service ReplicatedCache joined the cluster with senior service member 1
    2013-10-24 11:07:49.128/32.053 Oracle Coherence GE 3.7.1.0 <D5> (thread=OptimisticCache, member=1): Service OptimisticCache joined the cluster with senior service member 1
    2013-10-24 11:07:49.131/32.056 Oracle Coherence GE 3.7.1.0 <D5> (thread=Invocation:InvocationService, member=1): Service InvocationService joined the cluster with senior service member 1
    2013-10-24 11:07:49.132/32.057 Oracle Coherence GE 3.7.1.0 <Info> (thread=main, member=1):
    Services
      ClusterService{Name=Cluster, State=(SERVICE_STARTED, STATE_JOINED), Id=0, Version=3.7.1, OldestMemberId=1}
      InvocationService{Name=Management, State=(SERVICE_STARTED), Id=1, Version=3.1, OldestMemberId=1}
      PartitionedCache{Name=DistributedCache, State=(SERVICE_STARTED), LocalStorage=enabled, PartitionCount=257, BackupCount=1, AssignedPartitions=257, BackupPartitions=0}
      ReplicatedCache{Name=ReplicatedCache, State=(SERVICE_STARTED), Id=3, Version=3.0, OldestMemberId=1}
      Optimistic{Name=OptimisticCache, State=(SERVICE_STARTED), Id=4, Version=3.0, OldestMemberId=1}
      InvocationService{Name=InvocationService, State=(SERVICE_STARTED), Id=5, Version=3.1, OldestMemberId=1}
    Started DefaultCacheServer...
    Error Message from CacheFactory
    C:\Users\Zk5rjg8>C:\coherence37\bin\coherence.cmd
    ** Starting storage disabled console **
    java version "1.6.0_51"
    Java(TM) SE Runtime Environment (build 1.6.0_51-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01, mixed mode)
    2013-10-24 11:13:22.851/0.392 Oracle Coherence 3.7.1.0 <Info> (thread=main, member=n/a): Loaded operational configuration from "jar:file:/C:/coherence37/lib/coherence.jar!/tangosol-coherence.xml"
    2013-10-24 11:13:22.920/0.462 Oracle Coherence 3.7.1.0 <Info> (thread=main, member=n/a): Loaded operational overrides from "file:/C:/coherence37/cluster.xml"
    2013-10-24 11:13:22.924/0.465 Oracle Coherence 3.7.1.0 <D5> (thread=main, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified
    2013-10-24 11:13:22.924/0.465 Oracle Coherence 3.7.1.0 <D6> (thread=main, member=n/a): Loaded edition data from "jar:file:/C:/coherence37/lib/coherence.jar!/coherence-grid.xml"
    Oracle Coherence Version 3.7.1.0 Build 27797
    Grid Edition: Development mode
    Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.
    2013-10-24 11:13:23.722/1.263 Oracle Coherence GE 3.7.1.0 <D4> (thread=main, member=n/a): TCMP bound to /171.193.103.25:8090 using SystemSocketProvider
    2013-10-24 11:13:54.001/31.542 Oracle Coherence GE 3.7.1.0 <Warning> (thread=Cluster, member=n/a): This Member(Id=0, Timestamp=2013-10-24 11:13:23.762, Address=171.193.103.25:8090, MachineId=9041, Location=site:,machine:FD4C9EF534D5D,process:17192, Role=CoherenceConsole) has been attempting to joi
    2013-10-24 11:13:54.001/31.542 Oracle Coherence GE 3.7.1.0 <Warning> (thread=Cluster, member=n/a): Delaying formation of a new cluster; waiting for well-known nodes to respond
    2013-10-24 11:14:24.402/61.943 Oracle Coherence GE 3.7.1.0 <Warning> (thread=Cluster, member=n/a): Delaying formation of a new cluster; waiting for well-known nodes to respond
    2013-10-24 11:14:54.805/92.346 Oracle Coherence GE 3.7.1.0 <Warning> (thread=Cluster, member=n/a): Delaying formation of a new cluster; waiting for well-known nodes to respond
    2013-10-24 11:15:25.207/122.748 Oracle Coherence GE 3.7.1.0 <Warning> (thread=Cluster, member=n/a): Delaying formation of a new cluster; waiting for well-known nodes to respond
    2013-10-24 11:15:55.610/153.151 Oracle Coherence GE 3.7.1.0 <Warning> (thread=Cluster, member=n/a): Delaying formation of a new cluster; waiting for well-known nodes to respond
    2013-10-24 11:16:26.012/183.553 Oracle Coherence GE 3.7.1.0 <Warning> (thread=Cluster, member=n/a): Delaying formation of a new cluster; waiting for well-known nodes to respond
    2013-10-24 11:16:56.414/213.955 Oracle Coherence GE 3.7.1.0 <Warning> (thread=Cluster, member=n/a): Delaying formation of a new cluster; waiting for well-known nodes to respond
    2013-10-24 11:17:26.817/244.358 Oracle Coherence GE 3.7.1.0 <Warning> (thread=Cluster, member=n/a): Delaying formation of a new cluster; waiting for well-known nodes to respond
    2013-10-24 11:17:57.219/274.760 Oracle Coherence GE 3.7.1.0 <Warning> (thread=Cluster, member=n/a): Delaying formation of a new cluster; waiting for well-known nodes to respond
    2013-10-24 11:17:58.271/275.812 Oracle Coherence GE 3.7.1.0 <Error> (thread=Cluster, member=n/a): Detected soft timeout) of {WrapperGuardable Guard{Daemon=IpMonitor} Service=ClusterService{Name=Cluster, State=(SERVICE_STARTED, STATE_ANNOUNCE), Id=0, Version=3.7.1}}
    2013-10-24 11:17:58.273/275.814 Oracle Coherence GE 3.7.1.0 <Error> (thread=Recovery Thread, member=n/a): Full Thread Dump
    Thread[PacketListener1,8,Cluster]
            java.net.PlainDatagramSocketImpl.receive0(Native Method)
            java.net.PlainDatagramSocketImpl.receive(Unknown Source)
            java.net.DatagramSocket.receive(Unknown Source)
            com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:22)
            com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:1)
            com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:20)
            com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
            java.lang.Thread.run(Unknown Source)
    Thread[PacketReceiver,7,Cluster]
            java.lang.Object.wait(Native Method)
            com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
            com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketReceiver.onWait(PacketReceiver.CDB:2)
            com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
            java.lang.Thread.run(Unknown Source)
    Thread[Attach Listener,5,system]
    Thread[PacketPublisher,6,Cluster]
            java.lang.Object.wait(Native Method)
            com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
            com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher.onWait(PacketPublisher.CDB:2)
            com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
            java.lang.Thread.run(Unknown Source)
    Thread[Cluster|STATE_ANNOUNCE|Member(Id=0, Timestamp=2013-10-24 11:13:23.762, Address=171.193.103.25:8090, MachineId=9041, Location=site:,machine:FD4C9EF534D5D,process:17192, Role=CoherenceConsole),5,Cluster]
            sun.nio.ch.WindowsSelectorImpl$SubSelector.poll0(Native Method)
            sun.nio.ch.WindowsSelectorImpl$SubSelector.poll(Unknown Source)
            sun.nio.ch.WindowsSelectorImpl$SubSelector.access$400(Unknown Source)
            sun.nio.ch.WindowsSelectorImpl.doSelect(Unknown Source)
            sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
            sun.nio.ch.SelectorImpl.select(Unknown Source)
            com.tangosol.coherence.component.net.TcpRing.select(TcpRing.CDB:11)
            com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ClusterService.onWait(ClusterService.CDB:6)
            com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
            java.lang.Thread.run(Unknown Source)
    Thread[Reference Handler,10,system]
            java.lang.Object.wait(Native Method)
            java.lang.Object.wait(Object.java:485)
            java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
    Thread[Finalizer,8,system]
            java.lang.Object.wait(Native Method)
            java.lang.ref.ReferenceQueue.remove(Unknown Source)
            java.lang.ref.ReferenceQueue.remove(Unknown Source)
            java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
    Thread[Signal Dispatcher,9,system]
    Thread[PacketSpeaker,8,Cluster]
            java.lang.Object.wait(Native Method)
            com.tangosol.coherence.component.util.queue.ConcurrentQueue.waitForEntry(ConcurrentQueue.CDB:16)
            com.tangosol.coherence.component.util.queue.ConcurrentQueue.remove(ConcurrentQueue.CDB:7)
            com.tangosol.coherence.component.util.Queue.remove(Queue.CDB:1)
            com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketSpeaker.onNotify(PacketSpeaker.CDB:21)
            com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
            java.lang.Thread.run(Unknown Source)
    Thread[Logger@1457155060 3.7.1.0,3,main]
            java.lang.Object.wait(Native Method)
            com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
            com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
            java.lang.Thread.run(Unknown Source)
    Thread[PacketListener1P,8,Cluster]
            java.net.PlainDatagramSocketImpl.receive0(Native Method)
            java.net.PlainDatagramSocketImpl.receive(Unknown Source)
            java.net.DatagramSocket.receive(Unknown Source)
            com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:22)
            com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:1)
            com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:20)
            com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
            java.lang.Thread.run(Unknown Source)
    Thread[main,5,main]
            java.lang.Object.wait(Native Method)
            com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:18)
            com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
            com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:56)
            com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
            com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
            com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:10)
            com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:26)
            com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
            com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:427)
            com.tangosol.coherence.component.application.console.Coherence.run(Coherence.CDB:25)
            com.tangosol.coherence.component.application.console.Coherence.main(Coherence.CDB:3)
            sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
            sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
            java.lang.reflect.Method.invoke(Unknown Source)
            com.tangosol.net.CacheFactory.main(CacheFactory.java:827)
    Thread[Recovery Thread,5,Cluster]
            java.lang.Thread.dumpThreads(Native Method)
            java.lang.Thread.getAllStackTraces(Unknown Source)
            com.tangosol.net.GuardSupport.logStackTraces(GuardSupport.java:810)
            com.tangosol.internal.net.cluster.DefaultServiceFailurePolicy.onGuardableRecovery(DefaultServiceFailurePolicy.java:44)
            com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$WrapperGuardable.recover(Grid.CDB:1)
            com.tangosol.net.GuardSupport$Context$1.run(GuardSupport.java:653)
            java.lang.Thread.run(Unknown Source)
    2013-10-24 11:17:58.273/275.814 Oracle Coherence GE 3.7.1.0 <Warning> (thread=Recovery Thread, member=n/a): Attempting recovery of Guard{Daemon=IpMonitor}
    Exception in thread "main" 2013-10-24 11:18:24.025/301.566 Oracle Coherence GE 3.7.1.0 <Error> (thread=main, member=n/a): Error while starting cluster: com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=0, Name=Cluster, Type=Cluster
      MemberSet=MasterMemberSet(
        ThisMember=null
        OldestMember=null
        ActualMemberSet=MemberSet(Size=0
        MemberId|ServiceVersion|ServiceJoined|MemberState
        RecycleMillis=1200000
        RecycleSet=MemberSet(Size=0
            at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:3)
            at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
            at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
            at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:56)
            at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
            at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
            at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:10)
            at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:26)
            at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
            at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:427)
            at com.tangosol.coherence.component.application.console.Coherence.run(Coherence.CDB:25)
            at com.tangosol.coherence.component.application.console.Coherence.main(Coherence.CDB:3)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
            at java.lang.reflect.Method.invoke(Unknown Source)
            at com.tangosol.net.CacheFactory.main(CacheFactory.java:827)
    java.lang.reflect.InvocationTargetException
    2013-10-24 11:18:24.025/301.566 Oracle Coherence GE 3.7.1.0 <D5> (thread=Cluster, member=n/a): Service Cluster left the cluster at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
            at java.lang.reflect.Method.invoke(Unknown Source)
            at com.tangosol.net.CacheFactory.main(CacheFactory.java:827)
    Caused by: com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=0, Name=Cluster, Type=Cluster
      MemberSet=MasterMemberSet(
        ThisMember=null
        OldestMember=null
        ActualMemberSet=MemberSet(Size=0
        MemberId|ServiceVersion|ServiceJoined|MemberState
        RecycleMillis=1200000
        RecycleSet=MemberSet(Size=0
            at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:3)
            at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
            at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
            at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:56)
            at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
            at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
            at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:10)
            at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:26)
            at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
            at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:427)
            at com.tangosol.coherence.component.application.console.Coherence.run(Coherence.CDB:25)
            at com.tangosol.coherence.component.application.console.Coherence.main(Coherence.CDB:3)
            ... 5 more
    C:\Users\Zk5rjg8>

    Hi SajeevPynadath
    1
    First start the server process  with "cache-server.cmd"
    2
    After that you can start another server or client process,  the "coherence.cmd" script is to start a client process to join the cluster .
    3
    Then now you have 2 processes , and your cluster.xml will look like this :
    <socket-address id="serverprocess">
         <address>171.193.103.25</address>
         <port>8088</port>
        </socket-address>
    <socket-address id="clienprocess">
         <address>171.193.103.25</address>
         <port>8089</port>
        </socket-address>
    4
    Before start each process remember put in java command line :
    for server
    -Dtangosol.coherence.localhost=171.193.103.25 -Dtangosol.coherence.localport=8088
    for client
    -Dtangosol.coherence.localhost=171.193.103.25 -Dtangosol.coherence.localport=8089
    regards,
    Leo_TA

  • Error in coherence-- stopping cluster service.

    i do have found the error in one of my coherence server log files can some one explain me what does it mean?
    Coherence Logger@9272718 3.4.2/411 ERROR 2009-06-01 16:08:31.396/1217.130 Oracle Coherence GE 3.4.2/411 <Error> (thread=Cluster, member=3): Received cluster heartbeat from the senior Member(Id=7, Timestamp=2009-04-24 12:29:25.802, Address=xx.xxx.xx.xxx:8093, MachineId=55400, Location=machine:server72,process:11324, Role=WeblogicServer) that does not contain this Member(Id=3, Timestamp=2009-06-01 15:48:09.18, Address=xx.xxx.xxx.xx:8091, MachineId=47428, Location=site:ops.company.org,machine:cohserverbox1,process:14401, Role=CoherenceServer); stopping cluster service.
    Thanks Much

    Hi,
    This error essentially means what it says: The process received a cluster heartbeat that did not include the process as a member of the cluster. The process, therefore, stops its cluster service and will attempt to join the cluster again when appropriate. There are few reasons that the senior member may not have included the process in its heartbeat. Based on the timestamps and roles, I would first want to confirm the intent to cluster these processes. If the intent is not to cluster these processes, I would adjust their configurations appropriately (eg. use a distinct port) to form separate clusters. If the intent is to cluster these processes and the error (with the timestamp spread) reproduces, I would want to examine the network topology and look for reasons the members are being dropped from the cluster.
    Regards,
    Harv

  • Is there a way of getting Coherence to run without starting up a cluster?

    It's painful to run unit-tests against Coherence because I can't find a way of configuring it with a simple local in-memory scheme, but still get to test features requiring custom POF types, etc..
    I have implemented the advice here: http://coherence.oracle.com/display/COH35UG/Setting+Single+Server+Mode
    But it still seems far too slow when you are used to unit tests taking milliseconds- and things seem to have got slower with Coherence 3.5.
    Are there any plans to allow disabling of the TCMP/clustering layer? This would greatly improve the product in my opinion.

    You can set the "<join-timeout-milliseconds>" (on [<multicast-listener>|http://coherence.oracle.com/display/COH35UG/multicast-listener]) to the lowest possible number. Make sure to remember to set it to the recommended value when you want to start testing in a cluster again.
    Rob
    :Coherence Team:

  • Coherence::net::messaging::ConnectionException: could not establish a connection to one of the following addresses: {10.242.152.242/10.242.152.242:8088}; make sure the "remote-addresses" configuration element contains an address and port of a running TcpA

    Hi
    I have installed coheI have installed coherence server "fmw_12.1.3.0.0_coherence_Disk1_1of1.zip" along with Examples on windows machine and C++ client coherence-cpp-12.1.3.0.0b51709-windows-x86-vs2012.zip on the same machine.
    I have built the "contacts" C++ Example successfully and while I execute this "contacts" using run I am facing TcpAcceptor error.
    On my coherence server the TcpAcceptor is listening on port 8088, so I have modified the extend-cache-config.xml file with values "ip address of my windows machine" and port as "8088".
    All the time I am getting below error,
    coherence::net::messaging::ConnectionException: could not establish a connection to one of the following addresses: {10.242.152.242/10.242.152.242:8088}; make sure the "remote-addresses" configuration element contains an address and port of a running TcpAcceptor
        at class coherence::lang::TypedHandle<class coherence::component::net::extend::PofConnection> __thiscall coherence::component::util::TcpInitiator::openConne
    ction(void)(TcpInitiator.cpp:307)
        at coherence::component::util::TcpInitiator::openConnection
        at coherence::component::util::Initiator::ensureConnection
        at coherence::component::net::extend::RemoteCacheService::openChannel
        at coherence::component::net::extend::RemoteService::doStart
        at coherence::component::net::extend::RemoteService::start
        at coherence::component::util::SafeService::startService
        at coherence::component::util::SafeService::restartService
        at coherence::component::util::SafeService::ensureRunningServiceInternal
        at coherence::component::util::SafeService::start
        at coherence::net::DefaultConfigurableCacheFactory::configureService
        at coherence::net::DefaultConfigurableCacheFactory::ensureService
        at coherence::net::DefaultConfigurableCacheFactory::ensureRemoteCache
        at coherence::net::DefaultConfigurableCacheFactory::configureCache
        at coherence::net::DefaultConfigurableCacheFactory::ensureCache
        at coherence::net::CacheFactory::getCache
        at unsigned __int64 coherence::lang::class_spec<class coherence::lang::Managed<class ContactId>,class coherence::lang::extends<class coherence::lang::Object,class coherence::lang::Void<class coherence::lang::Object> >,class coherence::lang::implements<void,void,void,void,void,void,void,void,void,void,void,void,void,void,void,void> >::sizeOf(bool)
        at _onexit
        at class coherence::util::Hashtable * coherence::lang::factory<class coherence::util::Hashtable>::create(void)
        at class coherence::util::Hashtable * coherence::lang::factory<class coherence::util::Hashtable>::create(void)
        at BaseThreadInitThunk
        at RtlInitializeExceptionChain
        at RtlInitializeExceptionChain
        on thread "main"
    Caused by: coherence::net::messaging::ConnectionException: coherence::component::util::TcpInitiator::TcpConnection@029EAD78{Id=NULL, Open=1, LocalAddress=NULL,
    RemoteAddress=10.242.152.242/10.242.152.242:8088}: socket disconnect
        at class coherence::lang::TypedHandle<class coherence::net::messaging::Response> __thiscall coherence::component::net::extend::AbstractPofRequest::Status::g
    etResponse(void)(AbstractPofRequest.cpp:203)
        at coherence::component::net::extend::AbstractPofRequest::Status::getResponse
        at coherence::component::net::extend::AbstractPofRequest::Status::waitForResponse
        at coherence::component::util::Initiator::openConnection
        at coherence::component::net::extend::PofConnection::open
        at coherence::component::util::TcpInitiator::openConnection
        at coherence::component::util::Initiator::ensureConnection
        at coherence::component::net::extend::RemoteCacheService::openChannel
        at coherence::component::net::extend::RemoteService::doStart
        at coherence::component::net::extend::RemoteService::start
        at coherence::component::util::SafeService::startService
        at coherence::component::util::SafeService::restartService
        at coherence::component::util::SafeService::ensureRunningServiceInternal
        at coherence::component::util::SafeService::start
        at coherence::net::DefaultConfigurableCacheFactory::configureService
        at coherence::net::DefaultConfigurableCacheFactory::ensureService
        at coherence::net::DefaultConfigurableCacheFactory::ensureRemoteCache
        at coherence::net::DefaultConfigurableCacheFactory::configureCache
        at coherence::net::DefaultConfigurableCacheFactory::ensureCache
        at coherence::net::CacheFactory::getCache
        at unsigned __int64 coherence::lang::class_spec<class coherence::lang::Managed<class ContactId>,class coherence::lang::extends<class coherence::lang::Object
    ,class coherence::lang::Void<class coherence::lang::Object> >,class coherence::lang::implements<void,void,void,void,void,void,void,void,void,void,void,void,void
    ,void,void,void> >::sizeOf(bool)
        at _onexit
        at class coherence::util::Hashtable * coherence::lang::factory<class coherence::util::Hashtable>::create(void)
        at class coherence::util::Hashtable * coherence::lang::factory<class coherence::util::Hashtable>::create(void)
        at BaseThreadInitThunk
        at RtlInitializeExceptionChain
        at RtlInitializeExceptionChain
        on thread "main"
    Caused by: coherence::io::IOException: socket disconnect
        at unsigned int __thiscall coherence::net::Socket::readInternal(unsigned char *,unsigned int)(Socket.cpp:333)
        at coherence::net::Socket::readInternal
        at coherence::net::Socket::SocketInput::read
        at coherence::io::BufferedInputStream::fillBuffer
        at coherence::io::BufferedInputStream::read
        at coherence::component::util::TcpInitiator::readMessageLength
        at coherence::component::util::TcpInitiator::TcpConnection::TcpReader::onNotify
        at coherence::component::util::Daemon::run
        at coherence::lang::Thread::run
        on thread "ExtendTcpCacheService:coherence::component::util::TcpInitiator:coherence::component::util::TcpInitiator::TcpConnection::TcpReader"

    We are facing same issue.    Could you please provide us any working .Net sample code for the version 12.1.2.0.
    <ssl>
                  <protocol>Tls</protocol>
                  <local-certificates>
                    <certificate>
                      <url>c:\Cert\</url>
                      <password>password</password>
                      <flags>DefaultKeySet</flags>
                    </certificate>
                  </local-certificates>
                </ssl>
    thanks
    Bala

  • Is there a workaround for coherence::net::messaging::ConnectionException in C++ client?

    Hi,
    When debugging our C++ app we often set Visual Studio 2012 to break on exception which we use to handle... well... exceptional circumstances. However, when connecting, Coherence always throws few (handled) exceptions such as below. Does anyone know a workaround?
    Thank you!
    Michal
    First-chance exception at 0x000007FEFD73940D in cmd.exe: Microsoft C++ exception: coherence::lang::throwable_spec<coherence::net::messaging::ConnectionException,coherence::lang::extends<coherence::io::pof::PortableException,std::runtime_error>,coherence::lang::implements<void,void,void,void,void,void,void,void,void,void,void,void,void,void,void,void>,coherence::lang::throwable_spec<coherence::io::pof::PortableException,coherence::lang::extends<coherence::lang::RuntimeException,std::runtime_error>,coherence::lang::implements<coherence::io::pof::PortableObject,void,void,void,void,void,void,void,void,void,void,void,void,void,void,void>,coherence::lang::throwable_spec<coherence::lang::RuntimeException,coherence::lang::extends<coherence::lang::Exception,std::runtime_error>,coherence::lang::implements<void,void,void,void,void,void,void,void,void,void,void,void,void,void,void,void>,coherence::lang::throwable_spec<coherence::lang::Exception,coherence::lang::extends<coherence::lang::Object,std::exception>,coherence::lang::implements<void,void,void,void,void,void,void,void,void,void,void,void,void,void,void,void>,coherence::lang::TypedHandle<coherence::lang::Object const > >::hierarchy>::hierarchy>::hierarchy>::bridge at memory location 0x0000000000156108.
    First-chance exception at 0x000007FEFD73940D in cmd.exe: Microsoft C++ exception: coherence::lang::throwable_spec<coherence::io::InterruptedIOException,coherence::lang::extends<coherence::io::IOException,std::ios_base::failure>,coherence::lang::implements<void,void,void,void,void,void,void,void,void,void,void,void,void,void,void,void>,coherence::lang::throwable_spec<coherence::io::IOException,coherence::lang::extends<coherence::lang::Exception,std::ios_base::failure>,coherence::lang::implements<void,void,void,void,void,void,void,void,void,void,void,void,void,void,void,void>,coherence::lang::throwable_spec<coherence::lang::Exception,coherence::lang::extends<coherence::lang::Object,std::exception>,coherence::lang::implements<void,void,void,void,void,void,void,void,void,void,void,void,void,void,void,void>,coherence::lang::TypedHandle<coherence::lang::Object const > >::hierarchy>::hierarchy>::bridge at memory location 0x00000000062CED88.

    JM140 wrote:
    Hello,
    when I delete an email, it deletes from all of my devices and none of hers (i have an iPad, macbook, and iPhone)
    when she deletes an email it deletes from all of hers and none of mine. (she has an iPhone and macbook)
    So its kind of like a POP3 server splitting to two separate IMAP servers;  my wife's and mine.
    Any ideas everyone?
    Nope, its a POP server acting like a pop server where the settings on each device have it so email is left on the server and only deleted from the devices.
    This means all devices can download the email from the server, but its never really removed from the server, only from the devices upon deletion, unless one of the devices is set to have it deleted from the server when its deleted from the device. 
    Basically, what you want is not possible, as you want an IMAP functionality on only specific devices. IMAP is designed to sync actions across all devices using the account.  Because basically what happens is actions take place on the server, and then trickle down to the devices.
    On POP accounts its the opposite. Actions take place on the Devices and then if configured as such may move up to the server.

Maybe you are looking for

  • Original and Transcoded (ProRes 422) Media

    Hi all Can anyone explain to me what exactly is the point of having both original (say H264, for example) and transcoded media in a library? I fail to grasp the point since I am only interested in dealing with prores files. Thanks

  • Web performance test using Ultimate VS2012 is possible?

    Hello,  I worked on automation using VS2012 for different client who used .net. Now i am on different account who used Java and my goal is same to convert day to day functionality checked in automation. Any help?? how to start where to start??? Thank

  • To create a browse button

    I wish to create a browse button. In path field path will shown . The browsed image or logo will display in logo field. Is it possible in accrobat or LC? PlZ help....... Message was edited by: greenlnd34

  • Keep firefox from opening at startup on mac

    I use both firefox (more secure) and safari on my macbook pro. Even having safari set as default browser,firefox still appears at start up. How do I keep this from happening?

  • For uploading master data(ex:customer data) into sap,

    hi for uploading master data(ex:customer data) into sap, which methods you prefer? call transaction/session/lsmw/bapi? why? Thanks Rama