OSB Domain Cluster Members

Dear all,
In our production Env, we have a domain with 6 MS Servers spanning across 3 physical machines. with the existing MS it is becoming very tough to handle 500 tps Transaction load. now we are planning to go for 6 more MS. to do this which is the best practice(should yield more performance results) below mentioned-
1. Creating new Domain with 6 MS
2.Add  6 MS Servers to the existing Domain (Can we dynamically add like this, any issues if we do so. please suggest how to do this)
EAGERLY AWAITING Reply
Thanks
PR
PR

Hi PR,
According to the reference topology the option 2 would be preferable...
Have a look here for the reference topology...
Introduction to the Enterprise Deployment Reference Topology - 11g Release 1 (11.1.1)
And here on how to scale-up or scale-out the topology...
Managing the Topology for an Enterprise Deployment - 11g Release 1 (11.1.1)
Cheers,
Vlad

Similar Messages

  • Create a new OSB domain and data source problems

    Hello,
    I noticed a problem while create a new OSB domain with Oracle Service Bus 10gR3 on Solaris 10 (intel-based).
    This domain is composed of two managed servers deployed on a cluster.
    I have configured the JMS reporting data sources to use an Oracle 10g (XE) database (driver Oracle Thin (non XA)) installed on a remote server.
    While trying to start my managed servers, the startup process of these managed servers failed due to a data source problem on "wlsbjmsrpDataSource" or "cgDataSource-nonXA".
    The workaround I found is to delete and recreate (through the WLS console) the data sources "wlsbjmsrpDataSource", "cgDataSource-nonXA" and "cgDataSource".
    In this situation, my managed servers can be started properly.
    Is it a known problem located on the configuration wizard?
    Thanks for your help.

    Hi
    For the answer.
    I fully understand that those datasources are default ones and are mainly related to JMS reporting.
    But my question was probably not well expressed.
    The problem I had is when the domain is created, the managed servers won't start due to problems related to theses datasources.
    The workaround I found is to delete and to create those datasources from the WLS console.
    In this situation, the managed servers are able to be started.
    I want to know if this is a known problem/limitation of OSB 10gR3 with Solaris 10 and Oracle 10g?
    Thanks for your help.

  • IPlanet Plug-in does not recognize all cluster members

              We are running 7 WLS 6.1 SP2 server instances on two Solaris 2.8 machines. (3 on
              machine A , 4 on machine B). The admin server is also running on Machine A.
              All the cluster members appear to join the cluster correctly on startup, at least
              as far as the WLS console shows. No logged errors.
              We are using the IPlanet 6.0, sp4 with the Weblogic NSAPI plug-in to load-balance
              across the cluster.
              However the Plug-in will only recognize the server instances on machine A or machine
              B during any one life of the IPlanet. We can never get the plug-in to recognize
              all instances at one time, only on one machine or the other. We can see this from
              the results of a __WebLogicBridgeConfig.
              Using same configuration WLS configuration, run where machine A and B are domains
              on the same physical machine uses both A and B servers.
              We suspect this has to do with a multicast problem, running MulticastTest between
              machines A and B fail to receive messages.
              My questions are,
              A) Why would we be seeing all server instances as having successfully joined the
              cluster, if multicast is failing?
              B) How would multicast have any effect on how the NSAPI plug-in sees the cluster
              members?
              C) Why would the cluster instances seen by the NSAPI plug-in alternate between
              machine only A or B, after each shutdown and start-up ?
              WLS 6.1 SP2 on Solaris 2.8, IPlanet 6.0 sp4.
              

    I am seeing a similar problem. I have three machines - each with four managed servers running on them. All 12 servers in the cluster are are specified in the obj.conf for NSAPI. Only eight of the servers get used, however. The wierd part is that machine1 contains managed servers 1, 4, 7, and 10. Machine 2 contains managed servers 2, 5, 8, and 11. Machine 3 contains managed servers 3, 6, 9, and 12. In the obj.conf, we list the machines in ourder 1 to 12 for the cluster. For some reason the NSAPI is skipping managed servers 3, 6, 9, and 12. I have checked both the connectivity for the ip and the port. The fire wall seems fine.
              Did you ever get a response or figure out what was causing this problem.
              Thanks,
              Mark

  • Possible to have multiple cluster members in the same JVM (for unit tests)?

    Hi,
    I'm wondering if it's possible to simulate multiple cluster members inside of a single JVM. I'm looking to unit test my code and it would be great to write cases for various boundary conditions.
    This could certainly be done with multiple JVMs but would be more difficult to run in something like cruise control etc.
    TIA,
    Danny

    Hi Danny,
    I do not know how to accomplish what you are asking for and run several Coherence nodes in a single JVM, but I have written a JUnit base class that starts a configurable number of nodes (each in its own JVM) and provides some other useful methods for unit/integration testing with Coherence, such as method to clear all caches or to load test data from the CSV file.
    It is still a work in progress and you might need to tweak things a bit to make it work in your environment, but it will at least give you a head start.
    Shoot me an email if you are interested and I'll send it to you (you can find my email in my profile).
    Regards,
    Aleks

  • How to reset OSB domain from EM point of view?

    Hi folks,
    Since I set up a OSB domain in EM on a client host for the very first time, the administrative server changed (actually moved inside firewall and hence changed its name, consequently I re-installed OSB on it from scratch). Now EM on the client host cannot locate the administrative server, and thus thinks the domain id down. When I try to edit hosts list using EM on the client host (Setup>Hosts), the link directs to the client host itself (like https://client001.example.com/?tab=2&mode=3). As the client host obviously runs no administrative part, the page does not exist.
    On the other hand, when asked about OSB devices (Setup>Devices), EM correctly connects to the new administrative server name, and shows the correct answer. Same, obtool running on the client host can connect to the new administrative server and fetch settings from it (e.g. lshosts correctly displays the hosts in the domain).
    Seems that the old administrative server name exists only within EM settings. I believe the best thing to do is to re-create OSB domain (from the EM point of view) with the new administrative server in it. How can I do that?
    Thanks already for your answers.
    Dmitry.

    " emca -config dbcontrol db -repos recreate " does the job. Of course, as the repository is reset -- all other history records are lost.

  • What is behind the scenes of "Automatic Discovery of Cluster Members?"

    Do any of you know the basic mechenism ( or concept ) of authomatic discovery of cluster members?
    And how can we be confident that an application deployed anywhere on the network (LAN,WAN, or the Internet) can join a cluster?
    Sorry for this clumsy question, but I realy need to know this.
    Thanks in advance!
    Scott

    Hi Scott,
    Coherence uses a protocol called TCMP which is described in the following Wiki article:
    http://wiki.tangosol.com/display/COH32UG/Network+Protocols
    By default, multicast is used to discover if a cluster is already running that the new member may join.
    In order to test multicast in your environment, you need to run the multicast test:
    http://wiki.tangosol.com/display/COH32UG/Multicast+Test
    Some environments don't allow multicast or some switches don't handle multicast very well, in which case you'd need to run using WKA (Well Known Address):
    http://wiki.tangosol.com/display/COH32UG/well-known-addresses
    Regards,
    Jon Hall.
    p.s. once you've confirmed that clustering is taking place (either using the default of multicast or using WKA), then you should run the datagram test to test your network's performance:
    http://wiki.tangosol.com/display/COH32UG/Datagram+Test
    Also, check out the production checklist as this has lots of really good information in it that will be useful going forward:
    http://wiki.tangosol.com/display/COH32UG/Production+Checklist

  • Determining Cluster Members Using a NamedCache

    Is it possible to get the cluster members using a particular NamedCache?
    Is it possible to add a listener which listens for members getting/returning (ensure/release) a particular NamedCache?
    Is there a concept of groups within a cluster membership? Some way of both indicating what a particular node will be doing (say if you wanted to have a "nodata" member which held no objects in its partition and a "data" member which did hold the objects and could have the same configuration file, only somehow give it the parameter to indicate which an instance would be) as well as identify it from the cluster info (get all cluster members of the "data" type)?
    --Tim                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

    1) It's possible to get all cluster members that are
    using (or providing storage for) a given cache
    service:Excellent. Just what I was looking for.
    More generally, starting with Coherence 3.2, there
    are number of <aThis is what we need. I used storage enabled as an example, but we want to have roles for cluster members, so this is the better solution. We're currently on 3.0, but I guess this is a good impetus to upgrade.
    --Tim                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

  • Any way to communicate a message to all cluster members?

    Is there any way I can communicate a message to all cluster members?
              In my case I do some data caching within each cluster member. I'm trying to
              implement a mechanism that will enable me to have every server flush their
              cache. I was hoping that I could send the request to a servlet (on any
              machine) that in-turn would send a 'multicast' message to all app servers
              that would request a cache refresh.
              Any ideas?
              Thanks.
              Marko.
              

    JMS topics are great for multi-server synchronization.
              Marko Milicevic <[email protected]> wrote in message
              news:39dba2e7$[email protected]..
              > Is there any way I can communicate a message to all cluster members?
              > In my case I do some data caching within each cluster member. I'm trying
              to
              > implement a mechanism that will enable me to have every server flush their
              > cache. I was hoping that I could send the request to a servlet (on any
              > machine) that in-turn would send a 'multicast' message to all app servers
              > that would request a cache refresh.
              >
              > Any ideas?
              >
              > Thanks.
              >
              > Marko.
              > .
              >
              >
              

  • Configure weight for cluster members ?

    Hai everyone,
    I have installed weblogic 10.3.4 and configured a cluster with two members.
    Now i want to do load balancing & failover for my cluster and also want to set the weight for each cluster members.
    please guys help me to resolve this one...
    Regards,
    JagadesH L.

    General cluster documentation can be found here: http://download.oracle.com/docs/cd/E17904_01/web.1111/e13709/toc.htm
    The information concerning load balancing (and the description of the various algorithms) can be found here: http://download.oracle.com/docs/cd/E17904_01/web.1111/e13709/load_balancing.htm#CHDGFIBD

  • Db schema for osb domain - 1 db schema per domain???

    Hi,
    I have recently started looking into osb10.3.
    domain 1
    created osb domain and assigned one schema to 'wlsbjmsrpDataSource' datasource.
    started the domain and it worked fine then i shutdown the domain.
    domain 2
    Created another domain and assigned same schema assigned previously to domain 1.
    started the domain and it worked fine.
    Now when i try to start domain 1 again the domain startup failed. Wondering if we need one schema per domain or i am missing something while creating datasource for domain?
    Please help as i cant really believe having one schema per osb domain?
    thanks in advance!
    salman

    Hi Manoj,
    Some more information below!
    CR231843
    An ALSB domain cannot boot and generates weblogic.transaction.loggingresource.LoggingResourceException if the domain is a new domain using the same database, schema, and LLR table as an existing domain.
    When you move a domain template to a different machine and use the template to create the new domain, the new domain is not able to boot and weblogic.transaction.loggingresource.LoggingResourceException is thrown. The following details outline the scenario:
    Create the original domain.
    Start the server for the original domain. At this point, the domain is now “used”—a domain is considered used once you have started the server for a domain after you have created it.
    Create the domain template. You can create it in several different ways—use the Domain Template Builder tool and the Configuration Wizard, the pack/unpack command, or the Weblogic Scripting Tool in offline mode.
    Move the domain template to a different machine.
    Create a new domain using the template. Again, you can create it in several different ways—use the Domain Template Builder tool and Configuration Wizard, the pack/unpack command, or the Weblogic Scripting Tool in offline mode.
    Start the server for the domain. If the new domain does not have the same name as the initial domain, the new domain cannot be started. This is because the JMS Reporting Provider provided with ALSB uses the Logging Last Resources (LLR) option. The new domain is attempting to use the same database, schema, and LLR table name to store LLR transaction records. LLR does not allow this to prevent different domains from corrupting each other's tables. To learn more about the LLR feature, see Understanding the Logging Last Resource Transaction Option in Configuring and Managing WebLogic JDBC.
    Note: You can access the Domain Template Builder, Configuration Wizard, and the WebLogic Scripting Tool from the BEA ProductsTools menu on your machine. The tools and the pack and unpack commands are located in the BEA_HOME/weblogic9xx/common/bin directory.
    According to this i have tested :
    domain 1 and domain 2 pointing at same schema ignoring Loggin Last Resource option in the reporting datasource.
    Now the only question is how much do we need Logging Last Resource option and for what purpose in OSB domain??? any comments?
    thanks
    salman

  • Problem Adding Managed Servers in Weblogic - OSB Domain

    In the project there were 4 OSB nodes as a part of a cluster, 2 other wanted to be add. These 2 were added as a part of the same cluster and were started with no problem. However, it was found that the OSB proxy services were not properly deployed in these ones.
    From the sbconsole when trying to make any change to any proxy service the session could not be activated due to the following error:
    "SB Console Cannot Create Resource Outside Of A Session"
    In Oracle Metalink the error was found under the ID:
    OSB - SB Console Cannot Create Resource Outside Of A Session [ID 1466651.1]
    And the reason is suppose to be a Bug in Weblogic Version 11.1.1.3:
    @ BUG:9821093 - JMS PROXY IS NOT FOUND AFTER ADDING NEW MANAGED SERVER TO OSB CLUSTER
    As far as it is known these problem have not occur before when the previos nodes were added because no JMS proxy was deployed and now there were one.
    That is way the JMS proxy was deployed in an other domain and deleted from the previous one hopping that this action solve as a workaround to the
    problem, but when the new nodes were started and an small change was made to a proxy service the error :
    "SB Console Cannot Create Resource Outside Of A Session"
    appeared again.
    We are asking for some feedback regarding this issue, first of all we want to know if there are any workaround o anyone has faced this issue before, and if the Bug exists we want to know if the Patche suggested in Metalink is effective and trustable.
    In the domain we still have some Business Services with JMS transport. However, in the bug only PS are involved in the BUG and not the BS, moreover remove the BS with JMS transport is not an option.
    Thanks you in Advance.

    Instead of modifying the current Proxy Service that includes four nodes, delete and re-create it and with 6 nodes included.
    The bug only occurs when trying to "modify" the proxy service. But not to create it again.

  • OSB 11gR1 Cluster Environment Issues

    Hi,
    I installed Oracle Service Bus 11gR1 on a clustered environment with two nodes and extended the domain for OSB. (First Node: osb_server1, Second Node: osb_server2)
    I am facing the problems below on this environment:
    1. When I login on SB Console on second node, on "Operations" menu a warning message (Unable to obtain metrics data from the server. ) is displayed and the message below is logged on osb_server2:
    <Aug 14, 2012 3:21:00 PM GMT+03:00> <Error> <ALSB Statistics Manager>
    <BEA-473003> <Aggregation Server Not Available. Failed to get remote aggregator
    java.lang.IllegalArgumentException: Server 'null' not found
    at com.bea.alsb.platform.weblogic.WlsDomainConfigurationImpl.getServer(WlsDomainConfigurationImpl.java:98)
    at com.bea.alsb.platform.weblogic.WlsDomainConfigurationImpl.getAggregationServer(WlsDomainConfigurationImpl.java:119)
    at com.bea.wli.monitoring.statistics.ALSBStatisticsManager.getRemoteAggregator(ALSBStatisticsManager.java:291)
    at com.bea.wli.monitoring.statistics.ALSBStatisticsManager.access$000(ALSBStatisticsManager.java:38)
    at com.bea.wli.monitoring.statistics.ALSBStatisticsManager$RemoteAggregatorProxy.send(ALSBStatisticsManager.java:55)
    Truncated. see log file for complete stacktrace
    >
    2. In SB Console on both nodes when "Security" menu is selected, an error message (Message The server encountered an unexpected condition which prevented it from fulfilling the request. ) is displayed and the message (
    <Aug 14, 2012 3:16:03 PM EEST> <Error> <User-Management-WLI-OAM> <BEA-000000>
    <[BEA-WLI-Security-UserManagement:482200]The operation searchUsers is not supported by the provider null for user/group .>
    <Aug 14, 2012 3:16:03 PM EEST> <Error> <User-Management-WLI-OAM> <BEA-000000>
    <[BEA-WLI-Security-UserManagement:482200]The operation searchUsers is not supported by the provider null for user/group .>
    <Aug 14, 2012 3:16:03 PM EEST> <Error> <ALSB Console> <BEA-494002> <Internal error occured in OSBConsole : null
    com.bea.p13n.security.management.OperationNotSupportedException
    at com.bea.p13n.security.management.authentication.AtnManagerProxy.getUserNames(AtnManagerProxy.java:579)
    at com.bea.alsb.console.oam.usermanagement.UserManagementHelper.getUserNames(UserManagementHelper.java:857)
    at com.bea.alsb.console.oam.usermanagement.UserManagementHelper.searchUsers(UserManagementHelper.java:138)
    at com.bea.alsb.console.usermanagement.user.UserManagement.viewUsersByName(UserManagement.java:244)
    at com.bea.alsb.console.usermanagement.user.UserManagement.viewUsers(UserManagement.java:71)
    Truncated. see log file for complete stacktrace
    >
    How can I solve these two issues?

    For the first error - Refer to this http://tim.blackamber.org.uk/?p=975
    For the second error - Looks like there is an issue loading the default ldap configuration. Please do the following:
    1. Take a backup of the following folders cache+, data+ and tmp+ in each of the servers.
    2. Then delete the contents of the above folders in each server
    3. And restart all the servers in the cluster. This should reinitialize all the required files at startup.
    Hope this helps.
    Thanks,
    Patrick

  • Resource plan not migrating to all targeted cluster members

    Hi,
    We got an environment setup like:
    Physical machine 1: AdminServer, OSB1, WLS1
    Physical machine 2: OSB2, WLS2
    Where we have put OSB[1,2] into a cluster and WLS[1,2] into a cluster. The servers are running WLS 10.3.3.4.
    The problem is that: When creating a new connectionFactory for the DBAdapter, the plan.xml is only updated on 1 machine (not both of them). In my setup, it is always physical machine 1 that gets the updated plan.xml
    I looked at the target for the deployment, and it is set for AdminServer and OSBCluster (meaning, OSB1 and OSB2). And when going into the "Monitoring -> Outbound Connection Pools", I can also see that only the AdminServer and OSB1 has an outbound connection pool.
    To fix this, I've had to use SCP from OSB1 -> OSB2 for the plan.xml and then doing a redeploy of the DBAdapter. Then, I see an outbound connection from OSB2.
    Have I miss understood something? Shouldn't the configurations (plan.xml) be replicated across all targeted members, in case one servers fails?
    Best Regards,
    Mathias

    hello,
              and what about ServerAffinity (to be set on the ConnectionFactory-level)? Should be "false". Antoher explanation is that your cluster is out of sync (check, if multicast has been set correctly).
              load balancing and server affinity for dd
              regards,
              makiey

  • Are there any WLST 11g Scripts to create an OSB domain?

    All the ones I can find (forums, blogs, etc.) are for 10g. Things have changed since then, it seems. For example, most scripts contain something like:
    addTemplate(workshop_home + '/common/templates/applications/workshop_wl.jar')
    addTemplate(wl_home + '/common/templates/applications/wls_webservice.jar')
    addTemplate(osb_home + '/common/templates/applications/wlsb.jar')
    Since Workshop no longer exists, this fails.
    11g doco is sadly lacking...
    Thanks for any help.
    Alph

    I've also been attempting to create and configure an OSB 11gR1 domain using WLST and am also hitting some issues when adding the extention templates.
    I think I've identified the required templates as the following:-
    ${middleware.home}/oracle_common/common/templates/applications/jrf_template_11.1.1.jar
    ${osb.home}/common/templates/applications/oracle.soa.common.adapters_template_11.1.1.jar
    ${wls.home}/common/templates/applications/wls_webservice.jar
    ${osb.home}/common/templates/applications/wlsb_base.jar
    ${osb.home}/common/templates/applications/wlsb.jar
    The correct resources appear to be added but many of the deployments are wrongly targetted when using an admin + cluster topology.
    I've found that the templates contain dependency references in the template-info.xml and dependencies are implicitly included when you use WLST addTemplate(). The wlsb.jar template is dependent on the other 4 templates so you can just add wlsb.jar and the others get pulled in for you or explicitly add them all (starting with the children otherwise you will see a template already added error).
    I’ve tried various combinations of the 5 templates and essentially what I see is that the same apps and resources are always included but with seemingly random (and wrong!) targeting.
    One thing I have noticed is that the domain template framework seems to have been changed between 10gR3 and 11g. The most relevant difference (for this issue) appears to be addition of “config groups” configuration files. I can’t find any docs on these but they appear to group applications, libraries and other resources according usage and/or deployment requirements e.g. OSB-ADMIN-APPS, OSB-ADMIN-AND-CLUSTER-APPS, WLS-WEBSERVICE-MAIN-APPS etc and I imagine that they are used during domain configuration to target apps at the correct servers and clusters. Perhaps some additional WLST function calls are required to ensure these are applied correctly?
    If I use the domain configuration wizard then I can successfully create the domain so this is an option, but we'd rather have a fully scripted and repeatable domain creation process if possible. I expect that the config wizard must be using the same templates so there must be a way to script this?
    Thanks in advance

  • Multiple Senior Cluster Members?

    Hi Guys,
    We've had a few nodes kicked out of one of our production clusters all with messages similar to this:
    ERROR 2008-04-21 18:17:05.753 Oracle Coherence GE 3.3.1/389p1 <Error> (thread=Cluster, member=29): Received cluster heartbeat from the senior Member(Id=2, Timestamp=2008-04-18 11:07:21.948, Address=172.21.205.151:8089, MachineId=29847, Location=process:17367@trulxfw0006,member:trulxfw0006-2) that does not contain this Member(Id=29, Timestamp=2008-04-18 11:07:34.491, Address=172.21.205.148:8092, MachineId=29844, Location=process:17098@trulxfw0003,member:trulxfw0003-5); stopping cluster service.
    DEBUG 2008-04-21 18:17:05.753 Oracle Coherence GE 3.3.1/389p1 <D5> (thread=Cluster, member=29): Service Cluster left the cluster
    The logs on the senior member (2) are interesting though:
    DEBUG 2008-04-21 18:17:05.601 Oracle Coherence GE 3.3.1/389p1 <D5> (thread=Cluster, member=2): Member 29 left service Management with senior member 2
    DEBUG 2008-04-21 18:17:05.602 Oracle Coherence GE 3.3.1/389p1 <D5> (thread=Cluster, member=2): Member 29 left service WriteQueueSync with senior member 3
    DEBUG 2008-04-21 18:17:05.602 Oracle Coherence GE 3.3.1/389p1 <D5> (thread=Cluster, member=2): Member 29 left service WriteQueueAsync with senior member 3
    DEBUG 2008-04-21 18:17:05.603 Oracle Coherence GE 3.3.1/389p1 <D5> (thread=Cluster, member=2): Member 29 left service DistributedCache with senior member 3
    DEBUG 2008-04-21 18:17:05.603 Oracle Coherence GE 3.3.1/389p1 <D5> (thread=Cluster, member=2): Member 29 left service ServiceControl with senior member 3
    DEBUG 2008-04-21 18:17:05.604 Oracle Coherence GE 3.3.1/389p1 <D5> (thread=Cluster, member=2): Member 29 left service InvocationService with senior member 2
    DEBUG 2008-04-21 18:17:05.607 Oracle Coherence GE 3.3.1/389p1 <D5> (thread=Cluster, member=2): Member 29 left Cluster with senior member 2
    I wasn't aware that there could be multiple senior members. Is this indicative of something bad going on?
    Thanks, Paul
    PS Metalink is not behaving so I can't raise it there.

    Hi Jon,
    I've done some more digging into the logs for the whole cluster.
    Of the five nodes that left the cluster, three have the same reason:
    trulxfw0002/180-primary-0.log.1:ERROR 2008-04-21 17:05:19.763 Oracle Coherence GE 3.3.1/389p1 <Error> (thread=Cluster, member=34): This node appears to have partially lost the connectivity: it receives responses from MemberSet(Size=2, BitSetCount=2, ids=[8, 32]) which communicate with Member(Id=22, Timestamp=2008-04-18 11:07:29.911, Address=172.21.205.149:8092, MachineId=29845, Location=process:11858@trulxfw0004,member:trulxfw0004-5), but is not responding directly to this member; that could mean that either requests are not coming out or responses are not coming in; stopping cluster service.
    trulxfw0003/180-primary-7.log.3:ERROR 2008-04-21 00:40:02.153 Oracle Coherence GE 3.3.1/389p1 <Error> (thread=Cluster, member=28): This node appears to have partially lost the connectivity: it receives responses from MemberSet(Size=2, BitSetCount=3, ids=[35, 38]) which communicate with Member(Id=5, Timestamp=2008-04-18 11:07:21.992, Address=172.21.205.151:8090, MachineId=29847, Location=process:17351@trulxfw0006,member:trulxfw0006-1), but is not responding directly to this member; that could mean that either requests are not coming out or responses are not coming in; stopping cluster service.
    trulxfw0006/180-primary-0.log.6:ERROR 2008-04-18 23:13:33.896 Oracle Coherence GE 3.3.1/389p1 <Error> (thread=Cluster, member=1): This node appears to have partially lost the connectivity: it receives responses from MemberSet(Size=2, BitSetCount=3, ids=[14, 42]) which communicate with Member(Id=28, Timestamp=2008-04-18 11:07:34.381, Address=172.21.205.148:8091, MachineId=29844, Location=process:17152@trulxfw0003,member:trulxfw0003-7), but is not responding directly to this member; that could mean that either requests are not coming out or responses are not coming in; stopping cluster service.
    Member 29 had been marked as paused during its lifetime by various other nodes, however there are not frequent at the time of eviction:
    grep -R "member:trulxfw0003-5" * | grep "failed to respond" | grep "18:17"
    grep -R "member:trulxfw0003-5" * | grep "failed to respond" | grep "18:16"
    trulxfw0004/180-primary-6.log.8:DEBUG 2008-04-18 18:16:26.337 Oracle Coherence GE 3.3.1/389p1 <D6> (thread=PacketPublisher, member=23): Member(Id=29, Timestamp=2008-04-18 11:07:34.491, Address=172.21.205.148:8092, MachineId=29844, Location=process:17098@trulxfw0003,member:trulxfw0003-5) has failed to respond to 17 packets; declaring this member as paused.
    trulxfw0005/180-primary-1.log.7:DEBUG 2008-04-21 18:16:37.315 Oracle Coherence GE 3.3.1/389p1 <D6> (thread=PacketPublisher, member=10): Member(Id=29, Timestamp=2008-04-18 11:07:34.491, Address=172.21.205.148:8092, MachineId=29844, Location=process:17098@trulxfw0003,member:trulxfw0003-5) has failed to respond to 17 packets; declaring this member as paused.
    [xflow@lonrs00342 machines]$ grep -R "member:trulxfw0003-5" * | grep "failed to respond" | grep "18:15"
    trulxfw0002/180-primary-7.log.8:DEBUG 2008-04-18 18:15:44.161 Oracle Coherence GE 3.3.1/389p1 <D6> (thread=PacketPublisher, member=37): Member(Id=29, Timestamp=2008-04-18 11:07:34.491, Address=172.21.205.148:8092, MachineId=29844, Location=process:17098@trulxfw0003,member:trulxfw0003-5) has failed to respond to 17 packets; declaring this member as paused.
    trulxfw0006/180-primary-7.log.9:DEBUG 2008-04-18 18:15:51.477 Oracle Coherence GE 3.3.1/389p1 <D6> (thread=PacketPublisher, member=4): Member(Id=29, Timestamp=2008-04-18 11:07:34.491, Address=172.21.205.148:8092, MachineId=29844, Location=process:17098@trulxfw0003,member:trulxfw0003-5) has failed to respond to 17 packets; declaring this member as paused.
    grep -R "member:trulxfw0003-5" * | grep "failed to respond" | grep "18:14"
    trulxfw0002/180-primary-0.log.41:DEBUG 2008-04-18 22:18:14.220 Oracle Coherence GE 3.3.1/389p1 <D6> (thread=PacketPublisher, member=34): Member(Id=29, Timestamp=2008-04-18 11:07:34.491, Address=172.21.205.148:8092, MachineId=29844, Location=process:17098@trulxfw0003,member:trulxfw0003-5) has failed to respond to 17 packets; declaring this member as paused.
    [xflow@lonrs00342 machines]$
    grep -R "member:trulxfw0003-5" * | grep "failed to respond" | grep "18:13"
    trulxfw0002/180-primary-0.log.43:DEBUG 2008-04-18 18:13:41.083 Oracle Coherence GE 3.3.1/389p1 <D6> (thread=PacketPublisher, member=34): Member(Id=29, Timestamp=2008-04-18 11:07:34.491, Address=172.21.205.148:8092, MachineId=29844, Location=process:17098@trulxfw0003,member:trulxfw0003-5) has failed to respond to 17 packets; declaring this member as paused.
    How closely does a paused declaration correlate with a vote for eviction?
    The last node which was kicked out had a similar pattern to trulxfw0003-5 above, however it had significantly more paused declaration messages in the minutes preceding eviction.
    We have a listener on the Cluster service which detects the local node leaving the cluster and kills itself. Is this a good idea, or is it safer to let Coherence sort itself out?
    Thanks, Paul

Maybe you are looking for