Cluster member failure

Hi
i have a problem with my new Csico3550 switch member configuration, there was initially a star cluster configuration and i managed to add a new member cisco3550 switch to the cluster with the GBIC port connection to another member switch and this GBIC port is connected as trunk port with the same VLAN gamily unlike the other core switch and members.it works fine but when the core/command switch is turned off and on,the new Cisco3550 switch gets to a down state from the cluster.
my duplex and speed values are constant with the connected trunk port.
can any body please suggest what is the problem?
Thanks in advance

The recovery procedures in this section require that you have physical access to the switch. Recovery procedures include the following topics:
Recovering from corrupted software
Recovering from a lost or forgotten password
Recovering from a command-switch failure
http://www.cisco.com/en/US/products/hw/switches/ps637/products_configuration_guide_chapter09186a008007d204.html

Similar Messages

  • Pb : Tangosol cluster member has lost its storage/Questions about warnings

    Hello
    We have the following configuration :
    -JVM : JRockit 1.4.2_8
    -App Server : Weblogic 8.1
    We have a cluster with 4 members in 2 physical machines :
    - Weblogic server 1 and tangosol server 1 on machine 1
    - Weblogic server 2 and tangosol server 2 on machine 2
    The cache mode is Distributed.
    The 2 Weblogic servers have no local storage defined (localstorage=false).
    The 2 Weblogic servers have local storage defined (localstorage=true).
    We are performing some tests in this environment and we display continuously the
    cluster members information.
    Today, the Distributed cache on Tangosol server 2 left the cluster at 16:19 and
    data seem to be transferred to the Tangosol server 1.
    The main problem is that the Tangosol server 2 is always running but has no longer storage enabled (information we display).
    This information is also available in the server logs :
    2007-05-15 16:19:02.076 Tangosol Coherence AE 3.2/367 <D5> (thread=DistributedCache, member=n/a): Service DistributedCache left the cluster
    Is there an explanation for this behavior? Here are more logs :
    2007-05-15 16:18:52.092 Tangosol Coherence AE 3.2/367 <Error> (thread=Cluster, member=1): This senior Member(Id=1, Timestamp=2007-05-14 17:07:06.956, Address=...:..., MachineId=38679) appears to have been disconnected from other nodes due to a long period of inactivity and the seniority has been assumed by the Member(Id=2, Timestamp=2007-05-14 17:07:19.439, Address=...:..., MachineId=38678); stopping cluster service.
    2007-05-15 16:18:53.389 Tangosol Coherence AE 3.2/367 <D6> (thread=PacketPublisher, member=1): Member(Id=2, Timestamp=2007-05-14 17:07:19.439, Address=...:..., MachineId=38678) has failed to respond to 17 packets; declaring this member as paused.
    2007-05-15 16:18:56.185 Tangosol Coherence AE 3.2/367 <Info> (thread=Main Thread, member=n/a): Restarting cluster
    2007-05-15 16:18:57.404 Tangosol Coherence AE 3.2/367 <D5> (thread=Cluster, member=n/a): Service Cluster left the cluster
    2007-05-15 16:19:00.966 Tangosol Coherence AE 3.2/367 <D5> (thread=Invocation:InvocationService, member=n/a): Service InvocationService left the cluster
    2007-05-15 16:19:01.263 Tangosol Coherence AE 3.2/367 <D5> (thread=OptimisticCache, member=n/a): Service OptimisticCache left the cluster
    2007-05-15 16:19:01.263 Tangosol Coherence AE 3.2/367 <D5> (thread=ReplicatedCache, member=n/a): Service ReplicatedCache left the cluster
    2007-05-15 16:19:02.076 Tangosol Coherence AE 3.2/367 <D5> (thread=DistributedCache, member=n/a): Service DistributedCache left the cluster
    2007-05-15 16:19:15.122 Tangosol Coherence AE 3.2/367 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
    2007-05-15 16:19:15.621 Tangosol Coherence AE 3.2/367 <Info> (thread=Cluster, member=n/a): This Member(Id=3, Timestamp=2007-05-15 16:19:16.947, Address=...:8096, MachineId=38679, Edition=Application Edition, Mode=Production, CpuCount=2, SocketCount=2) joined cluster with senior Member(Id=2, Timestamp=2007-05-14 17:07:19.439, Address=...:..., MachineId=38678, Edition=Application Edition, Mode=Production, CpuCount=2, SocketCount=2)
    2007-05-15 16:19:15.700 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=n/a): Compensated for JIT compilation error (optimized out I2S); Expected: -1, received: 65535
    2007-05-15 16:19:15.715 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=n/a): Compensated for JIT compilation error (optimized out I2S); Expected: -2, received: 65534
    2007-05-15 16:19:15.809 Tangosol Coherence AE 3.2/367 <D5> (thread=Cluster, member=n/a): Member(Id=5, Timestamp=2007-05-14 18:44:59.606, Address=...:..., MachineId=38678) joined Cluster with senior member 2
    2007-05-15 16:19:15.840 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=n/a): Compensated for JIT compilation error (optimized out I2S); Expected: -1, received: 65535
    2007-05-15 16:19:15.918 Tangosol Coherence AE 3.2/367 <D5> (thread=Cluster, member=n/a): Member(Id=6, Timestamp=2007-05-14 18:45:04.621, Address=...:..., MachineId=38679) joined Cluster with senior member 2
    2007-05-15 16:19:15.965 Tangosol Coherence AE 3.2/367 <D5> (thread=Cluster, member=n/a): Member 2 joined Service ReplicatedCache with senior member 2
    2007-05-15 16:19:15.965 Tangosol Coherence AE 3.2/367 <D5> (thread=Cluster, member=n/a): Member 2 joined Service DistributedCache with senior member 2
    2007-05-15 16:19:15.965 Tangosol Coherence AE 3.2/367 <D5> (thread=Cluster, member=n/a): Member 2 joined Service OptimisticCache with senior member 2
    2007-05-15 16:19:15.965 Tangosol Coherence AE 3.2/367 <D5> (thread=Cluster, member=n/a): Member 2 joined Service InvocationService with senior member 2
    2007-05-15 16:19:15.965 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=n/a): Compensated for JIT compilation error (optimized out I2S); Expected: -2, received: 65534
    2007-05-15 16:19:15.965 Tangosol Coherence AE 3.2/367 <D5> (thread=Cluster, member=n/a): Member 5 joined Service DistributedCache with senior member 2
    2007-05-15 16:19:15.996 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=n/a): Compensated for JIT compilation error (optimized out I2S); Expected: -1, received: 65535
    2007-05-15 16:19:16.887 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=n/a): Compensated for JIT compilation error (optimized out I2S); Expected: -2, received: 65534
    2007-05-15 16:19:17.918 Tangosol Coherence AE 3.2/367 <D5> (thread=Cluster, member=n/a): Member 6 joined Service DistributedCache with senior member 2
    2007-05-15 16:19:17.949 Tangosol Coherence AE 3.2/367 <Info> (thread=Main Thread, member=3): Restarting Service: ReplicatedCache
    2007-05-15 16:19:18.981 Tangosol Coherence AE 3.2/367 <D5> (thread=Cluster, member=3): TcpRing: connecting to member 2 using TcpSocket{State=STATE_OPEN, Socket=Socket[addr=/...,port=...,localport=...]}
    2007-05-15 16:19:18.996 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=3): Compensated for JIT compilation error (optimized out I2S); Expected: -3, received: 65533
    2007-05-15 16:19:18.996 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=3): Compensated for JIT compilation error (optimized out I2S); Expected: -3, received: 65533
    2007-05-15 16:19:19.934 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=3): Compensated for JIT compilation error (optimized out I2S); Expected: -3, received: 65533
    2007-05-15 16:19:19.934 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=3): Compensated for JIT compilation error (optimized out I2S); Expected: -3, received: 65533
    2007-05-15 16:19:20.137 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=3): Compensated for JIT compilation error (optimized out I2S); Expected: -3, received: 65533
    2007-05-15 16:19:20.137 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=3): Compensated for JIT compilation error (optimized out I2S); Expected: -3, received: 65533
    2007-05-15 16:19:20.762 Tangosol Coherence AE 3.2/367 <D5> (thread=ReplicatedCache, member=3): Service ReplicatedCache joined the cluster with senior service member 2
    2007-05-15 16:19:20.949 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=3): Compensated for JIT compilation error (optimized out I2S); Expected: -17, received: 65519
    2007-05-15 16:19:20.949 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=3): Compensated for JIT compilation error (optimized out I2S); Expected: -2, received: 65534
    2007-05-15 16:19:21.840 Tangosol Coherence AE 3.2/367 <Info> (thread=Main Thread, member=3): Restarting Service: DistributedCache
    2007-05-15 16:19:25.465 Tangosol Coherence AE 3.2/367 <D5> (thread=DistributedCache, member=3): Service DistributedCache joined the cluster with senior service member 2
    2007-05-15 16:19:25.496 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=3): Compensated for JIT compilation error (optimized out I2S); Expected: -2, received: 65534
    2007-05-15 16:19:25.511 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=3): Compensated for JIT compilation error (optimized out I2S); Expected: -17, received: 65519
    2007-05-15 16:19:25.558 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=3): Compensated for JIT compilation error (optimized out I2S); Expected: -2, received: 65534
    2007-05-15 16:19:26.386 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=3): Compensated for JIT compilation error (optimized out I2S); Expected: -2, received: 65534
    2007-05-15 16:20:27.820 Tangosol Coherence AE 3.2/367 <D6> (thread=PacketPublisher, member=3): Member(Id=6, Timestamp=2007-05-14 18:45:04.621, Address=...:..., MachineId=38679) has failed to respond to 17 packets; declaring this member as paused.
    2007-05-15 16:20:29.382 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketPublisher, member=3): Member(Id=6, Timestamp=2007-05-14 18:45:04.621, Address=...:..., MachineId=38679) was unresponsive for 1562 ms, 24 packets have timed-out, PauseRate=0.0212, Paused=false, Deferring=false, OutstandingPackets=0, DeferredPackets=0, Threshold=1785
    2007-05-15 16:20:41.881 Tangosol Coherence AE 3.2/367 <D6> (thread=PacketPublisher, member=3): Member(Id=6, Timestamp=2007-05-14 18:45:04.621, Address=...:..., MachineId=38679) has failed to respond to 17 packets; declaring this member as paused.
    In addition, can someone explain the following warning messages we have in our Tangosol servers log files :
    2007-05-15 16:19:15.700 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketReceiver, member=n/a): Compensated for JIT compilation error (optimized out I2S); Expected: -1, received: 65535
    2007-05-15 16:20:29.382 Tangosol Coherence AE 3.2/367 <Warning> (thread=PacketPublisher, member=3): Member(Id=6, Timestamp=2007-05-14 18:45:04.621, Address=...:..., MachineId=38679) was unresponsive for 1562 ms, 24 packets have timed-out, PauseRate=0.0212, Paused=false, Deferring=false, OutstandingPackets=0, DeferredPackets=0, Threshold=1785
    We are a little bit blocked because we plan to install this environment on production.
    Regards.

    Hi
    We have 2 methods that check our cluster behavior :
    * This method returns the number of nodes in the Tangosol cluster.
    * @return String
    * @throws Throwable if an error occurs
    public String tangosolCount() throws Throwable {
         int tangosolNodesCount = 0;
         try {
         NamedCache cache = CacheFactory.getCache("cacheName");
         DistributedCacheService service = (DistributedCacheService) cache.getCacheService();
         tangosolNodesCount = service.getCluster().getMemberSet().size();
         } catch (Throwable t) {
              throw new RuntimeException("An error occurs when trying to access to Tangosol cache.", t);
    return String.valueOf(tangosolNodesCount);
    * This method returns the number of nodes on which storage is defined, in the Tangosol cluster.
    * @return String
    * @throws Throwable if an error occurs
    public String tangosolStorageCount() throws Throwable {
         int tangosolStorageCount = 0;
         try {
         NamedCache cache = CacheFactory.getCache("cacheName");
         DistributedCacheService service = (DistributedCacheService) cache.getCacheService();
         tangosolStorageCount = service.getStorageEnabledMembers().size();
         } catch (Throwable t) {
              throw new RuntimeException("An error occurs when trying to access to Tangosol cache.", t);
    return String.valueOf(tangosolStorageCount);
    These information are always displayed in a web page.
    The Tangosol log files are attached.
    Since yesterday 16:19 in Paris (till now, I do not restart the node that has no longer storage), we have tangosolCount=4 and tangosolStorageCount=1.
    The Tangosol servers are started (com.tangosol.net.DefaultCacheServer) with the parameter -Dtangosol.coherence.distributed.localstorage=true.
    I confirm that yesterday at 16:19, there was a high GC time. Heap sizes are set to 128Mo.
    Regards.<br><br> <b> Attachment: </b><br>Tangosol_1.log <br> (*To use this attachment you will need to rename 533.bin to Tangosol_1.log after the download is complete.)<br><br> <b> Attachment: </b><br>Tangosol_2.log <br> (*To use this attachment you will need to rename 534.bin to Tangosol_2.log after the download is complete.)

  • InvocationService.query() doesn't work for restarted cluster member.

    Hi,
    Thank you guys for that product, I'm very pleasured with Coherence cache.
    I will try to describe the following problem with Coherence.
    Our product integrated with Coherence cache and we have discovery/communication mechanism based on
    - MemberListener - to get events about cluster member;
    - InvocationService.query() - to communicate with remote member.
    I see that when node disconnected (wire break) MemberListener works fine. And after connection repaired sometimes cluster services on node stopped or restarted.
    I tested with 2 cluster islands and push/pull wire in my network router (2 coherence members on one side and 2 members on another side). Try to repair connection and see how cache merged.
    If node cluster services restarted I see that Member.getUid() changed for local member and MemberListener works fine for all nodes. But remote nodes can't send anything with InvocationService.query() to that restarted member!!!
    InvocationService.query() always return empty result when I try to send my AbstractInvocable to these restarted members in cluster.
    Could you help?
    I think there is bug...
    Best,
    Alex

    user4366011 wrote:
    Hi,
    Thank you guys for that product, I'm very pleasured with Coherence cache.
    I will try to describe the following problem with Coherence.
    Our product integrated with Coherence cache and we have discovery/communication mechanism based on
    - MemberListener - to get events about cluster member;
    - InvocationService.query() - to communicate with remote member.
    I see that when node disconnected (wire break) MemberListener works fine. And after connection repaired sometimes cluster services on node stopped or restarted.
    I tested with 2 cluster islands and push/pull wire in my network router (2 coherence members on one side and 2 members on another side). Try to repair connection and see how cache merged.
    If node cluster services restarted I see that Member.getUid() changed for local member and MemberListener works fine for all nodes. But remote nodes can't send anything with InvocationService.query() to that restarted member!!!
    InvocationService.query() always return empty result when I try to send my AbstractInvocable to these restarted members in cluster.
    Could you help?
    I think there is bug...
    Best,
    AlexHi Alex,
    could you please post your cache configuration file? Also, is the invocation service marked to be autostarted in the config file?
    Best regards,
    Robert

  • RMI Objects running on single cluster memb. only avail on that

              Hi,
              My setup:
              An RMI object shall be bound to a single cluster member.
              Only one cluster member has therefore the startup entry. The
              RMI Object is thus instantiated only in this one cluster member.
              I now expected to have the weblogic.rmi.myclass JNDI entry
              replicated into all cluster member's jndi context (thus having
              a simple way for implementing a singleton? does this work??).
              All clients, which are loadbalanced to a cluster-member
              that does not the startup entry will fail to lookup the rmi object's
              entry
              with the following.
              Why is the entry not replicated into the other machines naming context?
              javax.naming.NameNotFoundException: 'mybindname'; remaining name
              'mybindname'
              at
              weblogic.rmi.extensions.BasicRequest.sendReceive(BasicRequest.java:44)
              at
              weblogic.jndi.WLContext_WLStub.lookup(WLContext_WLStub.java:192)
              at
              weblogic.jndi.toolkit.WLContextStub.lookup(WLContextStub.java:545)
              at mypackage.MyClass.lookup(...);
              Thank you for any help on this & happy new year
              Toby
              

    We too are looking to achieve the same thing. Anyone know the answer to
              Tobias's question?
              david
              Tobias Christen wrote:
              > Hi,
              >
              > My setup:
              > An RMI object shall be bound to a single cluster member.
              > Only one cluster member has therefore the startup entry. The
              > RMI Object is thus instantiated only in this one cluster member.
              > I now expected to have the weblogic.rmi.myclass JNDI entry
              > replicated into all cluster member's jndi context (thus having
              > a simple way for implementing a singleton? does this work??).
              >
              > All clients, which are loadbalanced to a cluster-member
              > that does not the startup entry will fail to lookup the rmi object's
              > entry
              > with the following.
              >
              > Why is the entry not replicated into the other machines naming context?
              >
              > javax.naming.NameNotFoundException: 'mybindname'; remaining name
              > 'mybindname'
              > at
              > weblogic.rmi.extensions.BasicRequest.sendReceive(BasicRequest.java:44)
              > at
              > weblogic.jndi.WLContext_WLStub.lookup(WLContext_WLStub.java:192)
              > at
              > weblogic.jndi.toolkit.WLContextStub.lookup(WLContextStub.java:545)
              > at mypackage.MyClass.lookup(...);
              >
              > Thank you for any help on this & happy new year
              > Toby
              David Michaels <[email protected]>
              Director of Technology
              ShockMarket Corporation (650) 330-4665
              [david.vcf]
              

  • Cluster member caches a copy of DefaultWebApp from admin server

    Hi,
              It seems that when a cluster member starts up, it fetches a copy of
              DefaultWebApp web application from admin server, and keeps it under a
              directory of $BEA_HOME/wlserver6.1/config/mydomain/applications/.wlnotdelete_$SERVER_NAME,
              and it doesn't look into its local copy of DefaultWebApp even being
              modified.
              How does this happen? How can it be disabled?
              Thanks in advance,
              William
              

    Hi,
              this does probably occur due a change in WLS 6.1 which was done in order to assure the server
              to function properly even when the customer deletes the defaultWebApp. If this is the case
              this
              cannot switched off.
              Do you observe other symptoms, which lets you think you have a problem with your server ?
              Regards,
              Christian Buchegger
              Developer Relations Engineer
              BEA Support
              William Li schrieb:
              > Hi,
              >
              > It seems that when a cluster member starts up, it fetches a copy of
              > DefaultWebApp web application from admin server, and keeps it under a
              > directory of $BEA_HOME/wlserver6.1/config/mydomain/applications/.wlnotdelete_$SERVER_NAME,
              > and it doesn't look into its local copy of DefaultWebApp even being
              > modified.
              >
              > How does this happen? How can it be disabled?
              >
              > Thanks in advance,
              >
              > William
              

  • Transfer ID of a cluster member

    Hello,
    I cannot found a documentation about using transfer ID if the host is a cluster member.
    I would like to use transfer ID to replace a host (same old SAN). There is no service in miggui to transfer the cluster settings. The new server has no idea that it is a cluster member.
    Same situation for iPrint and Groupwise. It isn't necessary to migrate the data from iPrint or groupwise because it is located on shared storage but I think that there are some settings on the local drive.
    could someone help me?
    Chris

    $(UTransfer ID can't be done on the node participating in the cluster using MigGUI. Rather it is recommended to perform rolling cluster upgrade by adding the OES11 nodes to Netware cluster nodes and migrate the cluster services from old node to new node.
    -Ramesh
    >>> chris im foruml<[email protected]> 01/09/2013 09:16 PM >>>
    Hello,
    I cannot found a documentation about using transfer ID if the host is a
    cluster member.
    I would like to use transfer ID to replace a host (same old SAN). There
    is no service in miggui to transfer the cluster settings. The new server
    has no idea that it is a cluster member.
    Same situation for iPrint and Groupwise. It isn't necessary to migrate
    the data from iPrint or groupwise because it is located on shared
    storage but I think that there are some settings on the local drive.
    could someone help me?
    Chris
    chris_im_foruml
    chris_im_foruml's Profile: http://forums.novell.com/member.php?userid=9831
    View this thread: http://forums.novell.com/showthread.php?t=462963

  • Cluster Member Identification question

    If a particular node is listed in the failover cluster manager, is it definitely a member? I know this may seem like an easy question, but I am just learning this and I am trying to make sure I am getting accurate info. We have 4 physical servers that hold
    multiple VMs ... I was told that although the VMs show up as part of the cluster that does not mean that they are. Can someone help me with this? How can I tell if they are or aren't? 
    Thanks

    Hi Charles3479,
    Naturally, cluster node must be artificial added in to cluster manager, “I was told that although the VMs show up as part of the cluster that does not mean that they are” may
    you mean to use the VM as cluster node, If I misunderstand please correct me,as far as I know there don’t have the official doc indicate that is supported, but personal experience is it work properly, I hope can give you more idea.
    More information:
    Failover Cluster Step-by-Step Guide: Configuring a Two-Node File Server Failover Cluster
    https://technet.microsoft.com/en-us/library/cc731844(v=ws.10).aspx
    I’m glad to be of help to you!
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

  • Webserver on same machine as cluster member

    I understand that every member node of a cluster must have its own
              fixed, unique IP address. So that if we have a 2-tier architecture with
              two servlet/ejb servers, we would have one machine for each of those.
              Can the webserver be on the same machine as the servlet/ejb server?
              

              Yes, just use a different port number.
              There's nothing that prevents you from using the same WLS for BOTH a web server
              and an EJB server.
              Mike
              Mark Savory <[email protected]> wrote:
              >I understand that every member node of a cluster must have its own
              >fixed, unique IP address. So that if we have a 2-tier architecture with
              >two servlet/ejb servers, we would have one machine for each of those.
              >Can the webserver be on the same machine as the servlet/ejb server?
              >
              

  • Cluster Node Failure

    Hi,
    I am a Columbia University engineering graduate student doing research on Sun Clusters. I just have two quick questions that I was unable to find answers for on the Sun Cluster Documentation website. If any1 can help me answer these two questions I would really appreciate it. The questions are as follows.
    1. If a node of a cluster fails, how are resources such as file locks or client sessions recovered, rebuilt, if at all? I understand that if a node fails while talking to a client, it gets restarted on the same node if it is healthy or a backup node if it is not. But I am not clear on what happens to resources such as file locks that were originally owned by the node.
    2. If a node of a cluster fails, how are internal data of the node recovered, rebuilt, if at all? I mean things like caches or internal data structures.
    Please let me know.
    Thanks,
    Larry Chen

    I assume by 3.0 you actually mean 3.x because 3.0 is very old technology now. We're currently on 3.2, which is the 7th or 8th release of the software. Furthermore, I don't think these are particularly simple things to give answers for, so I may have to refer you to other material for longer answers.
    You may find most of the answers you want in the book I co-wrote with Richard Elling entitled "Designing Enterprise Solutions with Sun Cluster 3.0". Chapter 3 covers some of the stuff you're asking about.
    From your question I'm assuming you're asking about NFS? If so, the NFS protocol together with statd and lockd co-ordinate client recovery after an NFS fail-over (although to the client, it just looks like the same server came back up quickly).
    For objects that do have state in the kernel, e.g. writes to PxFS, global devices, etc, these are all handled by a highly available services framework in the kernel. They effectively use a two phase commit-like protocol to ensure that these operations can continue after failures.
    Hope that helps (somewhat),
    Tim
    ---

  • OC4J 9.0.3 cluster member on W2K

    Hello,
    We have troubles with configuring our application server cluster. Our cluster consists of two computers, first is running W2K and second Linux. Our network is multicast enabled. We choose folowing settings :
    rmi.xml on both servers, admin user and password are the same
    <cluster host="230.0.0.1" port="9000" id="1" username="admin" password="orion" />
    server.xml on Linux
    <cluster id="11" />
    server.xml on W2K
    <cluster id="12" />
    The problem is, that command "ping -t 230.0.0.1" reports "timed out" when runnig the W2K server first, so it seems that deamon on W2K that listens on multicast address doesn't start.
    When we start Linux server first, everything is ok, ping reports time in milliseconds.
    In fact it means that cluster does'n work, servers in cluster doesn't discover themselves.
    Please help.
    Thanks in advance.

    Radim -- I believe that you have run into a previously reported issue with the 9.0.3 Developer Preview. There is
    currently no known workaround for the problem that you encountered. We intended the Developer Preview to show
    some specific J2EE 1.3 functionality and therefore did not go to the normal lengths we do for our production releases.
    Also, we are planning on using this mechanism in the future to show new functionality to the community and to
    get feedback on it. I know that we will release another Developer Preview but not for the next few weeks. This
    functionality will be available in our 2.0 production release which should be available before the next Developer Preview.
    Thanks -- Jeff

  • SQL SERVER Failover Cluster switch failure because the passive node automatically reassign drive letter

    I switch the sql server resource group to the standby node , when the disk resource ready bring online in the passive node ,then occur exception. because the original dependency disk resource the drive letter is 'K:' , BUT when the disk bring online , it
    automatically reassign new drive letter 'H:' ,  So the sql server resource couldnot bring online . And After Manual modify the drive letter to 'K:' in the passive node , It Works !  So my question is why it not use the original drive letter
    and reassign a new one . what reasons would be cause it ? mount point ? Some log as follows:
    00001cbc.000004e0::2015/03/12-14:41:11.377 WARN  [RES] Physical Disk <FltLowestPrice_K>: OnlineThread: Failed to set volguid \??\Volume{e32c13d5-02e6-4924-a2d9-59a6fae1a1be}. Error: 183.
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk <FltLowestPrice_K>: Found 2 mount points for device \Device\Harddisk8\Partition2
    00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: PNP: Update volume exit, status 1168
    00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: PNP: Updating volume
    \\?\STORAGE#Volume#{1a8ddb8e-fe43-11e2-b7c5-6c3be5a5cdca}#0000000008100000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b}
    00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: PNP: Update volume exit, status 5023
    00001cbc.000004e0::2015/03/12-14:41:11.377 ERR   [RES] Physical Disk: Failed to get volname for drive H:\, status 2
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk <FltLowestPrice_K>: VolumeIsNtfs: Volume
    \\?\GLOBALROOT\Device\Harddisk8\Partition2\ has FS type NTFS
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: Volume
    \\?\GLOBALROOT\Device\Harddisk8\Partition2\ has FS type NTFS
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: MountPoint H:\ points to volume
    \\?\Volume{e32c13d5-02e6-4924-a2d9-59a6fae1a1be}\

    Sounds like you have an cluster hive that is out of date/bad, or some registry settings which are incorrect. You'll want to have this question transferred to the windows forum as that's really what you're asking about.
    -Sean
    The views, opinions, and posts do not reflect those of my company and are solely my own. No warranty, service, or results are expressed or implied.

  • Install BizTalk 2013 R2 + SQL Server on cluster - MSDTC Failure

    Hi, Folks.
    I'm trying to install BTS 2013 R2 using SQL Server on a cluster. Successfully I've configured SSO (on same BTS server), BRE and Group, perhaps when I try to install Runtime there is an error:
    The Microsoft Distributed Transaction Coordinator (MSDTC) may not be configured correctly. Ensure that the MSDTC service is running and DTC network access is allowed on the BizTalk, SQL and SSO Master servers. For more information, see "MSDTC Configuration
    settings required for BizTalk Server" in the BizTalk Server Help.
    Internal error: "New transaction cannot enlist in the specified transaction coordinator. "
    Well, I get DTCPing and take a test between my BTS server and MSDTC cluster server, which runs fine. I don't have any firewall between those servers. After, I've checked my DTC settings in both sides. They are configured properly, according to MS:
    MSDTC Cluster settings
    BTS Server settings
    After, I've looked to Event Viewer and I found a warning message from SSO every 30 seconds when BTS Config is trying to install BTS Runtime:
    Could not access the SSO database. If this condition persists, the SSO service will go offline.
     Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding..
     SQL Error code: 0xFFFFFFFE
    I "googled" it and found this issue is generally related to my BizTalk user permission on database server, perhaps, my user have high privileges in all DB servers which compose my cluster.
     All servers (3 from DB cluster and 1 for BTS) runs Windows 2012 R2 64-bit, my SQL Server version is 2014 and BTS user and related groups bellows to my domain. I really don't understand what's going on.

    Hi, Shankycheil.
    My 3 SQL Cluster nodes share the same CID, so, I've reconfigured two nodes, reboot each server and install MSDTC again. After that, DTCPing stops to show the CID warning.
    MSDTC cluster residents on same server with have SQL Server cluster.
    After, following Ashwin Prabhu suggestion, I've unconfig and config all BTS itens again, perhaps, at BTS Group I got the same error.
    Looking at my BTS MSDTC trace file I see an timeout erro, while BTS Group config is running:
    pid=4888       ;tid=3424       ;time=03/02/2015-15:18:16.560   ;seq=11         ;eventid=TRACING_STARTED                          ;;"TM
    Identifier='(null)                                            '" ;"MSDTC is resuming the tracing of long - lived transactions"
    pid=4888       ;tid=3424       ;time=03/02/2015-15:18:17.145   ;seq=12         ;eventid=TRANSACTION_BEGUN                        ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
        ;"TM Identifier='(null)                                            '" ;"transaction has begun, description :'<NULL>'"
    pid=4888       ;tid=3424       ;time=03/02/2015-15:18:17.145   ;seq=13         ;eventid=TRANSACTION_PROPOGATED_TO_CHILD_NODE     ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae    
    ;"TM Identifier='(null)                                            '" ;"transaction propagated to 'xxxxx' as transaction child node
    #1"
    pid=4888       ;tid=4964       ;time=03/02/2015-15:23:40.801   ;seq=14         ;eventid=ABORT_DUE_TO_TRANSACTION_TIMER_EXPIRED   ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
        ;"TM Identifier='(null)                                            '" ;"transaction timeout expired"
    pid=4888       ;tid=4964       ;time=03/02/2015-15:23:40.801   ;seq=15         ;eventid=TRANSACTION_ABORTING                
        ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae     ;"TM Identifier='(null)                                            '"
    ;"transaction is aborting"
    pid=4888       ;tid=4964       ;time=03/02/2015-15:23:40.801   ;seq=16         ;eventid=CHILD_NODE_ISSUED_ABORT                  ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
        ;"TM Identifier='(null)                                            '" ;"abort request issued to transaction child node
    #1 'xxxxx'"
    pid=4888       ;tid=4308       ;time=03/02/2015-15:23:40.801   ;seq=17         ;eventid=CHILD_NODE_ACKNOWLEDGED_ABORT            ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
        ;"TM Identifier='(null)                                            '" ;"received acknowledgement of abort request from
    transaction child node #1 'xxxxx'"
    pid=4888       ;tid=4308       ;time=03/02/2015-15:23:40.801   ;seq=18         ;eventid=TRANSACTION_ABORTED                      ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae
        ;"TM Identifier='(null)                                            '" ;"transaction has been aborted"
    pid=4888       ;tid=4308       ;time=03/02/2015-15:23:41.674   ;seq=19         ;eventid=TRANSACTION_PROPAGATION_FAILED_TRANSACTION_NOT_FOUND ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae    
    ;"TM Identifier='(null)                                            '" ;"failed to propagate transaction to child node 'xxxxx' because
    the transaction could not be found. Some possible reasons include, client might have already called commit or transaction might have got aborted due to timeout."
    pid=4888       ;tid=4308       ;time=03/02/2015-15:23:41.675   ;seq=20         ;eventid=TRANSACTION_PROPAGATION_FAILED_TRANSACTION_NOT_FOUND ;tx_guid=56b93685-6ada-47bc-8a80-c30ff7ad66ae    
    ;"TM Identifier='(null)                                            '" ;"failed to propagate transaction to child node 'xxxxx' because
    the transaction could not be found. Some possible reasons include, client might have already called commit or transaction might have got aborted due to timeout."
    My remote MSDTC server has a huge capability and I don't see any high consume process at this time to give me this timeout error.
    At least, I've tried run "msdtc -tmMappingSet"
    and "msdtc.exe -tmMappingView"
    on BTS server, but I got an error message from msdtc.exe:
    Error occurred while trying to perform the above operation. Check the trace file for more information
    I don't see any error at trace, only at eventvwr showing an error event "Invalid command line arguments.". This configuration must be done at BTS server-side or my MSDTC cluster?

  • SQL 2008 R2 cluster installation failure - Failed to find shared disks

    Hi,
    The validation tests in the SQL 2008R2 cluster installation (running Windows 2008 R2) fails with the following error. The cluster has one root mount point with multiple mount points :
    "The cluster on this computer does not have a shared disk available. To continue, at least one shared disk must be available'.
    The "Detail.txt" log has alot of "access is denied" errors and here is just a sample. Any ideas what might be causing this issue?
    2010-09-29 12:54:08 Slp: Initializing rule      : Cluster shared disk available check
    2010-09-29 12:54:08 Slp: Rule applied features  : ALL
    2010-09-29 12:54:08 Slp: Rule is will be executed  : True
    2010-09-29 12:54:08 Slp: Init rule target object: Microsoft.SqlServer.Configuration.Cluster.Rules.ClusterSharedDiskFacet
    2010-09-29 12:54:09 Slp: The disk resource 'QUORUM' cannot be used as a shared disk because it's a cluster quorum drive.
    2010-09-29 12:54:09 Slp: Mount point status for disk 'QUORUM' could not be determined.  Reason: 'The disk resource 'QUORUM' cannot be used because it is a cluster quorum drive.'
    2010-09-29 12:54:09 Slp: System Error: 5 trying to find mount points at path
    \\?\Volume{e1f5ca48-c798-11df-9401-0026b975df1a}\
    2010-09-29 12:54:09 Slp:     Access is denied.
    2010-09-29 12:54:09 Slp: Mount point status for disk 'SQL01_BAK01' could not be determined.  Reason: 'The search for mount points failed.  Error: Access is denied.'
    2010-09-29 12:54:10 Slp: System Error: 5 trying to find mount points at path
    \\?\Volume{e1f5ca4f-c798-11df-9401-0026b975df1a}\
    2010-09-29 12:54:10 Slp:     Access is denied.
    2010-09-29 12:54:10 Slp: Mount point status for disk 'SQL01_DAT01' could not be determined.  Reason: 'The search for mount points failed.  Error: Access is denied.'
    2010-09-29 12:54:10 Slp: System Error: 5 trying to find mount points at path
    \\?\Volume{e1f5ca56-c798-11df-9401-0026b975df1a}\
    2010-09-29 12:54:10 Slp:     Access is denied.
    Thanks,
    PK

    Hi,
    We were asked by the PSS engineer to give the following privileges the account used to install SQL Server - i am referring to the user domain account as apposed to the SQL service account. These privileges were already applied to the
    SQL service account prior to the SQL installation. Assigning these privileges to the user account resolved the issue.
      Act as Part of the Operating Sywstem = SeTcbPrivileg
      Bypass Traverse Checking = SeChangeNotify
      Lock Pages In Memory = SeLockMemory
      Log on as a Batch Job = SeBatchLogonRight
      Log on as a Service = SeServiceLogonRight
      Replace a Process Level Token = SeAssignPrimaryTokenPrivilege
    Thanks for everyones assistance.
    Cheers,
    PK

  • LDOM SUN Cluster Interconnect failure

    I am making a test SUN-Cluster on Solaris 10 in LDOM 1.3.
    in my environment, i have T5120, i have setup two guest OS with some configurations, setup sun cluster software, when executed, scinstall, it failed.
    node 2 come up, but node 1 throws following messgaes:
    Boot device: /virtual-devices@100/channel-devices@200/disk@0:a File and args:
    SunOS Release 5.10 Version Generic_139555-08 64-bit
    Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
    Use is subject to license terms.
    Hostname: test1
    Configuring devices.
    Loading smf(5) service descriptions: 37/37
    /usr/cluster/bin/scdidadm: Could not load DID instance list.
    /usr/cluster/bin/scdidadm: Cannot open /etc/cluster/ccr/did_instances.
    Booting as part of a cluster
    NOTICE: CMM: Node test2 (nodeid = 1) with votecount = 1 added.
    NOTICE: CMM: Node test1 (nodeid = 2) with votecount = 0 added.
    NOTICE: clcomm: Adapter vnet2 constructed
    NOTICE: clcomm: Adapter vnet1 constructed
    NOTICE: CMM: Node test1: attempting to join cluster.
    NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
    NOTICE: clcomm: Path test1:vnet1 - test2:vnet1 errors during initiation
    NOTICE: clcomm: Path test1:vnet2 - test2:vnet2 errors during initiation
    WARNING: Path test1:vnet1 - test2:vnet1 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    WARNING: Path test1:vnet2 - test2:vnet2 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    clcomm: Path test1:vnet2 - test2:vnet2 errors during initiation
    CREATED VIRTUAL SWITCH AND VNETS ON PRIMARY DOMAIN LIKE:<>
    532 ldm add-vsw mode=sc cluster-vsw0 primary
    533 ldm add-vsw mode=sc cluster-vsw1 primary
    535 ldm add-vnet vnet2 cluster-vsw0 test1
    536 ldm add-vnet vnet3 cluster-vsw1 test1
    540 ldm add-vnet vnet2 cluster-vsw0 test2
    541 ldm add-vnet vnet3 cluster-vsw1 test2
    Primary DOmain<>
    bash-3.00# dladm show-dev
    vsw0 link: up speed: 1000 Mbps duplex: full
    vsw1 link: up speed: 0 Mbps duplex: unknown
    vsw2 link: up speed: 0 Mbps duplex: unknown
    e1000g0 link: up speed: 1000 Mbps duplex: full
    e1000g1 link: down speed: 0 Mbps duplex: half
    e1000g2 link: down speed: 0 Mbps duplex: half
    e1000g3 link: up speed: 1000 Mbps duplex: full
    bash-3.00# dladm show-link
    vsw0 type: non-vlan mtu: 1500 device: vsw0
    vsw1 type: non-vlan mtu: 1500 device: vsw1
    vsw2 type: non-vlan mtu: 1500 device: vsw2
    e1000g0 type: non-vlan mtu: 1500 device: e1000g0
    e1000g1 type: non-vlan mtu: 1500 device: e1000g1
    e1000g2 type: non-vlan mtu: 1500 device: e1000g2
    e1000g3 type: non-vlan mtu: 1500 device: e1000g3
    bash-3.00#
    NOde1<>
    -bash-3.00# dladm show-link
    vnet0 type: non-vlan mtu: 1500 device: vnet0
    vnet1 type: non-vlan mtu: 1500 device: vnet1
    vnet2 type: non-vlan mtu: 1500 device: vnet2
    -bash-3.00# dladm show-dev
    vnet0 link: unknown speed: 0 Mbps duplex: unknown
    vnet1 link: unknown speed: 0 Mbps duplex: unknown
    vnet2 link: unknown speed: 0 Mbps duplex: unknown
    -bash-3.00#
    NODE2<>
    -bash-3.00# dladm show-link
    vnet0 type: non-vlan mtu: 1500 device: vnet0
    vnet1 type: non-vlan mtu: 1500 device: vnet1
    vnet2 type: non-vlan mtu: 1500 device: vnet2
    -bash-3.00#
    -bash-3.00#
    -bash-3.00# dladm show-dev
    vnet0 link: unknown speed: 0 Mbps duplex: unknown
    vnet1 link: unknown speed: 0 Mbps duplex: unknown
    vnet2 link: unknown speed: 0 Mbps duplex: unknown
    -bash-3.00#
    and this configuration i give while setting up scinstall
    Cluster Transport Adapters and Cables <<<You must identify the two cluster transport adapters which attach
    this node to the private cluster interconnect.
    For node "test1",
    What is the name of the first cluster transport adapter [vnet1]?
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    All transport adapters support the "dlpi" transport type. Ethernet
    and Infiniband adapters are supported only with the "dlpi" transport;
    however, other adapter types may support other types of transport.
    For node "test1",
    Is "vnet1" an Ethernet adapter (yes/no) [yes]?
    Is "vnet1" an Infiniband adapter (yes/no) [yes]? no
    For node "test1",
    What is the name of the second cluster transport adapter [vnet3]? vnet2
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    For node "test1",
    Name of the switch to which "vnet2" is connected [switch2]?
    For node "test1",
    Use the default port name for the "vnet2" connection (yes/no) [yes]?
    For node "test2",
    What is the name of the first cluster transport adapter [vnet1]?
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    For node "test2",
    Name of the switch to which "vnet1" is connected [switch1]?
    For node "test2",
    Use the default port name for the "vnet1" connection (yes/no) [yes]?
    For node "test2",
    What is the name of the second cluster transport adapter [vnet2]?
    Will this be a dedicated cluster transport adapter (yes/no) [yes]?
    For node "test2",
    Name of the switch to which "vnet2" is connected [switch2]?
    For node "test2",
    Use the default port name for the "vnet2" connection (yes/no) [yes]?
    i have setup the configurations like.
    ldm list -l nodename
    NODE1<>
    NETWORK
    NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP
    vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:61:63 1 1500
    vnet2 cluster-vsw0@primary 1 network@1 00:14:4f:f8:87:27 1 1500
    vnet3 cluster-vsw1@primary 2 network@2 00:14:4f:f8:f0:db 1 1500
    ldm list -l nodename
    NODE2<>
    NETWORK
    NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP
    vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:a1:68 1 1500
    vnet2 cluster-vsw0@primary 1 network@1 00:14:4f:f9:3e:3d 1 1500
    vnet3 cluster-vsw1@primary 2 network@2 00:14:4f:fb:03:83 1 1500
    ldm list-services
    VSW
    NAME LDOM MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    primary-vsw0 primary 00:14:4f:f9:25:5e e1000g0 0 switch@0 1 1 1500 on
    cluster-vsw0 primary 00:14:4f:fb:db:cb 1 switch@1 1 1 1500 sc on
    cluster-vsw1 primary 00:14:4f:fa:c1:58 2 switch@2 1 1 1500 sc on
    ldm list-bindings primary
    VSW
    NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    primary-vsw0 00:14:4f:f9:25:5e e1000g0 0 switch@0 1 1 1500 on
    PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
    vnet1@gitserver 00:14:4f:f8:c0:5f 1 1500
    vnet1@racc2 00:14:4f:f8:2e:37 1 1500
    vnet1@test1 00:14:4f:f9:61:63 1 1500
    vnet1@test2 00:14:4f:f9:a1:68 1 1500
    NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    cluster-vsw0 00:14:4f:fb:db:cb 1 switch@1 1 1 1500 sc on
    PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
    vnet2@test1 00:14:4f:f8:87:27 1 1500
    vnet2@test2 00:14:4f:f9:3e:3d 1 1500
    NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
    cluster-vsw1 00:14:4f:fa:c1:58 2 switch@2 1 1 1500 sc on
    PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
    vnet3@test1 00:14:4f:f8:f0:db 1 1500
    vnet3@test2 00:14:4f:fb:03:83 1 1500
    Any Idea Team, i beleive the cluster interconnect adapters were not successfull.
    I need any guidance/any clue, how to correct the private interconnect for clustering in two guest LDOMS.

    You dont have to stick to default IP's or subnet . You can change to whatever IP's you need. Whatever subnet mask you need. Even change the private names.
    You can do all this during install or even after install.
    Read the cluster install doc at docs.sun.com

  • Cluster resource failure

    Cluster resource 'FileServer-(HFS)(Cluster Disk 7)' in clustered service or application '***' failed

    Hi,
    Did you checked the failed disk is available or not in cluster manager console?
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

Maybe you are looking for

  • How can i change my ipods apple id from one id to another?

    can anyone tell me how i can change my apple id on my ipod so my ipod is on a diiferent id?

  • Supplier Bid Report in eSourcing timing out

    The Supplier Bid Report in eSourcing is timing out. Even when I select one vendor, it just hangs. Not sure if anyone has seen this before. I urgently need help on resolving this. Version of eSourcing is 4.2. Thanks, Dapo.

  • Policy Files

    I am using WindowsNT and want to create a policy file named .java.policy ,however, windows will not allow me to create a file with a name beginning with a dot as it expects this to be preceded by a filename. How can I get round this?

  • EZXS88W Maybe power adapter fail???

    Intermittantly, (0 to 5 times a day) the power light will go off and on with a result of all lights for the 8 ports going off and then all on while it resets and this may happen several times sucessivly. Of course the Computer loses the 100mb connect

  • Rfc input problem

    Hi I am calling a RFC in Webdynpro which saves an order in R/3 database It has 3 fields Order no, date from date to when i execute this RFc using data Order no 0030 date from 12.10.2008 date to   12.10.2008 the order gets created .Now wen i pass thes