Cluster node fails after testing removing both interconnects in a two node

Hi,
cluster node panics and fails to join cluster after testing removing both interconnects in a two node cluster. cluster is up on one node , but the panic'ed node fails to rejoin cluster saying no sufficient quorum yet and both clinterconn failed (even after conencting the interconn). Quorum device used is a shared disk.
Is this a bug?
Any workaround or solution?
Cluster is 3.2 SPARC
Thanking you
Ushas Symon

Sounds like a networking problem to me. If the failed node genuinely can't communicate with the remaining node then it will not be allowed to join the cluster, hence the quorum message. I would suspect either:
* Misconnected cables
* A switch that has block or disabled the port
* A failed auto-negotiation
This is of course without knowing anything about what your network infrastructure actually is!
Tim
---

Similar Messages

  • How can I get rid of "easy inline"-type hyperlinks even after having removed both easy-inline and yontoo?

    I'm using Firefox 18.0.
    I searched the internet for a solution to my problem, and the closest result was Mozilla Support item titled "The latest update to Firefox (14.0.1) has added something called "easy inline" which is very, very annoying. Please, anyone, how do I TURN IT OFF???", from July 18, 2012.
    I followed all recommendations but "random" phrases still become hyperlinks that display "balloon" ads on mouse-over. As for Yontoo, I do recall downloading some software over the summer that brought Yontoo along with it, but I've since uninstalled both. When I came across the support article, above, I followed the link to uninstall Yantoo, and performed it again.
    Currently, neither "Yontoo", nor "Easy inline" appear as Firefox extensions or add-ons, and there are no folders or files by (or containing) those names anywhere on my hard drive.
    Does anyone have a suggestion as to what this could still be, and how to get it turned off, once and for all?
    Thank You

    The Reset Firefox feature can fix many issues by restoring Firefox to its factory default state while saving your essential information.
    Note: ''This will cause you to lose any Extensions, Open websites, and some Preferences.''
    To Reset Firefox do the following:
    #Go to Firefox > Help > Troubleshooting Information.
    #Click the "Reset Firefox" button.
    #Firefox will close and reset. After Firefox is done, it will show a window with the information that is imported. Click Finish.
    #Firefox will open with all factory defaults applied.
    Further information can be found in the [[Reset Firefox – easily fix most problems]] article.
    Did this fix your problems? Please report back to us!

  • Oracle RAC performance Suddenly terminates on one of the two node cluster

    I have a strange problem that happens frequently from time to time when My M400 Machine which is a part of two node RAC cluster goes down suddenly
    I tried so many times to understand what's the cause behind that but when I read the logs there are so many messages related to the Oracle RAC which I don't have any experience or knowledge about so I hope I can find here any one who can explain to me these log messages knowing that they are always the same
    Jun 18 08:30:00 kfc-rac1 sendmail[17709]: [ID 702911 mail.crit] My unqualified host name (kfc-rac1) unknown; sleeping for retry
    Jun 18 08:31:00 kfc-rac1 sendmail[17709]: [ID 702911 mail.alert] unable to qualify my own domain name (kfc-rac1) -- using short name
    Jun 18 11:44:15 kfc-rac1 iscsi: [ID 454097 kern.notice] NOTICE: unrecognized ioctl 0x403
    Jun 18 11:44:15 kfc-rac1 scsi: [ID 243001 kern.warning] WARNING: /pseudo/fcp@0 (fcp0):
    Jun 18 11:44:15 kfc-rac1 Invalid ioctl opcode = 0x403
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_udlm_monitor_stop> for resource <rac-udlm-rs>, resource group <rac-fw-rg>, node <kfc-rac1
    , timeout <300> secondsJun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_framework_monitor_stop> for resource <rac-fw-rs>, resource group <rac-fw-rg>, node <kfc-r
    ac1>, timeout <3600> seconds
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_svm_monitor_stop> for resource <rac-svm-rs>, resource group <rac-fw-rg>, node <kfc-rac1>,
    timeout <300> seconds
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <scal_dg_monitor_stop> for resource <scal-racdg-rs>, resource group <scal-racdg-rg>, node <kfc-rac
    1>, timeout <300> seconds
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <scal_mountpoint_monitor_stop> for resource <racfs-mntpnt-rs>, resource group <racfs-mntpnt-rg>, n
    ode <kfc-rac1>, timeout <300> seconds
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_framework_monitor_stop> completed successfully for resource <rac-fw-rs>, resource group <rac-fw-rg>
    , node <kfc-rac1>, time used: 0% of timeout <3600 seconds>
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_udlm_monitor_stop> completed successfully for resource <rac-udlm-rs>, resource group <rac-fw-rg>, n
    ode <kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_svm_monitor_stop> completed successfully for resource <rac-svm-rs>, resource group <rac-fw-rg>, nod
    e <kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_udlm_stop> for resource <rac-udlm-rs>, resource group <rac-fw-rg>, node <kfc-rac1>, timeo
    ut <300> seconds
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <scal_dg_monitor_stop> completed successfully for resource <scal-racdg-rs>, resource group <scal-racdg-rg>,
    node <kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <scal_mountpoint_monitor_stop> completed successfully for resource <racfs-mntpnt-rs>, resource group <racfs-
    mntpnt-rg>, node <kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <scal_mountpoint_postnet_stop> for resource <racfs-mntpnt-rs>, resource group <racfs-mntpnt-rg>, n
    ode <kfc-rac1>, timeout <300> seconds
    Jun 18 17:09:41 kfc-rac1 SC[SUNW.rac_udlm.rac_udlm_stop]: [ID 854390 daemon.notice] Resource state of rac-udlm-rs is changed to offline. Note that RAC framework will not be stopped by STOP me
    thod.
    Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_udlm_stop> completed successfully for resource <rac-udlm-rs>, resource group <rac-fw-rg>, node <kfc
    -rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:42 kfc-rac1 samfs: [ID 320134 kern.notice] NOTICE: SAM-QFS: racfs: Initiated unmount filesystem: vers 2
    Jun 18 17:09:43 kfc-rac1 samfs: [ID 522083 kern.notice] NOTICE: SAM-QFS: racfs: Completed unmount filesystem: vers 2
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <scal_mountpoint_postnet_stop> completed successfully for resource <racfs-mntpnt-rs>, resource group <racfs-
    mntpnt-rg>, node <kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <scal_dg_postnet_stop> for resource <scal-racdg-rs>, resource group <scal-racdg-rg>, node <kfc-rac
    1>, timeout <300> seconds
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <scal_dg_postnet_stop> completed successfully for resource <scal-racdg-rs>, resource group <scal-racdg-rg>,
    node <kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_svm_stop> for resource <rac-svm-rs>, resource group <rac-fw-rg>, node <kfc-rac1>, timeout
    <300> seconds
    Jun 18 17:09:43 kfc-rac1 SC[SUNW.rac_svm.rac_svm_stop]: [ID 854390 daemon.notice] Resource state of rac-svm-rs is changed to offline. Note that RAC framework will not be stopped by STOP metho
    d.
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_svm_stop> completed successfully for resource <rac-svm-rs>, resource group <rac-fw-rg>, node <kfc-r
    ac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_framework_stop> for resource <rac-fw-rs>, resource group <rac-fw-rg>, node <kfc-rac1>, ti
    meout <300> seconds
    Jun 18 17:09:43 kfc-rac1 SC[SUNW.rac_framework.rac_framework_stop]: [ID 854390 daemon.notice] Resource state of rac-fw-rs is changed to offline. Note that RAC framework will not be stopped by
    STOP method.
    Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_framework_stop> completed successfully for resource <rac-fw-rs>, resource group <rac-fw-rg>, node <
    kfc-rac1>, time used: 0% of timeout <300 seconds>
    Jun 18 17:09:44 kfc-rac1 root: [ID 702911 user.error] Oracle CRSD 3932 set to stop
    Jun 18 17:09:44 kfc-rac1 root: [ID 702911 user.error] Oracle CRSD 3932 shutdown completed
    Jun 18 17:09:44 kfc-rac1 root: [ID 702911 user.error] Oracle EVMD set to stop
    Jun 18 17:09:44 kfc-rac1 root: [ID 702911 user.error] Oracle CSSD being stopped
    Jun 18 17:09:45 kfc-rac1 xntpd[980]: [ID 866926 daemon.notice] xntpd exiting on signal 15
    Jun 18 17:09:45 kfc-rac1 ip: [ID 646971 kern.notice] ip_create_dl: hw addr length = 0
    Jun 18 17:09:45 kfc-rac1 pppd[516]: [ID 702911 daemon.notice] Connection terminated.
    Jun 18 17:09:47 kfc-rac1 pppd[9462]: [ID 860527 daemon.notice] pppd 2.4.0b1 (Sun Microsystems, Inc.) started by root, uid 0
    Jun 18 17:09:47 kfc-rac1 pppd[9462]: [ID 702911 daemon.notice] Connect: sppp0 <--> /dev/dm2s0
    Jun 18 17:09:47 kfc-rac1 rpc.metamedd: [ID 702911 daemon.error] Terminated
    Jun 18 17:09:48 kfc-rac1 inetd[482]: [ID 702911 daemon.warning] inetd_offline method for instance svc:/network/rpc/scrcmd:default is unspecified. Taking default action: kill.
    Jun 18 17:09:48 kfc-rac1 inetd[482]: [ID 702911 daemon.warning] inetd_offline method for instance svc:/network/rpc/metacld:default is unspecified. Taking default action: kill.
    Jun 18 17:09:49 kfc-rac1 inetd[482]: [ID 702911 daemon.warning] inetd_offline method for instance svc:/network/rpc/scadmd:default is unspecified. Taking default action: kill.
    Jun 18 17:09:50 kfc-rac1 pppd[9462]: [ID 702911 daemon.notice] local IP address 192.168.224.2
    Jun 18 17:09:50 kfc-rac1 pppd[9462]: [ID 702911 daemon.notice] remote IP address 192.168.224.1
    Jun 18 17:09:50 kfc-rac1 cl_eventlogd[1554]: [ID 247336 daemon.error] Going down on signal 15.
    Jun 18 17:09:52 kfc-rac1 ip: [ID 372019 kern.error] ipsec_check_inbound_policy: Policy Failure for the incoming packet (not secure); Source 192.168.224.001, Destination 192.168.224.002.
    *Jun 18 17:09:56 kfc-rac1 ip: [ID 646971 kern.notice] ip_create_dl: hw addr length = 0*
    *Jun 18 17:09:56 kfc-rac1 pppd[9462]: [ID 702911 daemon.notice] Connection terminated.*
    *Jun 18 17:09:56 kfc-rac1 Cluster.PNM: [ID 226280 daemon.notice] PNM daemon exiting.*
    *Jun 18 17:09:57 kfc-rac1 pseudo: [ID 129642 kern.info] pseudo-device: tod0*
    *Jun 18 17:09:57 kfc-rac1 genunix: [ID 936769 kern.info] tod0 is /pseudo/tod@0*
    *Jun 18 17:09:57 kfc-rac1 pseudo: [ID 129642 kern.info] pseudo-device: pm0*
    *Jun 18 17:09:57 kfc-rac1 genunix: [ID 936769 kern.info] pm0 is /pseudo/pm@0*
    *Jun 18 17:09:57 kfc-rac1 rpc.metad: [ID 702911 daemon.error] Terminated*
    Jun 18 17:10:01 kfc-rac1 syslogd: going down on signal 15
    *Jun 18 17:10:07 kfc-rac1 rpcbind: [ID 564983 daemon.error] rpcbind terminating on signal.*
    *Jun 18 17:10:32 kfc-rac1 Cluster.RGM.fed: [ID 831843 daemon.notice] SCSLM thread WARNING pools facility is disabled*
    *Jun 18 17:10:40 kfc-rac1 genunix: [ID 672855 kern.notice] syncing file systems...*
    *Jun 18 17:10:40 kfc-rac1 genunix: [ID 904073 kern.notice] done*
    Jun 19 14:20:12 kfc-rac1 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_141444-09 64-bit
    Jun 19 14:20:12 kfc-rac1 genunix: [ID 943908 kern.notice] Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
    Jun 19 14:20:12 kfc-rac1 Use is subject to license terms.
    Jun 19 14:20:12 kfc-rac1 genunix: [ID 678236 kern.info] Ethernet address = 0:21:28:2:21:b2
    Thanks in advance for all of you
    your response is highly appreciated

    Hi I have checked the interconnect between the two nodes and it's as follow
    ifconfig -a
    lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
    inet 127.0.0.1 netmask ff000000
    bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    inet 10.1.100.126 netmask ffffff00 broadcast 10.1.100.255
    groupname sc_ipmp0
    ether 0:14:4f:3a:6c:19
    bge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
    inet 10.1.100.127 netmask ffffff00 broadcast 10.1.100.255
    bge0:2: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2
    inet 10.1.100.140 netmask ffffff00 broadcast 10.1.100.255
    bge1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 6
    inet 172.16.0.129 netmask ffffff80 broadcast 172.16.0.255
    ether 0:14:4f:3a:6c:1a
    nxge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3
    inet 10.1.100.128 netmask ffffff00 broadcast 10.1.100.255
    groupname sc_ipmp0
    ether 0:21:28:d:c9:8e
    nxge1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 5
    inet 172.16.1.1 netmask ffffff80 broadcast 172.16.1.127
    ether 0:21:28:d:c9:8f
    e1000g1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 4
    inet 172.16.1.129 netmask ffffff80 broadcast 172.16.1.255
    ether 0:15:17:81:15:c3
    clprivnet0: flags=1009843<UP,BROADCAST,RUNNING,MULTICAST,MULTI_BCAST,PRIVATE,IPv4> mtu 1500 index 7
    inet 172.16.4.1 netmask fffffe00 broadcast 172.16.5.255
    ether 0:0:0:0:0:1
    sppp0: flags=10010008d1<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST,IPv4,FIXEDMTU> mtu 1500 index 8
    inet 192.168.224.2 --> 192.168.224.1 netmask ffffff00
    ether 0:0:0:0:0:0
    root@kfc-rac1 #
    and it's direct attached between both nodes interfaces
    back to back
    and about the status of the hba cards here is it as well
    fcinfo hba-port -l
    HBA Port WWN: 2100001b3284c042
    OS Device Name: /dev/cfg/c1
    Manufacturer: QLogic Corp.
    Model: 375-3355-02
    Firmware Version: 05.01.00
    FCode/BIOS Version:  BIOS: 1.24; fcode: 1.24; EFI: 1.8;
    Serial Number: 0402R00-0844647023
    Driver Name: qlc
    Driver Version: 20090519-2.31
    Type: N-port
    State: online
    Supported Speeds: 1Gb 2Gb 4Gb
    Current Speed: 4Gb
    Node WWN: 2000001b3284c042
    Link Error Statistics:
    Link Failure Count: 0
    Loss of Sync Count: 0
    Loss of Signal Count: 0
    Primitive Seq Protocol Error Count: 0
    Invalid Tx Word Count: 0
    Invalid CRC Count: 0
    HBA Port WWN: 2100001b321c462b
    OS Device Name: /dev/cfg/c2
    Manufacturer: QLogic Corp.
    Model: 375-3355-02
    Firmware Version: 05.01.00
    FCode/BIOS Version:  BIOS: 1.24; fcode: 1.24; EFI: 1.8;
    Serial Number: 0402R00-0844646557
    Driver Name: qlc
    Driver Version: 20090519-2.31
    Type: N-port
    State: online
    Supported Speeds: 1Gb 2Gb 4Gb
    Current Speed: 4Gb
    Node WWN: 2000001b321c462b
    Link Error Statistics:
    Link Failure Count: 0
    Loss of Sync Count: 0
    Loss of Signal Count: 0
    Primitive Seq Protocol Error Count: 0
    Invalid Tx Word Count: 0
    Invalid CRC Count: 0
    HBA Port WWN: 2100001b32934b3c
    OS Device Name: /dev/cfg/c3
    Manufacturer: QLogic Corp.
    Model: 375-3294-01
    Firmware Version: 05.01.00
    FCode/BIOS Version:  BIOS: 2.2; fcode: 2.1; EFI: 2.0;
    Serial Number: 0402R00-0947745866
    Driver Name: qlc
    Driver Version: 20090519-2.31
    Type: N-port
    State: online
    Supported Speeds: 1Gb 2Gb 4Gb
    Current Speed: 4Gb
    Node WWN: 2000001b32934b3c
    Link Error Statistics:
    Link Failure Count: 0
    Loss of Sync Count: 0
    Loss of Signal Count: 0
    Primitive Seq Protocol Error Count: 0
    Invalid Tx Word Count: 0
    Invalid CRC Count: 0
    HBA Port WWN: 2101001b32b34b3c
    OS Device Name: /dev/cfg/c4
    Manufacturer: QLogic Corp.
    Model: 375-3294-01
    Firmware Version: 05.01.00
    FCode/BIOS Version:  BIOS: 2.2; fcode: 2.1; EFI: 2.0;
    Serial Number: 0402R00-0947745866
    Driver Name: qlc
    Driver Version: 20090519-2.31
    Type: unknown
    State: offline
    Supported Speeds: 1Gb 2Gb 4Gb
    Current Speed: not established
    Node WWN: 2001001b32b34b3c
    Link Error Statistics:
    Link Failure Count: 0
    Loss of Sync Count: 0
    Loss of Signal Count: 0
    Primitive Seq Protocol Error Count: 0
    Invalid Tx Word Count: 0
    Invalid CRC Count: 0
    root@kfc-rac1 #
    In addition here is the ocssd log file as well
    http://www.4shared.com/file/Txl9DqLW/log_25155156.html?
    you'll find on the lines for the dates in which this issue happens
    look at 2012-06-09
    2012-06-18
    2012-06-21
    you'll see something related to the voting disk
    it suddenly becomes unavailable which causes the problem
    thanks a lot for your help
    I'm waiting for your recommendation
    hope these logs gives more look for the problem
    Thanks in advance :)

  • Testing ha-nfs in two node cluster (cannot statvfs /global/nfs: I/O error )

    Hi all,
    I am testing HA-NFS(Failover) on two node cluster. I have sun fire v240 ,e250 and Netra st a1000/d1000 storage. I have installed Solaris 10 update 6 and cluster packages on both nodes.
    I have created one global file system (/dev/did/dsk/d4s7) and mounted as /global/nfs. This file system is accessible form both the nodes. I have configured ha-nfs according to the document, Sun Cluster Data Service for NFS Guide for Solaris, using command line interface.
    Logical host is pinging from nfs client. I have mounted there using logical hostname. For testing purpose I have made one machine down. After this step files tem is giving I/O error (server and client). And when I run df command it is showing
    df: cannot statvfs /global/nfs: I/O error.
    I have configured with following commands.
    #clnode status
    # mkdir -p /global/nfs
    # clresourcegroup create -n test1,test2 -p Pathprefix=/global/nfs rg-nfs
    I have added logical hostname,ip address in /etc/hosts
    I have commented hosts and rpc lines in /etc/nsswitch.conf
    # clreslogicalhostname create -g rg-nfs -h ha-host-1 -N
    sc_ipmp0@test1, sc_ipmp0@test2 ha-host-1
    # mkdir /global/nfs/SUNW.nfs
    Created one file called dfstab.user-home in /global/nfs/SUNW.nfs and that file contains follwing line
    share -F nfs &ndash;o rw /global/nfs
    # clresourcetype register SUNW.nfs
    # clresource create -g rg-nfs -t SUNW.nfs ; user-home
    # clresourcegroup online -M rg-nfs
    Where I went wrong? Can any one provide document on this?
    Any help..?
    Thanks in advance.

    test1#  tail -20 /var/adm/messages
    Feb 28 22:28:54 testlab5 Cluster.SMF.DR: [ID 344672 daemon.error] Unable to open door descriptor /var/run/rgmd_receptionist_door
    Feb 28 22:28:54 testlab5 Cluster.SMF.DR: [ID 801855 daemon.error]
    Feb 28 22:28:54 testlab5 Error in scha_cluster_get
    Feb 28 22:28:54 testlab5 Cluster.scdpmd: [ID 489913 daemon.notice] The state of the path to device: /dev/did/rdsk/d5s0 has changed to OK
    Feb 28 22:28:54 testlab5 Cluster.scdpmd: [ID 489913 daemon.notice] The state of the path to device: /dev/did/rdsk/d6s0 has changed to OK
    Feb 28 22:28:58 testlab5 svc.startd[8]: [ID 652011 daemon.warning] svc:/system/cluster/scsymon-srv:default: Method "/usr/cluster/lib/svc/method/svc_scsymon_srv start" failed with exit status 96.
    Feb 28 22:28:58 testlab5 svc.startd[8]: [ID 748625 daemon.error] system/cluster/scsymon-srv:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details)
    Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 537175 daemon.notice] CMM: Node e250 (nodeid: 1, incarnation #: 1235752006) has become reachable.
    Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 525628 daemon.notice] CMM: Cluster has reached quorum.
    Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node e250 (nodeid = 1) is up; new incarnation number = 1235752006.
    Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node testlab5 (nodeid = 2) is up; new incarnation number = 1235840337.
    Feb 28 22:37:15 testlab5 Cluster.CCR: [ID 499775 daemon.notice] resource group rg-nfs added.
    Feb 28 22:39:05 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<5>:cmd=<null>:tag=<>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Feb 28 22:39:05 testlab5 Cluster.CCR: [ID 491081 daemon.notice] resource ha-host-1 removed.
    Feb 28 22:39:17 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<5>:cmd=<null>:tag=<>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Feb 28 22:39:17 testlab5 Cluster.CCR: [ID 254131 daemon.notice] resource group nfs-rg removed.
    Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_validate> for resource <ha-host-1>, resource group <rg-nfs>, node <testlab5>, timeout <300> seconds
    Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_validate>:tag=<rg-nfs.ha-host-1.2>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
    Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_validate> completed successfully for resource <ha-host-1>, resource group <rg-nfs>, node <testlab5>, time used: 0% of timeout <300 seconds>
    Feb 28 22:39:30 testlab5 Cluster.CCR: [ID 973933 daemon.notice] resource ha-host-1 added.

  • Windows Server 2008 R2 SP1 2-Node cluster - Replace failed node

    Hi -  I have a two node Windows Server 2008 R2 SP1 fail-over cluster (DHCP, File, Print) where one of the nodes have failed beyond recovery. What I would like to do is to evict the failed cluster node and install a new machine with Windows Server 2008
    R2  SP1 and re use the same name and Ip adress and then join this machine as a node in the cluster. 
    Is there any recommended steps to do this, i'm mostly thinking about the part of re-using the same name and ip address for the new node? (e.g. is there any cleanup more than evict the node?)
    Enfo Zipper
    Christoffer Andersson – Principal Advisor
    http://blogs.chrisse.se - Directory Services Blog

    Hi,
    I agree with Noah Sparks, you can evict the corrupt node, if you reinstall then add the new server system you can just join it to cluster, it shouldn’t any error.
     If you have seen some the evicted node rejoin to cluster meet “The cluster node is already a member of the cluster” error, it may need pack the KB2549472
    hotfix.
    The related KB:
    How to Evict a Node from a Windows Server 2008 Failover Cluster
    http://technet.microsoft.com/en-us/library/bb676524(v=exchg.80).aspx
    Cluster node cannot rejoin the cluster after the node is restarted or removed from the cluster in Windows Server 2008 R2
    http://support.microsoft.com/kb/2549472
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Sap service fails after switching to the second node in HA system.

    Hello all,
    We have installed the HA mscs cluster installation for ECC6.0(my SAP erp 2005) system.The ASCS+SCS isntance  and the DB instance are the failover resources.
    When the ASCS+SCS(sap cluster group) fail over to the next node the service sap_<sid>_00 (ascs) and sap_<sid>_01(scs) fails to start automatically  and we have to manually start the instance from the services console by giving the sapservice<sid> password to start the services.
    when we try to start the service without giving the password then the system gives a message unable to start due to logon failure and after giving the password it shows that the user has been been granted to logon as a service user.
    Pl help resolve this issue.
    thx
    satyajit

    A change to the zpool_import() management of the zpool.cache file, as delivered by Solaris 10 kernel patches 137137-09 (for SPARC) or 137138-09 (for x86), might cause systems that have their shared ZFS (zfs(1M)) storage pools under the control of HAStoragePlus to be simultaneously imported on multiple cluster nodes. Importing a ZFS storage pool on multiple cluster nodes will result in pool corruption, which might cause data integrity issues or cause a cluster node to panic.
    To avoid this problem, install Solaris 10 patch 139579-02 (for SPARC) or 139580-02 (for x86) immediately after you install 137137-09 or 137138-09 but before you reboot the cluster nodes.
    Alternatively, only on the Solaris 10 5/08 OS, remove the affected patch before any ZFS pools are simultaneously imported to multiple cluster nodes. You cannot remove patch 137137-09 or 137138-09 from the Solaris 10 10/08 OS, because these patches are preinstalled on that release.

  • Simple two node Cluster Install - Hung after reboot of first node

    Hello,
    Over the past couple of days I have tried to install a simple two node cluster using two identical SunFire X4200s, firstly following the recipe in: http://www.sun.com/software/solaris/howtoguides/twonodecluster.jsp
    and when that failed referring to http://docs.sun.com/app/docs/doc/819-0912 and http://docs.sun.com/app/docs/doc/819-2970.
    I am trying to keep the install process as simple as possible, no switch, just back to back connections for the internal networking (node1 e1000g0 <--> node2 e1000g0, node1 e1000g1 <--> node2 e1000g1)
    I ran the installer on both X4200s with default answers. This went through smoothly without problems.
    I ran scinstall on node1, first time through, choosing "typical" as suggested in the how to guide. Everything goes OK (no errors) node2 reboots, but node1 just sits there waiting for node2, no errors, nothing....
    I also tried rerunning scinstall choosing "Custom", and then selecting the no switch option. Same thing happened.
    I must be doing something stupid, it's such a simple setup! Any ideas??
    Here's the final screen from node1 (dcmds0) in both cases:
    Cluster Creation
    Log file - /var/cluster/logs/install/scinstall.log.940
    Checking installation status ... done
    The Sun Cluster software is installed on "dcmds0".
    The Sun Cluster software is installed on "dcmds1".
    Started sccheck on "dcmds0".
    Started sccheck on "dcmds1".
    sccheck completed with no errors or warnings for "dcmds0".
    sccheck completed with no errors or warnings for "dcmds1".
    Configuring "dcmds1" ... done
    Rebooting "dcmds1" ...
    Output from scconf on node2 (dcmds1):
    bash-3.00# scconf -p
    Cluster name: dcmdscluster
    Cluster ID: 0x47538959
    Cluster install mode: enabled
    Cluster private net: 172.16.0.0
    Cluster private netmask: 255.255.248.0
    Cluster maximum nodes: 64
    Cluster maximum private networks: 10
    Cluster new node authentication: unix
    Cluster authorized-node list: dcmds0 dcmds1
    Cluster transport heart beat timeout: 10000
    Cluster transport heart beat quantum: 1000
    Round Robin Load Balancing UDP session timeout: 480
    Cluster nodes: dcmds1
    Cluster node name: dcmds1
    Node ID: 1
    Node enabled: yes
    Node private hostname: clusternode1-priv
    Node quorum vote count: 1
    Node reservation key: 0x4753895900000001
    Node zones: <NULL>
    CPU shares for global zone: 1
    Minimum CPU requested for global zone: 1
    Node transport adapters: e1000g0 e1000g1
    Node transport adapter: e1000g0
    Adapter enabled: no
    Adapter transport type: dlpi
    Adapter property: device_name=e1000g
    Adapter property: device_instance=0
    Adapter property: lazy_free=1
    Adapter property: dlpi_heartbeat_timeout=10000
    Adapter property: dlpi_heartbeat_quantum=1000
    Adapter property: nw_bandwidth=80
    Adapter property: bandwidth=70
    Adapter port names: <NULL>
    Node transport adapter: e1000g1
    Adapter enabled: no
    Adapter transport type: dlpi
    Adapter property: device_name=e1000g
    Adapter property: device_instance=1
    Adapter property: lazy_free=1
    Adapter property: dlpi_heartbeat_timeout=10000
    Adapter property: dlpi_heartbeat_quantum=1000
    Adapter property: nw_bandwidth=80
    Adapter property: bandwidth=70
    Adapter port names: <NULL>
    Cluster transport switches: <NULL>
    Cluster transport cables
    Endpoint Endpoint State
    Quorum devices: <NULL>
    Rob.

    I have found out why the install hung - this needs to be added into the install guide(s) at once!! - It's VERY frustrating when an install guide is incomplete!
    The solution is posted in the HA-Cluster OpenSolaris forums at:
    http://opensolaris.org/os/community/ha-clusters/ohac/Documentation/SCXdocs/relnotes/#bugs
    In particular, my problem was that I selected to make my Solaris install secure (A good idea, I thought!). Unfortunately, this stops Sun Cluster from working. To fix the problem you need to perform the following steps on each secured node:
    Problem Summary: During Solaris installation, the setting of a restricted network profile disables external access to network services that Sun Cluster functionality uses, ie: The RPC communication service, which is required for cluster communication
    Workaround: Restore external access to RPC communication.
    Perform the following commands to restore external access to RPC communication.
    # svccfg
    svc:> select network/rpc/bind
    svc:/network/rpc/bind> setprop config/local_only=false
    svc:/network/rpc/bind> quit
    # svcadm refresh network/rpc/bind:default
    # svcprop network/rpc/bind:default | grep local_only
    Once I applied these commands, the install process continued ... AT LAST!!!
    Rob.

  • Unit test fails after upgrading to Kodo 4.0.0 from 4.0.0-EA4

    I have a group of 6 unit tests failing after upgrading to the new Kodo
    4.0.0 (with BEA) from Kodo-4.0.0-EA4 (with Solarmetric). I'm getting
    exceptions like the one at the bottom of this email. It seems to be an
    interaction with the PostgreSQL driver, though I can't be sure. I
    haven't changed my JDO configuration or the related classes in months
    since I've been focusing on using the objects that have already been
    defined. The .jdo, .jdoquery, and .java code are below the exception,
    just in case there's something wrong in there. Does anyone have advice
    as to how I might debug this?
    Thanks,
    Mark
    Testsuite: edu.ucsc.whisper.test.integration.UserManagerQueryIntegrationTest
    Tests run: 15, Failures: 0, Errors: 6, Time elapsed: 23.308 sec
    Testcase:
    testGetAllUsersWithFirstName(edu.ucsc.whisper.test.integration.UserManagerQueryIntegrationTest):
    Caused an ERROR
    The column index is out of range: 2, number of columns: 1.
    <2|false|4.0.0> kodo.jdo.DataStoreException: The column index is out of
    range: 2, number of columns: 1.
    at
    kodo.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4092)
    at kodo.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:82)
    at kodo.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:66)
    at kodo.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:46)
    at
    kodo.jdbc.kernel.SelectResultObjectProvider.handleCheckedException(SelectResultObjectProvider.java:176)
    at
    kodo.kernel.QueryImpl$PackingResultObjectProvider.handleCheckedException(QueryImpl.java:2460)
    at
    com.solarmetric.rop.EagerResultList.<init>(EagerResultList.java:32)
    at kodo.kernel.QueryImpl.toResult(QueryImpl.java:1445)
    at kodo.kernel.QueryImpl.execute(QueryImpl.java:1136)
    at kodo.kernel.QueryImpl.execute(QueryImpl.java:901)
    at kodo.kernel.QueryImpl.execute(QueryImpl.java:865)
    at kodo.kernel.DelegatingQuery.execute(DelegatingQuery.java:787)
    at kodo.jdo.QueryImpl.executeWithArray(QueryImpl.java:210)
    at kodo.jdo.QueryImpl.execute(QueryImpl.java:137)
    at
    edu.ucsc.whisper.core.dao.JdoUserDao.findAllUsersWithFirstName(JdoUserDao.java:232)
    at
    edu.ucsc.whisper.core.manager.DefaultUserManager.getAllUsersWithFirstName(DefaultUserManager.java:252)
    NestedThrowablesStackTrace:
    org.postgresql.util.PSQLException: The column index is out of range: 2,
    number of columns: 1.
    at
    org.postgresql.core.v3.SimpleParameterList.bind(SimpleParameterList.java:57)
    at
    org.postgresql.core.v3.SimpleParameterList.setLiteralParameter(SimpleParameterList.java:101)
    at
    org.postgresql.jdbc2.AbstractJdbc2Statement.bindLiteral(AbstractJdbc2Statement.java:2085)
    at
    org.postgresql.jdbc2.AbstractJdbc2Statement.setInt(AbstractJdbc2Statement.java:1133)
    at
    com.solarmetric.jdbc.DelegatingPreparedStatement.setInt(DelegatingPreparedStatement.java:390)
    at
    com.solarmetric.jdbc.PoolConnection$PoolPreparedStatement.setInt(PoolConnection.java:440)
    at
    com.solarmetric.jdbc.DelegatingPreparedStatement.setInt(DelegatingPreparedStatement.java:390)
    at
    com.solarmetric.jdbc.DelegatingPreparedStatement.setInt(DelegatingPreparedStatement.java:390)
    at
    com.solarmetric.jdbc.DelegatingPreparedStatement.setInt(DelegatingPreparedStatement.java:390)
    at
    com.solarmetric.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.setInt(LoggingConnectionDecorator.java:1
    257)
    at
    com.solarmetric.jdbc.DelegatingPreparedStatement.setInt(DelegatingPreparedStatement.java:390)
    at
    com.solarmetric.jdbc.DelegatingPreparedStatement.setInt(DelegatingPreparedStatement.java:390)
    at kodo.jdbc.sql.DBDictionary.setInt(DBDictionary.java:980)
    at kodo.jdbc.sql.DBDictionary.setUnknown(DBDictionary.java:1299)
    at kodo.jdbc.sql.SQLBuffer.setParameters(SQLBuffer.java:638)
    at kodo.jdbc.sql.SQLBuffer.prepareStatement(SQLBuffer.java:539)
    at kodo.jdbc.sql.SQLBuffer.prepareStatement(SQLBuffer.java:512)
    at kodo.jdbc.sql.SelectImpl.execute(SelectImpl.java:332)
    at kodo.jdbc.sql.SelectImpl.execute(SelectImpl.java:301)
    at kodo.jdbc.sql.Union$UnionSelect.execute(Union.java:642)
    at kodo.jdbc.sql.Union.execute(Union.java:326)
    at kodo.jdbc.sql.Union.execute(Union.java:313)
    at
    kodo.jdbc.kernel.SelectResultObjectProvider.open(SelectResultObjectProvider.java:98)
    at
    kodo.kernel.QueryImpl$PackingResultObjectProvider.open(QueryImpl.java:2405)
    at
    com.solarmetric.rop.EagerResultList.<init>(EagerResultList.java:22)
    at kodo.kernel.QueryImpl.toResult(QueryImpl.java:1445)
    at kodo.kernel.QueryImpl.execute(QueryImpl.java:1136)
    at kodo.kernel.QueryImpl.execute(QueryImpl.java:901)
    at kodo.kernel.QueryImpl.execute(QueryImpl.java:865)
    at kodo.kernel.DelegatingQuery.execute(DelegatingQuery.java:787)
    at kodo.jdo.QueryImpl.executeWithArray(QueryImpl.java:210)
    at kodo.jdo.QueryImpl.execute(QueryImpl.java:137)
    at
    edu.ucsc.whisper.core.dao.JdoUserDao.findAllUsersWithFirstName(JdoUserDao.java:232)
    --- DefaultUser.java -------------------------------------------------
    public class DefaultUser
    implements User
    /** The account username. */
    private String username;
    /** The account password. */
    private String password;
    /** A flag indicating whether or not the account is enabled. */
    private boolean enabled;
    /** The authorities granted to this account. */
    private Set<Authority> authorities;
    /** Information about the user, including their name and text that
    describes them. */
    private UserInfo userInfo;
    /** The set of organizations where this user works. */
    private Set<Organization> organizations;
    --- DefaultUser.jdo --------------------------------------------------
    <?xml version="1.0"?>
    <!DOCTYPE jdo PUBLIC
    "-//Sun Microsystems, Inc.//DTD Java Data Objects Metadata 2.0//EN"
    "http://java.sun.com/dtd/jdo_2_0.dtd">
    <jdo>
    <package name="edu.ucsc.whisper.core">
    <sequence name="user_id_seq"
    factory-class="native(Sequence=user_id_seq)"/>
    <class name="DefaultUser" detachable="true"
    table="whisper_user" identity-type="datastore">
    <datastore-identity sequence="user_id_seq" column="userId"/>
    <field name="username">
    <column name="username" length="80" jdbc-type="VARCHAR" />
    </field>
    <field name="password">
    <column name="password" length="40" jdbc-type="CHAR" />
    </field>
    <field name="enabled">
    <column name="enabled" />
    </field>
    <field name="userInfo" persistence-modifier="persistent"
    default-fetch-group="true" dependent="true">
    <extension vendor-name="jpox"
    key="implementation-classes"
    value="edu.ucsc.whisper.core.DefaultUserInfo" />
    <extension vendor-name="kodo"
    key="type"
    value="edu.ucsc.whisper.core.DefaultUserInfo" />
    </field>
    <field name="authorities" persistence-modifier="persistent"
    table="user_authorities"
    default-fetch-group="true">
    <collection
    element-type="edu.ucsc.whisper.core.DefaultAuthority" />
    <join column="userId" delete-action="cascade"/>
    <element column="authorityId" delete-action="cascade"/>
    </field>
    <field name="organizations" persistence-modifier="persistent"
    table="user_organizations" mapped-by="user"
    default-fetch-group="true" dependent="true">
    <collection
    element-type="edu.ucsc.whisper.core.DefaultOrganization"
    dependent-element="true"/>
    <join column="userId"/>
    <!--<element column="organizationId"/>-->
    </field>
    </class>
    </package>
    </jdo>
    --- DefaultUser.jdoquery ---------------------------------------------
    <?xml version="1.0"?>
    <!DOCTYPE jdo PUBLIC
    "-//Sun Microsystems, Inc.//DTD Java Data Objects Metadata 2.0//EN"
    "http://java.sun.com/dtd/jdo_2_0.dtd">
    <jdo>
    <package name="edu.ucsc.whisper.core">
    <class name="DefaultUser">
    <query name="UserByUsername"
    language="javax.jdo.query.JDOQL"><![CDATA[
    SELECT UNIQUE FROM edu.ucsc.whisper.core.DefaultUser
    WHERE username==searchName
    PARAMETERS java.lang.String searchName
    ]]></query>
    <query name="DisabledUsers"
    language="javax.jdo.query.JDOQL"><![CDATA[
    SELECT FROM edu.ucsc.whisper.core.DefaultUser WHERE
    enabled==false
    ]]></query>
    <query name="EnabledUsers"
    language="javax.jdo.query.JDOQL"><![CDATA[
    SELECT FROM edu.ucsc.whisper.core.DefaultUser WHERE
    enabled==true
    ]]></query>
    <query name="CountUsers"
    language="javax.jdo.query.JDOQL"><![CDATA[
    SELECT count( this ) FROM edu.ucsc.whisper.core.DefaultUser
    ]]></query>
    </class>
    </package>
    </jdo>

    I'm sorry, I have no idea. I suggest sending a test case that
    reproduces the problem to support.

  • Hyper-V Guest Cluster Node Failing Regularly

    Hi,
    We currently have a 4-node Server 2012 R2 Cluster witch hosts among other things, a 3 node Guest Cluster running a single clustered file service.  
    Around once a week, the guest cluster node that is currently hosting the clustered file service will fail.  It's as if the VM is blue screening.  That in itself is fairly anoying and I'll be doing all the updates and checking event log for clues
    as to the cause.  
    The problem then is that whichever physical cluster node that is hosting the VM when it fails,  will not unlock some of the VM's files.  The Virtual machine configuration lists as Online Pending.  This means that the failed VM cannot be restarted
    on any other cluster node.  The only fix is to drain the physical host it failed on, and reboot. 
    Looking for suggestions on how to fix the following.
    1. Crashing guest file cluster node
    2. Failed VM with shared VHDX requiring Phyiscal host reboot.
    Event messages for the physical host that was hosting the failed vm in order that they occured.
    Hyper-V-Worker: Event ID 18590 - 'FS-03' has encountered a fatal error.  The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x9E, ErrorCode1: 0x6C2A17C0, ErrorCode2: 0x3C, ErrorCode3: 0xA, ErrorCode4:
    0x0.  If the problem persists, contact Product Support for the guest operating system.  (Virtual machine ID 36166B47-D003-4E51-AFB5-7B967A3EFD2D)
    FailoverClustering: Event ID 1069 - Cluster resource 'Virtual Machine FS-03' of type 'Virtual Machine' in clustered role 'FS-03' failed.
    Hyper-V-High-Availability: Event ID 21128 - 'Virtual Machine FS-03' failed to shutdown the virtual machine during the resource termination. The virtual machine will be forcefully stopped.
    Hyper-V-High-Availability: Event ID 21110 - 'Virtual Machine FS-03' failed to terminate.
    Hyper-V-VMMS: Event ID 20108 - The Virtual Machine Management Service failed to start the virtual machine '36166B47-D003-4E51-AFB5-7B967A3EFD2D': The group or resource is not in the correct state to perform the requested operation. (0x8007139F).
    Hyper-V-High-Availability: Event ID 21107 - 'Virtual Machine FS-03' failed to start.
    FailoverClustering: Event ID 1205 - The Cluster service failed to bring clustered role 'FS-03' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

    Hi,
    I don’t found the similar issue, Does your cluster can pass the cluster validation? Does all your Hyper-V host compatible with Server 2012r2? Have you try to disable all your
    AV soft and firewall? Please rerun Storage validation on the Cluster in non-production hours, the cluster validation report will quickly locate the issue.
    More information:
    Cluster
    http://technet.microsoft.com/en-us/library/dd581778(v=ws.10).aspx
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Node failed to join the cluster because it ould not send and receive failure detection network messages

    One of my customers has a Windows Server 2008 R2 cluster for an Exchange 2010 Mailbox Database Availability Group.  Lately, they've been having problems with one of their nodes (the one node that is on a different subnet in a different datacenter) where
    their Exchange databases aren't replicating.  While looking into this issue it seems that the problem is the Network Manager isn't started because the cluster service is failing.  Since the issue seems to be with the cluster service, and not Exchange,
    I'm asking here. 
    When the cluster service starts, it appears to start working, but within a few minutes the following is logged in the system event log.
    FailoverClustering
    1572
    Critical
    Cluster Virtual Adapter
    Node 'nodename' failed to join the cluster because it could not send and receive failure detection network messages with other cluster nodes. ...
    It seems that the problem is with the 169.254 address on the cluster virtual adapter.  An entry in the cluster.log file says: Aborting connection because NetFT route to node nodename on virtual IP 169.254.1.44:~3343~ has failed to come up. 
    In my experience, you never have to mess with the cluster virtual adapter.  I'm not sure what happened here, but I doubt it has been modified.  I need the cluster to communicate with its other nodes on our routed 10. network.  I've never experienced
    this before and found little in my searches on the subject.  Any idea how I can fix this?
    Thanks,
    Joe
    Joseph M. Durnal MCM: Exchange 2010 MCITP: Enterprise Messaging Administrator, Exchange 2010 MCITP: Enterprise Messaging Administrator, MCITP: Enterprise Administrator

    Hi,
    I suspected an issue with communication on UDP port 3343. Please confirm the set rules for port 3343 on all the nodes in firewall and enabled all connections for all the profiles
    in firewall on all the nodes are opened, or confirm the connectivity of all the node.
    Use ipconfig /flushdns to update all the node DNS register, then confirm the DNS in your DNS server entry is correct.
    The similar issue article:
    Exchange 2010 DAG - NetworkManager has not yet been initialized
    https://blogs.technet.com/b/dblanch/archive/2012/03/05/exchange-2010-dag-networkmanager-has-not-yet-been-initialized.aspx?Redirected=true
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • [svn:bz-trunk] 7681: Fix bunch of failing config tests on BlazeDS/ trunk by removing javax.servlet. UnavailableException from the expected error string.

    Revision: 7681
    Author:   [email protected]
    Date:     2009-06-09 11:44:36 -0700 (Tue, 09 Jun 2009)
    Log Message:
    Fix bunch of failing config tests on BlazeDS/trunk by removing javax.servlet.UnavailableException from the expected error string.
    Modified Paths:
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/destination/IncorrectRootElement Test/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/AdaptiveServerT oClient/AdaptiveFrequencyTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/AdaptiveServerT oClient/FrequencyStepSizeTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/AdaptiveServerT oClient/MaxQueueSizeTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/DestinationWith NoChannelTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/DestinationWith NoIDTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/jms/InvalidAckn owledgeModeTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/jms/InvalidDeli veryModeTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/jms/InvalidDest inationTypeTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/jms/InvalidMess ageTypeTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/jms/NoConnectio nFactoryTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/jms/NoJNDINameT est/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leInbound/InvalidBufferPolicyTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leInbound/InvalidConflatePolicyTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leInbound/UnknownInboundPolicyTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leInbound/frequencies/McfGreaterthanMfTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leOutbound/InvalidBufferPolicyTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leOutbound/InvalidConflatePolicyTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leOutbound/InvalidErrorPolicyTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leOutbound/UnknownOutboundPolicyTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leOutbound/frequencies/McfGreaterthanMfTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/validation/nonE xistingValidatorTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/validation/same ExplicitTypeValidatorTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/validation/same TypeValidatorTest/error.txt
        blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/validation/wron gTypeValidatorTest/error.txt

  • Cluster installation failed:OCR files are not shared across all nodes

    Hi gurus,
    I'm trying to install RAC 10g using vmware following hunter document to install oracle 10g Rac.
    I'm using centos 4.6 ,I'm now stuck on installing cluster which shows error OCR file are not shared across all nodes.
    Please advice

    Review the Release notes, there is a bug and a fix since CRS fails on the last node.
    This bug is for EL 5, but pretty match what I've found on the log files
    I'm also in EL 4.6, x64 in my case.. haven't tried the fix due lack of time.
    Provide log files from CRS_HOME/logs

  • Help : Cluster Fail over Test - Could not establish a connection

    Hi All
              I'm trying to do Cluster fail over test with two Weblogic 8.1 sp2 instances in cluster.
              During that testing, I'm restarting the one of the instance which is handing my request, to make sure the session is replicated smoothly to the other instance,so that can continue accessing my application without any interuption. But when I restart the instance, I'm getting following exception
              Error 500--Internal Server Error
              java.rmi.ConnectException: Could not establish a connection with 8909815174098071019S:dappsn03:[8201,8201,-1,-1,8201,-1,-1,0,0]:dappsn03-04:TNL:tnl1_81dappsn03, java.rmi.ConnectException: Destination unreachable; nested exception is:
                   java.net.ConnectException: Connection refused; No available router to destination
                   at weblogic.rjvm.RJVMImpl.getOutputStream(RJVMImpl.java:316)
                   at weblogic.rjvm.RJVMImpl.getRequestStream(RJVMImpl.java:488)
                   at weblogic.rjvm.RJVMImpl.getOutboundRequest(RJVMImpl.java:584)
                   at weblogic.rmi.internal.BasicRemoteRef.getOutboundRequest(BasicRemoteRef.java:91)
                   at weblogic.rmi.internal.activation.ActivatableRemoteRef.invoke(ActivatableRemoteRef.java:69)
                   at com.sns.pfk.ejb.PfkSessionBean_mz6mqm_EOImpl_812_WLStub.getPortalRecord(Unknown Source)
                   at com.sns.pfk.servlet.PfkMainServlet.getInfofromSB(Unknown Source)
                   at com.sns.pfk.servlet.PfkMainServlet.doActionDisplay(Unknown Source)
                   at com.sns.pfk.servlet.PfkMainServlet.doGet(Unknown Source)
                   at com.sns.pfk.servlet.PfkMainServlet.doPost(Unknown Source)
                   at javax.servlet.http.HttpServlet.service(HttpServlet.java:760)
                   at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
                   at weblogic.servlet.internal.ServletStubImpl$ServletInvocationAction.run(ServletStubImpl.java:971)
                   at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:402)
                   at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:305)
                   at weblogic.servlet.internal.RequestDispatcherImpl.include(RequestDispatcherImpl.java:607)
                   at weblogic.servlet.internal.RequestDispatcherImpl.include(RequestDispatcherImpl.java:400)
                   at com.sns.ana.ui.servlet.AuthorisationBaseServlet.service(AuthorisationBaseServlet.java:109)
                   at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
                   at weblogic.servlet.internal.ServletStubImpl$ServletInvocationAction.run(ServletStubImpl.java:971)
                   at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:402)
                   at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:305)
                   at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:6350)
                   at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:317)
                   at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:118)
                   at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.java:3635)
                   at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2585)
                   at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:197)
                   at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:170)
              Buddies, anyone hit this issue before, pls show up some light to escape this hickup.
              With Regs
              -SHAN

    Hi,
              Thanx for ones, spend time on reading this thread.This problem was due to some missing entries in weblogic-ejb.xml. This got fixed as we got support from BEA.
              With Regs
              -SHAN

  • Cluster node reboots after network failure

    hi all,
    The suncluster 3.1 8/05 with 2 nodes (E2900) was working fine without any errors in the sccheck.
    yesterday one node rebooted saying a network failure,errors in the massage file are
    Jan 17 08:00:36 PRD in.mpathd[221]: [ID 594170 daemon.error] NIC failure detected on ce0 of group sc_ipmp0
    Jan 17 08:00:36 PRD Cluster.PNM: [ID 890413 daemon.notice] sc_ipmp0: state transition from OK to DOWN.
    Jan 17 08:00:47 PRD Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource PROD status on node PRD change to R_FM_DEGRADED
    Jan 17 08:00:47 PRD Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource PROD status msg on node PRD change to <IPMP Failure.>
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group CFS state on node PRD change to RG_PENDING_OFFLINE
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_MON_STOPPING
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_monitor_stop> for resource <PROD>, resource group <CFS>, timeout <300> seconds
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hafoip_monitor_stop> completed successfully for resource <PROD>, resource group <CFS>, time used: 0% of timeout <300 seconds>
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_ONLINE_UNMON
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_STOPPING
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_stop> for resource <PROD>, resource group <CFS>, timeout <300> seconds
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource PROD status on node PRD change to R_FM_UNKNOWN
    Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource PROD status msg on node PRD change to <Stopping>
    Jan 17 08:00:51 PRD ip: [ID 678092 kern.notice] TCP_IOC_ABORT_CONN: local = 172.016.005.025:0, remote = 000.000.000.000:0, start = -2, end = 6
    Jan 17 08:00:51 PRD ip: [ID 302654 kern.notice] TCP_IOC_ABORT_CONN: aborted 53 connections
    what can be the reason for reabooting?
    is there any way to avoid this, with only a failover?
    rgds
    Message was edited by:
    suj

    What is in that resource group? The cause is probably something with Failover_mode=HARD set. Check the manual reference section for this. The option would be to set the Failover_mode=SOFT.
    Tim
    ---

  • Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason: The handle is invalid.

    I'm stuck here trying to figure this error out.  
    2003 domain, 2012 hyper v core 3 nodes.  (I have two of these hyper V groups, hvclust2012 is the problem group, hvclust2008 is okay)
    In Failover Cluster Manager I see these errors, "Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason:  The handle is invalid."
    I restarted the host node that was listed in having the error then another node starts showing the errors.
    I tried to follow this site:  http://blog.subvertallmedia.com/2012/12/06/repairing-a-failover-cluster-in-windows-server-2012-live-migration-fails-dns-cluster-name-errors/
    Then this error shows up when doing the repair:  there was an error repairing the active directory object for 'Cluster Name'
    I looked at our domain controller and noticed I don't have access to local users and groups.  I can access our other hvclust2008 (both clusters are same version 2012).
    <image here>
    I came upon this thread:  http://social.technet.microsoft.com/Forums/en-US/85fc2ad5-b0c0-41f0-900e-df1db8625445/windows-2012-cluster-resource-name-fails-dns-registration-evt-1196?forum=winserverClustering
    Now, I'm stuck on adding a managed service account (mas).  I'm not sure if I'm way off track to fix this.  Any advice?  Thanks in advance!
    <image here>

    Thanks Elton,
    I restarted 3 hosts after applying the hotfix.  Then I did the steps below and got stuck on step 5.  That is when I get the error (image above).  There
    was an error repairing the active directory object for 'Cluster Name'.  For more data, see 'Information Details'.
    To reset the password on the affected name resource, perform the following steps:
    From Failover Cluster Manager, locate the name resource.
    Right-click on the resource, and click Properties.
    On the Policies tab, select If resource fails, do not restart, and then click OK.
    Right-click on the resource, click More Actions, and then click Simulate Failure.
    When the name resource shows "Failed," right-click on the resource, click More Actions, and then click Repair.
    After the name resource is online, right-click on the resource, and then click Properties.
    On the Policies tab, select If resource fails, attempt restart on current node, and then click OK.
    Thanks

Maybe you are looking for

  • Temp Tables - Best Practice

    Hello, I have a customer who uses temp tables all over their application. This customer is a novice and the app has its roots in VB6. We are converting it to .net I would really like to know the best practice for using temp tables. I have seen code l

  • Understanding Sizes for Interactive PDFs

    This is what I want to accomplish: Create an interactive PDF with a page size of 1920x1280.  In other words when someone opens the pdf and they select the full screen the layout they will get a page that exactly fits the screen (assuming the user has

  • My iPhone 5 won't update/install apps !! Any idea why ?

    I own an iPhone 5, an iPhone 4s and an iPad 2. They are all not jailbroken and are all running on ios 6.0.1 While my iPhone 4s and iPad 2 have no problem updating existing apps or installing new apps, my iPhone 5 won't update/install apps all !! To b

  • Malfunction after solder

    I have been searching online for hours and hours, and could find no solutions:?Last spring, my Zen Micro began having the "headphone jack problem." After finally becoming fed up with it, my dad found a guide showing how to solder and fix it. I had ne

  • 1.0.3 Problem w 80G Classic

    After uppgrading to iPOD sw version 1.0.3, my iPOD ALWAYS restarts when it is turned off/on, halted, paused or put to "rest". This makes it impossible to listen to books etc where you want to continue where you paused. iPOD Model: MB029 / Version 1.0