Cluster node fails after testing removing both interconnects in a two node

Hi,
cluster node panics and fails to join cluster after testing removing both interconnects in a two node cluster. cluster is up on one node , but the panic'ed node fails to rejoin cluster saying no sufficient quorum yet and both clinterconn failed (even after conencting the interconn). Quorum device used is a shared disk.
Is this a bug?
Any workaround or solution?
Cluster is 3.2 SPARC
Thanking you
Ushas Symon

Sounds like a networking problem to me. If the failed node genuinely can't communicate with the remaining node then it will not be allowed to join the cluster, hence the quorum message. I would suspect either:
* Misconnected cables
* A switch that has block or disabled the port
* A failed auto-negotiation
This is of course without knowing anything about what your network infrastructure actually is!
Tim
---

Similar Messages

How can I get rid of "easy inline"-type hyperlinks even after having removed both easy-inline and yontoo?

I'm using Firefox 18.0.
I searched the internet for a solution to my problem, and the closest result was Mozilla Support item titled "The latest update to Firefox (14.0.1) has added something called "easy inline" which is very, very annoying. Please, anyone, how do I TURN IT OFF???", from July 18, 2012.
I followed all recommendations but "random" phrases still become hyperlinks that display "balloon" ads on mouse-over. As for Yontoo, I do recall downloading some software over the summer that brought Yontoo along with it, but I've since uninstalled both. When I came across the support article, above, I followed the link to uninstall Yantoo, and performed it again.
Currently, neither "Yontoo", nor "Easy inline" appear as Firefox extensions or add-ons, and there are no folders or files by (or containing) those names anywhere on my hard drive.
Does anyone have a suggestion as to what this could still be, and how to get it turned off, once and for all?
Thank You

The Reset Firefox feature can fix many issues by restoring Firefox to its factory default state while saving your essential information.
Note: ''This will cause you to lose any Extensions, Open websites, and some Preferences.''
To Reset Firefox do the following:
#Go to Firefox > Help > Troubleshooting Information.
#Click the "Reset Firefox" button.
#Firefox will close and reset. After Firefox is done, it will show a window with the information that is imported. Click Finish.
#Firefox will open with all factory defaults applied.
Further information can be found in the [[Reset Firefox – easily fix most problems]] article.
Did this fix your problems? Please report back to us!

Oracle RAC performance Suddenly terminates on one of the two node cluster

I have a strange problem that happens frequently from time to time when My M400 Machine which is a part of two node RAC cluster goes down suddenly
I tried so many times to understand what's the cause behind that but when I read the logs there are so many messages related to the Oracle RAC which I don't have any experience or knowledge about so I hope I can find here any one who can explain to me these log messages knowing that they are always the same
Jun 18 08:30:00 kfc-rac1 sendmail[17709]: [ID 702911 mail.crit] My unqualified host name (kfc-rac1) unknown; sleeping for retry
Jun 18 08:31:00 kfc-rac1 sendmail[17709]: [ID 702911 mail.alert] unable to qualify my own domain name (kfc-rac1) -- using short name
Jun 18 11:44:15 kfc-rac1 iscsi: [ID 454097 kern.notice] NOTICE: unrecognized ioctl 0x403
Jun 18 11:44:15 kfc-rac1 scsi: [ID 243001 kern.warning] WARNING: /pseudo/fcp@0 (fcp0):
Jun 18 11:44:15 kfc-rac1 Invalid ioctl opcode = 0x403
Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_udlm_monitor_stop> for resource <rac-udlm-rs>, resource group <rac-fw-rg>, node <kfc-rac1
, timeout <300> secondsJun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_framework_monitor_stop> for resource <rac-fw-rs>, resource group <rac-fw-rg>, node <kfc-r
ac1>, timeout <3600> seconds
Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_svm_monitor_stop> for resource <rac-svm-rs>, resource group <rac-fw-rg>, node <kfc-rac1>,
timeout <300> seconds
Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <scal_dg_monitor_stop> for resource <scal-racdg-rs>, resource group <scal-racdg-rg>, node <kfc-rac
1>, timeout <300> seconds
Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <scal_mountpoint_monitor_stop> for resource <racfs-mntpnt-rs>, resource group <racfs-mntpnt-rg>, n
ode <kfc-rac1>, timeout <300> seconds
Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_framework_monitor_stop> completed successfully for resource <rac-fw-rs>, resource group <rac-fw-rg>
, node <kfc-rac1>, time used: 0% of timeout <3600 seconds>
Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_udlm_monitor_stop> completed successfully for resource <rac-udlm-rs>, resource group <rac-fw-rg>, n
ode <kfc-rac1>, time used: 0% of timeout <300 seconds>
Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_svm_monitor_stop> completed successfully for resource <rac-svm-rs>, resource group <rac-fw-rg>, nod
e <kfc-rac1>, time used: 0% of timeout <300 seconds>
Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_udlm_stop> for resource <rac-udlm-rs>, resource group <rac-fw-rg>, node <kfc-rac1>, timeo
ut <300> seconds
Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <scal_dg_monitor_stop> completed successfully for resource <scal-racdg-rs>, resource group <scal-racdg-rg>,
node <kfc-rac1>, time used: 0% of timeout <300 seconds>
Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <scal_mountpoint_monitor_stop> completed successfully for resource <racfs-mntpnt-rs>, resource group <racfs-
mntpnt-rg>, node <kfc-rac1>, time used: 0% of timeout <300 seconds>
Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <scal_mountpoint_postnet_stop> for resource <racfs-mntpnt-rs>, resource group <racfs-mntpnt-rg>, n
ode <kfc-rac1>, timeout <300> seconds
Jun 18 17:09:41 kfc-rac1 SC[SUNW.rac_udlm.rac_udlm_stop]: [ID 854390 daemon.notice] Resource state of rac-udlm-rs is changed to offline. Note that RAC framework will not be stopped by STOP me
thod.
Jun 18 17:09:41 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_udlm_stop> completed successfully for resource <rac-udlm-rs>, resource group <rac-fw-rg>, node <kfc
-rac1>, time used: 0% of timeout <300 seconds>
Jun 18 17:09:42 kfc-rac1 samfs: [ID 320134 kern.notice] NOTICE: SAM-QFS: racfs: Initiated unmount filesystem: vers 2
Jun 18 17:09:43 kfc-rac1 samfs: [ID 522083 kern.notice] NOTICE: SAM-QFS: racfs: Completed unmount filesystem: vers 2
Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <scal_mountpoint_postnet_stop> completed successfully for resource <racfs-mntpnt-rs>, resource group <racfs-
mntpnt-rg>, node <kfc-rac1>, time used: 0% of timeout <300 seconds>
Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <scal_dg_postnet_stop> for resource <scal-racdg-rs>, resource group <scal-racdg-rg>, node <kfc-rac
1>, timeout <300> seconds
Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <scal_dg_postnet_stop> completed successfully for resource <scal-racdg-rs>, resource group <scal-racdg-rg>,
node <kfc-rac1>, time used: 0% of timeout <300 seconds>
Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_svm_stop> for resource <rac-svm-rs>, resource group <rac-fw-rg>, node <kfc-rac1>, timeout
<300> seconds
Jun 18 17:09:43 kfc-rac1 SC[SUNW.rac_svm.rac_svm_stop]: [ID 854390 daemon.notice] Resource state of rac-svm-rs is changed to offline. Note that RAC framework will not be stopped by STOP metho
d.
Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_svm_stop> completed successfully for resource <rac-svm-rs>, resource group <rac-fw-rg>, node <kfc-r
ac1>, time used: 0% of timeout <300 seconds>
Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/rac_framework_stop> for resource <rac-fw-rs>, resource group <rac-fw-rg>, node <kfc-rac1>, ti
meout <300> seconds
Jun 18 17:09:43 kfc-rac1 SC[SUNW.rac_framework.rac_framework_stop]: [ID 854390 daemon.notice] Resource state of rac-fw-rs is changed to offline. Note that RAC framework will not be stopped by
STOP method.
Jun 18 17:09:43 kfc-rac1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/rac_framework_stop> completed successfully for resource <rac-fw-rs>, resource group <rac-fw-rg>, node <
kfc-rac1>, time used: 0% of timeout <300 seconds>
Jun 18 17:09:44 kfc-rac1 root: [ID 702911 user.error] Oracle CRSD 3932 set to stop
Jun 18 17:09:44 kfc-rac1 root: [ID 702911 user.error] Oracle CRSD 3932 shutdown completed
Jun 18 17:09:44 kfc-rac1 root: [ID 702911 user.error] Oracle EVMD set to stop
Jun 18 17:09:44 kfc-rac1 root: [ID 702911 user.error] Oracle CSSD being stopped
Jun 18 17:09:45 kfc-rac1 xntpd[980]: [ID 866926 daemon.notice] xntpd exiting on signal 15
Jun 18 17:09:45 kfc-rac1 ip: [ID 646971 kern.notice] ip_create_dl: hw addr length = 0
Jun 18 17:09:45 kfc-rac1 pppd[516]: [ID 702911 daemon.notice] Connection terminated.
Jun 18 17:09:47 kfc-rac1 pppd[9462]: [ID 860527 daemon.notice] pppd 2.4.0b1 (Sun Microsystems, Inc.) started by root, uid 0
Jun 18 17:09:47 kfc-rac1 pppd[9462]: [ID 702911 daemon.notice] Connect: sppp0 <--> /dev/dm2s0
Jun 18 17:09:47 kfc-rac1 rpc.metamedd: [ID 702911 daemon.error] Terminated
Jun 18 17:09:48 kfc-rac1 inetd[482]: [ID 702911 daemon.warning] inetd_offline method for instance svc:/network/rpc/scrcmd:default is unspecified. Taking default action: kill.
Jun 18 17:09:48 kfc-rac1 inetd[482]: [ID 702911 daemon.warning] inetd_offline method for instance svc:/network/rpc/metacld:default is unspecified. Taking default action: kill.
Jun 18 17:09:49 kfc-rac1 inetd[482]: [ID 702911 daemon.warning] inetd_offline method for instance svc:/network/rpc/scadmd:default is unspecified. Taking default action: kill.
Jun 18 17:09:50 kfc-rac1 pppd[9462]: [ID 702911 daemon.notice] local IP address 192.168.224.2
Jun 18 17:09:50 kfc-rac1 pppd[9462]: [ID 702911 daemon.notice] remote IP address 192.168.224.1
Jun 18 17:09:50 kfc-rac1 cl_eventlogd[1554]: [ID 247336 daemon.error] Going down on signal 15.
Jun 18 17:09:52 kfc-rac1 ip: [ID 372019 kern.error] ipsec_check_inbound_policy: Policy Failure for the incoming packet (not secure); Source 192.168.224.001, Destination 192.168.224.002.
*Jun 18 17:09:56 kfc-rac1 ip: [ID 646971 kern.notice] ip_create_dl: hw addr length = 0*
*Jun 18 17:09:56 kfc-rac1 pppd[9462]: [ID 702911 daemon.notice] Connection terminated.*
*Jun 18 17:09:56 kfc-rac1 Cluster.PNM: [ID 226280 daemon.notice] PNM daemon exiting.*
*Jun 18 17:09:57 kfc-rac1 pseudo: [ID 129642 kern.info] pseudo-device: tod0*
*Jun 18 17:09:57 kfc-rac1 genunix: [ID 936769 kern.info] tod0 is /pseudo/tod@0*
*Jun 18 17:09:57 kfc-rac1 pseudo: [ID 129642 kern.info] pseudo-device: pm0*
*Jun 18 17:09:57 kfc-rac1 genunix: [ID 936769 kern.info] pm0 is /pseudo/pm@0*
*Jun 18 17:09:57 kfc-rac1 rpc.metad: [ID 702911 daemon.error] Terminated*
Jun 18 17:10:01 kfc-rac1 syslogd: going down on signal 15
*Jun 18 17:10:07 kfc-rac1 rpcbind: [ID 564983 daemon.error] rpcbind terminating on signal.*
*Jun 18 17:10:32 kfc-rac1 Cluster.RGM.fed: [ID 831843 daemon.notice] SCSLM thread WARNING pools facility is disabled*
*Jun 18 17:10:40 kfc-rac1 genunix: [ID 672855 kern.notice] syncing file systems...*
*Jun 18 17:10:40 kfc-rac1 genunix: [ID 904073 kern.notice] done*
Jun 19 14:20:12 kfc-rac1 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_141444-09 64-bit
Jun 19 14:20:12 kfc-rac1 genunix: [ID 943908 kern.notice] Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Jun 19 14:20:12 kfc-rac1 Use is subject to license terms.
Jun 19 14:20:12 kfc-rac1 genunix: [ID 678236 kern.info] Ethernet address = 0:21:28:2:21:b2
Thanks in advance for all of you
your response is highly appreciated

Hi I have checked the interconnect between the two nodes and it's as follow
ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 10.1.100.126 netmask ffffff00 broadcast 10.1.100.255
groupname sc_ipmp0
ether 0:14:4f:3a:6c:19
bge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
inet 10.1.100.127 netmask ffffff00 broadcast 10.1.100.255
bge0:2: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2
inet 10.1.100.140 netmask ffffff00 broadcast 10.1.100.255
bge1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 6
inet 172.16.0.129 netmask ffffff80 broadcast 172.16.0.255
ether 0:14:4f:3a:6c:1a
nxge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3
inet 10.1.100.128 netmask ffffff00 broadcast 10.1.100.255
groupname sc_ipmp0
ether 0:21:28:d:c9:8e
nxge1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 5
inet 172.16.1.1 netmask ffffff80 broadcast 172.16.1.127
ether 0:21:28:d:c9:8f
e1000g1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 4
inet 172.16.1.129 netmask ffffff80 broadcast 172.16.1.255
ether 0:15:17:81:15:c3
clprivnet0: flags=1009843<UP,BROADCAST,RUNNING,MULTICAST,MULTI_BCAST,PRIVATE,IPv4> mtu 1500 index 7
inet 172.16.4.1 netmask fffffe00 broadcast 172.16.5.255
ether 0:0:0:0:0:1
sppp0: flags=10010008d1<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST,IPv4,FIXEDMTU> mtu 1500 index 8
inet 192.168.224.2 --> 192.168.224.1 netmask ffffff00
ether 0:0:0:0:0:0
root@kfc-rac1 #
and it's direct attached between both nodes interfaces
back to back
and about the status of the hba cards here is it as well
fcinfo hba-port -l
HBA Port WWN: 2100001b3284c042
OS Device Name: /dev/cfg/c1
Manufacturer: QLogic Corp.
Model: 375-3355-02
Firmware Version: 05.01.00
FCode/BIOS Version: BIOS: 1.24; fcode: 1.24; EFI: 1.8;
Serial Number: 0402R00-0844647023
Driver Name: qlc
Driver Version: 20090519-2.31
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb 4Gb
Current Speed: 4Gb
Node WWN: 2000001b3284c042
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 0
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0
HBA Port WWN: 2100001b321c462b
OS Device Name: /dev/cfg/c2
Manufacturer: QLogic Corp.
Model: 375-3355-02
Firmware Version: 05.01.00
FCode/BIOS Version: BIOS: 1.24; fcode: 1.24; EFI: 1.8;
Serial Number: 0402R00-0844646557
Driver Name: qlc
Driver Version: 20090519-2.31
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb 4Gb
Current Speed: 4Gb
Node WWN: 2000001b321c462b
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 0
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0
HBA Port WWN: 2100001b32934b3c
OS Device Name: /dev/cfg/c3
Manufacturer: QLogic Corp.
Model: 375-3294-01
Firmware Version: 05.01.00
FCode/BIOS Version: BIOS: 2.2; fcode: 2.1; EFI: 2.0;
Serial Number: 0402R00-0947745866
Driver Name: qlc
Driver Version: 20090519-2.31
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb 4Gb
Current Speed: 4Gb
Node WWN: 2000001b32934b3c
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 0
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0
HBA Port WWN: 2101001b32b34b3c
OS Device Name: /dev/cfg/c4
Manufacturer: QLogic Corp.
Model: 375-3294-01
Firmware Version: 05.01.00
FCode/BIOS Version: BIOS: 2.2; fcode: 2.1; EFI: 2.0;
Serial Number: 0402R00-0947745866
Driver Name: qlc
Driver Version: 20090519-2.31
Type: unknown
State: offline
Supported Speeds: 1Gb 2Gb 4Gb
Current Speed: not established
Node WWN: 2001001b32b34b3c
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 0
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0
root@kfc-rac1 #
In addition here is the ocssd log file as well
http://www.4shared.com/file/Txl9DqLW/log_25155156.html?
you'll find on the lines for the dates in which this issue happens
look at 2012-06-09
2012-06-18
2012-06-21
you'll see something related to the voting disk
it suddenly becomes unavailable which causes the problem
thanks a lot for your help
I'm waiting for your recommendation
hope these logs gives more look for the problem
Thanks in advance :)

Testing ha-nfs in two node cluster (cannot statvfs /global/nfs: I/O error )

Hi all,
I am testing HA-NFS(Failover) on two node cluster. I have sun fire v240 ,e250 and Netra st a1000/d1000 storage. I have installed Solaris 10 update 6 and cluster packages on both nodes.
I have created one global file system (/dev/did/dsk/d4s7) and mounted as /global/nfs. This file system is accessible form both the nodes. I have configured ha-nfs according to the document, Sun Cluster Data Service for NFS Guide for Solaris, using command line interface.
Logical host is pinging from nfs client. I have mounted there using logical hostname. For testing purpose I have made one machine down. After this step files tem is giving I/O error (server and client). And when I run df command it is showing
df: cannot statvfs /global/nfs: I/O error.
I have configured with following commands.
#clnode status
# mkdir -p /global/nfs
# clresourcegroup create -n test1,test2 -p Pathprefix=/global/nfs rg-nfs
I have added logical hostname,ip address in /etc/hosts
I have commented hosts and rpc lines in /etc/nsswitch.conf
# clreslogicalhostname create -g rg-nfs -h ha-host-1 -N
sc_ipmp0@test1, sc_ipmp0@test2 ha-host-1
# mkdir /global/nfs/SUNW.nfs
Created one file called dfstab.user-home in /global/nfs/SUNW.nfs and that file contains follwing line
share -F nfs –o rw /global/nfs
# clresourcetype register SUNW.nfs
# clresource create -g rg-nfs -t SUNW.nfs ; user-home
# clresourcegroup online -M rg-nfs
Where I went wrong? Can any one provide document on this?
Any help..?
Thanks in advance.

test1# tail -20 /var/adm/messages
Feb 28 22:28:54 testlab5 Cluster.SMF.DR: [ID 344672 daemon.error] Unable to open door descriptor /var/run/rgmd_receptionist_door
Feb 28 22:28:54 testlab5 Cluster.SMF.DR: [ID 801855 daemon.error]
Feb 28 22:28:54 testlab5 Error in scha_cluster_get
Feb 28 22:28:54 testlab5 Cluster.scdpmd: [ID 489913 daemon.notice] The state of the path to device: /dev/did/rdsk/d5s0 has changed to OK
Feb 28 22:28:54 testlab5 Cluster.scdpmd: [ID 489913 daemon.notice] The state of the path to device: /dev/did/rdsk/d6s0 has changed to OK
Feb 28 22:28:58 testlab5 svc.startd[8]: [ID 652011 daemon.warning] svc:/system/cluster/scsymon-srv:default: Method "/usr/cluster/lib/svc/method/svc_scsymon_srv start" failed with exit status 96.
Feb 28 22:28:58 testlab5 svc.startd[8]: [ID 748625 daemon.error] system/cluster/scsymon-srv:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details)
Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 537175 daemon.notice] CMM: Node e250 (nodeid: 1, incarnation #: 1235752006) has become reachable.
Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 525628 daemon.notice] CMM: Cluster has reached quorum.
Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node e250 (nodeid = 1) is up; new incarnation number = 1235752006.
Feb 28 22:29:23 testlab5 Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node testlab5 (nodeid = 2) is up; new incarnation number = 1235840337.
Feb 28 22:37:15 testlab5 Cluster.CCR: [ID 499775 daemon.notice] resource group rg-nfs added.
Feb 28 22:39:05 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<5>:cmd=<null>:tag=<>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Feb 28 22:39:05 testlab5 Cluster.CCR: [ID 491081 daemon.notice] resource ha-host-1 removed.
Feb 28 22:39:17 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<5>:cmd=<null>:tag=<>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Feb 28 22:39:17 testlab5 Cluster.CCR: [ID 254131 daemon.notice] resource group nfs-rg removed.
Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_validate> for resource <ha-host-1>, resource group <rg-nfs>, node <testlab5>, timeout <300> seconds
Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 375444 daemon.notice] 8 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_validate>:tag=<rg-nfs.ha-host-1.2>: Calling security_clnt_connect(..., host=<testlab5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)
Feb 28 22:39:30 testlab5 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hafoip_validate> completed successfully for resource <ha-host-1>, resource group <rg-nfs>, node <testlab5>, time used: 0% of timeout <300 seconds>
Feb 28 22:39:30 testlab5 Cluster.CCR: [ID 973933 daemon.notice] resource ha-host-1 added.

Windows Server 2008 R2 SP1 2-Node cluster - Replace failed node

Hi - I have a two node Windows Server 2008 R2 SP1 fail-over cluster (DHCP, File, Print) where one of the nodes have failed beyond recovery. What I would like to do is to evict the failed cluster node and install a new machine with Windows Server 2008
R2 SP1 and re use the same name and Ip adress and then join this machine as a node in the cluster.
Is there any recommended steps to do this, i'm mostly thinking about the part of re-using the same name and ip address for the new node? (e.g. is there any cleanup more than evict the node?)
Enfo Zipper
Christoffer Andersson – Principal Advisor
http://blogs.chrisse.se - Directory Services Blog

Hi,
I agree with Noah Sparks, you can evict the corrupt node, if you reinstall then add the new server system you can just join it to cluster, it shouldn’t any error.
If you have seen some the evicted node rejoin to cluster meet “The cluster node is already a member of the cluster” error, it may need pack the KB2549472
hotfix.
The related KB:
How to Evict a Node from a Windows Server 2008 Failover Cluster
http://technet.microsoft.com/en-us/library/bb676524(v=exchg.80).aspx
Cluster node cannot rejoin the cluster after the node is restarted or removed from the cluster in Windows Server 2008 R2
http://support.microsoft.com/kb/2549472
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

Sap service fails after switching to the second node in HA system.

Hello all,
We have installed the HA mscs cluster installation for ECC6.0(my SAP erp 2005) system.The ASCS+SCS isntance and the DB instance are the failover resources.
When the ASCS+SCS(sap cluster group) fail over to the next node the service sap_<sid>_00 (ascs) and sap_<sid>_01(scs) fails to start automatically and we have to manually start the instance from the services console by giving the sapservice<sid> password to start the services.
when we try to start the service without giving the password then the system gives a message unable to start due to logon failure and after giving the password it shows that the user has been been granted to logon as a service user.
Pl help resolve this issue.
thx
satyajit

A change to the zpool_import() management of the zpool.cache file, as delivered by Solaris 10 kernel patches 137137-09 (for SPARC) or 137138-09 (for x86), might cause systems that have their shared ZFS (zfs(1M)) storage pools under the control of HAStoragePlus to be simultaneously imported on multiple cluster nodes. Importing a ZFS storage pool on multiple cluster nodes will result in pool corruption, which might cause data integrity issues or cause a cluster node to panic.
To avoid this problem, install Solaris 10 patch 139579-02 (for SPARC) or 139580-02 (for x86) immediately after you install 137137-09 or 137138-09 but before you reboot the cluster nodes.
Alternatively, only on the Solaris 10 5/08 OS, remove the affected patch before any ZFS pools are simultaneously imported to multiple cluster nodes. You cannot remove patch 137137-09 or 137138-09 from the Solaris 10 10/08 OS, because these patches are preinstalled on that release.

Simple two node Cluster Install - Hung after reboot of first node

Hello,
Over the past couple of days I have tried to install a simple two node cluster using two identical SunFire X4200s, firstly following the recipe in: http://www.sun.com/software/solaris/howtoguides/twonodecluster.jsp
and when that failed referring to http://docs.sun.com/app/docs/doc/819-0912 and http://docs.sun.com/app/docs/doc/819-2970.
I am trying to keep the install process as simple as possible, no switch, just back to back connections for the internal networking (node1 e1000g0 <--> node2 e1000g0, node1 e1000g1 <--> node2 e1000g1)
I ran the installer on both X4200s with default answers. This went through smoothly without problems.
I ran scinstall on node1, first time through, choosing "typical" as suggested in the how to guide. Everything goes OK (no errors) node2 reboots, but node1 just sits there waiting for node2, no errors, nothing....
I also tried rerunning scinstall choosing "Custom", and then selecting the no switch option. Same thing happened.
I must be doing something stupid, it's such a simple setup! Any ideas??
Here's the final screen from node1 (dcmds0) in both cases:
Cluster Creation
Log file - /var/cluster/logs/install/scinstall.log.940
Checking installation status ... done
The Sun Cluster software is installed on "dcmds0".
The Sun Cluster software is installed on "dcmds1".
Started sccheck on "dcmds0".
Started sccheck on "dcmds1".
sccheck completed with no errors or warnings for "dcmds0".
sccheck completed with no errors or warnings for "dcmds1".
Configuring "dcmds1" ... done
Rebooting "dcmds1" ...
Output from scconf on node2 (dcmds1):
bash-3.00# scconf -p
Cluster name: dcmdscluster
Cluster ID: 0x47538959
Cluster install mode: enabled
Cluster private net: 172.16.0.0
Cluster private netmask: 255.255.248.0
Cluster maximum nodes: 64
Cluster maximum private networks: 10
Cluster new node authentication: unix
Cluster authorized-node list: dcmds0 dcmds1
Cluster transport heart beat timeout: 10000
Cluster transport heart beat quantum: 1000
Round Robin Load Balancing UDP session timeout: 480
Cluster nodes: dcmds1
Cluster node name: dcmds1
Node ID: 1
Node enabled: yes
Node private hostname: clusternode1-priv
Node quorum vote count: 1
Node reservation key: 0x4753895900000001
Node zones: <NULL>
CPU shares for global zone: 1
Minimum CPU requested for global zone: 1
Node transport adapters: e1000g0 e1000g1
Node transport adapter: e1000g0
Adapter enabled: no
Adapter transport type: dlpi
Adapter property: device_name=e1000g
Adapter property: device_instance=0
Adapter property: lazy_free=1
Adapter property: dlpi_heartbeat_timeout=10000
Adapter property: dlpi_heartbeat_quantum=1000
Adapter property: nw_bandwidth=80
Adapter property: bandwidth=70
Adapter port names: <NULL>
Node transport adapter: e1000g1
Adapter enabled: no
Adapter transport type: dlpi
Adapter property: device_name=e1000g
Adapter property: device_instance=1
Adapter property: lazy_free=1
Adapter property: dlpi_heartbeat_timeout=10000
Adapter property: dlpi_heartbeat_quantum=1000
Adapter property: nw_bandwidth=80
Adapter property: bandwidth=70
Adapter port names: <NULL>
Cluster transport switches: <NULL>
Cluster transport cables
Endpoint Endpoint State
Quorum devices: <NULL>
Rob.

I have found out why the install hung - this needs to be added into the install guide(s) at once!! - It's VERY frustrating when an install guide is incomplete!
The solution is posted in the HA-Cluster OpenSolaris forums at:
http://opensolaris.org/os/community/ha-clusters/ohac/Documentation/SCXdocs/relnotes/#bugs
In particular, my problem was that I selected to make my Solaris install secure (A good idea, I thought!). Unfortunately, this stops Sun Cluster from working. To fix the problem you need to perform the following steps on each secured node:
Problem Summary: During Solaris installation, the setting of a restricted network profile disables external access to network services that Sun Cluster functionality uses, ie: The RPC communication service, which is required for cluster communication
Workaround: Restore external access to RPC communication.
Perform the following commands to restore external access to RPC communication.
# svccfg
svc:> select network/rpc/bind
svc:/network/rpc/bind> setprop config/local_only=false
svc:/network/rpc/bind> quit
# svcadm refresh network/rpc/bind:default
# svcprop network/rpc/bind:default | grep local_only
Once I applied these commands, the install process continued ... AT LAST!!!
Rob.

Unit test fails after upgrading to Kodo 4.0.0 from 4.0.0-EA4

I have a group of 6 unit tests failing after upgrading to the new Kodo
4.0.0 (with BEA) from Kodo-4.0.0-EA4 (with Solarmetric). I'm getting
exceptions like the one at the bottom of this email. It seems to be an
interaction with the PostgreSQL driver, though I can't be sure. I
haven't changed my JDO configuration or the related classes in months
since I've been focusing on using the objects that have already been
defined. The .jdo, .jdoquery, and .java code are below the exception,
just in case there's something wrong in there. Does anyone have advice
as to how I might debug this?
Thanks,
Mark
Testsuite: edu.ucsc.whisper.test.integration.UserManagerQueryIntegrationTest
Tests run: 15, Failures: 0, Errors: 6, Time elapsed: 23.308 sec
Testcase:
testGetAllUsersWithFirstName(edu.ucsc.whisper.test.integration.UserManagerQueryIntegrationTest):
Caused an ERROR
The column index is out of range: 2, number of columns: 1.
<2|false|4.0.0> kodo.jdo.DataStoreException: The column index is out of
range: 2, number of columns: 1.
at
kodo.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4092)
at kodo.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:82)
at kodo.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:66)
at kodo.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:46)
at
kodo.jdbc.kernel.SelectResultObjectProvider.handleCheckedException(SelectResultObjectProvider.java:176)
at
kodo.kernel.QueryImpl$PackingResultObjectProvider.handleCheckedException(QueryImpl.java:2460)
at
com.solarmetric.rop.EagerResultList.<init>(EagerResultList.java:32)
at kodo.kernel.QueryImpl.toResult(QueryImpl.java:1445)
at kodo.kernel.QueryImpl.execute(QueryImpl.java:1136)
at kodo.kernel.QueryImpl.execute(QueryImpl.java:901)
at kodo.kernel.QueryImpl.execute(QueryImpl.java:865)
at kodo.kernel.DelegatingQuery.execute(DelegatingQuery.java:787)
at kodo.jdo.QueryImpl.executeWithArray(QueryImpl.java:210)
at kodo.jdo.QueryImpl.execute(QueryImpl.java:137)
at
edu.ucsc.whisper.core.dao.JdoUserDao.findAllUsersWithFirstName(JdoUserDao.java:232)
at
edu.ucsc.whisper.core.manager.DefaultUserManager.getAllUsersWithFirstName(DefaultUserManager.java:252)
NestedThrowablesStackTrace:
org.postgresql.util.PSQLException: The column index is out of range: 2,
number of columns: 1.
at
org.postgresql.core.v3.SimpleParameterList.bind(SimpleParameterList.java:57)
at
org.postgresql.core.v3.SimpleParameterList.setLiteralParameter(SimpleParameterList.java:101)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.bindLiteral(AbstractJdbc2Statement.java:2085)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.setInt(AbstractJdbc2Statement.java:1133)
at
com.solarmetric.jdbc.DelegatingPreparedStatement.setInt(DelegatingPreparedStatement.java:390)
at
com.solarmetric.jdbc.PoolConnection$PoolPreparedStatement.setInt(PoolConnection.java:440)
at
com.solarmetric.jdbc.DelegatingPreparedStatement.setInt(DelegatingPreparedStatement.java:390)
at
com.solarmetric.jdbc.DelegatingPreparedStatement.setInt(DelegatingPreparedStatement.java:390)
at
com.solarmetric.jdbc.DelegatingPreparedStatement.setInt(DelegatingPreparedStatement.java:390)
at
com.solarmetric.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.setInt(LoggingConnectionDecorator.java:1
257)
at
com.solarmetric.jdbc.DelegatingPreparedStatement.setInt(DelegatingPreparedStatement.java:390)
at
com.solarmetric.jdbc.DelegatingPreparedStatement.setInt(DelegatingPreparedStatement.java:390)
at kodo.jdbc.sql.DBDictionary.setInt(DBDictionary.java:980)
at kodo.jdbc.sql.DBDictionary.setUnknown(DBDictionary.java:1299)
at kodo.jdbc.sql.SQLBuffer.setParameters(SQLBuffer.java:638)
at kodo.jdbc.sql.SQLBuffer.prepareStatement(SQLBuffer.java:539)
at kodo.jdbc.sql.SQLBuffer.prepareStatement(SQLBuffer.java:512)
at kodo.jdbc.sql.SelectImpl.execute(SelectImpl.java:332)
at kodo.jdbc.sql.SelectImpl.execute(SelectImpl.java:301)
at kodo.jdbc.sql.Union$UnionSelect.execute(Union.java:642)
at kodo.jdbc.sql.Union.execute(Union.java:326)
at kodo.jdbc.sql.Union.execute(Union.java:313)
at
kodo.jdbc.kernel.SelectResultObjectProvider.open(SelectResultObjectProvider.java:98)
at
kodo.kernel.QueryImpl$PackingResultObjectProvider.open(QueryImpl.java:2405)
at
com.solarmetric.rop.EagerResultList.<init>(EagerResultList.java:22)
at kodo.kernel.QueryImpl.toResult(QueryImpl.java:1445)
at kodo.kernel.QueryImpl.execute(QueryImpl.java:1136)
at kodo.kernel.QueryImpl.execute(QueryImpl.java:901)
at kodo.kernel.QueryImpl.execute(QueryImpl.java:865)
at kodo.kernel.DelegatingQuery.execute(DelegatingQuery.java:787)
at kodo.jdo.QueryImpl.executeWithArray(QueryImpl.java:210)
at kodo.jdo.QueryImpl.execute(QueryImpl.java:137)
at
edu.ucsc.whisper.core.dao.JdoUserDao.findAllUsersWithFirstName(JdoUserDao.java:232)
--- DefaultUser.java -------------------------------------------------
public class DefaultUser
implements User
/** The account username. */
private String username;
/** The account password. */
private String password;
/** A flag indicating whether or not the account is enabled. */
private boolean enabled;
/** The authorities granted to this account. */
private Set<Authority> authorities;
/** Information about the user, including their name and text that
describes them. */
private UserInfo userInfo;
/** The set of organizations where this user works. */
private Set<Organization> organizations;
--- DefaultUser.jdo --------------------------------------------------
<?xml version="1.0"?>
<!DOCTYPE jdo PUBLIC
"-//Sun Microsystems, Inc.//DTD Java Data Objects Metadata 2.0//EN"
"http://java.sun.com/dtd/jdo_2_0.dtd">
<jdo>
<package name="edu.ucsc.whisper.core">
<sequence name="user_id_seq"
factory-class="native(Sequence=user_id_seq)"/>
<class name="DefaultUser" detachable="true"
table="whisper_user" identity-type="datastore">
<datastore-identity sequence="user_id_seq" column="userId"/>
<field name="username">
<column name="username" length="80" jdbc-type="VARCHAR" />
</field>
<field name="password">
<column name="password" length="40" jdbc-type="CHAR" />
</field>
<field name="enabled">
<column name="enabled" />
</field>
<field name="userInfo" persistence-modifier="persistent"
default-fetch-group="true" dependent="true">
<extension vendor-name="jpox"
key="implementation-classes"
value="edu.ucsc.whisper.core.DefaultUserInfo" />
<extension vendor-name="kodo"
key="type"
value="edu.ucsc.whisper.core.DefaultUserInfo" />
</field>
<field name="authorities" persistence-modifier="persistent"
table="user_authorities"
default-fetch-group="true">
<collection
element-type="edu.ucsc.whisper.core.DefaultAuthority" />
<join column="userId" delete-action="cascade"/>
<element column="authorityId" delete-action="cascade"/>
</field>
<field name="organizations" persistence-modifier="persistent"
table="user_organizations" mapped-by="user"
default-fetch-group="true" dependent="true">
<collection
element-type="edu.ucsc.whisper.core.DefaultOrganization"
dependent-element="true"/>
<join column="userId"/>

</field>
</class>
</package>
</jdo>
--- DefaultUser.jdoquery ---------------------------------------------
<?xml version="1.0"?>
<!DOCTYPE jdo PUBLIC
"-//Sun Microsystems, Inc.//DTD Java Data Objects Metadata 2.0//EN"
"http://java.sun.com/dtd/jdo_2_0.dtd">
<jdo>
<package name="edu.ucsc.whisper.core">
<class name="DefaultUser">
<query name="UserByUsername"
language="javax.jdo.query.JDOQL"><![CDATA[
SELECT UNIQUE FROM edu.ucsc.whisper.core.DefaultUser
WHERE username==searchName
PARAMETERS java.lang.String searchName
]]></query>
<query name="DisabledUsers"
language="javax.jdo.query.JDOQL"><![CDATA[
SELECT FROM edu.ucsc.whisper.core.DefaultUser WHERE
enabled==false
]]></query>
<query name="EnabledUsers"
language="javax.jdo.query.JDOQL"><![CDATA[
SELECT FROM edu.ucsc.whisper.core.DefaultUser WHERE
enabled==true
]]></query>
<query name="CountUsers"
language="javax.jdo.query.JDOQL"><![CDATA[
SELECT count( this ) FROM edu.ucsc.whisper.core.DefaultUser
]]></query>
</class>
</package>
</jdo>

I'm sorry, I have no idea. I suggest sending a test case that
reproduces the problem to support.

Hyper-V Guest Cluster Node Failing Regularly

Hi,
We currently have a 4-node Server 2012 R2 Cluster witch hosts among other things, a 3 node Guest Cluster running a single clustered file service.
Around once a week, the guest cluster node that is currently hosting the clustered file service will fail. It's as if the VM is blue screening. That in itself is fairly anoying and I'll be doing all the updates and checking event log for clues
as to the cause.
The problem then is that whichever physical cluster node that is hosting the VM when it fails, will not unlock some of the VM's files. The Virtual machine configuration lists as Online Pending. This means that the failed VM cannot be restarted
on any other cluster node. The only fix is to drain the physical host it failed on, and reboot.
Looking for suggestions on how to fix the following.
1. Crashing guest file cluster node
2. Failed VM with shared VHDX requiring Phyiscal host reboot.
Event messages for the physical host that was hosting the failed vm in order that they occured.
Hyper-V-Worker: Event ID 18590 - 'FS-03' has encountered a fatal error. The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x9E, ErrorCode1: 0x6C2A17C0, ErrorCode2: 0x3C, ErrorCode3: 0xA, ErrorCode4:
0x0. If the problem persists, contact Product Support for the guest operating system. (Virtual machine ID 36166B47-D003-4E51-AFB5-7B967A3EFD2D)
FailoverClustering: Event ID 1069 - Cluster resource 'Virtual Machine FS-03' of type 'Virtual Machine' in clustered role 'FS-03' failed.
Hyper-V-High-Availability: Event ID 21128 - 'Virtual Machine FS-03' failed to shutdown the virtual machine during the resource termination. The virtual machine will be forcefully stopped.
Hyper-V-High-Availability: Event ID 21110 - 'Virtual Machine FS-03' failed to terminate.
Hyper-V-VMMS: Event ID 20108 - The Virtual Machine Management Service failed to start the virtual machine '36166B47-D003-4E51-AFB5-7B967A3EFD2D': The group or resource is not in the correct state to perform the requested operation. (0x8007139F).
Hyper-V-High-Availability: Event ID 21107 - 'Virtual Machine FS-03' failed to start.
FailoverClustering: Event ID 1205 - The Cluster service failed to bring clustered role 'FS-03' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Hi,
I don’t found the similar issue, Does your cluster can pass the cluster validation? Does all your Hyper-V host compatible with Server 2012r2? Have you try to disable all your
AV soft and firewall? Please rerun Storage validation on the Cluster in non-production hours, the cluster validation report will quickly locate the issue.
More information:
Cluster
http://technet.microsoft.com/en-us/library/dd581778(v=ws.10).aspx
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

Node failed to join the cluster because it ould not send and receive failure detection network messages

One of my customers has a Windows Server 2008 R2 cluster for an Exchange 2010 Mailbox Database Availability Group. Lately, they've been having problems with one of their nodes (the one node that is on a different subnet in a different datacenter) where
their Exchange databases aren't replicating. While looking into this issue it seems that the problem is the Network Manager isn't started because the cluster service is failing. Since the issue seems to be with the cluster service, and not Exchange,
I'm asking here.
When the cluster service starts, it appears to start working, but within a few minutes the following is logged in the system event log.
FailoverClustering
1572
Critical
Cluster Virtual Adapter
Node 'nodename' failed to join the cluster because it could not send and receive failure detection network messages with other cluster nodes. ...
It seems that the problem is with the 169.254 address on the cluster virtual adapter. An entry in the cluster.log file says: Aborting connection because NetFT route to node nodename on virtual IP 169.254.1.44:~3343~ has failed to come up.
In my experience, you never have to mess with the cluster virtual adapter. I'm not sure what happened here, but I doubt it has been modified. I need the cluster to communicate with its other nodes on our routed 10. network. I've never experienced
this before and found little in my searches on the subject. Any idea how I can fix this?
Thanks,
Joe
Joseph M. Durnal MCM: Exchange 2010 MCITP: Enterprise Messaging Administrator, Exchange 2010 MCITP: Enterprise Messaging Administrator, MCITP: Enterprise Administrator

Hi,
I suspected an issue with communication on UDP port 3343. Please confirm the set rules for port 3343 on all the nodes in firewall and enabled all connections for all the profiles
in firewall on all the nodes are opened, or confirm the connectivity of all the node.
Use ipconfig /flushdns to update all the node DNS register, then confirm the DNS in your DNS server entry is correct.
The similar issue article:
Exchange 2010 DAG - NetworkManager has not yet been initialized
https://blogs.technet.com/b/dblanch/archive/2012/03/05/exchange-2010-dag-networkmanager-has-not-yet-been-initialized.aspx?Redirected=true
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

[svn:bz-trunk] 7681: Fix bunch of failing config tests on BlazeDS/ trunk by removing javax.servlet. UnavailableException from the expected error string.

Revision: 7681
Author:   [email protected]
Date:     2009-06-09 11:44:36 -0700 (Tue, 09 Jun 2009)
Log Message:
Fix bunch of failing config tests on BlazeDS/trunk by removing javax.servlet.UnavailableException from the expected error string.
Modified Paths:
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/destination/IncorrectRootElement Test/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/AdaptiveServerT oClient/AdaptiveFrequencyTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/AdaptiveServerT oClient/FrequencyStepSizeTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/AdaptiveServerT oClient/MaxQueueSizeTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/DestinationWith NoChannelTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/DestinationWith NoIDTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/jms/InvalidAckn owledgeModeTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/jms/InvalidDeli veryModeTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/jms/InvalidDest inationTypeTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/jms/InvalidMess ageTypeTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/jms/NoConnectio nFactoryTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/jms/NoJNDINameT est/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leInbound/InvalidBufferPolicyTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leInbound/InvalidConflatePolicyTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leInbound/UnknownInboundPolicyTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leInbound/frequencies/McfGreaterthanMfTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leOutbound/InvalidBufferPolicyTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leOutbound/InvalidConflatePolicyTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leOutbound/InvalidErrorPolicyTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leOutbound/UnknownOutboundPolicyTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/throttle/thrott leOutbound/frequencies/McfGreaterthanMfTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/validation/nonE xistingValidatorTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/validation/same ExplicitTypeValidatorTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/validation/same TypeValidatorTest/error.txt
    blazeds/trunk/qa/apps/qa-regress/testsuites/config/tests/messagingService/validation/wron gTypeValidatorTest/error.txt

Cluster installation failed:OCR files are not shared across all nodes

Hi gurus,
I'm trying to install RAC 10g using vmware following hunter document to install oracle 10g Rac.
I'm using centos 4.6 ,I'm now stuck on installing cluster which shows error OCR file are not shared across all nodes.
Please advice

Review the Release notes, there is a bug and a fix since CRS fails on the last node.
This bug is for EL 5, but pretty match what I've found on the log files
I'm also in EL 4.6, x64 in my case.. haven't tried the fix due lack of time.
Provide log files from CRS_HOME/logs

Help : Cluster Fail over Test - Could not establish a connection

Hi All
          I'm trying to do Cluster fail over test with two Weblogic 8.1 sp2 instances in cluster.
          During that testing, I'm restarting the one of the instance which is handing my request, to make sure the session is replicated smoothly to the other instance,so that can continue accessing my application without any interuption. But when I restart the instance, I'm getting following exception
          Error 500--Internal Server Error
          java.rmi.ConnectException: Could not establish a connection with 8909815174098071019S:dappsn03:[8201,8201,-1,-1,8201,-1,-1,0,0]:dappsn03-04:TNL:tnl1_81dappsn03, java.rmi.ConnectException: Destination unreachable; nested exception is:
               java.net.ConnectException: Connection refused; No available router to destination
               at weblogic.rjvm.RJVMImpl.getOutputStream(RJVMImpl.java:316)
               at weblogic.rjvm.RJVMImpl.getRequestStream(RJVMImpl.java:488)
               at weblogic.rjvm.RJVMImpl.getOutboundRequest(RJVMImpl.java:584)
               at weblogic.rmi.internal.BasicRemoteRef.getOutboundRequest(BasicRemoteRef.java:91)
               at weblogic.rmi.internal.activation.ActivatableRemoteRef.invoke(ActivatableRemoteRef.java:69)
               at com.sns.pfk.ejb.PfkSessionBean_mz6mqm_EOImpl_812_WLStub.getPortalRecord(Unknown Source)
               at com.sns.pfk.servlet.PfkMainServlet.getInfofromSB(Unknown Source)
               at com.sns.pfk.servlet.PfkMainServlet.doActionDisplay(Unknown Source)
               at com.sns.pfk.servlet.PfkMainServlet.doGet(Unknown Source)
               at com.sns.pfk.servlet.PfkMainServlet.doPost(Unknown Source)
               at javax.servlet.http.HttpServlet.service(HttpServlet.java:760)
               at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
               at weblogic.servlet.internal.ServletStubImpl$ServletInvocationAction.run(ServletStubImpl.java:971)
               at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:402)
               at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:305)
               at weblogic.servlet.internal.RequestDispatcherImpl.include(RequestDispatcherImpl.java:607)
               at weblogic.servlet.internal.RequestDispatcherImpl.include(RequestDispatcherImpl.java:400)
               at com.sns.ana.ui.servlet.AuthorisationBaseServlet.service(AuthorisationBaseServlet.java:109)
               at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
               at weblogic.servlet.internal.ServletStubImpl$ServletInvocationAction.run(ServletStubImpl.java:971)
               at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:402)
               at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:305)
               at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:6350)
               at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:317)
               at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:118)
               at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.java:3635)
               at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2585)
               at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:197)
               at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:170)
          Buddies, anyone hit this issue before, pls show up some light to escape this hickup.
          With Regs
          -SHAN

Hi,
          Thanx for ones, spend time on reading this thread.This problem was due to some missing entries in weblogic-ejb.xml. This got fixed as we got support from BEA.
          With Regs
          -SHAN

Cluster node reboots after network failure

hi all,
The suncluster 3.1 8/05 with 2 nodes (E2900) was working fine without any errors in the sccheck.
yesterday one node rebooted saying a network failure,errors in the massage file are
Jan 17 08:00:36 PRD in.mpathd[221]: [ID 594170 daemon.error] NIC failure detected on ce0 of group sc_ipmp0
Jan 17 08:00:36 PRD Cluster.PNM: [ID 890413 daemon.notice] sc_ipmp0: state transition from OK to DOWN.
Jan 17 08:00:47 PRD Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource PROD status on node PRD change to R_FM_DEGRADED
Jan 17 08:00:47 PRD Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource PROD status msg on node PRD change to <IPMP Failure.>
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group CFS state on node PRD change to RG_PENDING_OFFLINE
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_MON_STOPPING
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_monitor_stop> for resource <PROD>, resource group <CFS>, timeout <300> seconds
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hafoip_monitor_stop> completed successfully for resource <PROD>, resource group <CFS>, time used: 0% of timeout <300 seconds>
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_ONLINE_UNMON
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_STOPPING
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_stop> for resource <PROD>, resource group <CFS>, timeout <300> seconds
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource PROD status on node PRD change to R_FM_UNKNOWN
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource PROD status msg on node PRD change to <Stopping>
Jan 17 08:00:51 PRD ip: [ID 678092 kern.notice] TCP_IOC_ABORT_CONN: local = 172.016.005.025:0, remote = 000.000.000.000:0, start = -2, end = 6
Jan 17 08:00:51 PRD ip: [ID 302654 kern.notice] TCP_IOC_ABORT_CONN: aborted 53 connections
what can be the reason for reabooting?
is there any way to avoid this, with only a failover?
rgds
Message was edited by:
suj

What is in that resource group? The cause is probably something with Failover_mode=HARD set. Check the manual reference section for this. The option would be to set the Failover_mode=SOFT.
Tim
---

Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason: The handle is invalid.

I'm stuck here trying to figure this error out.
2003 domain, 2012 hyper v core 3 nodes. (I have two of these hyper V groups, hvclust2012 is the problem group, hvclust2008 is okay)
In Failover Cluster Manager I see these errors, "Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason: The handle is invalid."
I restarted the host node that was listed in having the error then another node starts showing the errors.
I tried to follow this site: http://blog.subvertallmedia.com/2012/12/06/repairing-a-failover-cluster-in-windows-server-2012-live-migration-fails-dns-cluster-name-errors/
Then this error shows up when doing the repair: there was an error repairing the active directory object for 'Cluster Name'
I looked at our domain controller and noticed I don't have access to local users and groups. I can access our other hvclust2008 (both clusters are same version 2012).
<image here>
I came upon this thread: http://social.technet.microsoft.com/Forums/en-US/85fc2ad5-b0c0-41f0-900e-df1db8625445/windows-2012-cluster-resource-name-fails-dns-registration-evt-1196?forum=winserverClustering
Now, I'm stuck on adding a managed service account (mas). I'm not sure if I'm way off track to fix this. Any advice? Thanks in advance!
<image here>

Thanks Elton,
I restarted 3 hosts after applying the hotfix. Then I did the steps below and got stuck on step 5. That is when I get the error (image above). There
was an error repairing the active directory object for 'Cluster Name'. For more data, see 'Information Details'.
To reset the password on the affected name resource, perform the following steps:
From Failover Cluster Manager, locate the name resource.
Right-click on the resource, and click Properties.
On the Policies tab, select If resource fails, do not restart, and then click OK.
Right-click on the resource, click More Actions, and then click Simulate Failure.
When the name resource shows "Failed," right-click on the resource, click More Actions, and then click Repair.
After the name resource is online, right-click on the resource, and then click Properties.
On the Policies tab, select If resource fails, attempt restart on current node, and then click OK.
Thanks

Cluster node fails after testing removing both interconnects in a two node

Similar Messages

Maybe you are looking for