Automatic reboots in cluster nodes
Hi all,
I have installed sun cluster 3.3 on intel x86 machine in Vmware. I have 2 nodes.
Both the nodes reboot automatically or hang after some time.
Can you please tell the cause and how to troubleshoot it.
The memory assigned to both the nodes in VM is 1300 MB each.
So first I should point out that this is not an officially supported configuration which means there may be any number of issues that exist with this configuration. Having said that, I know that some people have made use of similar sorts of configurations.
To get a root cause, you need to look at the message logs (/var/adm/messages) for both nodes. See if there is anything to do with either loss of quorum or heartbeat tick timeouts. Both of those can lead to node panics. Once you have that information, it will be easier to search for a potential resolution.
Tim
---
Similar Messages
-
Node-2 automatically reboots while installing DB Softw're after cluster Ins
I'm trying to configure RAC on my Laptop on OEL-5 with 11g,
When i'm trying to install Database software after 75% of installation my second node automatically reboots.
Help
Thanks
BalaRefer
Time difference between the RAC nodes is out of sync
http://www.oracleracexpert.com/2009/12/time-difference-between-rac-nodes-is.html
Hope this helps,
Regards,
http://www.oracleracexpert.com
Remove Grid control agents or targets from repository
http://www.oracleracexpert.com/2010/06/remove-grid-control-agents-or-targets.html
Modify VIP Hostname in Oracle Cluster
http://www.oracleracexpert.com/2010/06/modifying-vip-address-or-vip-hostname.html
Edited by: Satishbabu Gunukula on Jul 7, 2010 11:15 AM -
DB Instance fails to automatically start on cluster reboot
Hello -
I cannot get the instance on node1 to automatically start on a reboot. It starts manually, and it starts automatically when the other node is rebooted. I also unregistered it, and re-registered the instance.
Here is the errors from $ORA_CRS_HOME/log/ctolinuxpoc01/crsd/crsd.log:
2009-02-03 15:52:08.496: [ CRSAPP][1494882624] StartResource error for ora.ractest.ractest1.inst error code = 1
2009-02-03 15:52:10.098: [ CRSRES][1494882624] Start of `ora.ractest.ractest1.inst` on member `ctolinuxpoc01` failed.
I do see for the DB instance log these errors:
<txt>ASMB (ospid: 4422): terminating the instance due to error 15064
</txt>
</msg>
<msg time='2009-02-03T15:43:25.399-05:00' org_id='oracle' comp_id='rdbms'
client_id='' type='UNKNOWN' level='16'
module='' pid='4149'>
<txt>Errors in file /opt/app/oracle/product/11.1.0/diag/rdbms/ractest/ractest1/trace/ractest1_diag_4149.trc:
ORA-27508: IPC error sending a message
ORA-27300: OS system dependent operation:sendmsg failed with status: 22
ORA-27301: OS failure message: Invalid argument
ORA-27302: failure occurred at: sskgxpsnd1
</txt>
</msg>
<msg time='2009-02-03T15:43:25.399-05:00' org_id='oracle' comp_id='rdbms'
client_id='' type='UNKNOWN' level='16'
module='' pid='4149'>
<txt>System state dump is made for local instance
</txt>
</msg>
<msg time='2009-02-03T15:43:25.399-05:00' org_id='oracle' comp_id='rdbms'
client_id='' type='UNKNOWN' level='16'
module='' pid='4149'>
<txt>System State dumped to trace file /opt/app/oracle/product/11.1.0/diag/rdbms/ractest/ractest1/trace/ractest1_diag_4149.trc
</txt>
</msg>
<msg time='2009-02-03T15:43:26.589-05:00' org_id='oracle' comp_id='rdbms'
client_id='' type='UNKNOWN' level='16'
module='' pid='4149'>
<txt>Trace dumping is performing id=[cdmp_20090203154325]
</txt>
</msg>
<msg time='2009-02-03T15:43:26.727-05:00' org_id='oracle' comp_id='rdbms'
client_id='' type='UNKNOWN' level='16'
module='' pid='4422'>
<txt>Instance terminated by ASMB, pid = 4422
</txt>
</msg>
The ASM instance was automatically started, and I am able to manually start the instance without any problems.
Any help is appreciated!
Thanks,ASMB (ospid: 4422): terminating the instance due to error 15064oerr ORA 15064
15064, 00000, "communication failure with ASM instance"
// *Cause: There was a failure to communicate with the ASM instance, most
// likely because the connection went down.
// *Action: Check the accompanying error messages for more information on the
// reason for the failure. Note that database instances will always
// return this error when the ASM instance is terminated abnormally.
ORA-27508: IPC error sending a message
ORA-27300: OS system dependent operation:sendmsg failed with status: 22
ORA-27301: OS failure message: Invalid argument
ORA-27302: failure occurred at: sskgxpsnd1What is the output from following command?:
/sbin/sysctl -a|grep net.core|egrep 'wmem|rmem' -
After reboot cluster node went into maintanance mode (CONTROL-D)
Hi there!
I have configured 2 node cluster on 2 x SUN Enterprise 220R and StoreEdge D1000.
Each time when rebooted any of the cluster nodes i get the following error during boot up:
The / file system (/dev/rdsk/c0t1d0s0) is being checked.
/dev/rdsk/c0t1d0s0: UNREF DIR I=35540 OWNER=root MODE=40755
/dev/rdsk/c0t1d0s0: SIZE=512 MTIME=Jun 5 15:02 2006 (CLEARED)
/dev/rdsk/c0t1d0s0: UNREF FILE I=1192311 OWNER=root MODE=100600
/dev/rdsk/c0t1d0s0: SIZE=96 MTIME=Jun 5 13:23 2006 (RECONNECTED)
/dev/rdsk/c0t1d0s0: LINK COUNT FILE I=1192311 OWNER=root MODE=100600
/dev/rdsk/c0t1d0s0: SIZE=96 MTIME=Jun 5 13:23 2006 COUNT 0 SHOULD BE 1
/dev/rdsk/c0t1d0s0: LINK COUNT INCREASING
/dev/rdsk/c0t1d0s0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
In maintanance mode i do:
# fsck -y -F ufs /dev/rdsk/c0t1d0s0
and it managed to correct the problem ... but problem occured again after each reboot on each cluster node!
I have installed Sun CLuster 3.1 on Solaris 9 SPARC
How can i get rid of it?
Any ideas?
Brgds,
SergejHi i get this:
112941-09 SunOS 5.9: sysidnet Utility Patch
116755-01 SunOS 5.9: usr/snadm/lib/libadmutil.so.2 Patch
113434-30 SunOS 5.9: /usr/snadm/lib Library and Differential Flash Patch
112951-13 SunOS 5.9: patchadd and patchrm Patch
114711-03 SunOS 5.9: usr/sadm/lib/diskmgr/VDiskMgr.jar Patch
118064-04 SunOS 5.9: Admin Install Project Manager Client Patch
113742-01 SunOS 5.9: smcpreconfig.sh Patch
113813-02 SunOS 5.9: Gnome Integration Patch
114501-01 SunOS 5.9: drmproviders.jar Patch
112943-09 SunOS 5.9: Volume Management Patch
113799-01 SunOS 5.9: solregis Patch
115697-02 SunOS 5.9: mtmalloc lib Patch
113029-06 SunOS 5.9: libaio.so.1 librt.so.1 and abi_libaio.so.1 Patch
113981-04 SunOS 5.9: devfsadm Patch
116478-01 SunOS 5.9: usr platform links Patch
112960-37 SunOS 5.9: patch libsldap ldap_cachemgr libldap
113332-07 SunOS 5.9: libc_psr.so.1 Patch
116500-01 SunOS 5.9: SVM auto-take disksets Patch
114349-04 SunOS 5.9: sbin/dhcpagent Patch
120441-03 SunOS 5.9: libsec patch
114344-19 SunOS 5.9: kernel/drv/arp Patch
114373-01 SunOS 5.9: UMEM - abi_libumem.so.1 patch
118558-27 SunOS 5.9: Kernel Patch
115675-01 SunOS 5.9: /usr/lib/liblgrp.so Patch
112958-04 SunOS 5.9: patch pci.so
113451-11 SunOS 5.9: IKE Patch
112920-02 SunOS 5.9: libipp Patch
114372-01 SunOS 5.9: UMEM - llib-lumem patch
116229-01 SunOS 5.9: libgen Patch
116178-01 SunOS 5.9: libcrypt Patch
117453-01 SunOS 5.9: libwrap Patch
114131-03 SunOS 5.9: multi-terabyte disk support - libadm.so.1 patch
118465-02 SunOS 5.9: rcm_daemon Patch
113490-04 SunOS 5.9: Audio Device Driver Patch
114926-02 SunOS 5.9: kernel/drv/audiocs Patch
113318-25 SunOS 5.9: patch /kernel/fs/nfs and /kernel/fs/sparcv9/nfs
113070-01 SunOS 5.9: ftp patch
114734-01 SunOS 5.9: /usr/ccs/bin/lorder Patch
114227-01 SunOS 5.9: yacc Patch
116546-07 SunOS 5.9: CDRW DVD-RW DVD+RW Patch
119494-01 SunOS 5.9: mkisofs patch
113471-09 SunOS 5.9: truss Patch
114718-05 SunOS 5.9: usr/kernel/fs/pcfs Patch
115545-01 SunOS 5.9: nss_files patch
115544-02 SunOS 5.9: nss_compat patch
118463-01 SunOS 5.9: du Patch
116016-03 SunOS 5.9: /usr/sbin/logadm patch
115542-02 SunOS 5.9: nss_user patch
116014-06 SunOS 5.9: /usr/sbin/usermod patch
116012-02 SunOS 5.9: ps utility patch
117433-02 SunOS 5.9: FSS FX RT Patch
117431-01 SunOS 5.9: nss_nis Patch
115537-01 SunOS 5.9: /kernel/strmod/ptem patch
115336-03 SunOS 5.9: /usr/bin/tar, /usr/sbin/static/tar Patch
117426-03 SunOS 5.9: ctsmc and sc_nct driver patch
121319-01 SunOS 5.9: devfsadmd_mod.so Patch
121316-01 SunOS 5.9: /kernel/sys/doorfs Patch
121314-01 SunOS 5.9: tl driver patch
116554-01 SunOS 5.9: semsys Patch
112968-01 SunOS 5.9: patch /usr/bin/renice
116552-01 SunOS 5.9: su Patch
120445-01 SunOS 5.9: Toshiba platform token links (TSBW,Ultra-3i)
112964-15 SunOS 5.9: /usr/bin/ksh Patch
112839-08 SunOS 5.9: patch libthread.so.1
115687-02 SunOS 5.9:/var/sadm/install/admin/default Patch
115685-01 SunOS 5.9: sbin/netstrategy Patch
115488-01 SunOS 5.9: patch /kernel/misc/busra
115681-01 SunOS 5.9: usr/lib/fm/libdiagcode.so.1 Patch
113032-03 SunOS 5.9: /usr/sbin/init Patch
113031-03 SunOS 5.9: /usr/bin/edit Patch
114259-02 SunOS 5.9: usr/sbin/psrinfo Patch
115878-01 SunOS 5.9: /usr/bin/logger Patch
116543-04 SunOS 5.9: vmstat Patch
113580-01 SunOS 5.9: mount Patch
115671-01 SunOS 5.9: mntinfo Patch
113977-01 SunOS 5.9: awk/sed pkgscripts Patch
122716-01 SunOS 5.9: kernel/fs/lofs patch
113973-01 SunOS 5.9: adb Patch
122713-01 SunOS 5.9: expr patch
117168-02 SunOS 5.9: mpstat Patch
116498-02 SunOS 5.9: bufmod Patch
113576-01 SunOS 5.9: /usr/bin/dd Patch
116495-03 SunOS 5.9: specfs Patch
117160-01 SunOS 5.9: /kernel/misc/krtld patch
118586-01 SunOS 5.9: cp/mv/ln Patch
120025-01 SunOS 5.9: ipsecconf Patch
116527-02 SunOS 5.9: timod Patch
117155-08 SunOS 5.9: pcipsy Patch
114235-01 SunOS 5.9: libsendfile.so.1 Patch
117152-01 SunOS 5.9: magic Patch
116486-03 SunOS 5.9: tsalarm Driver Patch
121998-01 SunOS 5.9: two-key mode fix for 3DES Patch
116484-01 SunOS 5.9: consconfig Patch
116482-02 SunOS 5.9: modload Utils Patch
117746-04 SunOS 5.9: patch platform/sun4u/kernel/drv/sparcv9/pic16f819
121992-01 SunOS 5.9: fgrep Patch
120768-01 SunOS 5.9: grpck patch
119438-01 SunOS 5.9: usr/bin/login Patch
114389-03 SunOS 5.9: devinfo Patch
116510-01 SunOS 5.9: wscons Patch
114224-05 SunOS 5.9: csh Patch
116670-04 SunOS 5.9: gld Patch
114383-03 SunOS 5.9: Enchilada/Stiletto - pca9556 driver
116506-02 SunOS 5.9: traceroute patch
112919-01 SunOS 5.9: netstat Patch
112918-01 SunOS 5.9: route Patch
112917-01 SunOS 5.9: ifrt Patch
117132-01 SunOS 5.9: cachefsstat Patch
114370-04 SunOS 5.9: libumem.so.1 patch
114010-02 SunOS 5.9: m4 Patch
117129-01 SunOS 5.9: adb Patch
117483-01 SunOS 5.9: ntwdt Patch
114369-01 SunOS 5.9: prtvtoc patch
117125-02 SunOS 5.9: procfs Patch
117480-01 SunOS 5.9: pkgadd Patch
112905-02 SunOS 5.9: ippctl Patch
117123-06 SunOS 5.9: wanboot Patch
115030-03 SunOS 5.9: Multiterabyte UFS - patch mount
114004-01 SunOS 5.9: sed Patch
113335-03 SunOS 5.9: devinfo Patch
113495-05 SunOS 5.9: cfgadm Library Patch
113494-01 SunOS 5.9: iostat Patch
113493-03 SunOS 5.9: libproc.so.1 Patch
113330-01 SunOS 5.9: rpcbind Patch
115028-02 SunOS 5.9: patch /usr/lib/fs/ufs/df
115024-01 SunOS 5.9: file system identification utilities
117471-02 SunOS 5.9: fifofs Patch
118897-01 SunOS 5.9: stc Patch
115022-03 SunOS 5.9: quota utilities
115020-01 SunOS 5.9: patch /usr/lib/adb/ml_odunit
113720-01 SunOS 5.9: rootnex Patch
114352-03 SunOS 5.9: /etc/inet/inetd.conf Patch
123056-01 SunOS 5.9: ldterm patch
116243-01 SunOS 5.9: umountall Patch
113323-01 SunOS 5.9: patch /usr/sbin/passmgmt
116049-01 SunOS 5.9: fdfs Patch
116241-01 SunOS 5.9: keysock Patch
113480-02 SunOS 5.9: usr/lib/security/pam_unix.so.1 Patch
115018-01 SunOS 5.9: patch /usr/lib/adb/dqblk
113277-44 SunOS 5.9: sd and ssd Patch
117457-01 SunOS 5.9: elfexec Patch
113110-01 SunOS 5.9: touch Patch
113077-17 SunOS 5.9: /platform/sun4u/kernal/drv/su Patch
115006-01 SunOS 5.9: kernel/strmod/kb patch
113072-07 SunOS 5.9: patch /usr/sbin/format
113071-01 SunOS 5.9: patch /usr/sbin/acctadm
116782-01 SunOS 5.9: tun Patch
114331-01 SunOS 5.9: power Patch
112835-01 SunOS 5.9: patch /usr/sbin/clinfo
114927-01 SunOS 5.9: usr/sbin/allocate Patch
119937-02 SunOS 5.9: inetboot patch
113467-01 SunOS 5.9: seg_drv & seg_mapdev Patch
114923-01 SunOS 5.9: /usr/kernel/drv/logindmux Patch
117443-01 SunOS 5.9: libkvm Patch
114329-01 SunOS 5.9: /usr/bin/pax Patch
119929-01 SunOS 5.9: /usr/bin/xargs patch
113459-04 SunOS 5.9: udp patch
113446-03 SunOS 5.9: dman Patch
116009-05 SunOS 5.9: sgcn & sgsbbc patch
116557-04 SunOS 5.9: sbd Patch
120241-01 SunOS 5.9: bge: Link & Speed LEDs flash constantly on V20z
113984-01 SunOS 5.9: iosram Patch
113220-01 SunOS 5.9: patch /platform/sun4u/kernel/drv/sparcv9/upa64s
113975-01 SunOS 5.9: ssm Patch
117165-01 SunOS 5.9: pmubus Patch
116530-01 SunOS 5.9: bge.conf Patch
116529-01 SunOS 5.9: smbus Patch
116488-03 SunOS 5.9: Lights Out Management (lom) patch
117131-01 SunOS 5.9: adm1031 Patch
117124-12 SunOS 5.9: platmod, drmach, dr, ngdr, & gptwocfg Patch
114003-01 SunOS 5.9: bbc driver Patch
118539-02 SunOS 5.9: schpc Patch
112837-10 SunOS 5.9: patch /usr/lib/inet/in.dhcpd
114975-01 SunOS 5.9: usr/lib/inet/dhcp/svcadm/dhcpcommon.jar Patch
117450-01 SunOS 5.9: ds_SUNWnisplus Patch
113076-02 SunOS 5.9: dhcpmgr.jar Patch
113572-01 SunOS 5.9: docbook-to-man.ts Patch
118472-01 SunOS 5.9: pargs Patch
122709-01 SunOS 5.9: /usr/bin/dc patch
113075-01 SunOS 5.9: pmap patch
113472-01 SunOS 5.9: madv & mpss lib Patch
115986-02 SunOS 5.9: ptree Patch
115693-01 SunOS 5.9: /usr/bin/last Patch
115259-03 SunOS 5.9: patch usr/lib/acct/acctcms
114564-09 SunOS 5.9: /usr/sbin/in.ftpd Patch
117441-01 SunOS 5.9: FSSdispadmin Patch
113046-01 SunOS 5.9: fcp Patch
118191-01 gtar patch
114818-06 GNOME 2.0.0: libpng Patch
117177-02 SunOS 5.9: lib/gss module Patch
116340-05 SunOS 5.9: gzip and Freeware info files patch
114339-01 SunOS 5.9: wrsm header files Patch
122673-01 SunOS 5.9: sockio.h header patch
116474-03 SunOS 5.9: libsmedia Patch
117138-01 SunOS 5.9: seg_spt.h
112838-11 SunOS 5.9: pcicfg Patch
117127-02 SunOS 5.9: header Patch
112929-01 SunOS 5.9: RIPv2 Header Patch
112927-01 SunOS 5.9: IPQos Header Patch
115992-01 SunOS 5.9: /usr/include/limits.h Patch
112924-01 SunOS 5.9: kdestroy kinit klist kpasswd Patch
116231-03 SunOS 5.9: llc2 Patch
116776-01 SunOS 5.9: mipagent patch
117420-02 SunOS 5.9: mdb Patch
117179-01 SunOS 5.9: nfs_dlboot Patch
121194-01 SunOS 5.9: usr/lib/nfs/statd Patch
116502-03 SunOS 5.9: mountd Patch
113331-01 SunOS 5.9: usr/lib/nfs/rquotad Patch
113281-01 SunOS 5.9: patch /usr/lib/netsvc/yp/ypbind
114736-01 SunOS 5.9: usr/sbin/nisrestore Patch
115695-01 SunOS 5.9: /usr/lib/netsvc/yp/yppush Patch
113321-06 SunOS 5.9: patch sf and socal
113049-01 SunOS 5.9: luxadm & liba5k.so.2 Patch
116663-01 SunOS 5.9: ntpdate Patch
117143-01 SunOS 5.9: xntpd Patch
113028-01 SunOS 5.9: patch /kernel/ipp/flowacct
113320-06 SunOS 5.9: patch se driver
114731-08 SunOS 5.9: kernel/drv/glm Patch
115667-03 SunOS 5.9: Chalupa platform support Patch
117428-01 SunOS 5.9: picl Patch
113327-03 SunOS 5.9: pppd Patch
114374-01 SunOS 5.9: Perl patch
115173-01 SunOS 5.9: /usr/bin/sparcv7/gcore /usr/bin/sparcv9/gcore Patch
114716-02 SunOS 5.9: usr/bin/rcp Patch
112915-04 SunOS 5.9: snoop Patch
116778-01 SunOS 5.9: in.ripngd patch
112916-01 SunOS 5.9: rtquery Patch
112928-03 SunOS 5.9: in.ndpd Patch
119447-01 SunOS 5.9: ses Patch
115354-01 SunOS 5.9: slpd Patch
116493-01 SunOS 5.9: ProtocolTO.java Patch
116780-02 SunOS 5.9: scmi2c Patch
112972-17 SunOS 5.9: patch /usr/lib/libssagent.so.1 /usr/lib/libssasnmp.so.1 mibiisa
116480-01 SunOS 5.9: IEEE 1394 Patch
122485-01 SunOS 5.9: 1394 mass storage driver patch
113716-02 SunOS 5.9: sar & sadc Patch
115651-02 SunOS 5.9: usr/lib/acct/runacct Patch
116490-01 SunOS 5.9: acctdusg Patch
117473-01 SunOS 5.9: fwtmp Patch
116180-01 SunOS 5.9: geniconvtbl Patch
114006-01 SunOS 5.9: tftp Patch
115646-01 SunOS 5.9: libtnfprobe shared library Patch
113334-03 SunOS 5.9: udfs Patch
115350-01 SunOS 5.9: ident_udfs.so.1 Patch
122484-01 SunOS 5.9: preen_md.so.1 patch
117134-01 SunOS 5.9: svm flasharchive patch
116472-02 SunOS 5.9: rmformat Patch
112966-05 SunOS 5.9: patch /usr/sbin/vold
114229-01 SunOS 5.9: action_filemgr.so.1 Patch
114335-02 SunOS 5.9: usr/sbin/rmmount Patch
120443-01 SunOS 5.9: sed core dumps on long lines
121588-01 SunOS 5.9: /usr/xpg4/bin/awk Patch
113470-02 SunOS 5.9: winlock Patch
119211-07 NSS_NSPR_JSS 3.11: NSPR 4.6.1 / NSS 3.11 / JSS 4.2
118666-05 J2SE 5.0: update 6 patch
118667-05 J2SE 5.0: update 6 patch, 64bit
114612-01 SunOS 5.9: ANSI-1251 encodings file errors
114276-02 SunOS 5.9: Extended Arabic support in UTF-8
117400-01 SunOS 5.9: ISO8859-6 and ISO8859-8 iconv symlinks
113584-16 SunOS 5.9: yesstr, nostr nl_langinfo() strings incorrect in S9
117256-01 SunOS 5.9: Remove old OW Xresources.ow files
112625-01 SunOS 5.9: Dcam1394 patch
114600-05 SunOS 5.9: vlan driver patch
117119-05 SunOS 5.9: Sun Gigabit Ethernet 3.0 driver patch
117593-04 SunOS 5.9: Manual Page updates for Solaris 9
112622-19 SunOS 5.9: M64 Graphics Patch
115953-06 Sun Cluster 3.1: Sun Cluster sccheck patch
117949-23 Sun Cluster 3.1: Core Patch for Solaris 9
115081-06 Sun Cluster 3.1: HA-Sun One Web Server Patch
118627-08 Sun Cluster 3.1: Manageability and Serviceability Agent
117985-03 SunOS 5.9: XIL 1.4.2 Loadable Pipeline Libraries
113896-06 SunOS 5.9: en_US.UTF-8 locale patch
114967-02 SunOS 5.9: FDL patch
114677-11 SunOS 5.9: International Components for Unicode Patch
112805-01 CDE 1.5: Help volume patch
113841-01 CDE 1.5: answerbook patch
113839-01 CDE 1.5: sdtwsinfo patch
115713-01 CDE 1.5: dtfile patch
112806-01 CDE 1.5: sdtaudiocontrol patch
112804-02 CDE 1.5: sdtname patch
113244-09 CDE 1.5: dtwm patch
114312-02 CDE1.5: GNOME/CDE Menu for Solaris 9
112809-02 CDE:1.5 Media Player (sdtjmplay) patch
113868-02 CDE 1.5: PDASync patch
119976-01 CDE 1.5: dtterm patch
112771-30 Motif 1.2.7 and 2.1.1: Runtime library patch for Solaris 9
114282-01 CDE 1.5: libDtWidget patch
113789-01 CDE 1.5: dtexec patch
117728-01 CDE1.5: dthello patch
113863-01 CDE 1.5: dtconfig patch
112812-01 CDE 1.5: dtlp patch
113861-04 CDE 1.5: dtksh patch
115972-03 CDE 1.5: dtterm libDtTerm patch
114654-02 CDE 1.5: SmartCard patch
117632-01 CDE1.5: sun_at patch for Solaris 9
113374-02 X11 6.6.1: xpr patch
118759-01 X11 6.6.1: Font Administration Tools patch
117577-03 X11 6.6.1: TrueType fonts patch
116084-01 X11 6.6.1: font patch
113098-04 X11 6.6.1: X RENDER extension patch
112787-01 X11 6.6.1: twm patch
117601-01 X11 6.6.1: libowconfig.so.0 patch
117663-02 X11 6.6.1: xwd patch
113764-04 X11 6.6.1: keyboard patch
113541-02 X11 6.6.1: XKB patch
114561-01 X11 6.6.1: X splash screen patch
113513-02 X11 6.6.1: platform support for new hardware
116121-01 X11 6.4.1: platform support for new hardware
114602-04 X11 6.6.1: libmpg_psr patch
Is there a bundle to install or i have to install each patch separatly_? -
Oracle Cluster Node Reboots Abruptly
One of our RAC 11gR2 Cluster Node rebooted abruptly. We found the following error in the grid home alter log file and ocssd.log file.
[cssd(6014)]CRS-1611:Network communication with node mumchora12 (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.190 secondsWe need to find the Root Cause for this node reboot. Kindly assist.
OS Version : RHEL 5.8
GRID : 11.2.0.2
Database : 11.2.0.2.10Hi,
By looking the logs it seems private interconnect problem. I would suggest you to refer one of nice metalink doc on same issue.
Node reboot or eviction: How to check if your private interconnect CRS can transmit network heartbeats [ID 1445075.1]
Hope it will help you to identify the root cause of node eviction.
Thanks -
Simple two node Cluster Install - Hung after reboot of first node
Hello,
Over the past couple of days I have tried to install a simple two node cluster using two identical SunFire X4200s, firstly following the recipe in: http://www.sun.com/software/solaris/howtoguides/twonodecluster.jsp
and when that failed referring to http://docs.sun.com/app/docs/doc/819-0912 and http://docs.sun.com/app/docs/doc/819-2970.
I am trying to keep the install process as simple as possible, no switch, just back to back connections for the internal networking (node1 e1000g0 <--> node2 e1000g0, node1 e1000g1 <--> node2 e1000g1)
I ran the installer on both X4200s with default answers. This went through smoothly without problems.
I ran scinstall on node1, first time through, choosing "typical" as suggested in the how to guide. Everything goes OK (no errors) node2 reboots, but node1 just sits there waiting for node2, no errors, nothing....
I also tried rerunning scinstall choosing "Custom", and then selecting the no switch option. Same thing happened.
I must be doing something stupid, it's such a simple setup! Any ideas??
Here's the final screen from node1 (dcmds0) in both cases:
Cluster Creation
Log file - /var/cluster/logs/install/scinstall.log.940
Checking installation status ... done
The Sun Cluster software is installed on "dcmds0".
The Sun Cluster software is installed on "dcmds1".
Started sccheck on "dcmds0".
Started sccheck on "dcmds1".
sccheck completed with no errors or warnings for "dcmds0".
sccheck completed with no errors or warnings for "dcmds1".
Configuring "dcmds1" ... done
Rebooting "dcmds1" ...
Output from scconf on node2 (dcmds1):
bash-3.00# scconf -p
Cluster name: dcmdscluster
Cluster ID: 0x47538959
Cluster install mode: enabled
Cluster private net: 172.16.0.0
Cluster private netmask: 255.255.248.0
Cluster maximum nodes: 64
Cluster maximum private networks: 10
Cluster new node authentication: unix
Cluster authorized-node list: dcmds0 dcmds1
Cluster transport heart beat timeout: 10000
Cluster transport heart beat quantum: 1000
Round Robin Load Balancing UDP session timeout: 480
Cluster nodes: dcmds1
Cluster node name: dcmds1
Node ID: 1
Node enabled: yes
Node private hostname: clusternode1-priv
Node quorum vote count: 1
Node reservation key: 0x4753895900000001
Node zones: <NULL>
CPU shares for global zone: 1
Minimum CPU requested for global zone: 1
Node transport adapters: e1000g0 e1000g1
Node transport adapter: e1000g0
Adapter enabled: no
Adapter transport type: dlpi
Adapter property: device_name=e1000g
Adapter property: device_instance=0
Adapter property: lazy_free=1
Adapter property: dlpi_heartbeat_timeout=10000
Adapter property: dlpi_heartbeat_quantum=1000
Adapter property: nw_bandwidth=80
Adapter property: bandwidth=70
Adapter port names: <NULL>
Node transport adapter: e1000g1
Adapter enabled: no
Adapter transport type: dlpi
Adapter property: device_name=e1000g
Adapter property: device_instance=1
Adapter property: lazy_free=1
Adapter property: dlpi_heartbeat_timeout=10000
Adapter property: dlpi_heartbeat_quantum=1000
Adapter property: nw_bandwidth=80
Adapter property: bandwidth=70
Adapter port names: <NULL>
Cluster transport switches: <NULL>
Cluster transport cables
Endpoint Endpoint State
Quorum devices: <NULL>
Rob.I have found out why the install hung - this needs to be added into the install guide(s) at once!! - It's VERY frustrating when an install guide is incomplete!
The solution is posted in the HA-Cluster OpenSolaris forums at:
http://opensolaris.org/os/community/ha-clusters/ohac/Documentation/SCXdocs/relnotes/#bugs
In particular, my problem was that I selected to make my Solaris install secure (A good idea, I thought!). Unfortunately, this stops Sun Cluster from working. To fix the problem you need to perform the following steps on each secured node:
Problem Summary: During Solaris installation, the setting of a restricted network profile disables external access to network services that Sun Cluster functionality uses, ie: The RPC communication service, which is required for cluster communication
Workaround: Restore external access to RPC communication.
Perform the following commands to restore external access to RPC communication.
# svccfg
svc:> select network/rpc/bind
svc:/network/rpc/bind> setprop config/local_only=false
svc:/network/rpc/bind> quit
# svcadm refresh network/rpc/bind:default
# svcprop network/rpc/bind:default | grep local_only
Once I applied these commands, the install process continued ... AT LAST!!!
Rob. -
Microsoft Cluster node service failing automatically
Hello Expert,
We have Net weaver 7.0 EHP 2 installed on Windows 2008 R2 for EP. It is installed on cluster environment.
We have 2 cluster node Host A and Host B. Also we have 2 services one is for database and another is for SCS. During the failover these 2 services will move to another node.
My problem is SCS cluster service is getting offline automatically which is making my entire EP production server down. As it gets down i manually start cluster service first then app server and my EP system gets start.
Please suggest how can i find the root cause for getting SCS service offline or How we can make it always online?
Regards,HI Sunil,
I checked dev_ms.old file and below is log:
trc file: "dev_ms", trc level: 1, release: "720"
[Thr 7224] Fri Mar 21 14:05:02 2014
[Thr 7224] ms/http_max_clients = 500 -> 500
[Thr 7224] MsSSetTrcLog: trc logging active, max size = 52428800 bytes
systemid 562 (PC with Windows NT)
relno 7200
patchlevel 0
patchno 101
intno 20020600
make multithreaded, Unicode, 64 bit, optimized
pid 9488
[Thr 7224] ***LOG Q01=> MsSInit, MSStart (Msg Server 1 9488) [msxxserv.c 2274]
[Thr 7224] Fri Mar 21 14:05:03 2014
[Thr 7224] load acl file = \\EP1SAPGRP\sapmnt\EP1\SYS\global\ms_acl_info.DAT
[Thr 7224] MsGetOwnIpAddr: my host addresses are :
[Thr 7224] 1 : [IP] HOST (HOSTNAME)
[Thr 7224] 2 : [127.0.0.1] FQDN (LOCALHOST)
[Thr 7224] 3 : [IP] FQDN (NILIST)
[Thr 7224] 4 : [IP] EPCLUSTER (NILIST)
[Thr 7224] 5 : [IP] EP1SAPGRP (NILIST)
[Thr 7224] 6 : [IP] EP1ORAGRP (NILIST)
[Thr 7224] 7 : [IP] FQDN (NILIST)
[Thr 7224] 8 : [IP] FQDN (NILIST)
[Thr 7224] MsHttpInit: full qualified hostname = NODE A
[Thr 7224] HTTP logging is switch off
[Thr 7224] set HTTP state to LISTEN
[Thr 7224] *** HTTP port 8110 state LISTEN ***
[Thr 7224] *** I listen to internal port 3910 (3910) ***
[Thr 7224] *** HTTP port 8110 state LISTEN ***
[Thr 7224] CUSTOMER KEY: ><
[Thr 7224] build version=720.2011.05.04
[Thr 7224] MsJ2EE_CheckLoggedInNode: logged in list is not initialized -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [114836600] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [114836600] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [128683700] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [128683700] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [128683700] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [128683751] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [128683751] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [128683751] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [139051900] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [139051900] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [139051900] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [114836650] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [114836650] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [114836650] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [139051951] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [139051951] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [139051951] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [139051950] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [139051950] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [139051950] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [128683750] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [128683750] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [128683750] into logged in list -
Cluster node reboots repeatedly
We have 2 node 10.1.0.3 cluster setup. We had a problem with a HBA card for the fibre channel to SAN and after replacing it, one of the cluster nodes keeps rebooting itself right after the Cluster processes startup.
We have had this issue once before and Support suggested the following.. Howevere the same solution is not working this time around.. Any ideas?
Check output of the unix command hostname is node1
Please rename cssnorun file in /etc/oracle/scls_scr/node1/root directory. Please issue "touch /etc/oracle/scls_scr/node1/root/crsdboot" and also change the permission and ownership of the file to match that of the node 2. Please check if there is any differences in permission, ownership, and the group for any files or directory structure under /etc/oracle between two nodes.
Please reboot node 1 after this change and see if you run into the same problem.
Please check if there is any /tmp/crsctl* files.Well especially if you are Linux RH4 the new controler card will have cause the device names to change. Check that out. It could be that you are no longer seeing you vote and crs partitions. This can happen on other operating systems if the devices now have a new name because the controller card has changed.
For Linux try the Man pages on udev and search for udev on OTN
Regards -
Hi all.
I have a 3 node cluster based on OES2 SP2a fully patched. There are a coupe of resources: Master_IP and a NSS volume.
The cluster is virtualized on ESXi 4.1 fully patched, and vmware-tools are installed and up to date.
If i do an "rcnetwork stop" on a node, it remains with no network for about 20 seconds, and then freezes. Does not reboot. Only freezes. The resource is balanced correctly, but the server remains hanged.
This behaviour is the same on a server with a cluster resource on it and on a server with no cluster resource on it. Always hangs.
The correct behaviour should be a reboot, shouldn't?
Any hints?
Thanks in advance.The node does not reboot because ....
9.11 Preventing a Cluster Node Reboot after a Node Shutdown
If LAN connectivity is lost between a cluster node and the other nodes in the cluster, it is possible that the lost node will be automatically shut down by the other cluster nodes. This is normal cluster operating behavior, and it prevents the lost node from trying to load cluster resources because it cannot detect the other cluster nodes. By default, cluster nodes are configured to reboot after an automatic shutdown.
On certain occasions, you might want to prevent a downed cluster node from rebooting so you can troubleshoot problems.
Section 9.11.1, OES 2 SP2 with Patches and Later
Section 9.11.2, OES 2 SP2 Release Version and Earlier
9.11.1 OES 2 SP2 with Patches and Later
Beginning in the OES 2 SP2 Maintenance Patch for May 2010, the Novell Cluster Services reboot behavior conforms to the kernel panic setting for the Linux operating system. By default the kernel panic setting is set for no reboot after a node shutdown.
You can set the kernel panic behavior in the /etc/sysctl.conf file by adding a kernel.panic command line. Set the value to 0 for no reboot after a node shutdown. Set the value to a positive integer value to indicate that the server should be rebooted after waiting the specified number of seconds. For information about the Linux sysctl, see the Linux man pages on sysctl and sysctl.conf.
1.
As the root user, open the /etc/sysctl.conf file in a text editor.
2.
If the kernel.panic token is not present, add it.
kernel.panic = 0
3.
Set the kernel.panic value to 0 or to a positive integer value, depending on the desired behavior.
No Reboot: To prevent an automatic cluster reboot after a node shutdown, set the kernel.panic token to value to 0. This allows the administrator to determine what caused the kernel panic condition before manually rebooting the server. This is the recommended setting.
kernel.panic = 0
Reboot: To allow a cluster node to reboot automatically after a node shutdown, set the kernel.panic token to a positive integer value that represents the seconds to delay the reboot.
kernel.panic = <seconds>
For example, to wait 1 minute (60 seconds) before rebooting the server, specify the following:
kernel.panic = 60
4.
Save your changes.
9.11.2 OES 2 SP2 Release Version and Earlier
In OES 2 SP release version and earlier, you can modify the opt/novell/ncs/bin/ldncs file for the cluster to trigger the server to not automatically reboot after a shutdown.
1.
Open the opt/novell/ncs/bin/ldncs file in a text editor.
2.
Find the following line:
echo -n $TOLERANCE > /proc/sys/kernel/panic
3.
Replace $TOLERANCE with a value of 0 to cause the server to not automatically reboot after a shutdown.
4.
After editing the ldncs file, you must reboot the server to cause the change to take effect. -
SCVMM losing connection to cluster nodes
Hey guys'n girls, I hope this is the right forum for this question. I already opened a ticket at MS support as well because it's impacting our production environment indirectly, but even after a week there's been no contact. Losing faith in MS support there
The problem we're having is that scvmm is that a host enters the 'needs attention' state, with a winrm error 0x80338126. I guess it has something to do with the network or with Kerberos, and I've found some info on it, but I still haven't been able to solve
it. Do you guys have any ideas?
Problem summary:
We are seeing an issue on our new hyper-v platform. The platform should have been in production last week, but this issue is delaying our project as we can't seem to get it stable.
The problem we are experiencing is that SCVMM loses the connection to some of the Hyper-V nodes. Not one
specific node. Last week it happened to two nodes, and today it happened to another node. I see issues with WinRM, and I expect something to do with kerberos. See the bottom of this post for background details and software versions.
The host gets the status 'needs attention', and if you look at the status of the machine, WinRM gives an error. The error is:
Error (2916)
VMM is unable to complete the request. The connection to the agent cc1-hyp-10.domaincloud1.local was lost.
WinRM: URL: [http://cc1-hyp-10.domaincloud1.local:5985], Verb: [ENUMERATE], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_Service], Filter: [select * from Win32_Service where Name="WinRM"]
Unknown error (0x80338126)
Recommended Action
Ensure that the Windows Remote Management (WinRM) service and the VMM agent are installed and running and that a firewall is not blocking HTTP/HTTPS traffic. Ensure that VMM server is able to communicate with cc1-hyp-10.domaincloud1.local over WinRM by successfully
running the following command:
winrm id –r:cc1-hyp-10.domaincloud1.local
This
problem can also be caused by a Windows Management Instrumentation (WMI) service crash. If the server is running Windows Server 2008 R2, ensure that KB 982293 (http://support.microsoft.com/kb/982293)
is installed on it.
If the error persists, restart cc1-hyp-10.domaincloud1.local and then try the operation again. /nRefer to
http://support.microsoft.com/kb/2742275 for more details.
Doing a simple test from the VMM server to the problematic cluster node shows this error:
PS C:\> hostname
CC1-VMM-01
PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
WSManFault
Message = WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this
computer. By default, the WinRM firewall exception for public profiles limits access to remote computers within the same local subnet.
Error number: -2144108250 0x80338126
WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this computer. By default, the WinRM
firewall exception for public profiles limits access to remote computers within the same local subnet.
I CAN connect from other hosts to this problematic cluster node:
PS C:\> hostname
CC1-HYP-16
PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
IdentifyResponse
ProtocolVersion =
http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
ProductVendor = Microsoft Corporation
ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
SecurityProfiles
SecurityProfileName =
http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
And I can connect from the vmm server to all other cluster nodes:
PS C:\> hostname
CC1-VMM-01
PS C:\> winrm id -r:cc1-hyp-11.domaincloud1.local
IdentifyResponse
ProtocolVersion =
http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
ProductVendor = Microsoft Corporation
ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
SecurityProfiles
SecurityProfileName =
http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
So at this point only the test from the cc1-vmm-01 to cc1-hyp-10 seems to be problematic.
I followed the steps in the page
https://support.microsoft.com/kb/2742275 (which is referred to above). I tried the VMMCA, but it can't really get it working the way I want, or it seems to give outdated recommendations.
I tried checking for duplicate SPN's by running setspn -x on affected machines. No results (although I do not understand
what an SPN is or how it works). I rebuilt the performance counters.
It tried setting 'sc config winrm type= own' as described in [http://blinditandnetworkadmin.blogspot.nl/2012/08/kb-how-to-troubleshoot-needs-attention.html].
If I reboot this cc1-hyp-10 machine, it will start working perfectly again. However, then I can't troubleshoot the issue, and it will happen again.
I want this problem to be solved, so vmm never loses connection to the hypervisors it's managing again!
Background information:
We've set up a platform with Hyper-V to run a VM workload. The platform consists of the following hardware:
2 Dell R620's with 32GB of RAM, running hyper-v to virtualize the cloud management layer (DC's, VMM, SQL). These machines are called cc1-hyp-01 and cc1-hyp-02. They run the management vm's like cc1-dc-01/02, cc1-sql-01, cc1-vmm-01, etc. The names are self-explanatory.
The VMM machine is NOT clustered.
8 Dell M620 blades with 320GB of RAM, running hyper-v to virtualize the customer workload. The machines are
called cc1-hyp-10 until cc1-hyp-17. They are in a cluster.
2 Equallogic units form a SAN (premium storage), and we have a Dell R515 running iscsi target (budget storage).
We have Dell Force10 switches and Cisco C3750X switches to connect everything together (mostly 10GB links).
All hosts run Windows Server 2012R2 Datacenter edition. The VMM server runs System Center Virtual Machine Manage 2012 R2.
All the latest Windows updates are installed on every host. There are no firewalls between any host (vmm and hypervisors) at this level. Windows firewalls are all disabled. No antivirus software is installed, no symantec software is installed.
The only non-standard software that is installed is the Dell Host Integration Tools 4.7.1, Dell Openmanage Server Administrator, and some small stuff like 7-zip, bginfo, net-snap, etc.
The SCVMM service is running under the domain account DOMAINCLOUD1\scvmm. This machine is in the local administrators group of each cluster node.
On top of this cloud layer we're running the tenant layer with a lot of vm's for a specific customer (although they are all off now).I think I found the culprit, after an hour of analyzing wireshark dumps I found the vmm had jumbo frames enabled on the management interface to the hosts (and the underlying infrastructure does not).. Now my winrm commands started working again.
-
Cluster with 2 hosts 2012 R2
Scheduled CAU fails with:
CAU run {4EFE116C-AB49-456D-8EED-F7EDC764DA49} on cluster Cluster1 failed. Error Message:One or more errors occurred while checking the status of Windows Firewall on the cluster nodes. Review the errors for more information on how to resolve the problems.
Error Code:-2146233088 Stack: at MS.Internal.ClusterAwareUpdating.Util.<CheckFirewallsAsync>d__3a.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ClusterAwareUpdating.Commands.InvokeCauRunCommand.<_ProcessCluster>d__78.MoveNext()
If I run CAU "Analyze Readiness" ALL comes as PASS
If I run CUA by hand on same hosts with NO change to the system (not even reboot) it finishes OK
Anybody any ideas?
Thanks
SebHi,
In some case if you disabled the connection in Windows firewall inbound of
"Cluster aware updating" service it will can’t use the CAU.
More information:
Starting with Cluster-Aware Updating: Self-Updating
http://blogs.technet.com/b/filecab/archive/2012/05/17/starting-with-cluster-aware-updating-self-updating.aspx
What is Cluster Aware Updating in Windows Server 2012? (Part 1)
http://blogs.technet.com/b/mspfe/archive/2013/02/06/what-is-cluster-aware-updating-in-windows-server-2012.aspx
Cluster-Aware Updating Overview
http://technet.microsoft.com/en-us/library/hh831694.aspx
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place. -
Question about cluster node NodeWeight property
Hi,
I have a three nodes (A/B/C) windows 2008 r2 sp1 cluster testCluster, and installed KB2494036 for three nodes,suppose Node A is a active node.
I configured node C's NodeWeight property to 0, and node A and node B keep default (NodeWeight=1). I also added a shared disk Q for cluster quorum.
So i want to know if node C and Node B are down , is the windows cluster testCluster down as lost of quorum or keep up?
At the first i thought testCluster should keep up , because the cluster has 2 votes (node A and quorum), node B is down, node C doesn't join voting. But after testing, testCluster was down as lost of quorum.
So anybody konw the reason,thanks.Hello mark.gao,
Let me see if I understand correctly your steps, so I can think that if you create your cluster with three nodes at the beginning your quorum model should be "Node Majority", then you have three votes one per each node.
Then was removed the vote for Node "C" and added a disk to be witness for cluster quorum, at this point we have two out of three votes from the original configuration on "Node Majority"
Question:
At some point you changed the quorum model to be "Node and Disk Majority"???
Maybe this is the issue, you are stuck on "Node Majority" and when "B" and "C" nodes are down we have only one vote from node "A" therefore there is no quorum to keep the service online.
On 2012 we have the awesome option to configure a Dynamic Quorum:
Dynamic quorum management
In Windows Server 2012, as an advanced quorum configuration option, you can choose to enable dynamic quorum management by cluster. When this option is enabled, the cluster dynamically manages
the vote assignment to nodes, based on the state of each node. Votes are automatically removed from nodes that leave active cluster membership, and a vote is automatically assigned when a node rejoins the cluster. By default, dynamic quorum management is enabled.
Note
With dynamic quorum management, the cluster quorum majority is determined by the set of nodes that are active members of the cluster at any time. This is an important distinction from the cluster quorum in Windows Server 2008 R2, where the quorum
majority is fixed, based on the initial cluster configuration.
With dynamic quorum management, it is also possible for a cluster to run on the last surviving cluster node. By dynamically adjusting the quorum majority requirement, the cluster can sustain
sequential node shutdowns to a single node.
The cluster-assigned dynamic vote of a node can be verified with the DynamicWeight common property of the cluster node by using the Get-ClusterNodeWindows
PowerShell cmdlet. A value of 0 indicates that the node does not have a quorum vote. A value of 1 indicates that the node has a quorum vote.
The vote assignment for all cluster nodes can be verified by using the Validate Cluster Quorum validation test.
Additional considerations
Dynamic quorum management does not allow the cluster to sustain a simultaneous failure of a majority of voting members. To continue running, the cluster must always have a quorum majority at the time of a node shutdown or failure.
If you have explicitly removed the vote of a node, the cluster cannot dynamically add or remove that vote.
Configure and Manage the Quorum in a Windows Server 2012 Failover Cluster
https://technet.microsoft.com/en-us/library/jj612870.aspx#BKMK_dynamic
Hope this info help you to reach your goal. :D
5ALU2 ! -
Hyper-V Guest Cluster Node Failing Regularly
Hi,
We currently have a 4-node Server 2012 R2 Cluster witch hosts among other things, a 3 node Guest Cluster running a single clustered file service.
Around once a week, the guest cluster node that is currently hosting the clustered file service will fail. It's as if the VM is blue screening. That in itself is fairly anoying and I'll be doing all the updates and checking event log for clues
as to the cause.
The problem then is that whichever physical cluster node that is hosting the VM when it fails, will not unlock some of the VM's files. The Virtual machine configuration lists as Online Pending. This means that the failed VM cannot be restarted
on any other cluster node. The only fix is to drain the physical host it failed on, and reboot.
Looking for suggestions on how to fix the following.
1. Crashing guest file cluster node
2. Failed VM with shared VHDX requiring Phyiscal host reboot.
Event messages for the physical host that was hosting the failed vm in order that they occured.
Hyper-V-Worker: Event ID 18590 - 'FS-03' has encountered a fatal error. The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x9E, ErrorCode1: 0x6C2A17C0, ErrorCode2: 0x3C, ErrorCode3: 0xA, ErrorCode4:
0x0. If the problem persists, contact Product Support for the guest operating system. (Virtual machine ID 36166B47-D003-4E51-AFB5-7B967A3EFD2D)
FailoverClustering: Event ID 1069 - Cluster resource 'Virtual Machine FS-03' of type 'Virtual Machine' in clustered role 'FS-03' failed.
Hyper-V-High-Availability: Event ID 21128 - 'Virtual Machine FS-03' failed to shutdown the virtual machine during the resource termination. The virtual machine will be forcefully stopped.
Hyper-V-High-Availability: Event ID 21110 - 'Virtual Machine FS-03' failed to terminate.
Hyper-V-VMMS: Event ID 20108 - The Virtual Machine Management Service failed to start the virtual machine '36166B47-D003-4E51-AFB5-7B967A3EFD2D': The group or resource is not in the correct state to perform the requested operation. (0x8007139F).
Hyper-V-High-Availability: Event ID 21107 - 'Virtual Machine FS-03' failed to start.
FailoverClustering: Event ID 1205 - The Cluster service failed to bring clustered role 'FS-03' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.Hi,
I don’t found the similar issue, Does your cluster can pass the cluster validation? Does all your Hyper-V host compatible with Server 2012r2? Have you try to disable all your
AV soft and firewall? Please rerun Storage validation on the Cluster in non-production hours, the cluster validation report will quickly locate the issue.
More information:
Cluster
http://technet.microsoft.com/en-us/library/dd581778(v=ws.10).aspx
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place. -
OrainstRoot.sh: Failure to promote local gpnp setup to other cluster nodes
I'm trying to build a 2 node cluster and everything appeared to be going swimmingly until the end of the 1st nodes running of the orainstRoot.sh script.
The following is the end of the output:
Disk Group OCR_VOTE created successfully.
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4256: Updating the profile
Successful addition of voting disk 4e3f692529584f8bbf7f16146bd90346.
Successful addition of voting disk 728bed918cf54f6cbf904d37638c674b.
Successful addition of voting disk 8ac20793405d4fdcbfcafc7e311f877d.
Successfully replaced voting disk group with +OCR_VOTE.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
1. ONLINE 4e3f692529584f8bbf7f16146bd90346 (ORCL:VOTE01) [OCR_VOTE]
2. ONLINE 728bed918cf54f6cbf904d37638c674b (ORCL:VOTE02) [OCR_VOTE]
3. ONLINE 8ac20793405d4fdcbfcafc7e311f877d (ORCL:VOTE03) [OCR_VOTE]
Located 3 voting disk(s).
Failed to rmtcopy "/tmp/fileLgKPGV" to "/u01/app/11.2.0/grid/gpnp/manifest.txt" for nodes {ilprevzedb01,ilprevzedb02}, rc=256
Failed to rmtcopy "/u01/app/11.2.0/grid/gpnp/ilprevzedb01/profiles/peer/profile.xml" to "/u01/app/11.2.0/grid/gpnp/profiles/peer/profile.xml" for nodes {ilprevzedb01,ilprevzedb02}, rc=256
rmtcopy aborted
Failed to promote local gpnp setup to other cluster nodes at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 6504.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
Has anyone run into this problem and found a solution?
Thanks in advance!Ok, for everyone out there, I resolved the issue. Hopefully this will help others encountering the same problem.
It turns out that when the OS was installed, iptables firewall was enabled. This will cause havoc with the installer scripts.
My first inkling should have been when the installer stalled at 65% trying to copy home directories between nodes, the first time I ran through the installer.
At that time, Googling around found that iptables might be the problem and indeed it was running, so I just did a 'service iptables stop' WITHOUT REBOOTING THE NODES and re-ran the installer.
Well, it looks as though NOT REBOOTING THE NODES doesn't quite cut it. I then did a 'chkconfig iptables off' and REBOOTED BOTH NODES.
Oracle support simply provided me with: How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation (Doc ID 942166.1), which didn't really work all that well, lots of failures, errors, etc. So I just deleted the 11.2.0 directory and tried running the installer again.
This time the install went through without problems.
Thanks! -
VMM Thinks Cluster Node is in Maintenance
I'm running VMM 2012 SP1 (version 3.1.6020.0). The cluster in question are Windows Server 2012 Datacenter.
I performed maintenance on one of my Hyper-V failover clusters (installed KB's in
this article
) and when I took one the nodes out of maintenance I successfully migrated VM's between the two via the Failover Cluster Manger console. However, I noticed that VMM still had the exclamation mark on the cluster name. I didn't noticed this until
a couple of days later and now I'm trying to do a cross-cluster migration and it's not allowing me because VMM thinks the node is in maintenance. I've tried rebooting the VMM server, refreshing the cluster, refreshing all the VMMs and no luck.
When I go into the Failover Cluster Manager on each of the cluster nodes, both nodes show in production (not in maintenance). Any ideas?
Note: the way that I took the node out of maintenance was via the Failover Cluster Manager console and NOT through VMM console, as the VMM server was unavailable at the time).It is interesting that VMM was unavailable at the time you were doing this. Are you able to refresh this particular host and see if anything changes? Are the option for "stop maintenance mode" available on this host from VMM?
Anyhow, the root cause here will be that the data in VMM database is not consistent with your resources, so as a last attempt you could remote - and add your cluster again, just so that the database will perform a clean up of the objects.
-kn
Kristian (Virtualization and some coffee: http://kristiannese.blogspot.com )
Maybe you are looking for
-
Dear Sir / Madam I WANT TO USE HP1020 PLUS LASERJET PRINTER WITH B& R PC . PC HAVING TWO FLASH CARD EACH HAVING MEMORY OF 1GB. WINDOW XP EMBEDDED SERVICE PACK-3 How much memory will be covered during printer driver installation.
-
Library paths and environment variables
Hello, I am using 10g AS 9.0.4 on a Unix system. I am trying to convert an old JSP web application to work with 10g. It is not an EAR/WAR file so I am manually trying to set it up under the "default-web-app" folder. It uses a number of java class fil
-
WHY CAN'T I GET EXTERNAL HARD DRIVE TO IMPORT IN iMOVIE?
I have linked my computer to an external hard drive to import clips with a firewire cable(the same one I use when I import from my camcorder). iMovie detects the device is plugged in, but is not playing the pictures or sound. Is there a setting I nee
-
Hello All, I had one interesting scenario from my client on allocation of expenses. This was possible using standard BCS functionality but was not straightforward. So would like to listen from you all on your solution before we discuss mine. The solu
-
SharePoint 2010 - Yammer webpart- Cannot connect to Groups
We have a SharePoint 2010 environment with the Yammer webpart installed when adding the Yammer webpart to a page we can connect to user feeds, company feed, but when we want to add a GroupFeed and try to search for groups we get following: Somebody e