Oracle Cluster Node Reboots Abruptly
One of our RAC 11gR2 Cluster Node rebooted abruptly. We found the following error in the grid home alter log file and ocssd.log file.
[cssd(6014)]CRS-1611:Network communication with node mumchora12 (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.190 secondsWe need to find the Root Cause for this node reboot. Kindly assist.
OS Version : RHEL 5.8
GRID : 11.2.0.2
Database : 11.2.0.2.10
Hi,
By looking the logs it seems private interconnect problem. I would suggest you to refer one of nice metalink doc on same issue.
Node reboot or eviction: How to check if your private interconnect CRS can transmit network heartbeats [ID 1445075.1]
Hope it will help you to identify the root cause of node eviction.
Thanks
Similar Messages
-
There is a two nodes cluster and running Oracle RAC DB. Yesterday both nodes rebooted at the same time (less than few seconds different). Don't know it was caused by Oracle CRS and server itsefl?
Here is the log:
/var/log/messages in node 1
Dec 8 15:14:38 dc01locs01 kernel: 493 http://RAIDarray.mppdcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18469446 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:14:38 dc01locs01 kernel: 493 http://RAIDarray.mppdcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18469448 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:17:20 dc01locs01 syslogd 1.4.1: restart.
Dec 8 15:17:20 dc01locs01 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Dec 8 15:17:20 dc01locs01 kernel: Linux version 2.6.18-128.7.1.0.1.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Mon Aug 24 14:07:09 EDT 2009
Dec 8 15:17:20 dc01locs01 kernel: Command line: ro root=/dev/vg00/root rhgb quiet crashkernel=128M@16M
Dec 8 15:17:20 dc01locs01 kernel: BIOS-provided physical RAM map:
ocssd.log in node 1
CSSD2009-12-08 15:14:33.467 1134680384 >TRACE: clssgmDispatchCMXMSG: msg type(13) src(2) dest(1) size(123) tag(00000000) incarnation(148585637)
CSSD2009-12-08 15:14:33.468 1134680384 >TRACE: clssgmHandleDataInvalid: grock HB+ASM, member 2 node 2, birth 1
CSSD2009-12-08 15:19:00.217 >USER: Copyright 2009, Oracle version 11.1.0.7.0
CSSD2009-12-08 15:19:00.217 >USER: CSS daemon log for node dc01locs01, number 1, in cluster ocsprodrac
clsdmtListening to (ADDRESS=(PROTOCOL=ipc)(KEY=dc01locs01DBG_CSSD))
CSSD2009-12-08 15:19:00.235 1995774848 >TRACE: clssscmain: Cluster GUID is 79db6803afc7df32ffd952110f22702c
CSSD2009-12-08 15:19:00.239 1995774848 >TRACE: clssscmain: local-only set to false
/var/log/messages in node 2
Dec 8 15:14:38 dc01locs02 kernel: 493 http://RAIDarray.mppdcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18561465 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:14:38 dc01locs02 kernel: 493 http://RAIDarray.mppdcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18561463 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:17:14 dc01locs02 syslogd 1.4.1: restart.
Dec 8 15:17:14 dc01locs02 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Dec 8 15:17:14 dc01locs02 kernel: Linux version 2.6.18-128.7.1.0.1.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Mon Aug 24 14:07:09 EDT 2009
Dec 8 15:17:14 dc01locs02 kernel: Command line: ro root=/dev/vg00/root rhgb quiet crashkernel=128M@16M
Dec 8 15:17:14 dc01locs02 kernel: BIOS-provided physical RAM map:
ocssd.log in node 2
CSSD2009-12-08 15:14:35.450 1264081216 >TRACE: clssgmExecuteClientRequest: Received data update request from client (0x2aaaac065a00), type 1
CSSD2009-12-08 15:14:36.909 1127713088 >TRACE: clssgmDispatchCMXMSG: msg type(13) src(1) dest(1) size(123) tag(00000000) incarnation(148585637)
CSSD2009-12-08 15:14:36.909 1127713088 >TRACE: clssgmHandleDataInvalid: grock HB+ASM, member 1 node 1, birth 0
CSSD2009-12-08 15:18:55.047 >USER: Copyright 2009, Oracle version 11.1.0.7.0
clsdmtListening to (ADDRESS=(PROTOCOL=ipc)(KEY=dc01locs02DBG_CSSD))
CSSD2009-12-08 15:18:55.047 >USER: CSS daemon log for node dc01locs02, number 2, in cluster ocsprodrac
CSSD2009-12-08 15:18:55.071 3628915584 >TRACE: clssscmain: Cluster GUID is 79db6803afc7df32ffd952110f22702c
CSSD2009-12-08 15:18:55.077 3628915584 >TRACE: clssscmain: local-only set to falseHi!
I suppose this seems easy: you have a service at 'http://RAIDarray.mppdcsgswsst6140:1:0:2' (a RAID perhaps?) which failed. Logically all servers connected to thi RAID went down at the same time.
Seems no Oracle problem. Good luck! -
Cluster node reboots repeatedly
We have 2 node 10.1.0.3 cluster setup. We had a problem with a HBA card for the fibre channel to SAN and after replacing it, one of the cluster nodes keeps rebooting itself right after the Cluster processes startup.
We have had this issue once before and Support suggested the following.. Howevere the same solution is not working this time around.. Any ideas?
Check output of the unix command hostname is node1
Please rename cssnorun file in /etc/oracle/scls_scr/node1/root directory. Please issue "touch /etc/oracle/scls_scr/node1/root/crsdboot" and also change the permission and ownership of the file to match that of the node 2. Please check if there is any differences in permission, ownership, and the group for any files or directory structure under /etc/oracle between two nodes.
Please reboot node 1 after this change and see if you run into the same problem.
Please check if there is any /tmp/crsctl* files.Well especially if you are Linux RH4 the new controler card will have cause the device names to change. Check that out. It could be that you are no longer seeing you vote and crs partitions. This can happen on other operating systems if the devices now have a new name because the controller card has changed.
For Linux try the Man pages on udev and search for udev on OTN
Regards -
If use MSSQ , when oracle rac node reboot, client get TPEOS error
Hi, all
in my tuxedo applicaton, if we use Single Server, Single Queue mode , when reboot any Oracle RAC node, our application is ok, client can get correct result. but if we use MSSQ(Multi Server, Single Queue) , if Oracle RAC node is ok , our application also is ok. but if we reboot any Oracle RAC node, client program can continue run, get correct result, but always get TPEOS error , for this situation, server can get client request, but client can not get server reply, only get TPEOS error.
our enviroment is :
oracle RAC ,10g 10.2.0.4 , two instances ,rac1 rac2, and two DTP services s1 and s2, set s1 and s2 services TAF is basic
tuxedo 10R3 , two nodes ,work in MP model ,use XA access oracle rac database,services have Transaction and not Transaction
OS is linux AS4 U5, 64bits
service program use OCI
can any one encounter this problem ?Hi, first thanks you
in ULOG file , only have failover information, not any other error message, in client side also has no other error.
not use MSSQ, ubb file about MSSQ config
SERVERS
DEFAULT:
CLOPT="-A "
sinUpdate_server SRVGRP=GROUP11 SRVID=80 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinUpdate_server SRVGRP=GROUP12 SRVID=160 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinCount_server SRVGRP=GROUP11 SRVID=240 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinCount_server SRVGRP=GROUP12 SRVID=320 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinSelect_server SRVGRP=GROUP11 SRVID=360 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinSelect_server SRVGRP=GROUP12 SRVID=400 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinInsert_server SRVGRP=GROUP11 SRVID=520 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinInsert_server SRVGRP=GROUP12 SRVID=560 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinDelete_server SRVGRP=GROUP11 SRVID=600 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinDelete_server SRVGRP=GROUP12 SRVID=640 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinDdl_server SRVGRP=GROUP11 SRVID=700 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinDdl_server SRVGRP=GROUP12 SRVID=740 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
lockselect_server SRVGRP=GROUP11 SRVID=800 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
lockselect_server SRVGRP=GROUP12 SRVID=840 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
#mulup_server SRVGRP=GROUP11 SRVID=1 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
#mulup_server SRVGRP=GROUP12 SRVID=60 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinUpdate_server SRVGRP=GROUP13 SRVID=83 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinUpdate_server SRVGRP=GROUP14 SRVID=164 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinCount_server SRVGRP=GROUP13 SRVID=243 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinCount_server SRVGRP=GROUP14 SRVID=324 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinSelect_server SRVGRP=GROUP13 SRVID=363 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinSelect_server SRVGRP=GROUP14 SRVID=404 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinInsert_server SRVGRP=GROUP13 SRVID=523 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinInsert_server SRVGRP=GROUP14 SRVID=564 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinDelete_server SRVGRP=GROUP13 SRVID=603 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinDelete_server SRVGRP=GROUP14 SRVID=644 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinDdl_server SRVGRP=GROUP13 SRVID=703 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
sinDdl_server SRVGRP=GROUP14 SRVID=744 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
lockselect_server SRVGRP=GROUP13 SRVID=803 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
lockselect_server SRVGRP=GROUP14 SRVID=844 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
#mulup_server SRVGRP=GROUP13 SRVID=13 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
#mulup_server SRVGRP=GROUP14 SRVID=64 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
WSL SRVGRP=GROUP11 SRVID=1000
CLOPT="-A -- -n//120.3.8.237:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
WSL SRVGRP=GROUP12 SRVID=1001
CLOPT="-A -- -n//120.3.8.238:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
WSL SRVGRP=GROUP13 SRVID=1003
CLOPT="-A -- -n//120.3.8.237:7203 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
WSL SRVGRP=GROUP14 SRVID=1004
CLOPT="-A -- -n//120.3.8.238:7204 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
if we use MSSQ ,ubb file about MSSQ config is
*SERVERS
DEFAULT:
CLOPT="-A -p 1,60:1,30"
sinUpdate_server SRVGRP=GROUP11 SRVID=80 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate11 REPLYQ=Y
sinUpdate_server SRVGRP=GROUP12 SRVID=160 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate12 REPLYQ=Y
sinCount_server SRVGRP=GROUP11 SRVID=240 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount11 REPLYQ=Y
sinCount_server SRVGRP=GROUP12 SRVID=320 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount12 REPLYQ=Y
sinSelect_server SRVGRP=GROUP11 SRVID=360 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelec11 REPLYQ=Y
sinSelect_server SRVGRP=GROUP12 SRVID=400 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelect12 REPLYQ=Y
sinInsert_server SRVGRP=GROUP11 SRVID=520 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert11 REPLYQ=Y
sinInsert_server SRVGRP=GROUP12 SRVID=560 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert12 REPLYQ=Y
sinDelete_server SRVGRP=GROUP11 SRVID=600 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete11 REPLYQ=Y
sinDelete_server SRVGRP=GROUP12 SRVID=640 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete12 REPLYQ=Y
sinDdl_server SRVGRP=GROUP11 SRVID=700 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl11 REPLYQ=Y
sinDdl_server SRVGRP=GROUP12 SRVID=740 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl12 REPLYQ=Y
lockselect_server SRVGRP=GROUP11 SRVID=800 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect11 REPLYQ=Y
lockselect_server SRVGRP=GROUP12 SRVID=840 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect12 REPLYQ=Y
#mulup_server SRVGRP=GROUP11 SRVID=1 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup11 REPLYQ=Y
#mulup_server SRVGRP=GROUP12 SRVID=60 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup12 REPLYQ=Y
sinUpdate_server SRVGRP=GROUP13 SRVID=83 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate13 REPLYQ=Y
sinUpdate_server SRVGRP=GROUP14 SRVID=164 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate14 REPLYQ=Y
sinCount_server SRVGRP=GROUP13 SRVID=243 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount13 REPLYQ=Y
sinCount_server SRVGRP=GROUP14 SRVID=324 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount14 REPLYQ=Y
sinSelect_server SRVGRP=GROUP13 SRVID=363 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelec13 REPLYQ=Y
sinSelect_server SRVGRP=GROUP14 SRVID=404 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelect14 REPLYQ=Y
sinInsert_server SRVGRP=GROUP13 SRVID=523 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert13 REPLYQ=Y
sinInsert_server SRVGRP=GROUP14 SRVID=564 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert14 REPLYQ=Y
sinDelete_server SRVGRP=GROUP13 SRVID=603 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete13 REPLYQ=Y
sinDelete_server SRVGRP=GROUP14 SRVID=644 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete14 REPLYQ=Y
sinDdl_server SRVGRP=GROUP13 SRVID=703 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl13 REPLYQ=Y
sinDdl_server SRVGRP=GROUP14 SRVID=744 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl14 REPLYQ=Y
lockselect_server SRVGRP=GROUP13 SRVID=803 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect13 REPLYQ=Y
lockselect_server SRVGRP=GROUP14 SRVID=844 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect14 REPLYQ=Y
#mulup_server SRVGRP=GROUP13 SRVID=13 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup13 REPLYQ=Y
#mulup_server SRVGRP=GROUP14 SRVID=64 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup14 REPLYQ=Y
WSL SRVGRP=GROUP11 SRVID=1000
CLOPT="-A -- -n//120.3.8.237:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
WSL SRVGRP=GROUP12 SRVID=1001
CLOPT="-A -- -n//120.3.8.238:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
WSL SRVGRP=GROUP13 SRVID=1003
CLOPT="-A -- -n//120.3.8.237:7203 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
WSL SRVGRP=GROUP14 SRVID=1004
CLOPT="-A -- -n//120.3.8.238:7204 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
about above ubb file ,has any error ? or not correct use MSSQ
look forward to you answer,thanks. -
Cluster node reboots after network failure
hi all,
The suncluster 3.1 8/05 with 2 nodes (E2900) was working fine without any errors in the sccheck.
yesterday one node rebooted saying a network failure,errors in the massage file are
Jan 17 08:00:36 PRD in.mpathd[221]: [ID 594170 daemon.error] NIC failure detected on ce0 of group sc_ipmp0
Jan 17 08:00:36 PRD Cluster.PNM: [ID 890413 daemon.notice] sc_ipmp0: state transition from OK to DOWN.
Jan 17 08:00:47 PRD Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource PROD status on node PRD change to R_FM_DEGRADED
Jan 17 08:00:47 PRD Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource PROD status msg on node PRD change to <IPMP Failure.>
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group CFS state on node PRD change to RG_PENDING_OFFLINE
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_MON_STOPPING
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_monitor_stop> for resource <PROD>, resource group <CFS>, timeout <300> seconds
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hafoip_monitor_stop> completed successfully for resource <PROD>, resource group <CFS>, time used: 0% of timeout <300 seconds>
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_ONLINE_UNMON
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource PROD state on node PRD change to R_STOPPING
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_stop> for resource <PROD>, resource group <CFS>, timeout <300> seconds
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource PROD status on node PRD change to R_FM_UNKNOWN
Jan 17 08:00:50 PRD Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource PROD status msg on node PRD change to <Stopping>
Jan 17 08:00:51 PRD ip: [ID 678092 kern.notice] TCP_IOC_ABORT_CONN: local = 172.016.005.025:0, remote = 000.000.000.000:0, start = -2, end = 6
Jan 17 08:00:51 PRD ip: [ID 302654 kern.notice] TCP_IOC_ABORT_CONN: aborted 53 connections
what can be the reason for reabooting?
is there any way to avoid this, with only a failover?
rgds
Message was edited by:
sujWhat is in that resource group? The cause is probably something with Failover_mode=HARD set. Check the manual reference section for this. The option would be to set the Failover_mode=SOFT.
Tim
--- -
Cluster node reboot and Quick Migration of VMs instead of Live Migration...
Hi to all,
how can one configure a Windows Server 2012 multi-node failover cluster, that vms are migrated per Live Migration and NOT per Quick Migration, if one node of the failover cluster will be rebooted.
Thanks in advance
JoergHi Aidan,
only for the record:
We get the requested functionality - Live migrate all VMs on reboot without first pausing the cluster- when we do the following:
Change the value of HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\PreshutdownOrder
from the default
vmms
wuauserv
gpsvc
trustedinstall
to
clussvc
vmms
wuauserv
gpsvc
trustedinstall
Now the cluster service stops at first, if we Trigger a reboot and all VMs migrate as configured per MoveTypeThreshold cluster setting.
Greetings
Joerg -
There is a 2 node RAC database and both nodes reboots yesterday at the same time. We uses OCSF2 for OCRs and voting disks.
Here is the folder structure:
1. /opt/oracle/data/crs1/ - stores 1st OCR and 1st voting disk
2. /opt/oracle/data/crs2/ - stores 2nd OCR and 2nd voting disk
3. /opt/oracle/data/crs3/ - 3rd voting disk
SA told us /opt/oracle/data/crs1/ and /opt/oracle/data/crs2/ disappeared for few minutes due to SAN problem, that mean Oracle CRS did not access both OCR for few mintues. Is it cause the cluster node reboot?
Version:
CRS - 11.1.0.7
ASM - 11.1.0.7
DB - 10.2.0.4
OS - Oracle Linux 2.6.18-128.7.1.0.1.el5
ThanksSorry, here is the log:
/var/log/messages in node 1
Dec 8 15:14:38 dc01locs01 kernel: 493 [RAIDarray.mpp]dcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18469446 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:14:38 dc01locs01 kernel: 493 [RAIDarray.mpp]dcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18469448 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:17:20 dc01locs01 syslogd 1.4.1: restart.
Dec 8 15:17:20 dc01locs01 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Dec 8 15:17:20 dc01locs01 kernel: Linux version 2.6.18-128.7.1.0.1.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Mon Aug 24 14:07:09 EDT 2009
Dec 8 15:17:20 dc01locs01 kernel: Command line: ro root=/dev/vg00/root rhgb quiet crashkernel=128M@16M
Dec 8 15:17:20 dc01locs01 kernel: BIOS-provided physical RAM map:
ocssd.log in node 1
[ CSSD]2009-12-08 15:14:33.467 [1134680384] >TRACE: clssgmDispatchCMXMSG: msg type(13) src(2) dest(1) size(123) tag(00000000) incarnation(148585637)
[ CSSD]2009-12-08 15:14:33.468 [1134680384] >TRACE: clssgmHandleDataInvalid: grock HB+ASM, member 2 node 2, birth 1
[ CSSD]2009-12-08 15:19:00.217 >USER: Copyright 2009, Oracle version 11.1.0.7.0
[ CSSD]2009-12-08 15:19:00.217 >USER: CSS daemon log for node dc01locs01, number 1, in cluster ocsprodrac
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=dc01locs01DBG_CSSD))
[ CSSD]2009-12-08 15:19:00.235 [1995774848] >TRACE: clssscmain: Cluster GUID is 79db6803afc7df32ffd952110f22702c
[ CSSD]2009-12-08 15:19:00.239 [1995774848] >TRACE: clssscmain: local-only set to false
/var/log/messages in node 2
Dec 8 15:14:38 dc01locs02 kernel: 493 [RAIDarray.mpp]dcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18561465 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:14:38 dc01locs02 kernel: 493 [RAIDarray.mpp]dcsgswsst6140:1:0:2 Cmnd failed-retry the same path. vcmnd SN 18561463 pdev H3:C0:T0:L2 0x02/0x04/0x01 0x08000002 mpp_status:1
Dec 8 15:17:14 dc01locs02 syslogd 1.4.1: restart.
Dec 8 15:17:14 dc01locs02 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Dec 8 15:17:14 dc01locs02 kernel: Linux version 2.6.18-128.7.1.0.1.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Mon Aug 24 14:07:09 EDT 2009
Dec 8 15:17:14 dc01locs02 kernel: Command line: ro root=/dev/vg00/root rhgb quiet crashkernel=128M@16M
Dec 8 15:17:14 dc01locs02 kernel: BIOS-provided physical RAM map:
ocssd.log in node 2
[ CSSD]2009-12-08 15:14:35.450 [1264081216] >TRACE: clssgmExecuteClientRequest: Received data update request from client (0x2aaaac065a00), type 1
[ CSSD]2009-12-08 15:14:36.909 [1127713088] >TRACE: clssgmDispatchCMXMSG: msg type(13) src(1) dest(1) size(123) tag(00000000) incarnation(148585637)
[ CSSD]2009-12-08 15:14:36.909 [1127713088] >TRACE: clssgmHandleDataInvalid: grock HB+ASM, member 1 node 1, birth 0
[ CSSD]2009-12-08 15:18:55.047 >USER: Copyright 2009, Oracle version 11.1.0.7.0
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=dc01locs02DBG_CSSD))
[ CSSD]2009-12-08 15:18:55.047 >USER: CSS daemon log for node dc01locs02, number 2, in cluster ocsprodrac
[ CSSD]2009-12-08 15:18:55.071 [3628915584] >TRACE: clssscmain: Cluster GUID is 79db6803afc7df32ffd952110f22702c
[ CSSD]2009-12-08 15:18:55.077 [3628915584] >TRACE: clssscmain: local-only set to false -
Changing Cluster node hostname
Dear all
Can i change hostname of box in cluster enviroment
Regards
DRAccording to SysAdmin magazine (it's not on there site but in the May 2006 edition) you can change the hostname of cluster hostnames by performing the following:
Reboot cluster nodes into non-cluster node (reboot -- -x)
Change the hostname of the system (nodenames, hosts etc)
Change hostname on all nodes within the files under /erc/cluster/ccr
Regenerate the checksums for each file changed using ccradm -I /etc/cluster/ccr/FILENAME -0Reboot every cluster node into the cluster.
I have no idea if this works but if it does then let me know. -
Hi all.
I have a 3 node cluster based on OES2 SP2a fully patched. There are a coupe of resources: Master_IP and a NSS volume.
The cluster is virtualized on ESXi 4.1 fully patched, and vmware-tools are installed and up to date.
If i do an "rcnetwork stop" on a node, it remains with no network for about 20 seconds, and then freezes. Does not reboot. Only freezes. The resource is balanced correctly, but the server remains hanged.
This behaviour is the same on a server with a cluster resource on it and on a server with no cluster resource on it. Always hangs.
The correct behaviour should be a reboot, shouldn't?
Any hints?
Thanks in advance.The node does not reboot because ....
9.11 Preventing a Cluster Node Reboot after a Node Shutdown
If LAN connectivity is lost between a cluster node and the other nodes in the cluster, it is possible that the lost node will be automatically shut down by the other cluster nodes. This is normal cluster operating behavior, and it prevents the lost node from trying to load cluster resources because it cannot detect the other cluster nodes. By default, cluster nodes are configured to reboot after an automatic shutdown.
On certain occasions, you might want to prevent a downed cluster node from rebooting so you can troubleshoot problems.
Section 9.11.1, OES 2 SP2 with Patches and Later
Section 9.11.2, OES 2 SP2 Release Version and Earlier
9.11.1 OES 2 SP2 with Patches and Later
Beginning in the OES 2 SP2 Maintenance Patch for May 2010, the Novell Cluster Services reboot behavior conforms to the kernel panic setting for the Linux operating system. By default the kernel panic setting is set for no reboot after a node shutdown.
You can set the kernel panic behavior in the /etc/sysctl.conf file by adding a kernel.panic command line. Set the value to 0 for no reboot after a node shutdown. Set the value to a positive integer value to indicate that the server should be rebooted after waiting the specified number of seconds. For information about the Linux sysctl, see the Linux man pages on sysctl and sysctl.conf.
1.
As the root user, open the /etc/sysctl.conf file in a text editor.
2.
If the kernel.panic token is not present, add it.
kernel.panic = 0
3.
Set the kernel.panic value to 0 or to a positive integer value, depending on the desired behavior.
No Reboot: To prevent an automatic cluster reboot after a node shutdown, set the kernel.panic token to value to 0. This allows the administrator to determine what caused the kernel panic condition before manually rebooting the server. This is the recommended setting.
kernel.panic = 0
Reboot: To allow a cluster node to reboot automatically after a node shutdown, set the kernel.panic token to a positive integer value that represents the seconds to delay the reboot.
kernel.panic = <seconds>
For example, to wait 1 minute (60 seconds) before rebooting the server, specify the following:
kernel.panic = 60
4.
Save your changes.
9.11.2 OES 2 SP2 Release Version and Earlier
In OES 2 SP release version and earlier, you can modify the opt/novell/ncs/bin/ldncs file for the cluster to trigger the server to not automatically reboot after a shutdown.
1.
Open the opt/novell/ncs/bin/ldncs file in a text editor.
2.
Find the following line:
echo -n $TOLERANCE > /proc/sys/kernel/panic
3.
Replace $TOLERANCE with a value of 0 to cause the server to not automatically reboot after a shutdown.
4.
After editing the ldncs file, you must reboot the server to cause the change to take effect. -
Node does not join cluster upon reboot
Hi Guys,
I have two servers [Sun Fire X4170] clustered together using Solaris cluster 3.3 for Oracle Database. They are connected to a shared storage which is Dell Equallogic [iSCSI]. Lately, I have ran into a weird kind of a problem where as both nodes come up fine and join the cluster upon reboot; however, when I reboot one of nodes then any of them does not join cluster and shows following errors:
This is happening on both the nodes [if I reboot only one node at a time]. But if I reboot both the nodes at the same time then they successfully join the cluster and everything runs fine.
Below is the output from one node which I rebooted and it did not join the cluster and puked out following errors. The other node is running fine will all the services.
In order to get out of this situation, I have to reboot both the nodes together.
# dmesg output #
Apr 23 17:37:03 srvhqon11 ixgbe: [ID 611667 kern.info] NOTICE: ixgbe2: link down
Apr 23 17:37:12 srvhqon11 iscsi: [ID 933263 kern.notice] NOTICE: iscsi connection(5) unable to connect to target SENDTARGETS_DISCOVERY
Apr 23 17:37:12 srvhqon11 iscsi: [ID 114404 kern.notice] NOTICE: iscsi discovery failure - SendTargets (010.010.017.104)
Apr 23 17:37:13 srvhqon11 iscsi: [ID 240218 kern.notice] NOTICE: iscsi session(9) iqn.2001-05.com.equallogic:0-8a0906-96cf73708-ef30000005e50a1b-sblprdbk online
Apr 23 17:37:13 srvhqon11 scsi: [ID 583861 kern.info] sd11 at scsi_vhci0: unit-address g6090a0887073cf961b0ae505000030ef: g6090a0887073cf961b0ae505000030ef
Apr 23 17:37:13 srvhqon11 genunix: [ID 936769 kern.info] sd11 is /scsi_vhci/disk@g6090a0887073cf961b0ae505000030ef
Apr 23 17:37:13 srvhqon11 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Apr 23 17:37:13 srvhqon11 /scsi_vhci/disk@g6090a0887073cf961b0ae505000030ef (sd11): Command failed to complete (3) on path iscsi0/[email protected]:0-8a0906-96cf73708-ef30000005e50a1b-sblprdbk0001,0
Apr 23 17:46:54 srvhqon11 svc.startd[11]: [ID 122153 daemon.warning] svc:/network/iscsi/initiator:default: Method or service exit timed out. Killing contract 41.
Apr 23 17:46:54 srvhqon11 svc.startd[11]: [ID 636263 daemon.warning] svc:/network/iscsi/initiator:default: Method "/lib/svc/method/iscsid start" failed due to signal KILL.
Apr 23 17:46:54 srvhqon11 svc.startd[11]: [ID 748625 daemon.error] network/iscsi/initiator:default failed repeatedly: transitioned to maintenance (see 'svcs -xv' for details)
Apr 24 14:50:16 srvhqon11 svc.startd[11]: [ID 694882 daemon.notice] instance svc:/system/console-login:default exited with status 1
root@srvhqon11 # svcs -xv
svc:/system/cluster/loaddid:default (Oracle Solaris Cluster loaddid)
State: offline since Tue Apr 23 17:46:54 2013
Reason: Start method is running.
See: http://sun.com/msg/SMF-8000-C4
See: /var/svc/log/system-cluster-loaddid:default.log
Impact: 49 dependent services are not running:
svc:/system/cluster/bootcluster:default
svc:/system/cluster/cl_execd:default
svc:/system/cluster/zc_cmd_log_replay:default
svc:/system/cluster/sc_zc_member:default
svc:/system/cluster/sc_rtreg_server:default
svc:/system/cluster/sc_ifconfig_server:default
svc:/system/cluster/initdid:default
svc:/system/cluster/globaldevices:default
svc:/system/cluster/gdevsync:default
svc:/milestone/multi-user:default
svc:/system/boot-config:default
svc:/system/cluster/cl-svc-enable:default
svc:/milestone/multi-user-server:default
svc:/application/autoreg:default
svc:/system/basicreg:default
svc:/system/zones:default
svc:/system/cluster/sc_zones:default
svc:/system/cluster/scprivipd:default
svc:/system/cluster/cl-svc-cluster-milestone:default
svc:/system/cluster/sc_svtag:default
svc:/system/cluster/sckeysync:default
svc:/system/cluster/rpc-fed:default
svc:/system/cluster/rgm-starter:default
svc:/application/management/common-agent-container-1:default
svc:/system/cluster/scsymon-srv:default
svc:/system/cluster/sc_syncsa_server:default
svc:/system/cluster/scslmclean:default
svc:/system/cluster/cznetd:default
svc:/system/cluster/scdpm:default
svc:/system/cluster/rpc-pmf:default
svc:/system/cluster/pnm:default
svc:/system/cluster/sc_pnm_proxy_server:default
svc:/system/cluster/cl-event:default
svc:/system/cluster/cl-eventlog:default
svc:/system/cluster/cl-ccra:default
svc:/system/cluster/ql_upgrade:default
svc:/system/cluster/mountgfs:default
svc:/system/cluster/clusterdata:default
svc:/system/cluster/ql_rgm:default
svc:/system/cluster/scqdm:default
svc:/application/stosreg:default
svc:/application/sthwreg:default
svc:/application/graphical-login/cde-login:default
svc:/application/cde-printinfo:default
svc:/system/cluster/scvxinstall:default
svc:/system/cluster/sc_failfast:default
svc:/system/cluster/clexecd:default
svc:/system/cluster/sc_pmmd:default
svc:/system/cluster/clevent_listenerd:default
svc:/application/print/server:default (LP print server)
State: disabled since Tue Apr 23 17:36:44 2013
Reason: Disabled by an administrator.
See: http://sun.com/msg/SMF-8000-05
See: man -M /usr/share/man -s 1M lpsched
Impact: 2 dependent services are not running:
svc:/application/print/rfc1179:default
svc:/application/print/ipp-listener:default
svc:/network/iscsi/initiator:default (?)
State: maintenance since Tue Apr 23 17:46:54 2013
Reason: Restarting too quickly.
See: http://sun.com/msg/SMF-8000-L5
See: /var/svc/log/network-iscsi-initiator:default.log
Impact: This service is not running.
######## Cluster Status from working node ############
root@srvhqon10 # cluster status
=== Cluster Nodes ===
--- Node Status ---
Node Name Status
srvhqon10 Online
srvhqon11 Offline
=== Cluster Transport Paths ===
Endpoint1 Endpoint2 Status
srvhqon10:igb3 srvhqon11:igb3 faulted
srvhqon10:igb2 srvhqon11:igb2 faulted
=== Cluster Quorum ===
--- Quorum Votes Summary from (latest node reconfiguration) ---
Needed Present Possible
2 2 3
--- Quorum Votes by Node (current status) ---
Node Name Present Possible Status
srvhqon10 1 1 Online
srvhqon11 0 1 Offline
--- Quorum Votes by Device (current status) ---
Device Name Present Possible Status
d2 1 1 Online
=== Cluster Device Groups ===
--- Device Group Status ---
Device Group Name Primary Secondary Status
--- Spare, Inactive, and In Transition Nodes ---
Device Group Name Spare Nodes Inactive Nodes In Transistion Nodes
--- Multi-owner Device Group Status ---
Device Group Name Node Name Status
=== Cluster Resource Groups ===
Group Name Node Name Suspended State
ora-rg srvhqon10 No Online
srvhqon11 No Offline
nfs-rg srvhqon10 No Online
srvhqon11 No Offline
backup-rg srvhqon10 No Online
srvhqon11 No Offline
=== Cluster Resources ===
Resource Name Node Name State Status Message
ora-listener srvhqon10 Online Online
srvhqon11 Offline Offline
ora-server srvhqon10 Online Online
srvhqon11 Offline Offline
ora-stor srvhqon10 Online Online
srvhqon11 Offline Offline
ora-lh srvhqon10 Online Online - LogicalHostname online.
srvhqon11 Offline Offline
nfs-rs srvhqon10 Online Online - Service is online.
srvhqon11 Offline Offline
nfs-stor-rs srvhqon10 Online Online
srvhqon11 Offline Offline
nfs-lh-rs srvhqon10 Online Online - LogicalHostname online.
srvhqon11 Offline Offline
backup-stor srvhqon10 Online Online
srvhqon11 Offline Offline
cluster: (C383355) No response from daemon on node "srvhqon11".
=== Cluster DID Devices ===
Device Instance Node Status
/dev/did/rdsk/d1 srvhqon10 Ok
/dev/did/rdsk/d2 srvhqon10 Ok
srvhqon11 Unknown
/dev/did/rdsk/d3 srvhqon10 Ok
srvhqon11 Unknown
/dev/did/rdsk/d4 srvhqon10 Ok
/dev/did/rdsk/d5 srvhqon10 Fail
srvhqon11 Unknown
/dev/did/rdsk/d6 srvhqon11 Unknown
/dev/did/rdsk/d7 srvhqon11 Unknown
/dev/did/rdsk/d8 srvhqon10 Ok
srvhqon11 Unknown
/dev/did/rdsk/d9 srvhqon10 Ok
srvhqon11 Unknown
=== Zone Clusters ===
--- Zone Cluster Status ---
Name Node Name Zone HostName Status Zone Status
Regards.check if your global devices are mounted properly
#cat /etc/mnttab | grep -i global
check if proper entries are there on both systems
#cat /etc/vfstab | grep -i global
give output for quoram devices .
#scstat -q
or
#clquorum list -v
also check why your scsi initiator service is going offline unexpectedly
#vi /var/svc/log/network-iscsi-initiator:default.log -
2 node rac cluster - continuous reboot
i have semi-successfully installed oracle clusterware on 2 nodes........... had trouble with the last screen running root.sh and orainstRoot.sh (whatever).
Now, what i have is the continuous reboot of 2 nodes............... i have tried restarting them at exactly the same time but the 1 node reboot as soon as you log in to the node.......... the other node just reboots at some stage.
So, how to resolve ?
I am using openfiler 2.2; enterprise linux 5.0 installing oracle 11.1.0.6.
I am following the instructions as posted on the otn.oracle.com website.........
My last step was to deinstall the clusterware software as it had issues during final stage of starting ONS and GND (or whatever). I had to reboot after removing the software and that is when continuous reboot cycle started.
Any help appreciated.....
THIS IS JUST A DEMO SYSTEM but still i would like to get it working as quickly as possible.I think Your system reboot.., Because CRS on Oracle Cluster Software had the problem... you should check logs at $ORA_CRS_HOME/log/<nodename>/*/ ... (about heart beat)
If you have problem reboot ... when you removing software ,you have make sure you stop RAC processes
ps -ef | grep init | grep crsAnyway If you don't need to RAC processes start, when you restart server...
You should comment on /etc/inittab file ()
#h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null
#h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
#h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/nullOr disable...
10gR1
/etc/init.d/init.crs disable
/etc/init.d/init.crs stopabove 10gR2
$ORA_CRS_HOME/bin/crsctl disable crs
$ORA_CRS_HOME/bincrsctl stop crsAbout Your System reboot, when on Oracle Cluster... You should check log... , HeartBeat (Interconnect and Disk (CRS File and Voting File))
Anyway Contact Oracle Support
Finally, You should installed RAC... If you find system reboot ... You should disable crs, and then investigate the problem (Check Log $ORA_CRS_HOME/log/<nodename>/crsd/*)
Good Luck -
Oracle RAC Nodes getting reboot in case of preferred controller failed
When we are disconnecting both Fiber cable from preferred Controller A or plugging out Controller A card from Disk Array(IBM DS 4300), After 90 seconds both the servers are rebooting.
In this time complete RAC network is going out of service for approx 5 minutes.After reboot both servers are coming with both instances without any manual intervention
Its a critical issue for us because we are loosing High Availability, Let us know how we can resolve this critical issue.
Detail of Network:
1. Software- Oracle 10g Release2
2. OS- Redhat Linux 3 (Kernel Version-2.4.21-27.ELsmp)
3. Shared Storage- IBM DS 4300.
4. Multipathing Driver - RDAC (rdac-LINUX-09.00 A5.13)
4. Nodes- IBM 346
5. Databse on ASM
6. ASM,OCR & Voting Disk Preferred controller is A.
7. Hangcheck timer value is 210 seconds.
8. Both Server available with 2 HBA port . I HBA port is connected with Controller A and Seconfd HBA port is connected with Controller B of SAN Disk Array.
As per my understanding,
Voting disk resides in Disk Array and Controller A is preferred owner of Voting Disk LUN.. When i am disconnecting both fiber cable from preferred controller A , then Both Nodes Clusterware software trying to contact with Voting Disk, When they are unable to contact with Voting disk in specfic time period, they are going for reboot.
I tested Controller failure testing with Oracle RAC software as well without Oracle. Without Oracle its working fine and reason behind, in that time Disk Array is waiting for approx 300 seconds for changing preferred controlller from A to B.
But With Oracle, Clusterware Software reboot both nodes before Controller can shift from A to B.
So if i conclude,the tech who has good understanding of Oracle Clusterware on Linux OS & IBM RDAC multipath driver can help me.
when we install Oracle RAC on Linux, it is required to configure hangcheck timer.
Oracle recomends 180 second.
It means if one of node is hanging, then second node will wait for 180 seconds, if within 180 seconds ,it is not able to resolve this situation then it will reboot hung node.
I think Hangcheck timer configuration reuired only with Linux OS.
Configuration File
cat >> /etc/rc.d/rc.local << EOF
modprobe hangcheck-timer hangcheck_tick=15 hangcheck_margin=60Sorry
Hangcheck timer is
Configuration File
cat >> /etc/rc.d/rc.local << EOF
modprobe hangcheck-timer hangcheck_tick=30 hangcheck_margin=180 -
Do I use same oracle account on 2 cluster nodes cause problem?
Do I use same oracle account on 2 cluster nodes cause problem?
If I use same oracle account on 2 cluster nodes running 2 database, when failover happens, 2 database will be running on one node, does 2 oracle account make SHM ... memory conflict?
or do I have to use oracle01 account on node1, oracle02 account on node2? Can not use same name account?
Thanks.I'm not 100% certain I understood the question, so I'll rephrase them and answer them.
Q. If I have the same Oracle account on each cluster node, e.g. uid=100 (oracle) gid=100 (oinstall), groups dba=200, can I run two databases, one on each cluster node without problems?
A. Yes. Having multiple DBs on one node is not a problem and doesn't cause shared memory problems. Obviously each database needs a different database name and thus different SID.
Q. Can I have two different Oracle accounts on each cluster node e.g. uid=100 (oraclea) gid=100 (oinstall), groups dba=200 and e.g. uid=300 (oracleb) gid=100 (oinstall), groups dba=200, and run two databases, one for each Oracle user?
A. Yes. The different Oracle user names would need to be associated with different Oracle installations, i.e. Oracle HOMEs. So you might have /oracle/oracle/product/10.2.0/db_1 (oraclea) and /oracle/oracle/product/11.0.1.0/db_1 (oracleb). The ORACLE_HOME is then used to determine the Oracle user name by checking the user of the Oracle binary in the ${ORACLE_HOME}/bin directory.
Tim
--- -
Automatic reboots in cluster nodes
Hi all,
I have installed sun cluster 3.3 on intel x86 machine in Vmware. I have 2 nodes.
Both the nodes reboot automatically or hang after some time.
Can you please tell the cause and how to troubleshoot it.
The memory assigned to both the nodes in VM is 1300 MB each.So first I should point out that this is not an officially supported configuration which means there may be any number of issues that exist with this configuration. Having said that, I know that some people have made use of similar sorts of configurations.
To get a root cause, you need to look at the message logs (/var/adm/messages) for both nodes. See if there is anything to do with either loss of quorum or heartbeat tick timeouts. Both of those can lead to node panics. Once you have that information, it will be easier to search for a potential resolution.
Tim
--- -
After reboot cluster node went into maintanance mode (CONTROL-D)
Hi there!
I have configured 2 node cluster on 2 x SUN Enterprise 220R and StoreEdge D1000.
Each time when rebooted any of the cluster nodes i get the following error during boot up:
The / file system (/dev/rdsk/c0t1d0s0) is being checked.
/dev/rdsk/c0t1d0s0: UNREF DIR I=35540 OWNER=root MODE=40755
/dev/rdsk/c0t1d0s0: SIZE=512 MTIME=Jun 5 15:02 2006 (CLEARED)
/dev/rdsk/c0t1d0s0: UNREF FILE I=1192311 OWNER=root MODE=100600
/dev/rdsk/c0t1d0s0: SIZE=96 MTIME=Jun 5 13:23 2006 (RECONNECTED)
/dev/rdsk/c0t1d0s0: LINK COUNT FILE I=1192311 OWNER=root MODE=100600
/dev/rdsk/c0t1d0s0: SIZE=96 MTIME=Jun 5 13:23 2006 COUNT 0 SHOULD BE 1
/dev/rdsk/c0t1d0s0: LINK COUNT INCREASING
/dev/rdsk/c0t1d0s0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
In maintanance mode i do:
# fsck -y -F ufs /dev/rdsk/c0t1d0s0
and it managed to correct the problem ... but problem occured again after each reboot on each cluster node!
I have installed Sun CLuster 3.1 on Solaris 9 SPARC
How can i get rid of it?
Any ideas?
Brgds,
SergejHi i get this:
112941-09 SunOS 5.9: sysidnet Utility Patch
116755-01 SunOS 5.9: usr/snadm/lib/libadmutil.so.2 Patch
113434-30 SunOS 5.9: /usr/snadm/lib Library and Differential Flash Patch
112951-13 SunOS 5.9: patchadd and patchrm Patch
114711-03 SunOS 5.9: usr/sadm/lib/diskmgr/VDiskMgr.jar Patch
118064-04 SunOS 5.9: Admin Install Project Manager Client Patch
113742-01 SunOS 5.9: smcpreconfig.sh Patch
113813-02 SunOS 5.9: Gnome Integration Patch
114501-01 SunOS 5.9: drmproviders.jar Patch
112943-09 SunOS 5.9: Volume Management Patch
113799-01 SunOS 5.9: solregis Patch
115697-02 SunOS 5.9: mtmalloc lib Patch
113029-06 SunOS 5.9: libaio.so.1 librt.so.1 and abi_libaio.so.1 Patch
113981-04 SunOS 5.9: devfsadm Patch
116478-01 SunOS 5.9: usr platform links Patch
112960-37 SunOS 5.9: patch libsldap ldap_cachemgr libldap
113332-07 SunOS 5.9: libc_psr.so.1 Patch
116500-01 SunOS 5.9: SVM auto-take disksets Patch
114349-04 SunOS 5.9: sbin/dhcpagent Patch
120441-03 SunOS 5.9: libsec patch
114344-19 SunOS 5.9: kernel/drv/arp Patch
114373-01 SunOS 5.9: UMEM - abi_libumem.so.1 patch
118558-27 SunOS 5.9: Kernel Patch
115675-01 SunOS 5.9: /usr/lib/liblgrp.so Patch
112958-04 SunOS 5.9: patch pci.so
113451-11 SunOS 5.9: IKE Patch
112920-02 SunOS 5.9: libipp Patch
114372-01 SunOS 5.9: UMEM - llib-lumem patch
116229-01 SunOS 5.9: libgen Patch
116178-01 SunOS 5.9: libcrypt Patch
117453-01 SunOS 5.9: libwrap Patch
114131-03 SunOS 5.9: multi-terabyte disk support - libadm.so.1 patch
118465-02 SunOS 5.9: rcm_daemon Patch
113490-04 SunOS 5.9: Audio Device Driver Patch
114926-02 SunOS 5.9: kernel/drv/audiocs Patch
113318-25 SunOS 5.9: patch /kernel/fs/nfs and /kernel/fs/sparcv9/nfs
113070-01 SunOS 5.9: ftp patch
114734-01 SunOS 5.9: /usr/ccs/bin/lorder Patch
114227-01 SunOS 5.9: yacc Patch
116546-07 SunOS 5.9: CDRW DVD-RW DVD+RW Patch
119494-01 SunOS 5.9: mkisofs patch
113471-09 SunOS 5.9: truss Patch
114718-05 SunOS 5.9: usr/kernel/fs/pcfs Patch
115545-01 SunOS 5.9: nss_files patch
115544-02 SunOS 5.9: nss_compat patch
118463-01 SunOS 5.9: du Patch
116016-03 SunOS 5.9: /usr/sbin/logadm patch
115542-02 SunOS 5.9: nss_user patch
116014-06 SunOS 5.9: /usr/sbin/usermod patch
116012-02 SunOS 5.9: ps utility patch
117433-02 SunOS 5.9: FSS FX RT Patch
117431-01 SunOS 5.9: nss_nis Patch
115537-01 SunOS 5.9: /kernel/strmod/ptem patch
115336-03 SunOS 5.9: /usr/bin/tar, /usr/sbin/static/tar Patch
117426-03 SunOS 5.9: ctsmc and sc_nct driver patch
121319-01 SunOS 5.9: devfsadmd_mod.so Patch
121316-01 SunOS 5.9: /kernel/sys/doorfs Patch
121314-01 SunOS 5.9: tl driver patch
116554-01 SunOS 5.9: semsys Patch
112968-01 SunOS 5.9: patch /usr/bin/renice
116552-01 SunOS 5.9: su Patch
120445-01 SunOS 5.9: Toshiba platform token links (TSBW,Ultra-3i)
112964-15 SunOS 5.9: /usr/bin/ksh Patch
112839-08 SunOS 5.9: patch libthread.so.1
115687-02 SunOS 5.9:/var/sadm/install/admin/default Patch
115685-01 SunOS 5.9: sbin/netstrategy Patch
115488-01 SunOS 5.9: patch /kernel/misc/busra
115681-01 SunOS 5.9: usr/lib/fm/libdiagcode.so.1 Patch
113032-03 SunOS 5.9: /usr/sbin/init Patch
113031-03 SunOS 5.9: /usr/bin/edit Patch
114259-02 SunOS 5.9: usr/sbin/psrinfo Patch
115878-01 SunOS 5.9: /usr/bin/logger Patch
116543-04 SunOS 5.9: vmstat Patch
113580-01 SunOS 5.9: mount Patch
115671-01 SunOS 5.9: mntinfo Patch
113977-01 SunOS 5.9: awk/sed pkgscripts Patch
122716-01 SunOS 5.9: kernel/fs/lofs patch
113973-01 SunOS 5.9: adb Patch
122713-01 SunOS 5.9: expr patch
117168-02 SunOS 5.9: mpstat Patch
116498-02 SunOS 5.9: bufmod Patch
113576-01 SunOS 5.9: /usr/bin/dd Patch
116495-03 SunOS 5.9: specfs Patch
117160-01 SunOS 5.9: /kernel/misc/krtld patch
118586-01 SunOS 5.9: cp/mv/ln Patch
120025-01 SunOS 5.9: ipsecconf Patch
116527-02 SunOS 5.9: timod Patch
117155-08 SunOS 5.9: pcipsy Patch
114235-01 SunOS 5.9: libsendfile.so.1 Patch
117152-01 SunOS 5.9: magic Patch
116486-03 SunOS 5.9: tsalarm Driver Patch
121998-01 SunOS 5.9: two-key mode fix for 3DES Patch
116484-01 SunOS 5.9: consconfig Patch
116482-02 SunOS 5.9: modload Utils Patch
117746-04 SunOS 5.9: patch platform/sun4u/kernel/drv/sparcv9/pic16f819
121992-01 SunOS 5.9: fgrep Patch
120768-01 SunOS 5.9: grpck patch
119438-01 SunOS 5.9: usr/bin/login Patch
114389-03 SunOS 5.9: devinfo Patch
116510-01 SunOS 5.9: wscons Patch
114224-05 SunOS 5.9: csh Patch
116670-04 SunOS 5.9: gld Patch
114383-03 SunOS 5.9: Enchilada/Stiletto - pca9556 driver
116506-02 SunOS 5.9: traceroute patch
112919-01 SunOS 5.9: netstat Patch
112918-01 SunOS 5.9: route Patch
112917-01 SunOS 5.9: ifrt Patch
117132-01 SunOS 5.9: cachefsstat Patch
114370-04 SunOS 5.9: libumem.so.1 patch
114010-02 SunOS 5.9: m4 Patch
117129-01 SunOS 5.9: adb Patch
117483-01 SunOS 5.9: ntwdt Patch
114369-01 SunOS 5.9: prtvtoc patch
117125-02 SunOS 5.9: procfs Patch
117480-01 SunOS 5.9: pkgadd Patch
112905-02 SunOS 5.9: ippctl Patch
117123-06 SunOS 5.9: wanboot Patch
115030-03 SunOS 5.9: Multiterabyte UFS - patch mount
114004-01 SunOS 5.9: sed Patch
113335-03 SunOS 5.9: devinfo Patch
113495-05 SunOS 5.9: cfgadm Library Patch
113494-01 SunOS 5.9: iostat Patch
113493-03 SunOS 5.9: libproc.so.1 Patch
113330-01 SunOS 5.9: rpcbind Patch
115028-02 SunOS 5.9: patch /usr/lib/fs/ufs/df
115024-01 SunOS 5.9: file system identification utilities
117471-02 SunOS 5.9: fifofs Patch
118897-01 SunOS 5.9: stc Patch
115022-03 SunOS 5.9: quota utilities
115020-01 SunOS 5.9: patch /usr/lib/adb/ml_odunit
113720-01 SunOS 5.9: rootnex Patch
114352-03 SunOS 5.9: /etc/inet/inetd.conf Patch
123056-01 SunOS 5.9: ldterm patch
116243-01 SunOS 5.9: umountall Patch
113323-01 SunOS 5.9: patch /usr/sbin/passmgmt
116049-01 SunOS 5.9: fdfs Patch
116241-01 SunOS 5.9: keysock Patch
113480-02 SunOS 5.9: usr/lib/security/pam_unix.so.1 Patch
115018-01 SunOS 5.9: patch /usr/lib/adb/dqblk
113277-44 SunOS 5.9: sd and ssd Patch
117457-01 SunOS 5.9: elfexec Patch
113110-01 SunOS 5.9: touch Patch
113077-17 SunOS 5.9: /platform/sun4u/kernal/drv/su Patch
115006-01 SunOS 5.9: kernel/strmod/kb patch
113072-07 SunOS 5.9: patch /usr/sbin/format
113071-01 SunOS 5.9: patch /usr/sbin/acctadm
116782-01 SunOS 5.9: tun Patch
114331-01 SunOS 5.9: power Patch
112835-01 SunOS 5.9: patch /usr/sbin/clinfo
114927-01 SunOS 5.9: usr/sbin/allocate Patch
119937-02 SunOS 5.9: inetboot patch
113467-01 SunOS 5.9: seg_drv & seg_mapdev Patch
114923-01 SunOS 5.9: /usr/kernel/drv/logindmux Patch
117443-01 SunOS 5.9: libkvm Patch
114329-01 SunOS 5.9: /usr/bin/pax Patch
119929-01 SunOS 5.9: /usr/bin/xargs patch
113459-04 SunOS 5.9: udp patch
113446-03 SunOS 5.9: dman Patch
116009-05 SunOS 5.9: sgcn & sgsbbc patch
116557-04 SunOS 5.9: sbd Patch
120241-01 SunOS 5.9: bge: Link & Speed LEDs flash constantly on V20z
113984-01 SunOS 5.9: iosram Patch
113220-01 SunOS 5.9: patch /platform/sun4u/kernel/drv/sparcv9/upa64s
113975-01 SunOS 5.9: ssm Patch
117165-01 SunOS 5.9: pmubus Patch
116530-01 SunOS 5.9: bge.conf Patch
116529-01 SunOS 5.9: smbus Patch
116488-03 SunOS 5.9: Lights Out Management (lom) patch
117131-01 SunOS 5.9: adm1031 Patch
117124-12 SunOS 5.9: platmod, drmach, dr, ngdr, & gptwocfg Patch
114003-01 SunOS 5.9: bbc driver Patch
118539-02 SunOS 5.9: schpc Patch
112837-10 SunOS 5.9: patch /usr/lib/inet/in.dhcpd
114975-01 SunOS 5.9: usr/lib/inet/dhcp/svcadm/dhcpcommon.jar Patch
117450-01 SunOS 5.9: ds_SUNWnisplus Patch
113076-02 SunOS 5.9: dhcpmgr.jar Patch
113572-01 SunOS 5.9: docbook-to-man.ts Patch
118472-01 SunOS 5.9: pargs Patch
122709-01 SunOS 5.9: /usr/bin/dc patch
113075-01 SunOS 5.9: pmap patch
113472-01 SunOS 5.9: madv & mpss lib Patch
115986-02 SunOS 5.9: ptree Patch
115693-01 SunOS 5.9: /usr/bin/last Patch
115259-03 SunOS 5.9: patch usr/lib/acct/acctcms
114564-09 SunOS 5.9: /usr/sbin/in.ftpd Patch
117441-01 SunOS 5.9: FSSdispadmin Patch
113046-01 SunOS 5.9: fcp Patch
118191-01 gtar patch
114818-06 GNOME 2.0.0: libpng Patch
117177-02 SunOS 5.9: lib/gss module Patch
116340-05 SunOS 5.9: gzip and Freeware info files patch
114339-01 SunOS 5.9: wrsm header files Patch
122673-01 SunOS 5.9: sockio.h header patch
116474-03 SunOS 5.9: libsmedia Patch
117138-01 SunOS 5.9: seg_spt.h
112838-11 SunOS 5.9: pcicfg Patch
117127-02 SunOS 5.9: header Patch
112929-01 SunOS 5.9: RIPv2 Header Patch
112927-01 SunOS 5.9: IPQos Header Patch
115992-01 SunOS 5.9: /usr/include/limits.h Patch
112924-01 SunOS 5.9: kdestroy kinit klist kpasswd Patch
116231-03 SunOS 5.9: llc2 Patch
116776-01 SunOS 5.9: mipagent patch
117420-02 SunOS 5.9: mdb Patch
117179-01 SunOS 5.9: nfs_dlboot Patch
121194-01 SunOS 5.9: usr/lib/nfs/statd Patch
116502-03 SunOS 5.9: mountd Patch
113331-01 SunOS 5.9: usr/lib/nfs/rquotad Patch
113281-01 SunOS 5.9: patch /usr/lib/netsvc/yp/ypbind
114736-01 SunOS 5.9: usr/sbin/nisrestore Patch
115695-01 SunOS 5.9: /usr/lib/netsvc/yp/yppush Patch
113321-06 SunOS 5.9: patch sf and socal
113049-01 SunOS 5.9: luxadm & liba5k.so.2 Patch
116663-01 SunOS 5.9: ntpdate Patch
117143-01 SunOS 5.9: xntpd Patch
113028-01 SunOS 5.9: patch /kernel/ipp/flowacct
113320-06 SunOS 5.9: patch se driver
114731-08 SunOS 5.9: kernel/drv/glm Patch
115667-03 SunOS 5.9: Chalupa platform support Patch
117428-01 SunOS 5.9: picl Patch
113327-03 SunOS 5.9: pppd Patch
114374-01 SunOS 5.9: Perl patch
115173-01 SunOS 5.9: /usr/bin/sparcv7/gcore /usr/bin/sparcv9/gcore Patch
114716-02 SunOS 5.9: usr/bin/rcp Patch
112915-04 SunOS 5.9: snoop Patch
116778-01 SunOS 5.9: in.ripngd patch
112916-01 SunOS 5.9: rtquery Patch
112928-03 SunOS 5.9: in.ndpd Patch
119447-01 SunOS 5.9: ses Patch
115354-01 SunOS 5.9: slpd Patch
116493-01 SunOS 5.9: ProtocolTO.java Patch
116780-02 SunOS 5.9: scmi2c Patch
112972-17 SunOS 5.9: patch /usr/lib/libssagent.so.1 /usr/lib/libssasnmp.so.1 mibiisa
116480-01 SunOS 5.9: IEEE 1394 Patch
122485-01 SunOS 5.9: 1394 mass storage driver patch
113716-02 SunOS 5.9: sar & sadc Patch
115651-02 SunOS 5.9: usr/lib/acct/runacct Patch
116490-01 SunOS 5.9: acctdusg Patch
117473-01 SunOS 5.9: fwtmp Patch
116180-01 SunOS 5.9: geniconvtbl Patch
114006-01 SunOS 5.9: tftp Patch
115646-01 SunOS 5.9: libtnfprobe shared library Patch
113334-03 SunOS 5.9: udfs Patch
115350-01 SunOS 5.9: ident_udfs.so.1 Patch
122484-01 SunOS 5.9: preen_md.so.1 patch
117134-01 SunOS 5.9: svm flasharchive patch
116472-02 SunOS 5.9: rmformat Patch
112966-05 SunOS 5.9: patch /usr/sbin/vold
114229-01 SunOS 5.9: action_filemgr.so.1 Patch
114335-02 SunOS 5.9: usr/sbin/rmmount Patch
120443-01 SunOS 5.9: sed core dumps on long lines
121588-01 SunOS 5.9: /usr/xpg4/bin/awk Patch
113470-02 SunOS 5.9: winlock Patch
119211-07 NSS_NSPR_JSS 3.11: NSPR 4.6.1 / NSS 3.11 / JSS 4.2
118666-05 J2SE 5.0: update 6 patch
118667-05 J2SE 5.0: update 6 patch, 64bit
114612-01 SunOS 5.9: ANSI-1251 encodings file errors
114276-02 SunOS 5.9: Extended Arabic support in UTF-8
117400-01 SunOS 5.9: ISO8859-6 and ISO8859-8 iconv symlinks
113584-16 SunOS 5.9: yesstr, nostr nl_langinfo() strings incorrect in S9
117256-01 SunOS 5.9: Remove old OW Xresources.ow files
112625-01 SunOS 5.9: Dcam1394 patch
114600-05 SunOS 5.9: vlan driver patch
117119-05 SunOS 5.9: Sun Gigabit Ethernet 3.0 driver patch
117593-04 SunOS 5.9: Manual Page updates for Solaris 9
112622-19 SunOS 5.9: M64 Graphics Patch
115953-06 Sun Cluster 3.1: Sun Cluster sccheck patch
117949-23 Sun Cluster 3.1: Core Patch for Solaris 9
115081-06 Sun Cluster 3.1: HA-Sun One Web Server Patch
118627-08 Sun Cluster 3.1: Manageability and Serviceability Agent
117985-03 SunOS 5.9: XIL 1.4.2 Loadable Pipeline Libraries
113896-06 SunOS 5.9: en_US.UTF-8 locale patch
114967-02 SunOS 5.9: FDL patch
114677-11 SunOS 5.9: International Components for Unicode Patch
112805-01 CDE 1.5: Help volume patch
113841-01 CDE 1.5: answerbook patch
113839-01 CDE 1.5: sdtwsinfo patch
115713-01 CDE 1.5: dtfile patch
112806-01 CDE 1.5: sdtaudiocontrol patch
112804-02 CDE 1.5: sdtname patch
113244-09 CDE 1.5: dtwm patch
114312-02 CDE1.5: GNOME/CDE Menu for Solaris 9
112809-02 CDE:1.5 Media Player (sdtjmplay) patch
113868-02 CDE 1.5: PDASync patch
119976-01 CDE 1.5: dtterm patch
112771-30 Motif 1.2.7 and 2.1.1: Runtime library patch for Solaris 9
114282-01 CDE 1.5: libDtWidget patch
113789-01 CDE 1.5: dtexec patch
117728-01 CDE1.5: dthello patch
113863-01 CDE 1.5: dtconfig patch
112812-01 CDE 1.5: dtlp patch
113861-04 CDE 1.5: dtksh patch
115972-03 CDE 1.5: dtterm libDtTerm patch
114654-02 CDE 1.5: SmartCard patch
117632-01 CDE1.5: sun_at patch for Solaris 9
113374-02 X11 6.6.1: xpr patch
118759-01 X11 6.6.1: Font Administration Tools patch
117577-03 X11 6.6.1: TrueType fonts patch
116084-01 X11 6.6.1: font patch
113098-04 X11 6.6.1: X RENDER extension patch
112787-01 X11 6.6.1: twm patch
117601-01 X11 6.6.1: libowconfig.so.0 patch
117663-02 X11 6.6.1: xwd patch
113764-04 X11 6.6.1: keyboard patch
113541-02 X11 6.6.1: XKB patch
114561-01 X11 6.6.1: X splash screen patch
113513-02 X11 6.6.1: platform support for new hardware
116121-01 X11 6.4.1: platform support for new hardware
114602-04 X11 6.6.1: libmpg_psr patch
Is there a bundle to install or i have to install each patch separatly_?
Maybe you are looking for
-
RWB - No adapter registered for this channel (JDBC)
Hi Experts, All my JDBC communication channels have the following log in RWB: No adapter registered for this channel When executing an interface, SXMB_MONI shows: Object not found in lookup of SapXIJDBCAdapterService We have aready performed a CPACac
-
Hi! please help me.... i have installed SAP NW 2004s ABAP Trial SP11 completely without any error,, but my Dispatcher(disp+work.exe) is not starting ,, only Message server is starting initially it gives me a error message ' Running but message server
-
How to refund and change an area code
Dear Sir, I have signed up to an online number 2 days ago however I now discover I need to change it to another area code in the UK. I can see from other threads that you need to cancel the old number and set up a new one. How do you do this ? And wh
-
Can we execute multiple tasks using emcli
Hi, I know we can execute any task on target using emcli but just wondering if we can execute multiple command in one emcli execution, also does any one has any idea about how to pass the target variables in emcli command. Thanks, Ritu
-
Hello, Don't know if I'm in the right place but I just got my GE70 20E and I want to install windows 7 on it, I have a authentic copy. And I just don't want to make any mistakes. Could someone give me a guide on how I downgrade to win7 and have it s