Question on Rebooting RAC Nodes

Hi, I heard that when rebooting all RAC nodes, one has to wait at least 5 minutes between each node reboot. So you would reboot node 1 at 0 time, node 2 5 minutes later etc.
However, I could not find any documentation on this, can someone please point me to the right place to look? Thanks.

I have not heard that before. I generally use srvctl to stop/start the databases, I do not think it waits 5 minutes between starting nodes as it does not take 15 minutes to stop/start the databases. As far as rebooting the hosts, the only time interval between machines was the time it took to send the reboot command to the host.

Similar Messages

  • Oracle RAC Nodes getting reboot in case of preferred controller failed

    When we are disconnecting both Fiber cable from preferred Controller A or plugging out Controller A card from Disk Array(IBM DS 4300), After 90 seconds both the servers are rebooting.
    In this time complete RAC network is going out of service for approx 5 minutes.After reboot both servers are coming with both instances without any manual intervention
    It’s a critical issue for us because we are loosing High Availability, Let us know how we can resolve this critical issue.
    Detail of Network:
    1. Software- Oracle 10g Release2
    2. OS- Redhat Linux 3 (Kernel Version-2.4.21-27.ELsmp)
    3. Shared Storage- IBM DS 4300.
    4. Multipathing Driver - RDAC (rdac-LINUX-09.00 A5.13)
    4. Nodes- IBM 346
    5. Databse on ASM
    6. ASM,OCR & Voting Disk Preferred controller is A.
    7. Hangcheck timer value is 210 seconds.
    8. Both Server available with 2 HBA port . I HBA port is connected with Controller A and Seconfd HBA port is connected with Controller B of SAN Disk Array.
    As per my understanding,
    Voting disk resides in Disk Array and Controller A is preferred owner of Voting Disk LUN.. When i am disconnecting both fiber cable from preferred controller A , then Both Nodes Clusterware software trying to contact with Voting Disk, When they are unable to contact with Voting disk in specfic time period, they are going for reboot.
    I tested Controller failure testing with Oracle RAC software as well without Oracle. Without Oracle its working fine and reason behind, in that time Disk Array is waiting for approx 300 seconds for changing preferred controlller from A to B.
    But With Oracle, Clusterware Software reboot both nodes before Controller can shift from A to B.
    So if i conclude,the tech who has good understanding of Oracle Clusterware on Linux OS & IBM RDAC multipath driver can help me.
    when we install Oracle RAC on Linux, it is required to configure hangcheck timer.
    Oracle recomends 180 second.
    It means if one of node is hanging, then second node will wait for 180 seconds, if within 180 seconds ,it is not able to resolve this situation then it will reboot hung node.
    I think Hangcheck timer configuration reuired only with Linux OS.
    Configuration File
    cat >> /etc/rc.d/rc.local << EOF
    modprobe hangcheck-timer hangcheck_tick=15 hangcheck_margin=60

    Sorry
    Hangcheck timer is
    Configuration File
    cat >> /etc/rc.d/rc.local << EOF
    modprobe hangcheck-timer hangcheck_tick=30 hangcheck_margin=180

  • RAC node rebooting frequently

    Hi all,
    I am woserking on two node rac environment.One of my rac node is rebooting so frequently.I am using oracle 10g database and clusterware also(10.2.0.1).
    Ihave checked os logs(linux AS 4),and rac related logs.Not able to find out anything.Posting all logs please suggest.

    Hi i am posting alert log,os log and ocssd logs....
    clusterware alert log....._
    [crsd(5649)]CRS-1201:CRSD started on node ctmisdb1.
    2012-03-21 09:50:38.188
    [cssd(7490)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 .
    2012-03-21 09:50:46.726
    [crsd(5649)]CRS-1204:Recovering CRS resources for node ctmisdb2.
    2012-03-21 09:55:21.760
    [cssd(7490)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
    2012-03-21 12:07:46.681
    [cssd(7426)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
    2012-03-21 12:07:50.432
    [cssd(7426)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
    2012-03-21 12:07:50.893
    [crsd(5549)]CRS-1012:The OCR service started on node ctmisdb1.
    2012-03-21 12:07:50.942
    [evmd(7304)]CRS-1401:EVMD started on node ctmisdb1.
    2012-03-21 12:07:52.827
    [crsd(5549)]CRS-1201:CRSD started on node ctmisdb1.
    2012-03-21 12:48:41.908
    [cssd(7448)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
    2012-03-21 12:48:45.741
    [cssd(7448)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
    2012-03-21 12:48:49.173
    [crsd(5546)]CRS-1012:The OCR service started on node ctmisdb1.
    2012-03-21 12:48:49.190
    [evmd(7328)]CRS-1401:EVMD started on node ctmisdb1.
    2012-03-21 12:48:50.818
    [crsd(5546)]CRS-1201:CRSD started on node ctmisdb1.
    2012-03-21 13:26:36.398
    [cssd(7343)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /u01/app/oracle/product/crs/log/ctmisdb1/cssd/ocssd.log.
    2012-03-21 13:26:40.492
    [cssd(7343)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ctmisdb1 ctmisdb2 .
    2012-03-21 13:26:40.939
    [crsd(5542)]CRS-1012:The OCR service started on node ctmisdb1.
    2012-03-21 13:26:40.977
    [evmd(7223)]CRS-1401:EVMD started on node ctmisdb1.
    2012-03-21 13:26:42.772
    [crsd(5542)]CRS-1201:CRSD started on node ctmisdb1.
    node os log....+
    Mar 21 12:06:35 ctmisdb1 rc: Starting readahead: succeeded
    Mar 21 12:06:35 ctmisdb1 messagebus: messagebus startup succeeded
    Mar 21 12:06:36 ctmisdb1 cups-config-daemon: cups-config-daemon startup succeeded
    Mar 21 12:06:36 ctmisdb1 haldaemon: haldaemon startup succeeded
    Mar 21 12:06:37 ctmisdb1 fstab-sync[6267]: removed all generated mount points
    Mar 21 12:06:37 ctmisdb1 fstab-sync[6378]: added mount point /media/cdrecorder for /dev/hde
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6323]: session opened for user oracle by (uid=0)
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6324]: session opened for user oracle by (uid=0)
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6229]: session opened for user oracle by (uid=0)
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6229]: session closed for user oracle
    Mar 21 12:06:37 ctmisdb1 su(pam_unix)[6644]: session opened for user oracle by (uid=0)
    Mar 21 12:06:37 ctmisdb1 kernel: matroxfb: cannot set xres to 800, rounded up to 832
    Mar 21 12:06:37 ctmisdb1 last message repeated 2 times
    Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6323]: session closed for user oracle
    Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6644]: session closed for user oracle
    Mar 21 12:06:41 ctmisdb1 su(pam_unix)[6324]: session closed for user oracle
    Mar 21 12:06:41 ctmisdb1 logger: Cluster Ready Services completed waiting on dependencies.
    Mar 21 12:06:41 ctmisdb1 last message repeated 2 times
    Mar 21 12:06:45 ctmisdb1 gdm(pam_unix)[6379]: session opened for user root by (uid=0)
    Mar 21 12:06:46 ctmisdb1 gconfd (root-7052): starting (version 2.8.1), pid 7052 user 'root'
    Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
    Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 1
    Mar 21 12:06:47 ctmisdb1 gconfd (root-7052): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
    Mar 21 12:06:55 ctmisdb1 gconfd (root-7052): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 0
    Mar 21 12:07:41 ctmisdb1 su(pam_unix)[5547]: session opened for user oracle by (uid=0)
    Mar 21 12:07:41 ctmisdb1 logger: Running CRSD with TZ =
    Mar 21 12:07:43 ctmisdb1 su(pam_unix)[7399]: session opened for user oracle by (uid=0)
    Mar 21 12:12:49 ctmisdb1 sshd(pam_unix)[15323]: session opened for user root by root(uid=0)
    Mar 21 12:12:57 ctmisdb1 su(pam_unix)[15531]: session opened for user oracle by root(uid=0)
    Mar 21 12:47:05 ctmisdb1 syslogd 1.4.1: restart.
    ocssd log....
    [    CSSD]2012-03-21 11:24:41.045 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661f0c0) proc(0x8006622560) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 11:24:41.078 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660cfe0) proc(0x800662ba70) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:07:44.564 >USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
    [  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=ctmisdb1DBG_CSSD))
    [    CSSD]2012-03-21 12:07:44.564 >USER: CSS daemon log for node ctmisdb1, number 1, in cluster crs
    [    CSSD]2012-03-21 12:07:44.581 [28260544] >TRACE: clssscmain: local-only set to false
    [    CSSD]2012-03-21 12:07:44.603 [28260544] >TRACE: clssnmReadNodeInfo: added node 1 (ctmisdb1) to cluster
    [    CSSD]2012-03-21 12:07:44.621 [28260544] >TRACE: clssnmReadNodeInfo: added node 2 (ctmisdb2) to cluster
    [    CSSD]2012-03-21 12:07:44.627 [72925824] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
    [    CSSD]2012-03-21 12:07:44.627 [28260544] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
    [    CSSD]2012-03-21 12:07:44.641 [28260544] >TRACE: clssnmInitNMInfo: misscount set to 60
    [    CSSD]2012-03-21 12:07:44.655 [28260544] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2)
    [    CSSD]2012-03-21 12:07:46.661 [72925824] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2)
    [    CSSD]2012-03-21 12:07:46.690 [72925824] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(18) wrtcnt(7920) LATS(0) Disk lastSeqNo(7920)
    [    CSSD]2012-03-21 12:07:46.752 [28260544] >TRACE: clssnmFatalInit: fatal mode enabled
    [    CSSD]2012-03-21 12:07:46.752 [94777984] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
    [    CSSD]2012-03-21 12:07:46.753 [94777984] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
    [    CSSD]2012-03-21 12:07:46.753 [94777984] >TRACE: clssnmClusterListener: Probing node(2)
    [    CSSD]2012-03-21 12:07:46.755 [94777984] >TRACE: clssnmConnComplete: connected to node 2 (con 0x8006601040), state 3 birth 0, unique 1332303918/1332303918 prevConuni(0)
    [    CSSD]2012-03-21 12:07:46.756 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
    [    CSSD]2012-03-21 12:07:46.756 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ctmisdb1_crs))
    [    CSSD]2012-03-21 12:07:46.757 [151810688] >TRACE: clssnmPollingThread: Connection complete
    [    CSSD]2012-03-21 12:07:46.757 [162296448] >TRACE: clssnmSendingThread: Connection complete
    [    CSSD]2012-03-21 12:07:46.757 [172782208] >TRACE: clssnmRcfgMgrThread: Connection complete
    [    CSSD]2012-03-21 12:07:46.757 [172782208] >TRACE: clssnmRcfgMgrThread: Local Join
    [    CSSD]2012-03-21 12:07:46.757 [172782208] >WARNING: clssnmLocalJoinEvent: takeover aborted due to connected but inactive nodes
    [    CSSD]2012-03-21 12:07:47.339 [94777984] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[ctmisdb2] seq[5] sync[18]
    [    CSSD]2012-03-21 12:07:47.759 [172782208] >TRACE: clssnmRcfgMgrThread: lastleader(2) unique(1332311864)
    [    CSSD]2012-03-21 12:07:48.341 [94777984] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(18)
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmDeactivateNode: node 0 () left cluster
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1332311864/1332311864) prevConuni(0) birth (0/18) (old/new)
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1332303918/1332303918) prevConuni(0) birth (0/16) (old/new)
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: SYNC(18) from node(2) completed
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: NODE 1 (ctmisdb1) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2012-03-21 12:07:50.346 [94777984] >USER: clssnmHandleUpdate: NODE 2 (ctmisdb2) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2012-03-21 12:07:50.429 [28260544] >USER: NMEVENT_SUSPEND [00][00][00][00]
    [    CSSD]2012-03-21 12:07:50.429 [183267968] >TRACE: clssgmReconfigThread: started for reconfig (18)
    [    CSSD]2012-03-21 12:07:50.429 [183267968] >USER: NMEVENT_RECONFIG [00][00][00][06]
    [    CSSD]2012-03-21 12:07:50.429 [183267968] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 18
    [    CSSD]2012-03-21 12:07:50.430 [140255872] >TRACE: clssgmInitialRecv: (0x102a0360) accepted a new connection from node 2 born at 16 active (2, 2), vers (10,3,1,2)
    [    CSSD]2012-03-21 12:07:50.430 [140255872] >TRACE: clssgmInitialRecv: conns done (2/2)
    [    CSSD]2012-03-21 12:07:50.430 [183267968] >TRACE: clssgmEstablishMasterNode: MASTER for 18 is node(2) birth(16)
    [    CSSD]2012-03-21 12:07:50.430 [183267968] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
    [    CSSD]2012-03-21 12:07:50.432 [140255872] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(72) incarn 18
    [    CSSD]CLSS-3000: reconfiguration successful, incarnation 18 with 2 nodes
    [    CSSD]CLSS-3001: local node number 1, master node number 2
    [    CSSD]2012-03-21 12:07:50.433 [183267968] >TRACE: clssgmReconfigThread: completed for reconfig(18), with status(1)
    [    CSSD]2012-03-21 12:07:50.550 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006603bb0) proc(0x8006608b00) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:07:50.551 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066066f0) proc(0x8006608d70) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:07:53.569 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660ec70) proc(0x8006611260) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:00.829 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006610990) proc(0x800660de00) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:04.698 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006613030) proc(0x8006612930) pid(8115) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:04.816 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8115) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:04.832 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8115) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:06.615 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006612950) proc(0x8006613c20) pid(8171) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:07.114 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006615960) proc(0x8006616350) pid(8175) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:11.373 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066192a0) proc(0x8006619470) pid(8302) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:11.669 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661ee20) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:17.135 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661ee70) pid(8458) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:17.268 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661fc00) proc(0x80066220d0) pid(8460) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:17.305 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x80066223e0) proc(0x8006625250) pid(8462) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:17.353 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006625560) proc(0x8006628430) pid(8464) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:24.585 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006625560) proc(0x8006628430) pid(8645) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:27.957 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006628740) proc(0x800662b610) pid(8722) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:30.931 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662cce0) proc(0x800662c860) pid(8801) proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:36.400 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661c5f0) proc(0x800661eb50) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:37.863 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662f1c0) proc(0x800661eee0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:38.537 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800662f1c0) proc(0x800661d500) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:39.232 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800661bf60) proc(0x800661d500) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:43.085 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006611210) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:08:58.971 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x80066112c0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:09:59.290 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:10:59.589 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:11:59.904 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:13:00.203 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:13:14.029 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800660b190) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:14:00.501 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006611210) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:15:00.809 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:16:01.117 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:17:01.447 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:01.762 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:39.841 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:42.123 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:42.316 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:42.843 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:42.963 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:43.098 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800662bd20) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:44.173 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:44.368 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:45.351 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006628670) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:46.236 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:47.031 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:47.694 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:47.819 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:48.103 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:48.327 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b260) proc(0x800660b310) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:48.484 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x8006611210) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:48.758 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:49.529 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:50.509 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:51.060 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x800660b830) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:18:51.558 [106332800] >TRACE: clssgmClientConnectMsg: Connect from con(0x8006611630) proc(0x800662f0f0) pid() proto(10:2:1:1)
    [    CSSD]2012-03-21 12:48:39.836 >USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
    [  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=ctmisdb1DBG_CSSD))
    [    CSSD]2012-03-21 12:48:39.836 >USER: CSS daemon log for node ctmisdb1, number 1, in cluster crs
    [    CSSD]2012-03-21 12:48:39.849 [28260544] >TRACE: clssscmain: local-only set to false
    [    CSSD]2012-03-21 12:48:39.865 [28260544] >TRACE: clssnmReadNodeInfo: added node 1 (ctmisdb1) to cluster
    [    CSSD]2012-03-21 12:48:39.872 [28260544] >TRACE: clssnmReadNodeInfo: added node 2 (ctmisdb2) to cluster
    [    CSSD]2012-03-21 12:48:39.879 [72925824] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
    [    CSSD]2012-03-21 12:48:39.879 [28260544] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
    [    CSSD]2012-03-21 12:48:39.881 [28260544] >TRACE: clssnmInitNMInfo: misscount set to 60
    [    CSSD]2012-03-21 12:48:39.888 [28260544] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2)
    [    CSSD]2012-03-21 12:48:41.892 [72925824] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2)
    [    CSSD]2012-03-21 12:48:41.915 [72925824] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(20) wrtcnt(10367) LATS(0) Disk lastSeqNo(10367)
    [    CSSD]2012-03-21 12:48:41.959 [28260544] >TRACE: clssnmFatalInit: fatal mode enabled
    [    CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
    [    CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
    [    CSSD]2012-03-21 12:48:41.959 [94777984] >TRACE: clssnmClusterListener: Probing node(2)
    [    CSSD]2012-03-21 12:48:41.961 [94777984] >TRACE: clssnmConnComplete: connected to node 2 (con 0x8006702790), state 3 birth 0, unique 1332303918/1332303918 prevConuni(0)
    [    CSSD]2012-03-21 12:48:41.962 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
    [    CSSD]2012-03-21 12:48:41.962 [106332800] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ctmisdb1_crs))
    [    CSSD]2012-03-21 12:48:41.963 [152330880] >TRACE: clssnmPollingThread: Connection complete
    [    CSSD]2012-03-21 12:48:41.963 [162816640] >TRACE: clssnmSendingThread: Connection complete
    [    CSSD]2012-03-21 12:48:41.963 [173302400] >TRACE: clssnmRcfgMgrThread: Connection complete
    [    CSSD]2012-03-21 12:48:41.963 [173302400] >TRACE: clssnmRcfgMgrThread: Local Join
    [    CSSD]2012-03-21 12:48:41.963 [173302400] >WARNING: clssnmLocalJoinEvent: takeover aborted due to connected but inactive nodes
    [    CSSD]2012-03-21 12:48:42.631 [94777984] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[ctmisdb2] seq[13] sync[20]
    [    CSSD]2012-03-21 12:48:42.965 [173302400] >TRACE: clssnmRcfgMgrThread: lastleader(2) unique(1332314319)
    [    CSSD]2012-03-21 12:48:43.636 [94777984] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(20)
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmDeactivateNode: node 0 () left cluster
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1332314319/1332314319) prevConuni(0) birth (0/20) (old/new)
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1332303918/1332303918) prevConuni(0) birth (0/16) (old/new)
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: SYNC(20) from node(2) completed
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: NODE 1 (ctmisdb1) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2012-03-21 12:48:45.640 [94777984] >USER: clssnmHandleUpdate: NODE 2 (ctmisdb2) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2012-03-21 12:48:45.737 [28260544] >USER: NMEVENT_SUSPEND [00][00][00][00]
    [    CSSD]2012-03-21 12:48:45.738 [183788160] >TRACE: clssgmReconfigThread: started for reconfig (20)
    [    CSSD]2012-03-21 12:48:45.738 [183788160] >USER: NMEVENT_RECONFIG [00][00][00][06]
    [    CSSD]2012-03-21 12:48:45.738 [183788160] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 20
    [    CSSD]2012-03-21 12:48:45.739 [140776064] >TRACE: clssgmInitialRecv: (0x102a0370) accepted a new connection from node 2 born at 16 active (2, 2), vers (10,3,1,2)
    [    CSSD]2012-03-21 12:48:45.739 [140776064] >TRACE: clssgmInitialRecv: conns done (2/2)
    [    CSSD]2012-03-21 12:48:45.739 [183788160] >TRACE: clssgmEstablishMasterNode: MASTER for 20 is node(2) birth(16)
    [    CSSD]2012-03-21 12:48:45.739 [183788160] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
    [    CSSD]2012-03-21 12:48:45.741 [140776064] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(72) incarn 20
    [    CSSD]CLSS-3000: reconfiguration successful, incarnation 20 with 2 nodes
    Plz check and help..........

  • RAC nodes rebooting

    I'm newbie, and trying to implement 11g RAC using openfiler on E-linux 5.3
    I have so far successfully configured openfiler, created volumes and configured the nodes, configured ocfs2 and ASM.
    When I rebooted the machines, I first started the openfiler server and external storage they start fine and all volumes(devices) comes up fine, but when I boot the nodes one after the other, they are rebooting after couple of minutes continuously one after other , I am clue less, how to figure out what is the problem, why is this happening, has any one else experienced similar situatio? , how can this be resolved?
    I would appreciate any advise or help
    Thanks

    what is difference in timings on your rac nodes...any thing > 45 secs can possibly cause reboots.
    check you disktimeouts.. and hangcheck timer settings
    hth

  • If use MSSQ , when oracle rac node reboot, client get TPEOS error

    Hi, all
    in my tuxedo applicaton, if we use Single Server, Single Queue mode , when reboot any Oracle RAC node, our application is ok, client can get correct result. but if we use MSSQ(Multi Server, Single Queue) , if Oracle RAC node is ok , our application also is ok. but if we reboot any Oracle RAC node, client program can continue run, get correct result, but always get TPEOS error , for this situation, server can get client request, but client can not get server reply, only get TPEOS error.
    our enviroment is :
    oracle RAC ,10g 10.2.0.4 , two instances ,rac1 rac2, and two DTP services s1 and s2, set s1 and s2 services TAF is basic
    tuxedo 10R3 , two nodes ,work in MP model ,use XA access oracle rac database,services have Transaction and not Transaction
    OS is linux AS4 U5, 64bits
    service program use OCI
    can any one encounter this problem ?

    Hi, first thanks you
    in ULOG file , only have failover information, not any other error message, in client side also has no other error.
    not use MSSQ, ubb file about MSSQ config
    SERVERS
    DEFAULT:
    CLOPT="-A "
    sinUpdate_server SRVGRP=GROUP11 SRVID=80 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinUpdate_server SRVGRP=GROUP12 SRVID=160 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP11 SRVID=240 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP12 SRVID=320 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP11 SRVID=360 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP12 SRVID=400 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP11 SRVID=520 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP12 SRVID=560 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP11 SRVID=600 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP12 SRVID=640 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP11 SRVID=700 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP12 SRVID=740 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP11 SRVID=800 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP12 SRVID=840 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP11 SRVID=1 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP12 SRVID=60 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinUpdate_server SRVGRP=GROUP13 SRVID=83 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinUpdate_server SRVGRP=GROUP14 SRVID=164 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP13 SRVID=243 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP14 SRVID=324 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP13 SRVID=363 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP14 SRVID=404 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP13 SRVID=523 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP14 SRVID=564 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP13 SRVID=603 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP14 SRVID=644 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP13 SRVID=703 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP14 SRVID=744 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP13 SRVID=803 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP14 SRVID=844 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP13 SRVID=13 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP14 SRVID=64 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    WSL SRVGRP=GROUP11 SRVID=1000
    CLOPT="-A -- -n//120.3.8.237:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP12 SRVID=1001
    CLOPT="-A -- -n//120.3.8.238:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP13 SRVID=1003
    CLOPT="-A -- -n//120.3.8.237:7203 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP14 SRVID=1004
    CLOPT="-A -- -n//120.3.8.238:7204 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    if we use MSSQ ,ubb file about MSSQ config is
    *SERVERS
    DEFAULT:
    CLOPT="-A -p 1,60:1,30"
    sinUpdate_server SRVGRP=GROUP11 SRVID=80 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate11 REPLYQ=Y
    sinUpdate_server SRVGRP=GROUP12 SRVID=160 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate12 REPLYQ=Y
    sinCount_server SRVGRP=GROUP11 SRVID=240 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount11 REPLYQ=Y
    sinCount_server SRVGRP=GROUP12 SRVID=320 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount12 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP11 SRVID=360 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelec11 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP12 SRVID=400 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelect12 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP11 SRVID=520 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert11 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP12 SRVID=560 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert12 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP11 SRVID=600 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete11 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP12 SRVID=640 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete12 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP11 SRVID=700 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl11 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP12 SRVID=740 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl12 REPLYQ=Y
    lockselect_server SRVGRP=GROUP11 SRVID=800 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect11 REPLYQ=Y
    lockselect_server SRVGRP=GROUP12 SRVID=840 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect12 REPLYQ=Y
    #mulup_server SRVGRP=GROUP11 SRVID=1 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup11 REPLYQ=Y
    #mulup_server SRVGRP=GROUP12 SRVID=60 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup12 REPLYQ=Y
    sinUpdate_server SRVGRP=GROUP13 SRVID=83 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate13 REPLYQ=Y
    sinUpdate_server SRVGRP=GROUP14 SRVID=164 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate14 REPLYQ=Y
    sinCount_server SRVGRP=GROUP13 SRVID=243 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount13 REPLYQ=Y
    sinCount_server SRVGRP=GROUP14 SRVID=324 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount14 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP13 SRVID=363 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelec13 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP14 SRVID=404 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelect14 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP13 SRVID=523 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert13 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP14 SRVID=564 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert14 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP13 SRVID=603 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete13 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP14 SRVID=644 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete14 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP13 SRVID=703 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl13 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP14 SRVID=744 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl14 REPLYQ=Y
    lockselect_server SRVGRP=GROUP13 SRVID=803 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect13 REPLYQ=Y
    lockselect_server SRVGRP=GROUP14 SRVID=844 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect14 REPLYQ=Y
    #mulup_server SRVGRP=GROUP13 SRVID=13 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup13 REPLYQ=Y
    #mulup_server SRVGRP=GROUP14 SRVID=64 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup14 REPLYQ=Y
    WSL SRVGRP=GROUP11 SRVID=1000
    CLOPT="-A -- -n//120.3.8.237:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP12 SRVID=1001
    CLOPT="-A -- -n//120.3.8.238:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP13 SRVID=1003
    CLOPT="-A -- -n//120.3.8.237:7203 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP14 SRVID=1004
    CLOPT="-A -- -n//120.3.8.238:7204 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    about above ubb file ,has any error ? or not correct use MSSQ
    look forward to you answer,thanks.

  • TAF Failover issue when RAC node shutdown

    Dear all,
    We have a two-node RAC database. We use sqlplus from a client laptop to test RAC TAF failover when one node is being shutdown. And there's a tnsnames.ora file with TAF settings in the client laptop.
    First we connect to RAC database via sqlplus, when we are under the "SQL>" command prompt, we type " select instance_name from v$instance; " and we can see what instance we truely connect to. Then we shutdown the node we truely connect to; At the meanwhile, if we type "select instance_name from v$instance;" again right away, sometimes the sqlplus hangs and with no response; but if we wait utill the VIP failover to another node then type "select instance_name from v$instance;" we can see it always show the other node's instance name and we know the session is successfully failover to the healthy node.
    My question is :
    Does RAC TAF failover can always and "no down time" failover the session to another healthy node? Or there are some circumstances that the session would hang and need to connect again?
    Any help would be appreciated.

    Hi, thanks for your help.
    There are many things you have to do but if you don't have the knowledge will be difficult.Right. The cluster was setup by consultants but we're still trying to pick up basic Oracle knowledge by self study...
    Found some messages about eviction in old cssd logs in $ORA_CRS_HOME/log/cssd/. Will further dig into it.
    Yes, we tried rebooting different nodes many times in the clusters before, without any problem.
    Thanks a lot.
    /ST Wong

  • Exception while failing over to 2nd RAC Node

    We are using Weblogic 10.3.4. Our setup is that we have a Web Application (A tapestry front end Web UI) and EJb 2.1 back-end talking to the Oracle database. The EJB’s are CMP. Our product always was just stand alone and it wasn’t until this release we needed to make it work with RAC. To get this to work we followed the model of having a Multidatasource with datasources pointing to our RAC nodes. We have two types of datasources that we use persistent and non-persistent. And we are using the Oracle thin driver – non-XA for RAC Service Instances, supporting global transactions.
    When we do failover to the 2nd node we get a nasty exception in our GUI but after logging out and logging back it we are fine.
    My question is that I assumed I shouldn't have to restart our web-application and it should have stayed up ?? Or is there something wrong with our setup ?
    Thanks,
    Ian

    Showing us the exception and/or the error messages at the server might help...
    Note that failing over does not save any ongoing connection or transaction that
    had been to the dead RAC node... Does your web-app get-use-close JDBC
    connections on a per-user-invoke basis, or does it hold onto connections?
    Joe

  • Private Interconnect: Should any nodes other than RAC nodes have one?

    The contractors that set up our four-node production 10g RAC (and a standalone development server) also assigned private interconnect addresses to 2 Apache/ApEx servers and a standalone development database server.
    There are service names in the tnsnames.ora on all servers in our infrastructure referencing these private interconnects- even the non-rac member servers. The nics on these servers are not bound for failover with the nics bound to the public/VIP addresses. These nics are isolated on their own switch.
    Could this configuration be related to lost heartbeats or voting disk errors? We experience rac node expulsions and even arbitrary bounces (reboots!) of all the rac nodes.

    I do not have access to the contractors. . . .can only look at what they have left behind and try to figure out their intention. . .
    I am reading the Ault/Tumha book Oracle 10g Grid and Real Application Clusters and looking through our own settings and config files and learning srvctl and crsctl commands from their examples. Also googling and OTN searching through the library full of documentation. . .
    I still have yet to figure out if the private interconnect spoken about so frequently in cluster configuration documents are the binding to the set of node.vip address specifications in the tnsnames.ora (bound the the first eth adaptor along with the public ip addresses for the nodes) or the binding on the second eth adaptor to the node.prv addresses not found in the local pfile, in the tnsnames.ora, or the listener.ora (but found at the operating system level in the ifconfig). If the node.prv addresses are not the private interconnect then can anyone tell me that they are for?

  • Huge number of idle connections from loopback ip on oracle RAC node

    Hi,
    We have a 2node 11gR2(11.2.0.3) oracle RAC node. We are seeing huge number of idle connection(more than 5000 in each node) on both the nodes and increasing day by day. All the idle connections are from VIP and loopback address(127.0.0.1.47971 )
    netstat -an |grep -i idle|more
    127.0.0.1.47971 Idle
    any insight will be helpful.
    The server is suffering memory issues occasionally (once in a month).
    ORA-27300: OS system dependent operation:fork failed with status: 11
    ORA-27301: OS failure message: Resource temporarily unavailable
    Thanks

    user12959884 wrote:
    Hi,
    We have a 2node 11gR2(11.2.0.3) oracle RAC node. We are seeing huge number of idle connection(more than 5000 in each node) on both the nodes and increasing day by day. All the idle connections are from VIP and loopback address(127.0.0.1.47971 )
    netstat -an |grep -i idle|more
    127.0.0.1.47971 Idle
    any insight will be helpful.
    The server is suffering memory issues occasionally (once in a month).
    ORA-27300: OS system dependent operation:fork failed with status: 11
    ORA-27301: OS failure message: Resource temporarily unavailable
    Thankswe can not control what occurs on your DB Server.
    How do I ask a question on the forums?
    SQL and PL/SQL FAQ
    post results from following SQL
    SELECT * FROM V$VERSION;

  • RAC node restarting!

    hi
    one of our RAC environment keep restarting.
    i've disable the init.cssd, init.crs, init.evmd in the /etc/inittab in order to check the logs.
    this is the situation:
    crsd.log:
    2009-02-04 00:09:00.118: [ COMMCRS][9]clsc_connect: (8000000100318640) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
    2009-02-04 00:09:00.132: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
    2009-02-04 00:09:00.134: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
    2009-02-04 00:09:08.016: [    CRSD][1]32Daemon Version: 10.2.0.2.0 Active Version: 10.2.0.2.0
    2009-02-04 00:09:08.016: [    CRSD][1]32Active Version and Software Version are same
    2009-02-04 00:09:08.017: [ CRSMAIN][1]32Initializing OCR
    2009-02-04 00:09:08.037: [  OCRRAW][1]proprioo: for disk 0 (/dev/rdsk/ora_ocr_raw), id match (1), my id set (752560621,1028247821) total id sets (1), 1st set
    (752560621,1028247821), 2nd set (0,0) my votes (2), total votes (2)
    2009-02-04 00:09:08.140: [ CSSCLNT][24]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
    ocssd.log:
    [    CSSD]2009-02-03 21:52:08.651 [9] >USER: clssnmHandleUpdate: NODE 1 (node1l) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2009-02-03 21:52:08.651 [9] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
    [    CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmWaitForAcks: done, msg type(15)
    [    CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmDoSyncUpdate: Sync Complete!
    [    CSSD]2009-02-03 21:52:08.722 [1] >USER: NMEVENT_SUSPEND [00][00][00][00]
    [    CSSD]2009-02-03 21:52:08.724 [17] >TRACE: clssgmReconfigThread: started for reconfig (1)
    [    CSSD]2009-02-03 21:52:08.749 [17] >USER: NMEVENT_RECONFIG [00][00][00][02]
    [    CSSD]2009-02-03 21:52:08.749 [17] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 1
    [    CSSD]2009-02-03 21:52:08.751 [13] >TRACE: clssgmPeerListener: connects done (1/1)
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmEstablishMasterNode: MASTER for 1 is node(1) birth(1)
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
    [    CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes
    [    CSSD]CLSS-3001: local node number 1, master node number 1
    [    CSSD]2009-02-03 21:52:08.753 [17] >TRACE: clssgmReconfigThread: completed for reconfig(1), with status(1)
    [    CSSD]2009-02-03 21:52:08.863 [10] >TRACE: clssgmClientConnectMsg: Connect from con(80000001008fd2a0) proc(8000000100ae26a8) pid() proto(10:2:1:1)
    [    CSSD]2009-02-03 21:52:08.864 [10] >TRACE: clssgmClientConnectMsg: Connect from con(8000000100ae0128) proc(8000000100ae2a10) pid() proto(10:2:1:1) from con(8000000100aa32c0) proc(8000000100aa5b90) pid() proto(10:2:1:1)
    alertlog:
    [cssd(2535)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 .
    2009-02-03 23:55:20.821
    [cssd(2575)]CRS-1605:CSSD voting file is online: /dev/rdsk/ora_voting_raw. Detai ls in /work/crs/product/10.2/crs/log/lourmel/cssd/ocssd.log.
    2009-02-03 23:55:28.376
    evmd.log:
    Oracle Database 10g CRS Release 10.2.0.2.0 Production Copyright 1996, 2004, Oracle. All rights reserved
    2009-02-04 00:08:58.331: [    EVMD][1]32EVMD waiting for CSS to be ready err = 3
    2009-02-04 00:08:59.939: [ COMMCRS][9]clsc_connect: (800000010007d658) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
    2009-02-04 00:08:59.946: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
    2009-02-04 00:08:59.948: [    EVMD][1]32EVMD waiting for CSS to be ready err = 3
    2009-02-04 00:09:07.596: [ CSSCLNT][1]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
    syslog:
    Feb 4 00:08:41 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
    Feb 4 00:08:45 lourmel sfd[2153]: starting the daemon.
    Feb 4 00:08:45 lourmel su: + tty?? root-orac
    Feb 4 00:08:45 lourmel krsd[2152]: Delay time is 300 seconds
    Feb 4 00:08:43 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
    Feb 4 00:08:52 lourmel above message repeats 2 times
    Feb 4 00:08:52 lourmel syslog: Cluster Ready Services completed waiting on dependencies.
    Feb 4 00:08:53 lourmel syslog: Running CRSD with TZ =
    when i checked(befor the restart) the command crs_stat i got the message:
    ORA-0184: Cannot communicate wirh CRS
    crsctl check crs gives us:
    Failure 1 contacting CSS daemon
    Cannot communicate with CRS
    Cannot communicate with EVM
    as i said befor, the machine always restarting
    anyone have an idea?? please

    Dear All,
    I recently upgrade the Few RAC setups with Oracle 10g Patchset 3 (10.2.0.4) on Linux Servers
    In one of the RAC setup, found servers are rebooting daily. The same setup was working fine and problem started only after applying the Patchset. Checked all the logs and Found nothing relevant.
    Then i checked the things which added with this Patchset.
    The Most interesting found , Oracle Added a New Daemon- oprocd.
    # ps -efl | grep oprocd
    4 S root 6440 6063 0 -40 - - 2114 - Mar03 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -hsi 5:10:50:75:90 -f
    These are Interesting Points about above line
    1.This Process is running by root user
    2. With Highest Priority -40
    3. Probing every Seconds (t 1000)
    4. waiting CPU response for 500 Milliseconds ( -m 500 means margin time is 500 Milli Seconds)
    5. Process status is Fatal (-f)
    Now I am concluding these points- This daemon will probe cpu every second and wait for response within 500 Mill seconds. If in the 500 Milli second not getting any response from the cpu, will assume the CPU is hang and try to Reboot the Machine. The OPERATING SYSTEM will not get enough time to write the system logs and server reboots.
    So the solution is increase the Margin time for 500 Milli second to 10 seconds.
    These are following steps to increase the Margin time.
    Please Remember- The Modification process need Downtime and You need to stop cluster service in all member nodes.
    1. Stop The CRS Process
    #crsctl stop crs
    #<CRS_HOME>/bin/oprocd stop
    2. Ensure that Clusterware stack is down and not running
    #ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
    This should return no processes.
    3. From one node of the cluster, change the value of the "diagwait" parameter to 13 by issuing the command as root:
    #crsctl set css diagwait 13 -force
    4. Check if diagwait is successfully set.
    #crsctl get css diagwait
    5. Restart the Oracle Clusterware on all the nodes by executing:
    #crsctl start crs
    (Note- If facing any problem to restarting the CRS services, ASM and Database, You can reboot the Nodes.The Cluster and Database will come automatically due to init startup scripts.)
    6. The oprocd daemon process will show with -m 10000
    # ps -efl| grep oprocd
    # 4 S root 6440 6063 0 -40 - - 2114 - Feb02 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f
    Rollback Procedure-
    If You need to unset oprocd value due any reason
    #crsctl unset css diagwait
    I am confident, The abnormal RAC Node restart problem will solve with this workaround.
    Regards,
    Sumit
    Bangalore,India

  • RAC node is not starting

    I have 2 node RAC instance in 2 virtual machines. My node 2 ASM instance is not starting after the reboot but node 1 is working fine.
    I got below output from Node 2.
    srvctl status asm
    PRCR-1070 : Failed to check if resource ora.asm is registered
    Cannot communicate with crsd
    alertlog
    [/u01/app/11.2.0/grid/bin/cssdagent(3782)]CRS-5818:Aborted command 'start for resource: ora.cssd 1 1' for resource 'ora.cssd'. Details at (:CRSAGF00113:) in /u01/app/11.2.0/grid/log/ol5-112-rac2/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
    2015-03-13 16:38:23.671
    [ohasd(3315)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cssd'. Details at (:CRSPE00111:) in /u01/app/11.2.0/grid/log/ol5-112-rac2/ohasd/ohasd.log.
    2015-03-13 16:38:24.264
    [ohasd(3315)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'ol5-112-rac2'.
    2015-03-13 16:38:35.915
    [cssd(4326)]CRS-1713:CSSD daemon is started in clustered mode
    2015-03-13 16:38:41.195
    [cssd(4326)]CRS-1707:Lease acquisition for node ol5-112-rac2 number 2 completed
    2015-03-13 16:38:41.259
    [cssd(4326)]CRS-1605:CSSD voting file is online: /dev/oracleasm/disks/DISK1; details in /u01/app/11.2.0/grid/log/ol5-112-rac2/cssd/ocssd.log.
    2015-03-13 16:38:41.562
    [crsd(4285)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/ol5-112-rac2/crsd/crsd.log.
    2015-03-13 16:38:42.217
    [ohasd(3315)]CRS-2765:Resource 'ora.crsd' has failed on server 'ol5-112-rac2'.
    2015-03-13 16:38:43.313
    [crsd(4357)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/ol5-112-rac2/crsd/crsd.log.
    2015-03-13 16:38:44.262
    [ohasd(3315)]CRS-2765:Resource 'ora.crsd' has failed on server 'ol5-112-rac2'.
    2015-03-13 16:38:44.716
    [ohasd(3315)]CRS-2765:Resource 'ora.diskmon' has failed on server 'ol5-112-rac2'.
    crsd log
    2015-03-13 16:39:02.140: [  OCRASM][1266976496]proprasmo: kgfoCheckMount returned [7]
    2015-03-13 16:39:02.140: [  OCRASM][1266976496]proprasmo: The ASM instance is down
    2015-03-13 16:39:02.140: [  OCRRAW][1266976496]proprioo: Failed to open [+DATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
    2015-03-13 16:39:02.140: [  OCRRAW][1266976496]proprioo: No OCR/OLR devices are usable
    2015-03-13 16:39:02.140: [  OCRASM][1266976496]proprasmcl: asmhandle is NULL
    2015-03-13 16:39:02.140: [  OCRRAW][1266976496]proprinit: Could not open raw device
    2015-03-13 16:39:02.140: [  OCRASM][1266976496]proprasmcl: asmhandle is NULL
    2015-03-13 16:39:02.140: [  OCRAPI][1266976496]a_init:16!: Backend init unsuccessful : [26]
    2015-03-13 16:39:02.140: [  CRSOCR][1266976496] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
    ORA-15077: could not locate ASM instance serving a required diskgroup
    ] [7]
    2015-03-13 16:39:02.140: [    CRSD][1266976496][PANIC] CRSD exiting: Could not init OCR, code: 26
    2015-03-13 16:39:02.140: [    CRSD][1266976496] Done
    Pls help

    Hi Levi,
    I got below output from cssd.log
    2015-03-15 10:38:48.248: [ GIPCNET][1214789952]gipcmodNetworkProcessConnect: slos op  :  sgipcnTcpConnect
    2015-03-15 10:38:48.248: [ GIPCNET][1214789952]gipcmodNetworkProcessConnect: slos dep :  No route to host (113)
    2015-03-15 10:38:48.248: [ GIPCNET][1214789952]gipcmodNetworkProcessConnect: slos loc :  connect
    2015-03-15 10:38:48.248: [ GIPCNET][1214789952]gipcmodNetworkProcessConnect: slos info:  addr '192.168.1.101:42729'
    2015-03-15 10:38:48.248: [    CSSD][1214789952]clssscSelect: conn complete ctx 0x24d7aa0 endp 0x10d4
    2015-03-15 10:38:48.248: [    CSSD][1214789952]clssnmeventhndlr: node(1), endp(0x10d4) failed, probe((nil)) ninf->endp (0x1000010d4) CONNCOMPLETE
    2015-03-15 10:38:48.248: [    CSSD][1214789952]clssnmDiscHelper: ol5-112-rac1, node(1) connection failed, endp (0x10d4), probe(0x100000000), ninf->endp 0x10d4
    2015-03-15 10:38:48.248: [    CSSD][1214789952]clssnmDiscHelper: node 1 clean up, endp (0x10d4), init state 0, cur state 0
    2015-03-15 10:38:48.248: [GIPCXCPT][1214789952]gipcInternalDissociate: obj 0x27c8050 [00000000000010d4] { gipcEndpoint : localAddr 'gipc://ol5-112-rac2:de93-de83-5e0c-a373#192.168.1.102#50439', remoteAddr 'gipc://ol5-112-rac1:nm_ol5-112-scan#192.168.1.101#42729', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
    2015-03-15 10:38:48.248: [GIPCXCPT][1214789952]gipcDissociateF [clssnmDiscHelper : clssnm.c : 3215]: EXCEPTION[ ret gipcretFail (1) ]  failed to dissociate obj 0x27c8050 [00000000000010d4] { gipcEndpoint : localAddr 'gipc://ol5-112-rac2:de93-de83-5e0c-a373#192.168.1.102#50439', remoteAddr 'gipc://ol5-112-rac1:nm_ol5-112-scan#192.168.1.101#42729', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 }, flags 0x0
    2015-03-15 10:38:48.248: [    CSSD][1214789952]clssnmDiscEndp: gipcDestroy 0x10d4
    2015-03-15 10:38:48.414: [    CSSD][1147648320]clssnmvDHBValidateNCopy: node 1, ol5-112-rac1, has a disk HB, but no network HB, DHB has rcfg 320087493, wrtcnt, 207342, LATS 355744, lastSeqNo 207342, uniqueness 1426244111, timestamp 1426396128/31164744
    2015-03-15 10:38:48.414: [    CSSD][1214789952]clssnmconnect: connecting to addr gipc://ol5-112-rac1:nm_ol5-112-scan#192.168.1.101#42729
    2015-03-15 10:38:48.414: [    CSSD][1214789952]clssscConnect: endp 0x10e0 - cookie 0x24d7aa0 - addr gipc://ol5-112-rac1:nm_ol5-112-scan#192.168.1.101#42729
    2015-03-15 10:38:48.414: [    CSSD][1214789952]clssnmconnect: connecting to node(1), endp(0x10e0), flags 0x10002
    2015-03-15 10:38:48.710: [    CSSD][1181219136]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmRcfgMgrThread: Local Join
    2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmLocalJoinEvent: begin on node(2), waittime 193000
    2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmLocalJoinEvent: set curtime (356544) for my node
    2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmLocalJoinEvent: scanning 32 nodes
    2015-03-15 10:38:49.209: [    CSSD][1206397248]clssnmLocalJoinEvent: Node ol5-112-rac1, number 1, is in an existing cluster with disk state 3
    2015-03-15 10:38:49.210: [    CSSD][1206397248]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
    Thanks,
    Suranga

  • Solaris RAC nodes re-booting

    I have a pre-production 2-node cluster running on Solaris 10, Oracle 10.2.0.3 with the Oracle CRS, and using a NetApp filer as the shared storage.
    I also have a separate Solaris server running Grid Control 10.2.0.3, with the repository as one of the databases on the RAC (don't know if this is relevant to my problem).
    Periodically both RAC nodes reboot, with no trace of why (the GC server is fine). There is nothing logged in the Solaris logs (messages file), CRS logs, Oracle logs or the NetApp logs.
    All that is shown is the relevant service starting up following the shutdown.
    Has anyone any experience of this, or any thoughts on which component may cause such an issue?
    Thanks in advance
    Bob

    What type of Sun hardware are you using?
    Below is the Action Plan Oracle support sent me on my SR on this issue, not sure if any of this was provided to you or would be of help.
    ACTION PLAN
    ============
    1. there is nothing on the files at all that sheds any light on the issue
    agian 3 sperate sets of clusters all losing all nodes at the same tiem is a very strange occurance. Please be sure to have the admin look for
    anything in common wiht all custers.
    2. advice placing oswatcher on the systems Note.301137.1 Ext/Pub OS Watcher User Guide
    if we should have another occurances we will want the oswatcher logs for 1 hr before issue thru issue
    also see if the unix admin perhaps has any os stats from this occurance
    3. advice settign ntpd to run with -x option I do see that you are having negative time changes
    at times
    -x will give us a skew rather then an abbrupt time change
    4. advice setting this when you can
    Please do the following
    set the diagwait parameter:
    crsctl set css diagwait N [-force]
    Where N is the number of seconds to wait for a filesystem sync to
    complete (after this wait the node will reboot regardless of whether the
    sync has completed). This change must be made with the clusterware
    down, which will require the '-force', or with the stack up on just 1
    node, after which the stack on that node must be restarted before the
    stack starts up on any of the other nodes.
    N should be set to 25 (25 seconds)
    5. advice that you have with pcw mlr#6 Patch 5980915 on the systems as well
    but I do not believe that this was an oracle bug the reason for placing the patch on is for advanced diagnostics that is in that patchset
    6. the two issues sun is workking on
    Sun is working to resolve a time skew issue and a Solaris 10 kernel SIGALRM Sun#6292092 in addition to Sun#6595936.
    7. we do have a diagnostic oprocd that soem sites have used but on thier test systems. It stops reboots adn dumps information but I have
    been hesitant to place it on production boxes if you continue to have issues we may consider download the oprocd_skewfix_noreboot fro
    m Bug 6279879 but at this time I do not belvve that is warrented

  • Error during RAC Node Registration: WB_RT_SERVICE_MANAGEMENT missed

    hi to all,
    can you help me by OWB-Installation on RAC?
    I start on Unix-Server (Node1) with:
    Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
    PL/SQL Release 11.1.0.7.0 - Production
    CORE     11.1.0.7.0     Production
    TNS for Linux: Version 11.1.0.7.0 - Production
    NLSRTL Version 11.1.0.7.0 - Production
    owb/bin/unix/reposinst.sh.
    Now in Repository Assistant -> <IP of HOST1>:1521:<Service Name of RAC> -> Register a RAC Instance -> Finish.
    And now during Registration an error appear:
    Error occurred during RAC Node Registration. Exception =java.lang.Exception: java.sql.SQLException: ORA-06550: line 7, column 14:
    PLS-00201: identifier 'WB_RT_SERVICE_MANAGEMENT.ADD_NODE' must be declared;
    I am going to select all objects in database:
    SQL> select * from dba_objects where lower(object_name) like '%wb_rt%';
    no rows selected
    All components are installed:
    COMP_ID COMP_NAME VERSION
    OWB OWB 11.1.0.7.0
    APEX Oracle Application Express 3.0.1.00.1
    EM Oracle Enterprise Manager 11.1.0.7.0
    WK Oracle Ultra Search 11.1.0.7.0
    AMD OLAP Catalog 11.1.0.7.0
    SDO Spatial 11.1.0.7.0
    ORDIM Oracle Multimedia 11.1.0.7.0
    XDB Oracle XML Database 11.1.0.7.0
    CONTEXT Oracle Text 11.1.0.7.0
    EXF Oracle Expression Filter 11.1.0.7.0
    RUL Oracle Rules Manager 11.1.0.7.0
    OWM Oracle Workspace Manager 11.1.0.7.0
    CATALOG Oracle Database Catalog Views 11.1.0.7.0
    CATPROC Oracle Database Packages and Types 11.1.0.7.0
    JAVAVM JServer JAVA Virtual Machine 11.1.0.7.0
    XML Oracle XDK 11.1.0.7.0
    CATJAVA Oracle Database Java Packages 11.1.0.7.0
    APS OLAP Analytic Workspace 11.1.0.7.0
    XOQ Oracle OLAP API 11.1.0.7.0
    RAC Oracle Real Application Clusters 11.1.0.7.0
    What i must and can do?
    Edited by: AndreyT on 26.04.2010 05:23

    No. Must I install a Repository with Server's Repository Administrator (on Unix - Node 1) or with Client's Repository Administrator (on Windows)? Wie configure i Control Center ?
    And yet one question. I have Oracle 11.1.0.7.0 auf Unix with OWB 11.1.0.7.0. There are two Versions of OWB Standalone Software on Oracle-Site to download: OWB 11.2.0.1.0 and 11.1.0.6.0. What one must i download to client computer to install? Work 11.2.0.1.0 on client with 11.1.0.7.0 on Server?
    Must
    Edited by: AndreyT on 27.04.2010 04:12
    Edited by: AndreyT on 27.04.2010 12:46

  • Is it possible to move some of the capture processes to another rac node?

    Hi All,
    Is it possible to move some of the ODI (Oracle Data Integrator) capture processes running on node1 to node2. Once moved does it work as usual or not? If its possible please provide me with steps.
    Appreciate your response
    Best Regards
    SK.

    Hi Cezar,
    Thanks for your post. I have a related question regarding this,
    Is it really necessary to have multiple capture and multiple apply processes? One for each schema in ODI? Because if set to automatic configuration, ODI seems to create a capture and a related apply process for each schema, which I guess leads to our specific performance problem (high cpu etc) I mentioned in my other post: Re: Is it possible to move some of the capture processes to another rac node?
    Is there way to use just one capture and one apply process for all of the schemas in ODI?
    Thanks a million.
    Edited by: oyigit on Nov 6, 2009 5:31 AM

  • What is best use of 1400 gb SGA (2 rac nodes 768gb each)

    currently using 11.2.0.3.0 on unix sun sever with 2 RAC nodes each 8 UltraSPARC-T1 cpus (came out in 2005) four threads each so oracle sees 32 CPUS very slow(1.2 gb).  Database is 4TB in size on regular SAN (10k speed).
    8gb SGA.
    New boss wants to update system to the max to get best performance possible  Money is a concern of course but budget is pretty high,  Our use case is 12-16 users at same time, running reports some small others very large (return single row or 10000s or rows).  reports take 5 sec to 5 minutes, Our job is get the fastest system possible,  We have total of 8 licenses available so we can have 16 cores.  We are also getting a 6tb all flash SSD array for database.  we can get any CPU we want but we cant use parallel query server due to all kinds of issues we have experienced (too many slaves, RAC interconnect saturation etc, whack-a-mole).  sparc has too many threads and without PS oracle runs query in single thread. 
    we have speced out the following system for each RAC node
    HP ProLiant DL380p Gen8 8 SFF server
    2 Intel Xeon E5-2637v2 3.5GHz/4-core cpus
    768 gb ram
    2 HP 300GB 6G SAS 15K drives for database software
    this will give us total of 4 Xeon E5-2637v2 cpus 16 cores total (,5 factor for 8 licenses) and 1536 ram (leaving ~1400 for sga).  this will guarantee an available core for each user.  we intend to create very very large keep pool around 300 gb for each node that will hold all our dimension tables.  this we hope will reduce reads from the SSD to just data from fact tables.,
    Are we doing a massive overkill here?  the budget for this was way less than what our boss expected.  will that big an sga be wasted will say a 256gb be fine.  or will oracle take advantage of it and be able to keep most blocks in there.
    will an sga that big cause oracle problems due to overhead of handling that much ram?

    Current System:
    ===========
    a. Version : 11.2.0.3
    b. Unix Sun
    c. CPU - 8 cpus with 4 threads => 32 logical cpus or cores
    d. database 4TB
    e. SAN - 10k speed disk drives
    f. 8gb SGA
    g. 1.2 gb ??
    h. Users --> 12-16 concurrent and run reports varying size
    i. reports elasped time 5 sec to 5 mins
    j. cpu license -->8
    Target System
    ===========
    a. Version: 11.2.0.3
    b. HP ProLiant DL380p Gen8 8 SFF server
    c. RAM --> 768 GB
    d. 2 HP 300GB 6G SAS 15K drives for database software
    e. large keep pool -->90 gb to  hold all dimension tables. 
    f.  SSD to just data from fact tables
    g. SGA -->256gb
    Reassessment of the performance issues of current system appears to be required.Good performance tuning expert is required to look into tuning issues of current application by analyzing awr performance metrics . If 8GB SGA is not enough,then reason behind so is that queries running in the system are not having good access path to select lesser data to avoid flushing out of recent buffers from different tables involved in the query. Until those issues are identified , wherever you go, performance issue wont be going away as table size increase in future , problem will reappear.Even if the queries are running with more FULL Scan , then re-platforming to Exadata might be right decision as Exadata has smart scan , cell offloading feature which works faster and might be right direction for best performance and best investment for future.Compression (compress for OLTP) could be one of the other feature to exploit to improve further efficiency while reading the lesser block in lesser read time.
    Investment in infrastructure will solve a few issue in short term but long term issue will again arise.
    Investment in identifying the performance issues of current system would be best investment in current scenario.

Maybe you are looking for

  • Partner profile is not getting updated in FSCM.

    Hi All, We are using FSCM for credit management. We have a issue wherein partner profile are not created for a customer. We created a customer and hwne checked we found that the partner profile is not there for the customer. We are checking the maste

  • HT1600 no longer mirrors with my devices

    airplay no longer works... it was working fine today and all of a sudden it stopped working.  I've restarted my apple tv and even restored to its original factory setting and nothing.  please help

  • [SOLVED] System hung while mdadm is stabalizing

    I've got an external 8 bay SATA RAID chassis (connected to my PC with 2 x eSATA) that I'm trying to expand.  It had 5 x 2TB drives (WD20EARS) configured as RAID5.   I added 3 more 2TB drives. I did an mdadm --grow /dev/md3 --number-devices=7 I'd like

  • 100% Discount in Free of Cost Invoice??

    Hi, There is a requirement of passing the total excise duty charged in an FOC invoice to an expenses G/L which is currently debiting the customer account. I created condition type R100 by copying K007 and also created an account key for assignment to

  • No default project!

    Hi, I have SunOS 5.9 machine and I cannot log into it even with the root account. When I supply a valid password, it displays: No default project! and prompt for login again. A QA who worked on this box told me that he did uninstall of a program and