Oracle RAC reboots

Hi,
I have a two node cluster with raw devices as its storage. I have ASM instance and three database instances running on these servers. The OS is solaris 10 and the DB version is 10.2.0.3 . The problem is the servers reboot itself after some interval of time by itself. I am not sure what to look at to fix this issue. The only thing I know is that CSSD process is failing and making the servers reboot. This is production environment. Any kind of help will be appreciated.
Thanks

The following is what I got from Sun peple.
System was rebooted by Oracle CSSD process. It appears to be due to the fact the the system was cpu-bottlenecked with overloaded cpus (only 2 on the system) that has about 50 runable threads on each cpu dispatch queue. Most of the threads in the cpus dispatch queues belong to Oracle.
Customer need to contact Oracle to provide further analysis and recommendations
How can we reduce CPU and find if CPU is actually the culprit

Similar Messages

  • Oracle RAC Nodes getting reboot in case of preferred controller failed

    When we are disconnecting both Fiber cable from preferred Controller A or plugging out Controller A card from Disk Array(IBM DS 4300), After 90 seconds both the servers are rebooting.
    In this time complete RAC network is going out of service for approx 5 minutes.After reboot both servers are coming with both instances without any manual intervention
    It’s a critical issue for us because we are loosing High Availability, Let us know how we can resolve this critical issue.
    Detail of Network:
    1. Software- Oracle 10g Release2
    2. OS- Redhat Linux 3 (Kernel Version-2.4.21-27.ELsmp)
    3. Shared Storage- IBM DS 4300.
    4. Multipathing Driver - RDAC (rdac-LINUX-09.00 A5.13)
    4. Nodes- IBM 346
    5. Databse on ASM
    6. ASM,OCR & Voting Disk Preferred controller is A.
    7. Hangcheck timer value is 210 seconds.
    8. Both Server available with 2 HBA port . I HBA port is connected with Controller A and Seconfd HBA port is connected with Controller B of SAN Disk Array.
    As per my understanding,
    Voting disk resides in Disk Array and Controller A is preferred owner of Voting Disk LUN.. When i am disconnecting both fiber cable from preferred controller A , then Both Nodes Clusterware software trying to contact with Voting Disk, When they are unable to contact with Voting disk in specfic time period, they are going for reboot.
    I tested Controller failure testing with Oracle RAC software as well without Oracle. Without Oracle its working fine and reason behind, in that time Disk Array is waiting for approx 300 seconds for changing preferred controlller from A to B.
    But With Oracle, Clusterware Software reboot both nodes before Controller can shift from A to B.
    So if i conclude,the tech who has good understanding of Oracle Clusterware on Linux OS & IBM RDAC multipath driver can help me.
    when we install Oracle RAC on Linux, it is required to configure hangcheck timer.
    Oracle recomends 180 second.
    It means if one of node is hanging, then second node will wait for 180 seconds, if within 180 seconds ,it is not able to resolve this situation then it will reboot hung node.
    I think Hangcheck timer configuration reuired only with Linux OS.
    Configuration File
    cat >> /etc/rc.d/rc.local << EOF
    modprobe hangcheck-timer hangcheck_tick=15 hangcheck_margin=60

    Sorry
    Hangcheck timer is
    Configuration File
    cat >> /etc/rc.d/rc.local << EOF
    modprobe hangcheck-timer hangcheck_tick=30 hangcheck_margin=180

  • Oracle RAC 10.2G reboots node every 45 minutes

    Hello:
    - We have installed Oracle RAC 10.2G for Solaris X86 ( 64 bit ).
    - On one node, there are no issues. But the other node ( I think )
    is being rebooted by CRS every 45 minutes or so.
    - Is this issue caused by some misconfiguration I did during the install ?
    - Or is there a patch available to fix this ?
    - Has anyone else encountered this problem ?
    Thanks
    jlem

    Hello:
    - I re-installed Oracle RAC. The nodes were only rebooted once so far.
    So, the second install may be ok. If not, I have provided answers to the first email reply.
    - Any help given is most welcome. In meantime, I will continue searching the oracle forums
    for solutions.
    - My environment is:
    - both nodes are running under vmware ESX server version 3.0.1
    - the shared storage for OCR and Voting Disk is a raw shared device under vmware
    - both nodes are using Solaris X86 5.10 update 5
    - Oracle version is: 10.2.0.3 ( patched from version 10.2.0.1 )
    - My public network configuration is:
    node 1:
    e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    inet 10.20.1.74 netmask ffff0000 broadcast 10.20.255.255
    ether 0:c:29:3a:45:a9
    e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    inet 10.20.1.77 netmask ffff0000 broadcast 10.20.255.255
    node 2:
    e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    inet 10.20.1.75 netmask ffff0000 broadcast 10.20.255.255
    ether 0:c:29:2b:db:90
    e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    inet 10.20.1.78 netmask ffff0000 broadcast 10.20.255.255
    - My private network configuration is:
    node 1:
    e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
    inet 192.168.0.1 netmask ffffff00 broadcast 192.168.0.255
    ether 0:c:29:3a:45:b3
    node 2:
    e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
    inet 192.168.0.2 netmask ffffff00 broadcast 192.168.0.255
    ether 0:c:29:2b:db:9a
    - My storage solution is:
    - 3 virtual shared SCSI hard disks ( each 500 MB in size )
    - My log files are:
    - /var/adm/messages
    - doesn't report much only the following:
    Nov 12 10:57:05 saucer nfs4cbd[328]: [ID 867284 daemon.notice] nfsv4 cannot determine local hostname binding for transport
    tcp6 - delegations will not be available on this transport
    Nov 12 10:57:21 saucer savecore: [ID 570001 auth.error] reboot after panic: forced crash dump initiated at user requestNov 12 10:57:21 saucer savecore: [ID 748169 auth.error] saving system crash dump in /var/crash/saucer/*.2Nov 12 10:57:41 saucer root: [ID 702911 user.error] Oracle Cluster Ready Services disabled by administrator.Nov 12 10:57:54 saucer rootnex: [ID 349649 kern.info] xsvc0 at rootNov 12 10:57:54 saucer genunix: [ID 936769 kern.info] xsvc0 is /xsvc
    - ocssd.log file for node1 indicates that node2 was evicted for impeding a reconfig. Details are:
    [    CSSD]2008-11-12 10:55:43.700 [15] >TRACE: clssnmPollingThread: node saucer (2) is impending reconfig
    [    CSSD]2008-11-12 10:55:43.700 [15] >WARNING: clssnmPollingThread: node saucer (2) at 90% heartbeat fatal, eviction in 0
    .973 seconds
    [    CSSD]2008-11-12 10:55:44.679 [15] >TRACE: clssnmPollingThread: node saucer (2) is impending reconfig
    [    CSSD]2008-11-12 10:55:44.679 [15] >TRACE: clssnmPollingThread: Eviction started for node saucer (2), flags 0x000d, s
    tate 3, wt4c 0
    [    CSSD]2008-11-12 10:55:44.690 [17] >TRACE: clssnmDoSyncUpdate: Initiating sync 3
    [    CSSD]2008-11-12 10:55:44.690 [17] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (27000)ms
    [    CSSD]2008-11-12 10:55:44.691 [17] >TRACE: clssnmSetupAckWait: Ack message type (11)
    [    CSSD]2008-11-12 10:55:44.691 [17] >TRACE: clssnmSetupAckWait: node(1) is ALIVE
    [    CSSD]2008-11-12 10:55:44.691 [17] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
    [    CSSD]2008-11-12 10:55:44.691 [17] >TRACE: clssnmSendSync: syncSeqNo(3)
    - node2 ocssd.log does not indicate the problem. See below for details:
    [    CSSD]2008-11-12 10:52:34.731 [11] >TRACE: clssgmClientConnectMsg: Connect from con(da8410) proc(dab900) pid() proto(
    10:2:1:1)
    [    CSSD]2008-11-12 10:53:37.305 [11] >TRACE: clssgmClientConnectMsg: Connect from con(da8410) proc(dab900) pid() proto(
    10:2:1:1)
    [    CSSD]2008-11-12 10:54:40.515 [11] >TRACE: clssgmClientConnectMsg: Connect from con(da8410) proc(dab900) pid() proto(
    10:2:1:1)
    [    CSSD]2008-11-12 11:18:09.997 >USER: Oracle Database 10g CSS Release 10.2.0.3.0 Production Copyright 1996, 2004 Orac
    le. All rights reserved.
    [    CSSD]2008-11-12 11:18:09.997 >USER: CSS daemon log for node saucer, number 2, in cluster crs
    [  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=saucerDBG_CSSD))
    [    CSSD]2008-11-12 11:18:10.016 [1] >TRACE: clssscmain: local-only set to false
    [    CSSD]2008-11-12 11:18:10.031 [1] >TRACE: clssnmReadNodeInfo: added node 1 (flying) to cluster
    [    CSSD]2008-11-12 11:18:10.042 [1] >TRACE: clssnmReadNodeInfo: added node 2 (saucer) to cluster
    [    CSSD]2008-11-12 11:18:10.057 [5] >TRACE: clssnm_skgxnmon: skgxn init failed
    [    CSSD]2008-11-12 11:18:10.057 [1] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
    - ORACLE VERIFY: cluvfy was run on node2 resulting with the following:
    bash-3.00$ ./cluvfy comp ocr -n all -verbose
    Verifying OCR integrity
    Checking OCR integrity...
    Checking the absence of a non-clustered configuration...
    All nodes free of non-clustered, local-only configurations.
    Uniqueness check for OCR device passed.
    Checking the version of OCR...
    OCR of correct Version "2" exists.
    Checking data integrity of OCR...
    Data integrity check for OCR passed.
    OCR integrity check passed.
    Verification of OCR integrity was successful.
    bash-3.00$
    Thanks
    jlem

  • If use MSSQ , when oracle rac node reboot, client get TPEOS error

    Hi, all
    in my tuxedo applicaton, if we use Single Server, Single Queue mode , when reboot any Oracle RAC node, our application is ok, client can get correct result. but if we use MSSQ(Multi Server, Single Queue) , if Oracle RAC node is ok , our application also is ok. but if we reboot any Oracle RAC node, client program can continue run, get correct result, but always get TPEOS error , for this situation, server can get client request, but client can not get server reply, only get TPEOS error.
    our enviroment is :
    oracle RAC ,10g 10.2.0.4 , two instances ,rac1 rac2, and two DTP services s1 and s2, set s1 and s2 services TAF is basic
    tuxedo 10R3 , two nodes ,work in MP model ,use XA access oracle rac database,services have Transaction and not Transaction
    OS is linux AS4 U5, 64bits
    service program use OCI
    can any one encounter this problem ?

    Hi, first thanks you
    in ULOG file , only have failover information, not any other error message, in client side also has no other error.
    not use MSSQ, ubb file about MSSQ config
    SERVERS
    DEFAULT:
    CLOPT="-A "
    sinUpdate_server SRVGRP=GROUP11 SRVID=80 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinUpdate_server SRVGRP=GROUP12 SRVID=160 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP11 SRVID=240 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP12 SRVID=320 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP11 SRVID=360 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP12 SRVID=400 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP11 SRVID=520 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP12 SRVID=560 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP11 SRVID=600 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP12 SRVID=640 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP11 SRVID=700 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP12 SRVID=740 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP11 SRVID=800 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP12 SRVID=840 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP11 SRVID=1 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP12 SRVID=60 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinUpdate_server SRVGRP=GROUP13 SRVID=83 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinUpdate_server SRVGRP=GROUP14 SRVID=164 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP13 SRVID=243 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP14 SRVID=324 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP13 SRVID=363 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP14 SRVID=404 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP13 SRVID=523 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP14 SRVID=564 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP13 SRVID=603 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP14 SRVID=644 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP13 SRVID=703 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP14 SRVID=744 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP13 SRVID=803 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP14 SRVID=844 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP13 SRVID=13 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP14 SRVID=64 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    WSL SRVGRP=GROUP11 SRVID=1000
    CLOPT="-A -- -n//120.3.8.237:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP12 SRVID=1001
    CLOPT="-A -- -n//120.3.8.238:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP13 SRVID=1003
    CLOPT="-A -- -n//120.3.8.237:7203 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP14 SRVID=1004
    CLOPT="-A -- -n//120.3.8.238:7204 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    if we use MSSQ ,ubb file about MSSQ config is
    *SERVERS
    DEFAULT:
    CLOPT="-A -p 1,60:1,30"
    sinUpdate_server SRVGRP=GROUP11 SRVID=80 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate11 REPLYQ=Y
    sinUpdate_server SRVGRP=GROUP12 SRVID=160 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate12 REPLYQ=Y
    sinCount_server SRVGRP=GROUP11 SRVID=240 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount11 REPLYQ=Y
    sinCount_server SRVGRP=GROUP12 SRVID=320 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount12 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP11 SRVID=360 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelec11 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP12 SRVID=400 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelect12 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP11 SRVID=520 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert11 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP12 SRVID=560 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert12 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP11 SRVID=600 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete11 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP12 SRVID=640 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete12 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP11 SRVID=700 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl11 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP12 SRVID=740 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl12 REPLYQ=Y
    lockselect_server SRVGRP=GROUP11 SRVID=800 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect11 REPLYQ=Y
    lockselect_server SRVGRP=GROUP12 SRVID=840 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect12 REPLYQ=Y
    #mulup_server SRVGRP=GROUP11 SRVID=1 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup11 REPLYQ=Y
    #mulup_server SRVGRP=GROUP12 SRVID=60 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup12 REPLYQ=Y
    sinUpdate_server SRVGRP=GROUP13 SRVID=83 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate13 REPLYQ=Y
    sinUpdate_server SRVGRP=GROUP14 SRVID=164 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate14 REPLYQ=Y
    sinCount_server SRVGRP=GROUP13 SRVID=243 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount13 REPLYQ=Y
    sinCount_server SRVGRP=GROUP14 SRVID=324 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount14 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP13 SRVID=363 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelec13 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP14 SRVID=404 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelect14 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP13 SRVID=523 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert13 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP14 SRVID=564 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert14 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP13 SRVID=603 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete13 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP14 SRVID=644 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete14 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP13 SRVID=703 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl13 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP14 SRVID=744 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl14 REPLYQ=Y
    lockselect_server SRVGRP=GROUP13 SRVID=803 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect13 REPLYQ=Y
    lockselect_server SRVGRP=GROUP14 SRVID=844 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect14 REPLYQ=Y
    #mulup_server SRVGRP=GROUP13 SRVID=13 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup13 REPLYQ=Y
    #mulup_server SRVGRP=GROUP14 SRVID=64 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup14 REPLYQ=Y
    WSL SRVGRP=GROUP11 SRVID=1000
    CLOPT="-A -- -n//120.3.8.237:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP12 SRVID=1001
    CLOPT="-A -- -n//120.3.8.238:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP13 SRVID=1003
    CLOPT="-A -- -n//120.3.8.237:7203 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP14 SRVID=1004
    CLOPT="-A -- -n//120.3.8.238:7204 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    about above ubb file ,has any error ? or not correct use MSSQ
    look forward to you answer,thanks.

  • Encountered ora-29701 during Sun Cluster for Oracle RAC 9.2.0.7 startup (UR

    Hi all,
    Need some help from all out there
    In our Sun Cluster 3.1 Data Service for Oracle RAC 9.2.0.7 (Solaris 9) configuration, my team had encountered
    ora-29701 *Unable to connect to Cluster Manager*
    during the startup of the Oracle RAC database instances on the Oracle RAC Server resources.
    We tried the attached workaround by Oracle. This workaround works well for the 1^st time but it doesn’t work anymore when the server is rebooted.
    Kindly help me to check whether anyone encounter the same problem as the above and able to resolve. Thanks.
    Bug No. 4262155
    Filed 25-MAR-2005 Updated 11-APR-2005
    Product Oracle Server - Enterprise Edition Product Version 9.2.0.6.0
    Platform Linux x86
    Platform Version 2.4.21-9.0.1
    Database Version 9.2.0.6.0
    Affects Platforms Port-Specific
    Severity Severe Loss of Service
    Status Not a Bug. To Filer
    Base Bug N/A
    Fixed in Product Version No Data
    Problem statement:
    ORA-29701 DURING DATABASE CREATION AFTER APPLYING 9.2.0.6 PATCHSET
    *** 03/25/05 07:32 am ***
    TAR:
    PROBLEM:
    Customer applied 9.2.0.6 patchset over 9.2.0.4 patchset.
    While creating the database, customer receives following error:
         ORA-29701: unable to connect to Cluster Manager
    However, if customer goes from 9.2.0.4 -> 9.2.0.5 -> 9.2.0.6, the problem does not occur.
    DIAGNOSTIC ANALYSIS:
    It seems that the problem is with libskgxn9.so shared library.
    For 9.2.0.4 -> 9.2.0.5 -> 9.2.0.6, the install log shows the following:
    installActions2005-03-22_03-44-42PM.log:,
    [libskgxn9.so->%ORACLE_HOME%/lib/libskgxn9.so 7933 plats=1=>[46]langs=1=> en,fr,ar,bn,pt_BR,bg,fr_CA,ca,hr,cs,da,nl,ar_EG,en_GB,et,fi,de,el,iw,hu,is,in, it,ja,ko,es,lv,lt,ms,es_MX,no,pl,pt,ro,ru,zh_CN,sk,sl,es_ES,sv,th,zh_TW, tr,uk,vi]]
    installActions2005-03-22_04-13-03PM.log:, [libcmdll.so ->%ORACLE_HOME%/lib/libskgxn9.so 64274 plats=1=>[46] langs=-554696704=>[en]]
    For 9.2.0.4 -> 9.2.0.6, install log shows:
    installActions2005-03-22_04-13-03PM.log:, [libcmdll.so ->%ORACLE_HOME%/lib/libskgxn9.so 64274 plats=1=>[46] langs=-554696704=>[en]] does not exist.
    This means that while patching from 9.2.0.4 -> 9.2.0.5, Installer copies the libcmdll.so library into libskgxn9.so, while patching from 9.2.0.4 -> 9.2.0.6 does not.
    ORACM is located in /app/oracle/ORACM which is different than ORACLE_HOME in customer's environment.
    WORKAROUND:
    Customer is using the following workaround:
    cd $ORACLE_HOME/rdbms/lib make -f ins_rdbms.mk rac_on ioracle ipc_udp
    RELATED BUGS:
    Bug 4169291

    Check if following MOS note helps.
    Series of ORA-7445 Errors After Applying 9.2.0.7.0 Patchset to 9.2.0.6.0 Database (Doc ID 373375.1)

  • Failover not happening the Oracle RAC 10g

    Hi All,
    I am new to RAC.
    I have installed Oracle RAC 10g on Redhat Linux 4.0. Till yesterday failover was happening that is when i stopped one instance on node01 the vip of node01 was transferred to node02.This was shown using ifconfig -a but now that is now happening.Don't know as what has happened.Can you please help me out
    Below information is given:
    [oracle@node01 ~]$ crs_stat -t
    Name Type Target State Host
    ora.hitesh.db application ONLINE ONLINE node02
    ora....h1.inst application ONLINE ONLINE node01
    ora....h2.inst application OFFLINE OFFLINE
    ora....SM1.asm application ONLINE ONLINE node01
    ora....01.lsnr application ONLINE ONLINE node01
    ora.node01.gsd application ONLINE ONLINE node01
    ora.node01.ons application ONLINE ONLINE node01
    ora.node01.vip application ONLINE ONLINE node01
    ora....SM2.asm application ONLINE ONLINE node02
    ora....02.lsnr application ONLINE ONLINE node02
    ora.node02.gsd application ONLINE ONLINE node02
    ora.node02.ons application ONLINE ONLINE node02
    ora.node02.vip application ONLINE ONLINE node02
    Listner status on node01 is given:
    [oracle@node01 ~]$ lsnrctl status
    LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 06-APR-2013 12:59:29
    Copyright (c) 1991, 2005, Oracle. All rights reserved.
    Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
    STATUS of the LISTENER
    Alias LISTENER_NODE01
    Version TNSLSNR for Linux: Version 10.2.0.1.0 - Production
    Start Date 06-APR-2013 11:59:03
    Uptime 0 days 1 hr. 0 min. 25 sec
    Trace Level off
    Security ON: Local OS Authentication
    SNMP OFF
    Listener Parameter File /home/oracle/oracle/product/10.2.0/db_1/network/admin/listener.ora
    Listener Log File /home/oracle/oracle/product/10.2.0/db_1/network/log/listener_node01.log
    Listening Endpoints Summary...
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.131)(PORT=1521)))
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521)))
    Services Summary...
    Service "+ASM" has 1 instance(s).
    Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...
    Service "+ASM_XPT" has 1 instance(s).
    Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...
    Service "PLSExtProc" has 1 instance(s).
    Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
    Service "hitesh" has 2 instance(s).
    Instance "hitesh1", status READY, has 2 handler(s) for this service...
    Instance "hitesh2", status READY, has 1 handler(s) for this service...
    Service "hiteshXDB" has 2 instance(s).
    Instance "hitesh1", status READY, has 1 handler(s) for this service...
    Instance "hitesh2", status READY, has 1 handler(s) for this service...
    Service "hitesh_XPT" has 2 instance(s).
    Instance "hitesh1", status READY, has 2 handler(s) for this service...
    Instance "hitesh2", status READY, has 1 handler(s) for this service...
    The command completed successfully
    [root@node01 oracle]# crsctl check crs
    CSS appears healthy
    CRS appears healthy
    EVM appears healthy
    [root@node01 oracle]# ps -ef | grep lmon
    oracle 5741 1 0 12:07 ? 00:00:03 ora_lmon_hitesh1
    root 22582 20805 0 13:01 pts/2 00:00:00 grep lmon
    oracle 23643 1 0 11:58 ? 00:00:01 asm_lmon_+ASM1
    Please let me know what information else is required
    Edited by: user12924280 on Apr 6, 2013 12:36 AM

    Since you didn't say "thank you", I assumed my time was of no value to you.
    However, I shall try again.
    There is no relationship between instance failure and VIP failover. How can there be? What if you are running ten instances on each node, and one fails? Would you want the VIP to relocate? And I've already told you how to test it: kill the node. Just reboot it.

  • Oracle RAC 11g R1 Release Connection Failover Problem

    Hi All,
    In our Architecture we are using Oracle RAC 11g R1. Below is the JDBC URL :
    JDBCURL = jdbc:oracle:thin:@(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = Host1-vip)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = Host2-vi
    p)(PORT = 1521))(LOAD_BALANCE = ON)(FAILOVER=ON)(CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME = <Service_name>)))
    We are using two node RAC. The problem is whenever we are rebooting a Node and rejoin the cluster, Application Servers are not able to recognize that.
    Suppose we have node1 and node2, I will take down node1 (freeze the cluster) and then reboot node1 and bring it back up( and join the cluster). At this point, My application servers are not able to recognize that some new DBserver(node1) had joined the cluster until I restart my application servers.
    Please Provide me a solution for this. Thanks alot to everyone in advance.
    Edited by: 877010 on Aug 4, 2011 2:00 PM
    Edited by: 877010 on Aug 8, 2011 10:19 AM

    Please try using this
    JDBCURL = jdbc:oracle:thin:@(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = Host1-vip)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = Host2-vi
    p)(PORT = 1521))(LOAD_BALANCE = YES)(FAILOVER=YES)(CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME = <Service_name>)))

  • Oracle rac o2cb

    I am setting up oracle rac on 2 nodes.
    Redhat5.5 x64 on vsphere4,esx4.0,FC disks.
    I have 1 ocr disk,1 voting disk and 3 asm disks.All on vmware raw device mapping.
    Ocr and Voting are on ocfs2 block devices.I installed grid infrastructure and cluster is online and loaded.
    But i cannot stop o2cb service.
    [root@rac1 ~]# /etc/init.d/o2cb stop
    Stopping O2CB cluster ocfs2: Failed
    Unable to stop cluster as heartbeat region still active
    [root@rac2 /]# /etc/init.d/o2cb stop
    Stopping O2CB cluster ocfs2: Failed
    Unable to stop cluster as heartbeat region still active
    Because of that i cannot power off my server.I execute poweroff but when ocfs2 cluster fail to stop the machine goes to reboot.
    The question is...why cant I stop o2cb service?

    [root@rac1 ~]# umount /ocr
    umount: /ocr: device is busy
    umount: /ocr: device is busy
    [root@rac1 ~]# umount /voting/
    umount: /voting: device is busy
    umount: /voting: device is busy
    [root@rac1 ~]# /etc/init.d/o2cb offline ocfs2
    Stopping O2CB cluster ocfs2: Failed
    Unable to stop cluster as heartbeat region still active
    [root@rac1 ~]# /etc/init.d/o2cb force-offline ocfs2
    Stopping O2CB cluster ocfs2: Failed
    Unable to stop cluster as heartbeat region still active
    [root@rac1 ~]# mounted.ocfs2 -d
    Device FS Stack UUID Label
    /dev/sdc1 ocfs2 o2cb DBB0548511E540649EF06D375F2B128B /OCR
    /dev/sdd1 ocfs2 o2cb 6955A6CFC0AB4926B85A6830C70344F0 /VOTING
    Still nothing.

  • Oracle RAC with QFS shared storage going down when one disk fails

    Hello,
    I have an oracle RAC on my testing environment. The configuration follows
    nodes: V210
    Shared Storage: A5200
    #clrg status
    Group Name Node Name Suspended Status
    rac-framework-rg host1 No Online
    host2 No Online
    scal-racdg-rg host1 No Online
    host2 No Online
    scal-racfs-rg host1 No Online
    host2 No Online
    qfs-meta-rg host1 No Online
    host2 No Offline
    rac_server_proxy-rg host1 No Online
    host2 No Online
    #metastat -s racdg
    racdg/d200: Concat/Stripe
    Size: 143237376 blocks (68 GB)
    Stripe 0:
    Device Start Block Dbase Reloc
    d3s0 0 No No
    racdg/d100: Concat/Stripe
    Size: 143237376 blocks (68 GB)
    Stripe 0:
    Device Start Block Dbase Reloc
    d2s0 0 No No
    #more /etc/opt/SUNWsamfs/mcf
    racfs 10 ma racfs - shared
    /dev/md/racdg/dsk/d100 11 mm racfs -
    /dev/md/racdg/dsk/d200 12 mr racfs -
    When the disk /dev/did/dsk/d2 failed (I have failed it by removing from the array), the oracle RAC went offline on both nodes, and then both nodes paniced and rebooted. Now the #clrg status shows below output.
    Group Name Node Name Suspended Status
    rac-framework-rg host1 No Pending online blocked
    host2 No Pending online blocked
    scal-racdg-rg host1 No Online
    host2 No Online
    scal-racfs-rg host1 No Online
    host2 No Pending online blocked
    qfs-meta-rg host1 No Offline
    host2 No Offline
    rac_server_proxy-rg host1 No Pending online blocked
    host2 No Pending online blocked
    crs is not started in any of the nodes. I would like to know if anybody faced this kind of a problem when using QFS on diskgroup. When one disk is failed, the oracle is not supposed to go offline as the other disk is working, and also my qfs configuration is to mirror these two disks !!!!!!!!!!!!!!
    Many thanks in advance
    Ushas Symon

    I'm not sure why you say QFS is mirroring these disks!?!? Shared QFS has no inherent mirroring capability. It relies on the underlying volume manager (VM) or array to do that for it. If you need to mirror you storage, you do it at the VM level by creating a mirrored metadevice.
    Tim
    ---

  • Oracle RAC 2 node architecture-- Node -2 always gets evicted

    Hi,
    I have Oracle RAC DB with simple 2 node architecture( Host RHEL5.5 X 86_64) . The problem we are facing is, whenever there is network failure on either of nodes, always node-2 gets evicted (rebooted). We do not see any abnormal errors on alert.log file on both the nodes.
    The steps followed and results are:
    **Node-1#service network restart**
    **Result: Node-2 evicted**
    **Node-2# service network restart**
    **Result: Node-2 evicted**
    I would like to know why node-1 never gets evicted even if the network is down or restarted on node-1 itself?? Is this normal.
    Regards,
    Raj

    Hi,
    Please find the output below:
    2011-06-03 16:36:02.817: [    CSSD][1216194880]clssnmPollingThread: node prddbs02 (2) at 50% heartbeat fatal, removal in 14.120 seconds
    2011-06-03 16:36:02.817: [    CSSD][1216194880]clssnmPollingThread: node prddbs02 (2) is impending reconfig, flag 132108, misstime 15880
    2011-06-03 16:36:02.817: [    CSSD][1216194880]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)
    2011-06-03 16:36:05.994: [    CSSD][1132276032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 760 > margin 750 cur_ms 1480138014 lastalive 1480137254
    2011-06-03 16:36:07.493: [    CSSD][1226684736]clssnmSendingThread: sending status msg to all nodes
    2011-06-03 16:36:07.493: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
    2011-06-03 16:36:08.084: [    CSSD][1132276032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 850 > margin 750 cur_ms 1480140104 lastalive 1480139254
    2011-06-03 16:36:09.831: [    CSSD][1216194880]clssnmPollingThread: node prddbs02 (2) at 75% heartbeat fatal, removal in 7.110 seconds
    2011-06-03 16:36:10.122: [    CSSD][1132276032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 880 > margin 750 cur_ms 1480142134 lastalive 1480141254
    2011-06-03 16:36:11.112: [    CSSD][1132276032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 860 > margin 750 cur_ms 1480143124 lastalive 1480142264
    2011-06-03 16:36:12.212: [    CSSD][1132276032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 950 > margin 750 cur_ms 1480144224 lastalive 1480143274
    2011-06-03 16:36:12.487: [    CSSD][1226684736]clssnmSendingThread: sending status msg to all nodes
    2011-06-03 16:36:12.487: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
    2011-06-03 16:36:13.840: [    CSSD][1216194880]clssnmPollingThread: local diskTimeout set to 200000 ms, remote disk timeout set to 200000, impending reconfig status(0)
    2011-06-03 16:36:14.881: [    CSSD][1205705024]clssgmTagize: version(1), type(13), tagizer(0x494dfe)
    2011-06-03 16:36:14.881: [    CSSD][1205705024]clssgmHandleDataInvalid: grock HB+ASM, member 2 node 2, birth 21
    2011-06-03 16:36:17.487: [    CSSD][1226684736]clssnmSendingThread: sending status msg to all nodes
    2011-06-03 16:36:17.487: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
    2011-06-03 16:36:22.486: [    CSSD][1226684736]clssnmSendingThread: sending status msg to all nodes
    2011-06-03 16:36:22.486: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
    2011-06-03 16:36:23.162: [ GIPCNET][1205705024]gipcmodNetworkProcessRecv: [network] failed recv attempt endp 0x2eb80c0 [0000000001fed69c] { gipcEndpoint : localAddr 'gipc://prddbs01:80b3-6853-187b-4d2e#192.168.7.1#33842', remoteAddr 'gipc://prddbs02:gm_prddbs-cluster#192.168.7.2#60074', numPend 4, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x1e10, pidPeer 0, flags 0x2616, usrFlags 0x0 }, req 0x2aaaac308bb0 [0000000001ff4b7d] { gipcReceiveRequest : peerName '', data 0x2aaaac2e3cd8, len 10240, olen 0, off 0, parentEndp 0x2eb80c0, ret gipc
    2011-06-03 16:36:23.162: [ GIPCNET][1205705024]gipcmodNetworkProcessRecv: slos op : sgipcnTcpRecv
    2011-06-03 16:36:23.162: [ GIPCNET][1205705024]gipcmodNetworkProcessRecv: slos dep : Connection reset by peer (104)
    2011-06-03 16:36:23.162: [ GIPCNET][1205705024]gipcmodNetworkProcessRecv: slos loc : recv
    2011-06-03 16:36:23.162: [ GIPCNET][1205705024]gipcmodNetworkProcessRecv: slos info: dwRet 4294967295, cookie 0x2aaaac308bb0
    2011-06-03 16:36:23.162: [    CSSD][1205705024]clssgmeventhndlr: Disconnecting endp 0x1fed69c ninf 0x2aaab0000f90
    2011-06-03 16:36:23.162: [    CSSD][1205705024]clssgmPeerDeactivate: node 2 (prddbs02), death 0, state 0x80000001 connstate 0x1e
    2011-06-03 16:36:23.162: [GIPCXCPT][1205705024]gipcInternalDissociate: obj 0x2eb80c0 [0000000001fed69c] { gipcEndpoint : localAddr 'gipc://prddbs01:80b3-6853-187b-4d2e#192.168.7.1#33842', remoteAddr 'gipc://prddbs02:gm_prddbs-cluster#192.168.7.2#60074', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x1e10, pidPeer 0, flags 0x261e, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
    2011-06-03 16:36:32.494: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
    2011-06-03 16:36:37.493: [    CSSD][1226684736]clssnmSendingThread: sending status msg to all nodes
    2011-06-03 16:36:37.494: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
    2011-06-03 16:36:40.598: [    CSSD][1216194880]clssnmPollingThread: node prddbs02 (2) at 90% heartbeat fatal, removal in 2.870 seconds, seedhbimpd 1
    2011-06-03 16:36:42.497: [    CSSD][1226684736]clssnmSendingThread: sending status msg to all nodes
    2011-06-03 16:36:42.497: [    CSSD][1226684736]clssnmSendingThread: sent 5 status msgs to all nodes
    2011-06-03 16:36:43.476: [    CSSD][1216194880]clssnmPollingThread: Removal started for node prddbs02 (2), flags 0x20000, state 3, wt4c 0
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmDoSyncUpdate: Initiating sync 178830908
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssscUpdateEventValue: NMReconfigInProgress val 1, changes 57
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmDoSyncUpdate: local disk timeout set to 27000 ms, remote disk timeout set to 27000
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effect when the sync is completed.
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 178830908
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSetupAckWait: Ack message type (11)
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSetupAckWait: node(1) is ALIVE
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSendSync: syncSeqNo(178830908), indicating EXADATA fence initialization complete
    2011-06-03 16:36:43.476: [    CSSD][1237174592]List of nodes that have ACKed my sync: NULL
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSendSync: syncSeqNo(178830908)
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmWaitForAcks: Ack message type(11), ackCount(1)
    2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmHandleSync: Node prddbs01, number 1, is EXADATA fence capable
    2011-06-03 16:36:43.476: [    CSSD][1247664448]clssscUpdateEventValue: NMReconfigInProgress val 1, changes 58
    2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmHandleSync: local disk timeout set to 27000 ms, remote disk timeout set t:
    2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmQueueClientEvent: Sending Event(2), type 2, incarn 178830907
    2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmQueueClientEvent: Node[1] state = 3, birth = 178830889, unique = 1305623432
    2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmQueueClientEvent: Node[2] state = 5, birth = 178830907, unique = 1307103307
    2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmHandleSync: Acknowledging sync: src[1] srcName[prddbs01] seq[73] sync[178830908]
    2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmSendAck: node 1, prddbs01, syncSeqNo(178830908) type(11)
    2011-06-03 16:36:43.476: [    CSSD][1240850064]clssgmStartNMMon: node 1 active, birth 178830889
    2011-06-03 16:36:43.476: [    CSSD][1247664448]clssnmHandleAck: src[1] dest[1] dom[0] seq[0] sync[178830908] type[11] ackCount(0)
    2011-06-03 16:36:43.476: [    CSSD][1240850064]clssgmStartNMMon: node 2 active, birth 178830907
    2011-06-03 16:36:43.476: [    CSSD][1240850064]NMEVENT_SUSPEND [00][00][00][06]
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSendSync: syncSeqNo(178830908), indicating EXADATA fence initialization complete
    2011-06-03 16:36:43.476: [    CSSD][1240850064]clssgmUpdateEventValue: CmInfo State val 5, changes 190
    2011-06-03 16:36:43.476: [    CSSD][1237174592]List of nodes that have ACKed my sync: 1
    2011-06-03 16:36:43.476: [    CSSD][1240850064]clssgmSuspendAllGrocks: Issue SUSPEND
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmWaitForAcks: done, msg type(11)
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSetMinMaxVersion:node1 product/protocol (11.2/1.4)
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSetMinMaxVersion: properties common to all nodes: 1,2,3,4,5,6,7,8,9,10,11,12,13,14
    2011-06-03 16:36:43.476: [    CSSD][1237174592]clssnmSetMinMaxVersion: min product/protocol (11.2/1.4)
    2011-06-03 16:36:43.476: [    CSSD][1240850064]clssgmQueueGrockEvent: groupName(IG+ASMSYS$USERS) count(2) master(1) event(2), incarn 22, mbrc 2, to member 1, events 0x0, state 0x0
    2011-06-03 16:36:43.477: [    CSSD][1237174592]clssnmSetMinMaxVersion: max product/protocol (11.2/1.4)
    2011-06-03 16:36:43.477: [    CSSD][1237174592]clssnmNeedConfReq: No configuration to change
    etc.etc....
    Let me know if any other logfile required. No unususal messages on /var/log/messages.
    Regards,
    Raj

  • Oracle Rac services on second node

    Hi all
    I have got oracle rac database installed on a 2 node cluster. I was not able to get the below 2 services online (state) on the second node. Any idea?
    ora.racdb.racdb2.inst
    ora.racdb.racdb_taf.racdb2.srv
    Name Type Target State Host
    ora....b2.inst application ONLINE OFFLINE
    ora....db2.srv application ONLINE OFFLINE
    I checked the +ASM2 instance on the second node, the asm_diskgroup parameter is not set.  Is that the cause of the problem I am encountering?
    Thanks all
    OS: Oracle Enterprise Linux 5 x86
    Oracle: Oracle 10g x86

    You should be able to find some more information in the instance's alert log file. But not having asm_diskgroups set is a very likely cause. Your diskgroups won't get mounted and that leads to the instance not being able to start (because it won't even find the spfile if it is stored on ASM). You could try to mount them manually like this:
    export ORACLE_SID=+ASM2
    sqlplus sys/ as sysdba
    alter diskgroup MYDISKGROUP mount;and then start the instance with 'srvctl start instance -d myracdb -i myinstance2'
    Or, of course you could edit the asm instance's pfile (or spfile) to include the asm_diskgroup parameter and reboot the server (or just asm)
    Bjoern

  • CC&B 2.1 on Oracle RAC 10.2.0 - any issues,tips,gotcha's

    I am installing CC&B 2.1 on Oracle RAC 10.2.0.3. Is there anyone in the forum has experience of this and wants to share / compare notes. The architecture is 2 SUN M4000 servers (32gb RAM) for RAC with dataguard standby physical and logial. Application tier also unix with 4 T5440's each 32GB RAM where I will run minumum 3 instances per machine. The set up is in Test at the moment and I have only noticed one issue so far that is the threapoolworker keeps crashing. I have also had both nodes reboot at the same time as this issue and will raise an SR anyhow.
    Any comments , tips , discussion welcome
    cheers
    Sam

    Refer to Note: 887848.1, although it's for CC&B 2.2 and for some other DLL it should work in your case.
    ID: 887848.1
    The web server of CC&B 2.2 relies on the 32-bit Java platform to function.
    The solution is to uninstall the JDK, and re-install the 32 bit versionOn other note, on x86_64 architecture the supported platform for Windows is Win2K8 Server SP2 (64-bit), I'd be reluctant to install CC&B on Vista.

  • Regarding Hangcheck timer configuration in Oracle RAC 10g r2 installation

    Hi,
    Is it necessary to configure hangcheck timer in Oracle RAC 10g R2 installation .
    Can somebody guide when we should install the hangcheck timer in Linux oracle 10g R2.
    Best Regards
    Gupteswar Prasad Mishra
    Edited by: Gupteswar on Jan 25, 2010 8:42 PM

    yes, its recommended to configure hangcheck timer in RAC configuration
    Configure the Hangcheck Timer
    The hang-check timer is loaded into the Linux kernel and checks if the system hangs. It will set a timer and check the timer after a certain amount of time. There is a configurable threshold to hang-check that, if exceeded will reboot the machine. Although the hangcheck-timer module is not required for Oracle Clusterware (Cluster Manager) operation, it is highly recommended by Oracle.
    cat /etc/rc.local
    modprobe hangcheck-timer hangcheck_tick=1 hangcheck_margin=10 hangcheck_reboot=1

  • Oracle RAC on different SunServer

    Hi forum,
    I have one question regarding Oracle RAC requirements.
    As far as I know base requirements are:
    OS should be similar
    OS Patch Level should be similar
    kernel parameters should be similar
    And servers architecture should be the same
    Can I config Oracle RAC on the following servers:
    Sun SPARC M4000 (2.4 SPARC64 VII
    Sun Fire V490 (1.8GHz UltraSPARC IV+)
    If I am not mistaken the architecture is the same, SPARC V9

    Even though this will work from a technical point of view, you should be carefull about the implications, if this is not just for upgrading.
    Oracle will be able to serverside loadbalance, that the M4000 will get most of the work to do.. but just think of the consequences:
    What happens if the M4000 fails (and all sessions on the M4000). It is very likely that all sessions will switch to the V490.
    By doing so the load on the V490 will be very,very high. Under stress this could lead to a node starvation resulting in a reboot of the V490.
    Regards
    Sebastian

  • Oracle RAC Storage Migration - HP EVA to HP EVA

    Hi,
    We are in the process of migrating from an old HP EVA4000 to a new HP EVA6400. We have migrated most of the systems that had storage presented from the EVA4000 with the exception of our Oracle RAC databases and wanted to check to see if anyone knew of any gotchas before we proceeded.
    We are using CA to replicate the LUNs that are presented to the Oracle cluster. My plan is to shut down the database and all associated RAC services, disable CRS then halt the servers. I will then present the CA replicated LUNs to the servers and initiate a failover. I will then restart the servers and check that the LUNs have appeared from the 6400 before manually restarting CRS and the databases. I will then re-enable CRS before rebooting again to check that everything comes up automatically.
    Does this sound correct? Has anyone done this before?
    Thanks in advance.
    Phil

    Seems reasonable, but can't be certain unless you test. Make sure you backup OCR, VOTE, Database before you change the storage allocation.
    How big is the database ? How long will your downtime be ?
    It may be cleaner to just reinstall CRS and restore the database from backup ?
    Are you using ASM ? If so then you could just add new devices to ASM, migrate the data, then remove the old devices.

Maybe you are looking for

  • Purchase order print out is different

    Hi Experts When i preview the purchase order and print, it print correctly in one page but i have setup three copies in the layout but it prints only one copy. But when i print it directly it prints three copies but the font sizes are bigger and marg

  • Problem with using Request dispatcher in my login,jspx

    Hi I am using Jdev 11g and new to adf. I am trying to work with adf security. I designed a login.jspx page and loginproxy.jspx accoridng to the following link http://groundside.com/blog/DuncanMills.php?title=j2ee_security_a_jsf_based_login_form&more=

  • Lay out in CS4

    Hello; I am having a great problem centering my site: I have got a wide screen computer and have made a site in dreamweaver CS4,using AP-divs.Of course,everytime saving the changes,the magnificant program uploades my changes and displays it on the ne

  • Error in editing object form - OIM 10g

    Hi, * I have created a resource object say 'X'. I have an object form and process form associated with it. There is a child form associated with the object form. * when raising a request am able to successfully submit it. * But before manager approve

  • N73 Voice Dialling - How to?

    Hi Guys, Had an N73 for a few days now and have noticed a few things which I'm unsure about..... How do you add a voice tag to a contact? I may be being stupid but I can only find 'play voice tag' but nothing to indicate how to add one. A couple of m