TAF Failover issue when RAC node shutdown

Dear all,
We have a two-node RAC database. We use sqlplus from a client laptop to test RAC TAF failover when one node is being shutdown. And there's a tnsnames.ora file with TAF settings in the client laptop.
First we connect to RAC database via sqlplus, when we are under the "SQL>" command prompt, we type " select instance_name from v$instance; " and we can see what instance we truely connect to. Then we shutdown the node we truely connect to; At the meanwhile, if we type "select instance_name from v$instance;" again right away, sometimes the sqlplus hangs and with no response; but if we wait utill the VIP failover to another node then type "select instance_name from v$instance;" we can see it always show the other node's instance name and we know the session is successfully failover to the healthy node.
My question is :
Does RAC TAF failover can always and "no down time" failover the session to another healthy node? Or there are some circumstances that the session would hang and need to connect again?
Any help would be appreciated.

Hi, thanks for your help.
There are many things you have to do but if you don't have the knowledge will be difficult.Right. The cluster was setup by consultants but we're still trying to pick up basic Oracle knowledge by self study...
Found some messages about eviction in old cssd logs in $ORA_CRS_HOME/log/cssd/. Will further dig into it.
Yes, we tried rebooting different nodes many times in the clusters before, without any problem.
Thanks a lot.
/ST Wong

Similar Messages

  • Failover doesn't work when a node physically shutdown

              Cluster configuration:
              * Unix Compaq Tru64 4.0G
              * Wls version 6.0 SP2 RP3
              1 machine with 1 admin server and 2 managed servers
              1 machine with 2 managed servers
              * Proxy Plugin for Apache "mod_wl.so"
              Problem:
              Running in production environment when a node goes physically down, the plug-in
              doesn't make any fail-over.
              In the "wl_proxy.log" we can see that any requests are continuously proxied to
              the failing node endlessly.
              Closing browser (IE 6.0) ,clearing cache and cookies don't affect this behaviour.
              The same, using an other browser on a different workstation.
              Seems that the dinamic server list is not updated.
              When a single managed server is shutting down (kill -9) everything works well.
              We check the configuration many times and everything seems ok.
              Probably there's some problem with the multicast configuration on the
              network ?
              All advice is welcome.
              Regards,
              Adriano Villani
              

    Don't do multicast test using a multicast address that Weblogic is currently
              using.
              Peace,
              Cameron Purdy
              Tangosol, Inc.
              http://www.tangosol.com/coherence.jsp
              Tangosol Coherence: Clustered Replicated Cache for Weblogic
              "Adriano Villani" <[email protected]> wrote in message
              news:[email protected]...
              > Thanks for your answers Kumar,
              >
              > I want only to add that replicating the environment in a specular way on
              > others machine in an other network all works very well!
              >
              > So the problem seems to be the network configuration,
              > I tell you that in production there's a vlan, probably Multicast have
              some
              > problem with vlan ?
              > I don't how to start an investigation to isolate the problem.
              >
              > During test, the only strange thing, is that using the MulticastTest
              > utility I see send-receive the packet but also some strange ascii
              characters
              > garbage,
              > that cames from the managed server
              > (I see "hertbeat" .... garbage .... "IP alias of the managed servers" ...
              > garbage .. and so on)
              >
              > Adriano Villani
              >
              > "Kumar Allamraju" <[email protected]> wrote in message
              > news:[email protected]...
              >
              > > This should not happen. Plugin should get the updated list
              > > as soon as it contacts one of the available clustered nodes.
              > > The plugin should not endlessly route the request to dead
              > > server. However i'm not sure if this is a known issue in 60
              > > SP2 RP3. I would suggest u to contact support and get the
              > > latest apache plugin and see if that helps
              > >
              > > --
              > > Kumar
              > >
              > > Adriano Villani wrote:
              > > > Cluster configuration:
              > > >
              > > > * Unix Compaq Tru64 4.0G
              > > > * Wls version 6.0 SP2 RP3
              > > > 1 machine with 1 admin server and 2 managed servers
              > > > 1 machine with 2 managed servers
              > > > * Proxy Plugin for Apache "mod_wl.so"
              > > >
              > > > Problem:
              > > >
              > > > Running in production environment when a node goes physically down,
              the
              > plug-in
              > > > doesn't make any fail-over.
              > > >
              > > > In the "wl_proxy.log" we can see that any requests are continuously
              > proxied to
              > > > the failing node endlessly.
              > > >
              > > > Closing browser (IE 6.0) ,clearing cache and cookies don't affect this
              > behaviour.
              > > > The same, using an other browser on a different workstation.
              > > > Seems that the dinamic server list is not updated.
              > > > When a single managed server is shutting down (kill -9) everything
              > works well.
              > > > We check the configuration many times and everything seems ok.
              > > >
              > > > Probably there's some problem with the multicast configuration on the
              > > > network ?
              > > >
              > > > All advice is welcome.
              > > >
              > > > Regards,
              > > >
              > > > Adriano Villani
              > > >
              > >
              >
              >
              

  • One node RAC pause/hang/block on other node shutdown

    Hi,
    We have a Java application running on Linux servers connecting to a 10.2.0.1 RAC cluster, also Linux. When the application starts it opens up a pool of connections to the databsae, and these are used throughout the life time of the application. One server connects to one RAC node.
    AppA - DBA
    AppB - DBB
    When we shutdown one node, the application connecting to that node stops, which is what we would expect in this configuration.
    What is strange is that the other application blocks for 63 seconds and then continues. So it is like the database is blocking, or the database connections are blocking.
    We are not using TAF, FAN, FCN, LB, VIPs or any special features, just simple lightweight JDBC from one server to one database. In fact I do not thing we are unwittingly using any of these features, we have them switched off.
    john

    user1788323 wrote:
    What is strange is that the other application blocks for 63 seconds and then continues. So it is like the database is blocking, or the database connections are blocking.How have you determined/diagnosed the 63s blocking? (more details in this regard may shed some light on the problem)
    Assuming that the block is server side, then two basic reasons comes to mind.
    Networking issue - the CRS on the surviving node has to perform certain functions, like switching the VIP of the node that left the cluster to a surviving cluster node. The listener may need to re-register services. A local firewall may need to be dynamically reconfigured for supporting the new failed-over VIP. Etc.
    Thus these could result in some kind of delay or issue in the network layer that you are seeing from the client side.
    Infrastructure issue. If the actual client request via JDBC reaches the server process, and it is slow in responding, then that is not a network issue - instead some underlying service or s/w layer that the server process needs to use to perform the client request is busy for those 63s.
    This could be related to the Interconnect, the shared I/O storage layer or something along those lines. For example, how does the Interconnect and/or SAN switch re-act when a server node is powered down or rebooted?
    There's not really sufficient information to make anything but a guesses.. You will need to isolate the problem with further testing.
    I have seen similar problems with 10.1.0.3 CRS and RAC when a node is evicted from the cluster. In this case the "hung" period was in excess of 15 minutes and only for new connections (Listener unable to hand off to dedicated servers or dispatchers). Existing connections worked fine however and were unaware of any problems. But part of the issue in this case was a poor (outdated) driver layer - and also the last time we used proprietary binary drivers (kernel modules) from 3rd party vendors that results in a tainted (and very fixed and rigid) Linux kernel. Today we're sticking with an OpenSource driver layer only for Linux.

  • If use MSSQ , when oracle rac node reboot, client get TPEOS error

    Hi, all
    in my tuxedo applicaton, if we use Single Server, Single Queue mode , when reboot any Oracle RAC node, our application is ok, client can get correct result. but if we use MSSQ(Multi Server, Single Queue) , if Oracle RAC node is ok , our application also is ok. but if we reboot any Oracle RAC node, client program can continue run, get correct result, but always get TPEOS error , for this situation, server can get client request, but client can not get server reply, only get TPEOS error.
    our enviroment is :
    oracle RAC ,10g 10.2.0.4 , two instances ,rac1 rac2, and two DTP services s1 and s2, set s1 and s2 services TAF is basic
    tuxedo 10R3 , two nodes ,work in MP model ,use XA access oracle rac database,services have Transaction and not Transaction
    OS is linux AS4 U5, 64bits
    service program use OCI
    can any one encounter this problem ?

    Hi, first thanks you
    in ULOG file , only have failover information, not any other error message, in client side also has no other error.
    not use MSSQ, ubb file about MSSQ config
    SERVERS
    DEFAULT:
    CLOPT="-A "
    sinUpdate_server SRVGRP=GROUP11 SRVID=80 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinUpdate_server SRVGRP=GROUP12 SRVID=160 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP11 SRVID=240 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP12 SRVID=320 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP11 SRVID=360 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP12 SRVID=400 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP11 SRVID=520 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP12 SRVID=560 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP11 SRVID=600 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP12 SRVID=640 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP11 SRVID=700 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP12 SRVID=740 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP11 SRVID=800 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP12 SRVID=840 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP11 SRVID=1 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP12 SRVID=60 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinUpdate_server SRVGRP=GROUP13 SRVID=83 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinUpdate_server SRVGRP=GROUP14 SRVID=164 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP13 SRVID=243 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinCount_server SRVGRP=GROUP14 SRVID=324 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP13 SRVID=363 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinSelect_server SRVGRP=GROUP14 SRVID=404 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP13 SRVID=523 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinInsert_server SRVGRP=GROUP14 SRVID=564 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP13 SRVID=603 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDelete_server SRVGRP=GROUP14 SRVID=644 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP13 SRVID=703 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    sinDdl_server SRVGRP=GROUP14 SRVID=744 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP13 SRVID=803 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    lockselect_server SRVGRP=GROUP14 SRVID=844 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP13 SRVID=13 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    #mulup_server SRVGRP=GROUP14 SRVID=64 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y
    WSL SRVGRP=GROUP11 SRVID=1000
    CLOPT="-A -- -n//120.3.8.237:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP12 SRVID=1001
    CLOPT="-A -- -n//120.3.8.238:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP13 SRVID=1003
    CLOPT="-A -- -n//120.3.8.237:7203 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP14 SRVID=1004
    CLOPT="-A -- -n//120.3.8.238:7204 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    if we use MSSQ ,ubb file about MSSQ config is
    *SERVERS
    DEFAULT:
    CLOPT="-A -p 1,60:1,30"
    sinUpdate_server SRVGRP=GROUP11 SRVID=80 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate11 REPLYQ=Y
    sinUpdate_server SRVGRP=GROUP12 SRVID=160 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate12 REPLYQ=Y
    sinCount_server SRVGRP=GROUP11 SRVID=240 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount11 REPLYQ=Y
    sinCount_server SRVGRP=GROUP12 SRVID=320 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount12 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP11 SRVID=360 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelec11 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP12 SRVID=400 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelect12 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP11 SRVID=520 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert11 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP12 SRVID=560 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert12 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP11 SRVID=600 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete11 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP12 SRVID=640 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete12 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP11 SRVID=700 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl11 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP12 SRVID=740 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl12 REPLYQ=Y
    lockselect_server SRVGRP=GROUP11 SRVID=800 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect11 REPLYQ=Y
    lockselect_server SRVGRP=GROUP12 SRVID=840 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect12 REPLYQ=Y
    #mulup_server SRVGRP=GROUP11 SRVID=1 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup11 REPLYQ=Y
    #mulup_server SRVGRP=GROUP12 SRVID=60 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup12 REPLYQ=Y
    sinUpdate_server SRVGRP=GROUP13 SRVID=83 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate13 REPLYQ=Y
    sinUpdate_server SRVGRP=GROUP14 SRVID=164 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinUpdate14 REPLYQ=Y
    sinCount_server SRVGRP=GROUP13 SRVID=243 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount13 REPLYQ=Y
    sinCount_server SRVGRP=GROUP14 SRVID=324 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinCount14 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP13 SRVID=363 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelec13 REPLYQ=Y
    sinSelect_server SRVGRP=GROUP14 SRVID=404 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinSelect14 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP13 SRVID=523 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert13 REPLYQ=Y
    sinInsert_server SRVGRP=GROUP14 SRVID=564 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinInsert14 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP13 SRVID=603 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete13 REPLYQ=Y
    sinDelete_server SRVGRP=GROUP14 SRVID=644 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDelete14 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP13 SRVID=703 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl13 REPLYQ=Y
    sinDdl_server SRVGRP=GROUP14 SRVID=744 MIN=5 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=sinDdl14 REPLYQ=Y
    lockselect_server SRVGRP=GROUP13 SRVID=803 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect13 REPLYQ=Y
    lockselect_server SRVGRP=GROUP14 SRVID=844 MIN=10 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=lockselect14 REPLYQ=Y
    #mulup_server SRVGRP=GROUP13 SRVID=13 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup13 REPLYQ=Y
    #mulup_server SRVGRP=GROUP14 SRVID=64 MIN=2 MAX=30 MAXGEN=10 GRACE=10 RESTART=Y RQADDR=mulup14 REPLYQ=Y
    WSL SRVGRP=GROUP11 SRVID=1000
    CLOPT="-A -- -n//120.3.8.237:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP12 SRVID=1001
    CLOPT="-A -- -n//120.3.8.238:7200 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP13 SRVID=1003
    CLOPT="-A -- -n//120.3.8.237:7203 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    WSL SRVGRP=GROUP14 SRVID=1004
    CLOPT="-A -- -n//120.3.8.238:7204 -I 60 -T 60 -w WSH -m 50 -M 100 -x 6 -N 3600"
    about above ubb file ,has any error ? or not correct use MSSQ
    look forward to you answer,thanks.

  • Oracle ORA_16000 when trying to add standby instance to existing rac node

    I attempted to use dbca to add a new standby instance to an existing cluster. The cluster is 4 nodes, Linux RHEL 5.3 Oracle 11.1.0.7. Also using ASM, asmlib, ocfs2 and shared block devices.
    ASM instances are up and functional on all nodes. current config appears to be running normally and correctly.
    I have a 4 instance database running on the cluster. I also have 3 physical standby active data guard instances running on 3 of the nodes. I wanted to add a new ADG instance to the 4th node.
    While running dbca I received ORA-00604 and ORA-16000.
    The active data guard database was open (read only) and redo apply was on. I am using data guard broker as well, but not grid control.
    Does anyone have a procedure for adding an instance in this environment? Do I need to have the standby in mount state? If dbca won't work does anyone have a manual procedure for adding a new instance?
    Thanks

    zulo
    Let's say you adding node nusclust160## to you existing cluster and dbca is a pain to use.
    Extend clusterware to the nusclust160## server.
    re: Page 64 of Oracle® Clusterware Administration and Deployment Guide 11g Release 1 (11.1)
    1a.
    Add undo tablespace to support additional node.
    Re-check space for DATA1 on nusclust16007 and /dbdata/ORADB on sun16109.
    As of Thursday, May 21, 2009 the DATA1 asm group has 53,584M free.
    As of Thursday, May 21, 2009 the /dbdata/ORADB has 77G free.
    In a separate terminal window on nusclust16007 run the following in sqlplus
    CREATE UNDO TABLESPACE UNDOTBS4 datafile '+DATA1' SIZE 13300M AUTOEXTEND ON ;
    This will take a long time to create this tablespace. Please minimize the window after submitting the ddl and move on to the next step.
    1b.
    Insure .bash_profile on nusclust160## should look like this:
    vi .bash_profile
    export ORACLE_HOSTNAME=nusclust160##
    export ORACLE_SID=ORADB4
    export ORA_CRS_BASE=/apps/ocr/oracle
    export ORACLE_BASE=/apps/dbs/oracle
    export PATH=/usr/ccs/bin:/usr/X/bin:/usr/bin:/usr/sfw/bin:/usr/sbin:/usr/local/bin
    export server=`uname -n`
    export PS1="$ORACLE_SID@$HOSTNAME >"
    alias cls='clear'
    alias More='more'
    alias ll='ls -lt | more'
    Gather IP addresses for fourth node from /etc/hosts:
    222.65.125.### nusclust160##
    222.65.125.### nusclust160##-vip
    10.333.248.### nusclust160##-priv
    2. Start Oracle Universal Installer:
    Go to CRS_home/oui/bin and run the addNode.sh script on one of the existing
    nodes. Oracle Universal Installer runs in add node mode.
    The Oracle inventory on nusclust16007, nusclust16008, and nusclust16036 are found under:
    /home/oracle/oraInventory
    Use a X windows enabled session (The OUI takes 33 minutes)
    cd /apps/ocr/oracle/product/11.1.0/crs/oui/bin
    ./addNode.sh
    a. In the first screen specify a new node as :
    Public Node Name:          nusclust160##
    Private Node Name:     nusclust160##-priv
    Virtual Host Name:     nusclust160##-vip
    If you receive the error:
    " tar. ./bin/racgvip.orig: Permission denied"
    Do the following:
    cd /apps/ocr/oracle/product/11.1.0/crs/bin
    ls -al racgvip.orig
    paste here:
    chown root:oinstall racgvip.orig
    chmod 771 racgvip.orig
    should now show:
    -rwxrwx--x 1 root oinstall 19213 Feb 11 08:36 racgvip.orig
    As root:
    a.
    On nusclust160##:
    cd /home/oracle/oraInventory
    ./orainstRoot.sh
    b.
    On nusclust16007:
    cd /apps/ocr/oracle/product/11.1.0/crs/install
    ./rootaddnode.sh
    clscfg: EXISTING configuration version 4 detected.
    clscfg: version 4 is 11 Release 1.
    Attempting to add 1 new nodes to the configuration
    Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
    node <nodenumber>: <nodename> <private interconnect name> <hostname>
    node 4: nusclust160## nusclust160##-priv nusclust160##
    Creating OCR keys for user 'root', privgrp 'root'..
    Operation successful.
    /apps/ocr/oracle/product/11.1.0/crs/bin/srvctl add nodeapps -n nusclust160## -A nusclust160##-vip/255.255.255.224/bge0
    c.
    On nusclust160##:
    cd /apps/ocr/oracle/product/11.1.0/crs/
    ./root.sh
    WARNING: directory '/apps/ocr/oracle/product/11.1.0' is not owned by root
    WARNING: directory '/apps/ocr/oracle/product' is not owned by root
    WARNING: directory '/apps/ocr/oracle' is not owned by root
    Checking to see if Oracle CRS stack is already configured
    OCR LOCATIONS = /raw/ocr/ocrconf1,/raw/ocr/ocrconf2
    OCR backup directory '/apps/ocr/oracle/product/11.1.0/crs/cdata/rac_cluster' does not exist. Creating now
    Setting the permissions on OCR backup directory
    Setting up Network socket directories
    Oracle Cluster Registry configuration upgraded successfully
    The directory '/apps/ocr/oracle/product/11.1.0' is not owned by root. Changing owner to root
    The directory '/apps/ocr/oracle/product' is not owned by root. Changing owner to root
    The directory '/apps/ocr/oracle' is not owned by root. Changing owner to root
    clscfg: EXISTING configuration version 4 detected.
    clscfg: version 4 is 11 Release 1.
    Successfully accumulated necessary OCR keys.
    Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
    node <nodenumber>: <nodename> <private interconnect name> <hostname>
    node 1: nusclust16007 nusclust16007-priv nusclust16007
    node 2: nusclust16008 nusclust16008-priv nusclust16008
    node 3: nusclust16036 nusclust16036-priv nusclust16036
    clscfg: Arguments check out successfully.
    NO KEYS WERE WRITTEN. Supply -force parameter to override.
    -force is destructive and will destroy any previous cluster
    configuration.
    Oracle Cluster Registry for cluster has already been initialized
    Startup will be queued to init within 30 seconds.
    Adding daemons to inittab
    Expecting the CRS daemons to be up within 600 seconds.
    Cluster Synchronization Services is active on these nodes.
    nusclust16007
    nusclust16008
    nusclust16036
    nusclust160##
    Cluster Synchronization Services is active on all the nodes.
    Waiting for the Oracle CRSD and EVMD to start
    Oracle CRS stack installed and running under init(1M)
    4. After this is done crs_stat -t will show nusclust160## in the crs i.e.
    I see:
    Name Type Target State Host
    ora....160##.gsd application ONLINE ONLINE sun...160##
    ora....160##.ons application ONLINE OFFLINE
    ora....160##.vip application ONLINE ONLINE sun...160##
    Do not be concerned about ora.nusclust160##.ons being OFFLINE, as that will be fixed shortly in a step that follows this one.
    5. As oracle :
    On nusclust16007:
    cd /apps/ocr/oracle/product/11.1.0/crs/bin
    ./racgons add_config nusclust160##:6251
    This should take about one second to run.
    If it says that it has already been added to the OCR you are fine.
    If it hangs, you may need to reboot all servers to clear this issue.
    6. Insure new node is properly added to ocr by running
    On nusclust16007:
    ocrdump
    Check for the entries that show:
    [DATABASE.ONS_HOSTS.nusclust160##.PORT]
    ORATEXT : 6251
    7. Check that your cluster is integrated and that the cluster is not divided into
    partitions by completing the following operations:
    On nusclust16007:
    cd /apps/ocr/oracle/product/11.1.0/crs/bin
    ./cluvfy comp clumgr -n all -verbose
    Should see Verification of cluster manager integrity was successful.
    8.
    Use the following command to perform an integrated validation of the Oracle
    Clusterware setup on all of the configured nodes, both the preexisting nodes
    and the nodes that you have added:
    AS oracle on nusclust16007:
    cluvfy stage -post crsinst -n all -verbose
    Post-check for cluster services setup was successful.
    good: Post-check for cluster services setup was successful.
    9.
    On nusclust160## as oracle run the following:
    cd /apps/ocr/oracle/product/11.1.0/crs/bin
    ./crs_stat -t | grep OFFLINE
    If you see this:
    ora.nusclust160##.ons application ONLINE OFFLINE
    then run this:
    ./crs_start -all
    After:
    ./crs_stat -t
    ora.nusclust160##.ons application ONLINE ONLINE nusclust160##
    If you see the above then you can move on the next step.
    Adding database binaries to the nusclust160## server and setting up the listener.
    1.
    From nusclust16007:
    Open an X window (The OUI part takes 13 minutes)
    cd /apps/dbs/oracle/product/11.1.0/db_1/oui/bin
    ./runInstaller -addNode ORACLE_HOME=/apps/dbs/oracle/product/11.1.0/db_1 $*
    You should get a prompt to specify a new node, in this case you should see nusclust160## where you will need to put a check mark beside it.
    2.
    from nusclust160##:
    Eventually you will be prompted to run the following as root on the new node
    On nusclust160##
    cd /apps/dbs/oracle/product/11.1.0/db_1
    ./root.sh
    Running Oracle 11g root.sh script...
    The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME= /apps/dbs/oracle/product/11.1.0/db_1
    Enter the full pathname of the local bin directory: [usr/local/bin]:
    Copying dbhome to /usr/local/bin ...
    Copying oraenv to /usr/local/bin ...
    Copying coraenv to /usr/local/bin ...
    Creating /var/opt/oracle/oratab file...
    Entries will be added to the /var/opt/oracle/oratab file as needed by
    Database Configuration Assistant when a database is created
    Finished running generic part of root.sh script.
    Now product-specific root actions will be performed.
    Finished product-specific root actions.
    3. verification
    Now set up the .bash_profile and .asm profile to on nusclust160## to support new ORADB4 and +ASM4 instances for the oracle userid.
    On nusclust160##:
    cp .bash_profile .bash_profile.bak
    On nusclust16007:
    sftp nusclust160##
    put .bash_profile
    On nusclust160##:
    vi .bash_profile
    change ORALCE_SID to ORADB4
    cp .bash_profile .asm
    vi .asm
    change ORALCE_SID to +ASM4 in .asm file
    which sqlplus
    Should show the path below is $PATH environmental variable is set correctly.
    /apps/dbs/oracle/product/11.1.0/db_1/bin/sqlplus
    On nusclust160##:
    oifcfg getif
    This should show:
    ce4 10.333.248.192 global cluster_interconnect
    ce5 222.65.125.128 global public
    4.
    Run Netbackup Oracle Agent link script.
    As oracle make sure ORACLE_HOME is fined.
    env | grep ORACLE_HOME
    then
    cd /usr/openv/netbackup/bin/
    ./oracle_link
    ls -al $ORACLE_HOME/lib/libobk.so
    should show:
    /apps/dbs/oracle/product/11.1.0/db_1/lib/libobk.so -> /usr/openv/netbackup/bin/libobk.so64.1
    5.
    On the target node, run the Net Configuration Assistant (NETCA) to add a
    listener. Add a listener to the target node by running NETCA from the target node and
    selecting only the target node on the Node Selection page.
    I shall do the following on nusclust160## using X Windows
    Now before I do this I see:
    crs_stat -t
    ora.nusclust160##.gsd application ONLINE ONLINE nusclust160##
    ora.nusclust160##.ons application ONLINE ONLINE nusclust160##
    ora.nusclust160##.vip application ONLINE ONLINE nusclust160##
    Connect to nusclust160## and open up X windows session.
    netca
    Choose Cluster configuration.
    select nusclust160## as the node to configure.
    Choose Listener configuration, then Add.
    When it prompts you for a listener name choose LISTENER as it will append _NUSCLUST160##(server name) to end of the LISTENER name to make a complete listener name. 
    At this point you will have listener to support the new node in the crs.
    now
    crs_stat -t
    will show:
    ora....0#.lsnr application ONLINE ONLINE nusclust160##
    ora.nusclust160##.gsd application ONLINE ONLINE nusclust160##
    ora.nusclust160##.ons application ONLINE ONLINE nusclust160##
    ora.nusclust160##.vip application ONLINE ONLINE nusclust160##
    At this point the necessary crs entries for gsd, ons, vip, and the listener on nusclust160## all we need now are the ORADB4 and +ASM4 instances added.
    III. 7/11/2009 7:40 AM Sat [120 min] NTTA DBA
    Use NON dbca method to create additional instances on the nusclust160## server. This will involve a complete shutdown of all RAC instances.
    1.
    Undo tablespace creation was taken care of in Step I,1. Check on the progress of the creation of tablespace UNDOTBS4 in the minimized window. Should see tablespace on primary and physical standby databases.
    2. First we shall set up the +ASM4 instance on nusclust160## and add it to the cluster.
    On nusclust160##
    cd $ORACLE_HOME/dbs
    vi init+ASM4.ora
    # Copyright (c) 1991, 2001, 2002 by Oracle Corporation
    # Cluster Database
    cluster_database=true
    cluster_database_instances=6
    # Miscellaneous
    diagnostic_dest=/apps/dbs/oracle
    instance_type=asm
    # Pools
    large_pool_size=12M
    asm_diskgroups='DATA1','ARCH','REDO1','REDO2'
    asm_diskstring='/raw/asm'
    +ASM1.instance_number=1
    +ASM2.instance_number=2
    +ASM3.instance_number=3
    +ASM4.instance_number=4
    3.
    On nusclust16007
    cd $ORACLE_HOME/dbs
    sftp nusclust160##
    put orapw+ASM1 /apps/dbs/oracle/product/11.1.0/db_1/dbs
    put orapwORADB1 /apps/dbs/oracle/product/11.1.0/db_1/dbs
    4.
    On nusclust160##
    cd $ORACLE_HOME/dbs
    cp orapw+ASM1 orapw+ASM4
    cp orapwORADB1 orapwORADB4
    5.
    On nusclust160##
    cd $HOME
    . ./.asm
    sqlplus '/ as sysasm'
    startup
    create spfile from pfile='/apps/dbs/oracle/product/11.1.0/db_1/dbs/init+ASM4.ora' ;
    shutdown immediate ;
    startup
    show parameters spfile
    6. Now that we have a running asm instance add it the cluster.
    On nusclust160##
    srvctl add asm -n nusclust160## -i +ASM4 -o /apps/dbs/oracle/product/11.1.0/db_1
    srvctl enable asm -n nusclust160## -i +ASM4
    7. Now that we have an asm instance let's set up a database instance.
    On nusclust16007/ORADB1 :
    alter system set cluster_database_instances=6 scope=spfile ;
    alter system set instance_name=ORADB4 scope=spfile sid='ORADB4' ;
    alter system set instance_number=4 scope=spfile sid='ORADB4' ;
    alter system set local_listener=LISTENER_ NUSCLUST160## scope=both sid='ORADB4' ;
    alter system set thread=4 scope=both sid='ORADB4' ;
    alter system set undo_tablespace=UNDOTBS4 scope=both sid='ORADB4' ;
    alter database add logfile thread 4 group 28 ('+REDO1', '+REDO2' ) size 100M ;
    alter database add logfile thread 4 group 29 ('+REDO1', '+REDO2' ) size 100M ;
    alter database add logfile thread 4 group 30 ('+REDO1', '+REDO2' ) size 100M ;
    alter database add logfile thread 4 group 31 ('+REDO1', '+REDO2' ) size 100M ;
    alter database enable public thread 4;
    Need to add 5 groups to support standby
    So at the end of the day 900M will be added to REDO1(29,577M free) and 900M will be added to REDO2 (29,577M free).
    8. Set up init.ora, listener.ora, and tnsnames.ora for ORADB4 on nusclust160##.
    a. init.ora set up
    cd $ORACLE_HOME/dbs
    vi initORADB4.ora
    SPFILE='+DATA1/ORADB/spfileORADB.ora'
    b. add entries to tnsnames.ora:
    ORADB4 =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = ORADB)
    (INSTANCE_NAME = ORADB4)
    ORADB =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust16007-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust16008-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust16036-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = ORADB)
    LISTENERS_ORADB =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust16007-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust16008-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust16036-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    LISTENER_NUSCLUST160## =
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    ORADB_PRIM =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust16007-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust16008-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust16036-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = ORADB)
    c. add entries to listener.ora, The entries for most of this file should be set already, just insure modifications that need to be made are made.
    SID_LIST_LISTENER_NUSCLUST160## =
    (SID_LIST =
    (SID_DESC =
    (SID_NAME = PLSExtProc)
    (ORACLE_HOME = /apps/dbs/oracle/product/11.1.0/db_1)
    (PROGRAM = extproc)
    (SID_DESC =
    (GLOBAL_DBNAME = ORADB)
    (ORACLE_HOME = /apps/dbs/oracle/product/11.1.0/db_1)
    (SID_NAME = ORADB4)
    LISTENER_NUSCLUST160## =
    (DESCRIPTION_LIST =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = NUSCLUST160##-vip)(PORT = 1521)(IP = FIRST))
    (ADDRESS = (PROTOCOL = TCP)(HOST = 222.65.125.###)(PORT = 1521)(IP = FIRST))
    9. Reload the listener.
    lsnrclt
    set current_listener LISTENER_NUSCLUST160##
    reload
    exit
    10. Check audit trail, add instance to cluster, and start db instance.
    a.
    Check for audit directory and start the instance.
    /apps/dbs/oracle/product/11.1.0/db_1/rdbms/audit
    If this audit trail directory does not exist then create it.
    b.
    srvctl add instance -d ORADB -i ORADB4 -n nusclust160##
    srvctl modify instance -d ORADB -i ORADB4 -s +ASM4
    srvctl enable instance -d ORADB -i ORADB4
    Will probably show: PRKP-1017 : Instance ORADB4 already enabled.
    c.
    sqlplus '/ as sysdba'
    startup
    **Because the cluster_database_instances parameter requires the complete shutdown of all instances in the cluster, you might have an issue when it attempts to start the instance. If you receive an error then run:
    srvctl stop database -d oradb
    sqlplus '/ as sysdba'
    startup
    shutdown
    srvctl start database -d oradb
    shutdown
    srvctl start instance -d ORADB -i ORADB4 -o open
    11.
    Modify spfile of ASM1, ASM2, +ASM3
    On nusclust16007
    . ./.asm
    sqlplus '/ as sysasm'
    alter system set instance_number=4 scope=spfile sid='+ASM4' ;
    On nusclust16008
    . ./.asm
    sqlplus '/ as sysasm'
    alter system set instance_number=4 scope=spfile sid='+ASM4' ;
    On nusclust16036
    . ./.asm
    sqlplus '/ as sysasm'
    alter system set instance_number=4 scope=spfile sid='+ASM4' ;
    b Modify tnsnames.ora on nusclust nusclust16007, nusclust16008, and nusclust16036
    On nusclust16007
    ORADB4 =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = ORADB)
    (INSTANCE_NAME = ORADB4)
    Add the following line to the ORADB alias:
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    Add the following line to the LISTENERS_ORADB alias:
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    Add the following line to the ORADB_PRIM alias:
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    On nusclust16008
    ORADB4 =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = ORADB)
    (INSTANCE_NAME = ORADB4)
    Add the following line to the ORADB alias:
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    Add the following line to the LISTENERS_ORADB alias:
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    Add the following line to the ORADB_PRIM alias:
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    On nusclust16036
    ORADB4 =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = ORADB)
    (INSTANCE_NAME = ORADB4)
    Add the following line to the ORADB alias:
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    Add the following line to the LISTENERS_ORADB alias:
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    Add the following line to the ORADB_PRIM alias:
    (ADDRESS = (PROTOCOL = TCP)(HOST = nusclust160##-vip)(PORT = 1521))
    c Add standby logs on primary to support 4th node.
    alter database add standby logfile thread 4 group 32 ('+REDO1', '+REDO2' ) size 100M ;
    alter database add standby logfile thread 4 group 33 ('+REDO1', '+REDO2' ) size 100M ;
    alter database add standby logfile thread 4 group 34 ('+REDO1', '+REDO2' ) size 100M ;
    alter database add standby logfile thread 4 group 35 ('+REDO1', '+REDO2' ) size 100M ;
    alter database add standby logfile thread 4 group 36 ('+REDO1', '+REDO2' ) size 100M ;
    12.
    Test the cluster to make sure everything is set up correctly.
    a. Shutdown resources.
    On nusclust16007:
    emctl stop dbconsole
    ps -ef | grep perl
    ps -ef | grep agent
    ps -ef | grep java
    On nusclust16008:
    emctl stop dbconsole
    On nusclust16036:
    emctl stop dbconsole
    On nusclust16008:
    cd $HOME
    . ./.rman
    cd scripts
    ./go
    shutdown immediate
    cd $HOME
    . ./.bash_profile
    srvctl stop database -d oradb
    crs_stop -all
    crs_stat -t
    b. Startup resources
    On nusclust16007:
    cd $HOME
    . ./.bash_profile
    crs_start -all
    crs_stat -t
    The command above should show everything up and running.
    ocrcheck
    On nusclust16008:
    cd $HOME
    . ./.rman
    cd scripts
    ./go
    startup
    On nusclust16007:
    emctl start dbconsole
    On nusclust16008:
    emctl start dbconsole
    On nusclust16036:
    emctl start dbconsole
    How does that work for you?
    -JR jr

  • Failover did not happen when one node went down!!! PLEASE HELP

    Hi gurus,
    Yesterday one disaster struck my RAC database. We have two node cluster and it is 10.2.0.2, both of them located in different sites, yesterday suddenly power went down and the one of the network switch went down and got destructed, node one of RAC database was connected to that switch, but the failover did not happen to the node two as this should be the case when one node goes down the other should be available for all the node one sessions/connections.
    when I tried to ping/telnet the node 1, it was not happening because the switch was down, the network guyz connected the cables to other switch available. When I connected to the node 1, it was showing "Oracle is not available" message.
    And when I tried the other node, it was the same case but I did not see any error in alert log file. Then my TL restarted both the nodes and then the database was available.
    I am very confused that how the failover did not happen and how the database went down, PLEASE suggest something to how to identifiy what was happened. Thanks & Regards

    Thanks for your reply,
    after the network switch was replaced we connected to both the nodes and found that the instances are down with no reason given in the Alertlog file. We just restarted both the instances and then the database was up and the clients connected to both the instances with equal sessions on both the instances. I want to know that whether the failover can be done at the application side or it should be done on the database side i,e; in tnsnames.ora file with the required parameters? as in our scenario there is no failover configuration in the tnsnames.ora file.
    Thanks & Regards

  • RAC node upgrade issue

    We have our company's database on Oracle Real application clusters database consisting of two RAC nodes. We would like to perform some hardware upgrades on both the RAC nodes. Could anyone please tell if it is OK to shutdown one instance at a time and remove all the network/interconnect cables from it and at the same time the other RAC node keeps working. After one node is upgraded and all the network/interconnect cables are connected back to it, will everything be just OK like before or are there are certain things to be cautious about ?
    Thanks in advance

    I think you got to do a clean shutdown so that it does not require any instance recovery when you re-open the database
    SQL>SHUTDOWN TRANSACTIONAL
    would allow all the current transactions to complete then shutdown the db

  • What would happened when one RAC node's public NIC down ?

    Dear all,
    There's a two-node RAC on my office. As my observation, when one RAC node's public NIC is down, the "crs_stat -t" command wouldn't show any resource is OFFLINE. So at this moment if i use sqlplus to connect to my RAC db, it still will choose node1 or node2 instance randomly right? And if i've been assigned to the node's instance whose public NIC is down, my sqlplus would failed to connect to db?
    So, can we say that Oracle RAC can't supply HA function if one node's public NIC is down? Or is there just any other solution to solve this issue? Any suggestion would be appreciated.

    Public node is down means that only one node would be able to take the load i.e. in your case it will be Node 2. It may happen that CRS is unable to record the status - in such case there may be failed connections to one node.

  • RAC issues with second node

    We have two node rac setup. We are facing issues with the second node. if both node are up then application is not able to connect with the database. if the second node is down then application is able to connect.
    What is the reason of this abnormal behaviour ?
    any suggestion ?
    We have oracle 10g RAC ( Database version 10.2.0.4), CRS version 10.2.0.3
    Please help

    oracle@ora1-oam # crsstat
    HA Resource Target State
    ora.cms.cms1.inst ONLINE ONLINE on ora1-oam
    ora.cms.cms2.inst ONLINE ONLINE on ora2-oam
    ora.cms.db ONLINE ONLINE on ora1-oam
    ora.cms.db.cms.com.cms1.srv ONLINE ONLINE on ora1-oam
    ora.cms.db.cms.com.cms2.srv ONLINE ONLINE on ora2-oam
    ora.cms.db.cms.com.cs ONLINE ONLINE on ora1-oam
    ora.myrio.db ONLINE ONLINE on ora1-oam
    ora.myrio.db.myrio.com.cs ONLINE ONLINE on ora1-oam
    ora.myrio.db.myrio.com.myrio1.srv ONLINE ONLINE on ora1-oam
    ora.myrio.db.myrio.com.myrio2.srv ONLINE ONLINE on ora2-oam
    ora.myrio.myrio1.inst ONLINE ONLINE on ora1-oam
    ora.myrio.myrio2.inst ONLINE ONLINE on ora2-oam
    ora.ora1-oam.ASM1.asm ONLINE ONLINE on ora1-oam
    ora.ora1-oam.LISTENER_ORA1-OAM.lsnr ONLINE ONLINE on ora1-oam
    ora.ora1-oam.gsd ONLINE ONLINE on ora1-oam
    ora.ora1-oam.ons ONLINE ONLINE on ora1-oam
    ora.ora1-oam.vip ONLINE ONLINE on ora1-oam
    ora.ora2-oam.ASM2.asm ONLINE ONLINE on ora2-oam
    ora.ora2-oam.LISTENER_ORA2-OAM.lsnr ONLINE ONLINE on ora2-oam
    ora.ora2-oam.gsd ONLINE ONLINE on ora2-oam
    ora.ora2-oam.ons ONLINE ONLINE on ora2-oam
    ora.ora2-oam.vip ONLINE ONLINE on ora2-oam
    ora.rms.db ONLINE ONLINE on ora1-oam
    ora.rms.db.rms.com.cs ONLINE ONLINE on ora1-oam
    ora.rms.db.rms.com.rms1.srv ONLINE ONLINE on ora1-oam
    ora.rms.db.rms.com.rms2.srv ONLINE ONLINE on ora2-oam
    ora.rms.rms1.inst ONLINE ONLINE on ora1-oam
    ora.rms.rms2.inst ONLINE ONLINE on ora2-oam
    ora.tmpl.db ONLINE ONLINE on ora1-oam
    ora.tmpl.tmpl1.inst ONLINE ONLINE on ora1-oam
    ora.tmpl.tmpl2.inst ONLINE ONLINE on ora2-oam
    ora.vcas.db ONLINE ONLINE on ora1-oam
    ora.vcas.db.vcas.com.cs ONLINE ONLINE on ora1-oam
    ora.vcas.db.vcas.com.vcas1.srv ONLINE ONLINE on ora1-oam
    ora.vcas.db.vcas.com.vcas2.srv ONLINE ONLINE on ora2-oam
    ora.vcas.vcas1.inst ONLINE ONLINE on ora1-oam
    ora.vcas.vcas2.inst ONLINE ONLINE on ora2-oam
    ora.vmxcsmdb.db ONLINE ONLINE on ora1-oam
    ora.vmxcsmdb.vmxcsmdb1.inst ONLINE ONLINE on ora1-oam
    ora.vmxcsmdb.vmxcsmdb2.inst ONLINE ONLINE on ora2-oam
    this is the status when both nodes were up.....

  • SSH issue when converting stand alone to RAC -- manually

    Hi All,
    I face an issue with SSH when trying to convert a stand alone database to RAC (2 node). I am trying the manual way.
    Plan:
    Migrate the filesystem to ASM
    Install the cluster
    Create another ASM instance on node 2
    Create another DB instance on node 2
    Register the new ASM and DB instance with cluster manually
    I have successfully completed the filesystem conversion to ASM.
    When performing the pre requisites for cluster installation, i am facing problem when configuring the SSH.
    Node 1 -- Has the existing stand alone database
    Unable to do SSH from node 1 to node 1 (self) or from node 2 to node 1.
    However i am able to SSH from node 2 to node 1 and node 2 to node 2 (self)
    Steps which i tried already:
    1)Created another user(test) on both the nodes and try to establish SSH to see if configuration problem is with this current user (RAC1). SSH configured successfully with user test
    2)Copied the profile of RAC 1 user on node 2 to node 1 and retried the SSH configuration. But it failed
    3)Dropped the user RAC1 on node 1, deleted all the hidden files in home directory so that they can be created freshly when the user (RAC1) is recreated. Tried SSH configuration again and it failed
    Can you please help me to fix this issue and identify the root cause of this.
    Please let me know if you have any questions.

    user 777111 wrote:
    I face an issue with SSH when trying to convert a stand alone database to RAC (2 node). I am trying the manual way.What operating system?
    To create automated ssh connectivity between 2 accounts (same platform, or different platforms, does not matter):
    1. generate a ssh key for the current account (command: <i>ssh-keygen -t rsa</i>)
    2. create the file +$HOME/.ssh/authorized_keys+ and copy all the RSA public keys from all accounts that are trusted (this includes the current account's public key) - public key file is +$HOME/.ssh/id_rdsa.pub+
    3. manually test connectivity from the trusted account to the other accounts - as the signatures of those servers need to be accepted and stored in +$HOME/.ssh/known_hosts+
    When setting this up for RAC, you first need to generate the RSA keys and then do the above on one node. Create +$HOME/.ssh/authorized_keys+ containing all RSA public keys. Build +$HOME/.ssh/known_hosts+ by ssh'ing from that node to the other nodes (do not need a successful logon - only need to accept servers signatures).
    When done, copy both these files to the +$HOME/.ssh+ directory on all other RAC nodes.

  • Issues while Add / Delete RAC Node in Oracle 10g R2

    Hi,
    I have an requirement to add a New Node in the existing 2 Node RAC at Production, where 1 Node is Active & other one is passive due to licence issue & cannot keep both the nodes as active. Due to performance issues (Memory , CPU Cores ..etc) we are adding another new node.
    Right now we are planning to add a 3rd database node making the new node as active and current active one as passive which is a swap & later on after final observation delete and decommission the current passive node.
    This activity is checked at the Dev database with the same infrastructure (OS + Memory ..etc) but want to check what is the best approach (or) challenges we face during the RAC Node Addition / Deletion
    RAC DB Version : 10.2.o.4
    OS Version : RHEL 5.8
    (1) Is the approach is right one , First Adding the node & later on delete
    (2) If the approach is the correct , what would be the behavious of the 3rd node in means of active (or) passive
    (3) We have taken RMAN backup , OS backup , CRS , ORACLE_HOME , ASM_Home backup , OCR & VD.
    (4) Could you please give detail steps for adding / deleting node in 10g R2.
    (5) Are they any known bugs to us with the DB release (or) OS while performing this activity.
    Since this is a production machine we want to more proactive . Please correct or add any thing i am missing out ...
    With Thanks,
    Rakesh

    Hello Rakesh,
    Please follow the following steps.
    Node Addition Steps
    1. Install and configure OS and hardware for new node.
    2. Add Oracle Clusterware to the new node.
    3. Configure ONS for the new node.
    4. Add ASM home to the new node.
    5. Add Databse home to the new node.
    6. Add a listener to the new node.
    7. Add ASM instance to the New Node.
    8. Add a database instance to the new node.
    Details of steps
    1. run cluvfy to verify whether New node is ready for addition or not.
         $ cluvfy stage -pre crsinst -n node2
    2. from node1, execute
              $/u01/app/crs11g/oui/bin/addNode.sh
    3. Specify node2 vip address and follow instructions.
    4. In the last of installtion it may through an wornig and will ask to click on YES. click on YES
    5. from node1,
              /u01/app/crs11g/bin/racgons add_config node2:6200
    6. from Node1,set ORACLE_HOME=ASM_HOME and then execute addNode.sh from $ASM_HOME/oui/bin and Follow instrusctions.
    7. From node1, set ORACLE_HOME=DB_HOME and then
         /u01/app/oracle/product/11.1.0/db_1/oui/bin/addNode.sh
         and Follow instructions.
    8. from node2 start NETCA and configure listener for new node. While configuring Listener select the name of new node.
    9. from node1 start dbca from ASM Home to configure ASM instance for new node.
    10. Again from node1 start dbca from DB Home to add DB instance
    Node deletion Steps
    1. Delete the Database instance on the node to be deleted.
    2. Clean up the ASM instance.
    3. Remove the listener from the node to be deleted.
    4. Remove the node from the database.
    5. Remove the node from ASM.
    6. Remove ONS configuration from the node to be deleted.
    7. Remove the node from the clusterware
    Details of Steps
    1. Remove database Instance of node2
         Dbca -> instance Management -> delete instance -> password for sys -> select node -> finish.
    2. Stop asm for node2 from any nodes.
         $srvctl stop asm –n node2
    3. Remove asm for node2
         $ srvctl remove asm -n node2
    4. Remove Listener from Node2 using NETCA.
    5. From Node2:
              ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node2" -local
    6. From Node2, start runinstaller from Oracle_DB_Home/oui/bin, and remove "DB_HOME"
         $ ./runinstaller
         On the WELCOME Screen -> Deinstall product -> Select dbhome name (OraDb10g_Home1) -> Remove
    7. From Node1:
              ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node1"
    8. From Node2, set Oracle_Home to asm_1 and then fire:
              ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node2" -local
    9. From Node2, start OUI and deinstall ASM Home.
    10. From Node1, Set ORACLE_HOME= /u01/app/oracle/product/11.1.0/asm_1
    11. From Node1: from /u01/app/oracle/product/11.1.0/asm_1/oui/bin, start OUI
              ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node1"
    12. From Node2: as a root user (#) execute rootdelete.sh from /u01/app/crs11g/install
         # /u01/app/crs11g/install/rootdelete.sh
    13.From Node-1 first find out the node numbers
         # /u01/app/crs11g/bin/olsnodes -n
         output : node1 1
              node2 2
    14. From Node-1 as a root user (#):
         # /u01/app/crs11g/install/rootdeletenode.sh node2[Node_Name] 2[node_no]
         output:
              CRS nodeapps are deleted successfully
              clscfg: EXISTING configuration version 4 detected.
              clscfg: version 4 is 11 Release 1.
              Node deletion operation successful.
              'node2' deleted successfully
    15. From Node2 set ORACLE_HOME=CRS_HOME and then execute
         $$ORACLE_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node2" CRS=TRUE -local
    16. ./runInstaller and remove CRS_HOME
    17. From Node-1:
         $ /u01/app/crs11g/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node1" CRS=TRUE
    18. check node is deleted from ./crs_stat -t

  • Mid 2009 17" MacBook Pro battery shutdown issue when waking from sleep.

    My unibody 17" MacBook Pro occasionally shuts down on waking from sleep when it isn't plugged in.
    When I open the lid to wake it I can hear the hard drive and fans starting to spool, and then the next second it's off and silent. The laptop doesn't mind being turned on again after it does this and might go a week without doing it again, equally it might do it next time I try and wake it from sleep. It does this regardless of battery charge level.
    coconutBattery says the battery still has 86% of its designed capacity, and I haven't noticed anything else untoward regarding battery charging or performance when in use.
    So my question is two fold:
    1) Does it sound like the battery's internal processor is malfunctioning, or could it be something else?
    2) If it is the battery's internal processor, will replacing the battery replace the processor, or are they fitted separately (in defference to the 'internal' part of its name)?
    Cheers,
    James
    nb. just in case it makes any difference, the laptop is a 17" Mid-2009 MacBook Pro, 2.8GHz Core 2 Duo with 4Gb Ram running OSX 10.7.3

    James
    I have had issues when i press the space bar from sleep, sometimes it wakes but others i'm forced to power it on with the power button.
    Is this similar to anything you've experianced?
    Its so inconsistantly these days, but i'm also connected to an external screen.

  • RAC-DATA FILE ACCESSING ISSUE FROM ONE NODE

    Dear All,
    We have a two node RAC (10.2.0.3)running on Hp Unix. From yesterday onwards, from one instance accessing data from a specific data file showing the below error, whereas accessing from other node to the same datafile is working properly.
    Errors in file /oracle/product/admin/tap3plus/bdump/tap3plus4_dbw0_24950.trc:
    ORA-01157: cannot identify/lock data file 75 - see DBWR trace file
    ORA-01110: data file 75: '/dev/vg_rac/rraw_tap3plus_temp_live05'
    ORA-27041: unable to open file
    HPUX-ia64 Error: 19: No such device
    Additional information: 2
    Tue Jan 31 08:52:09 2012
    Errors in file /oracle/product/admin/tap3plus/bdump/tap3plus4_dbw0_24950.trc:
    ORA-01186: file 75 failed verification tests
    ORA-01157: cannot identify/lock data file 75 - see DBWR trace file
    ORA-01110: data file 75: '/dev/vg_rac/rraw_tap3plus_temp_live05'
    Tue Jan 31 08:52:09 2012
    File 75 not verified due to error ORA-01157
    Tue Jan 31 08:52:09 2012
    Thanks in Advance

    user585870 wrote:
    We have a two node RAC (10.2.0.3)running on Hp Unix. From yesterday onwards, from one instance accessing data from a specific data file showing the below error, whereas accessing from other node to the same datafile is working properly.That would be due to some kind of failure in the shared storage layer.
    RAC needs the very same storage layer to be visible and available on each RAC node - thus this needs to be some form of shared cluster storage.
    Should a piece of it fails on one node, that node would not be able to access the RAC database files on that shared storage layer - and will throw the type of errors you are seeing.
    So how does this shared storage layer look like? Fibre channels (HBAs) connected to a Fibre Channel Switch and SAN - making SAN LUNs available as shared storage devices?
    Typically a shared storage failure would throw errors in the kernel log. This is because the error is not an Oracle error, but a kernel error. As it is in your case. The bottom error on the error stack points to the root cause:
    ORA-01157: cannot identify/lock data file 75 - see DBWR trace file
    ORA-01110: data file 75: '/dev/vg_rac/rraw_tap3plus_temp_live05'
    ORA-27041: unable to open file
    HPUX-ia64 Error: 19: No such device
    So HP-UX on that node is not seeing a specific shared storage device.

  • How to force write-behind store on cache node shutdown?

    Hi,
    I built a small pilot project based on Coherence and now I test it for failover. I found replication issues with Distributed cache in the following scenario:
    - start cache node 1 (JVM instance 1);
    - connect Extend client to it and get 1 object from cache (only 1 object in the cache - loaded by CacheStore from DB);
    - change the object and put it back (I use EntryProcessor for this);
    - start cache node 2 (JVM instance 2);
    - stop cache instance 1 (write-behind store wasn't invoked yet: write-delay = 2m);
    - load/change the same object on node 2; all changes done on node 1 are lost.
    My expectation was that cache will replicate its data between nodes when new member joins cache cluster. The backup count = 1 by default, right?
    What should I do in order to prevent such behavior? Is it possible to force write-behind store on cache node shutdown event?
    Thanks, Denis.
    My cache-config, just in case:
    <cache-config>
    <caching-scheme-mapping>
    <cache-mapping>
    <cache-name>AccountCache</cache-name>
    <scheme-name>account-distributed</scheme-name>
    </cache-mapping>
    </caching-scheme-mapping>
    <caching-schemes>
    <distributed-scheme>
    <scheme-name>account-distributed</scheme-name>
    <service-name>DistributedCache</service-name>
    <serializer>
    <class-name>com.tangosol.io.pof.ConfigurablePofContext</class-name>
    <init-params>
    <init-param>
    <param-type>String</param-type>
    <param-value>account-pof-config.xml</param-value>
    </init-param>
    </init-params>
    </serializer>
    <backing-map-scheme>
    <read-write-backing-map-scheme>
    <scheme-name>AccountDatabaseScheme</scheme-name>
    <internal-cache-scheme>
    <local-scheme>
    <!--scheme-ref>default-eviction</scheme-ref-->
    <eviction-policy>LRU</eviction-policy>
    <high-units>0</high-units>
    <expiry-delay>30m</expiry-delay>
    </local-scheme>
    </internal-cache-scheme>
    <cachestore-scheme>
    <class-scheme>
    <class-name>com.roox.bss.cache.store.AccountCacheStore</class-name>
    <init-params>
    <init-param>
    <param-type>java.lang.String</param-type>
    <param-value>dburl_</param-value>
    </init-param>
    <init-param>
    <param-type>java.lang.String</param-type>
    <param-value>user</param-value>
    </init-param>
    <init-param>
    <param-type>java.lang.String</param-type>
    <param-value>password</param-value>
    </init-param>
    </init-params>
    </class-scheme>
    </cachestore-scheme>
    <write-delay>2m</write-delay>
    <write-batch-factor>.5</write-batch-factor>
    </read-write-backing-map-scheme>
    </backing-map-scheme>
    </distributed-scheme>
    <proxy-scheme>
    <service-name>ExtendTcpProxyService</service-name>
    <thread-count>10</thread-count>
    <acceptor-config>
    <tcp-acceptor>
    <local-address>
    <address>localhost</address>
    <port>9098</port>
    <reuse-address>true</reuse-address>
    <reusable>true</reusable>
    </local-address>
    </tcp-acceptor>
    <serializer>
    <class-name>com.tangosol.io.pof.ConfigurablePofContext</class-name>
    <init-params>
    <init-param>
    <param-type>String</param-type>
    <param-value>account-pof-config.xml</param-value>
    </init-param>
    </init-params>
    </serializer>
    </acceptor-config>
    <autostart>true</autostart>
    </proxy-scheme>
    </caching-schemes>
    </cache-config>

    solved with autostart=true

  • RAC Node hang and unexpected reboot

    Hello friends      
    We are facing the intermittent issue of node hang and unexpected shutdown of node. This is 2 node rac 10.2.03 running on windows 2003. Here's crsd.log
    2009-07-16 17:24:03.058: [ OCRMSG][5252]prom_rpc: CLSC recv failure..ret code 7
    2009-07-16 17:24:03.058: [ OCRMSG][5252]prom_rpc: possible OCR retry scenario
    2009-07-16 17:24:03.058: [ COMMCRS][5616]clscsendx: (0000000002AF5C60) Physical connection (0000000003892080) not active
    2009-07-16 17:24:03.058: [ OCRMSG][5616]prom_rpc: CLSC send failure..ret code 11
    2009-07-16 17:24:03.058: [ OCRMSG][5616]prom_rpc: possible OCR retry scenario
    2009-07-16 17:24:03.105: [ COMMCRS][5252]clscsendx: (0000000002AF5C60) Connection not active
    2009-07-16 17:24:03.105: [ OCRMSG][5252]prom_rpc: CLSC send failure..ret code 6
    2009-07-16 17:24:03.105: [ OCRMSG][5252]prom_rpc: possible OCR retry scenario
    2009-07-16 17:24:03.105: [ COMMCRS][5616]clscsendx: (0000000002AF5C60) Connection not active
    2009-07-16 17:24:03.105: [ OCRMSG][5616]prom_rpc: CLSC send failure..ret code 6
    2009-07-16 17:24:03.105: [ OCRMSG][5616]prom_rpc: possible OCR retry scenario
    2009-07-16 17:24:03.152: [ COMMCRS][5252]clscsendx: (0000000002AF5C60) Connection not active
    2009-07-16 17:24:03.152: [ OCRMSG][5252]prom_rpc: CLSC send failure..ret code 6
    2009-07-16 17:24:03.152: [ OCRMSG][5252]prom_rpc: possible OCR retry scenario
    2009-07-16 17:24:03.168: [ COMMCRS][5616]clscsendx: (0000000002AF5C60) Connection not active
    2009-07-16 17:24:03.168: [ OCRMSG][5616]prom_rpc: CLSC send failure..ret code 6
    2009-07-16 17:24:03.168: [ OCRMSG][5616]prom_rpc: possible OCR retry scenario
    2009-07-16 17:24:03.215: [ COMMCRS][5252]clscsendx: (0000000002AF5C60) Connection not active
    2009-07-16 17:24:03.215: [ OCRMSG][5252]prom_rpc: CLSC send failure..ret code 6
    2009-07-16 17:24:03.215: [ OCRMSG][5252]prom_rpc: possible OCR retry scenario
    2009-07-16 17:24:03.215: [ COMMCRS][5616]clscsendx: (0000000002AF5C60) Connection not active
    2009-07-16 17:24:03.215: [ OCRMSG][5616]prom_rpc: CLSC send failure..ret code 6
    2009-07-16 17:24:03.215: [ OCRMSG][5616]prom_rpc: possible OCR retry scenario
    2009-07-16 17:24:03.261: [ COMMCRS][5616]clscsendx: (0000000002AF5C60) Connection not active
    2009-07-16 17:24:03.261: [ OCRMSG][5616]prom_rpc: CLSC send failure..ret code 6
    2009-07-16 17:24:03.261: [ OCRMSG][5616]prom_rpc: possible OCR retry scenario
    Please throw me the light, what may be issue.

    I suggest you install [ IPD/OS|http://www.oracle.com/technology/products/database/clustering/ipd_download_homepage.html] on you cluster. This will give you all the relevant OS statistics so when a node reboot happens, you can figure out what the state of the nodes was at that time and then fix the problem. The hang is often caused by something other than Oracle RAC.

Maybe you are looking for