VIP failover time

I have configured a critical service(ap-kal-pinglist) for the VIP redundant failover, default freq,maxfail and retry freq is 5,3,5, so I think the failover time is 5+5*3*2=35s.But the virtual-router's state changed from "master" to "backup" in around 5 secs after connection lost.
Anyone help me to understand it?

Service sw1-up-down connect to e2 interface,going down in 15sec
Service sw2-up-down connect to e3 interface,going down in 4sec?
JAN 14 02:38:41 5/1 3857 NETMAN-2: Generic:LINK DOWN for e2
JAN 14 02:39:57 5/1 3858 NETMAN-2: Generic:LINK DOWN for e3
JAN 14 02:39:57 5/1 3859 VRRP-0: VrrpTx: Failed on Ipv4FindInterface
JAN 14 02:40:11 5/1 3860 NETMAN-2: Enterprise:Service Transition:sw2-up-down -> down
JAN 14 02:40:11 5/1 3861 NETMAN-2: Enterprise:Service Transition:sw1-up-down -> down

Similar Messages

  • VIP failover in Oracle RAC

    Dear all,
    I am using Oracle Rac 10gR2 running on top of Sun Cluster 3.2u3.
    I have a test to check the failover ability of VIP in Oracle RAC, however the result was not as I expected.
    The test scenario was:
    - Turn on the 02 nodes and wait for all services including both Sun Cluster and Oracle RAC online.
    - Using SQL Navigator to connect to the database using the VIP on node1. (VIP1)
    - Shutdown the node1.
    - All services and resources on node2 still online, however after a long time (about 10 mins), I did not see the VIP1 failover to the alive node.
    - The "crs_stat -t" command did not show the VIP1 online on node2 (alice node).
    - The SQL Navigator could not establish the connection to the databasse using the VIP1 any more.
    The output of "crs_stat -t" command before shutting down the node1:
    oracle@t5120-02 $ crs_stat -t
    Name Type Target State Host
    ora.orcl.db application ONLINE ONLINE t5120-02
    ora....l1.inst application ONLINE ONLINE t5120-01
    ora....l2.inst application ONLINE ONLINE t5120-02
    ora....01.lsnr application ONLINE ONLINE t5120-01
    ora....-01.gsd application ONLINE ONLINE t5120-01
    ora....-01.ons application ONLINE ONLINE t5120-01
    ora....-01.vip application ONLINE ONLINE t5120-01
    ora....02.lsnr application ONLINE ONLINE t5120-02
    ora....-02.gsd application ONLINE ONLINE t5120-02
    ora....-02.ons application ONLINE ONLINE t5120-02
    ora....-02.vip application ONLINE ONLINE t5120-02
    The output of "crs_stat -t" command after shutting down the node1:
    oracle@t5120-02 $ crs_stat -t
    Name Type Target State Host
    ora.orcl.db application ONLINE ONLINE t5120-02
    ora....l1.inst application OFFLINE OFFLINE
    ora....l2.inst application ONLINE ONLINE t5120-02
    ora....01.lsnr application OFFLINE OFFLINE
    ora....-01.gsd application OFFLINE OFFLINE
    ora....-01.ons application OFFLINE OFFLINE
    ora....-01.vip application OFFLINE OFFLINE
    ora....02.lsnr application ONLINE ONLINE t5120-02
    ora....-02.gsd application ONLINE ONLINE t5120-02
    ora....-02.ons application ONLINE ONLINE t5120-02
    ora....-02.vip application ONLINE ONLINE t5120-02
    So my questions are:
    - Was my test scenario correct to check the failover ability of VIP in Oracle RAC?
    - Is there any additional configuration needed to perform on the system to achieve the VIP failover?
    Please help me in this case as I am new to Oracle RAC.
    Thanks.
    HuyNQ.

    Dear Rajesh,
    Sorry for late reply.
    I have already tested 02 cases: shutting down a node and crashing a node. Below are the output of the log files in the 2 test cases.
    Once again, when shutting down a node, the VIP did not failover although the CRS on that node was shutdown before all other services and resources of Sun Cluster shutdown.
    Please help to check the log files and give me advise if you see anything abnormally.
    Thanks.
    * In case of shutting down the node 1: (at about 09:05 Sep 17)
    Shutdown node 1:
    root@t5120-01 # shutdown -y -g0 -i0
    Shutdown started. Fri Sep 17 09:04:55 ICT 2010
    Changing to init state 0 - please wait
    Broadcast Message from root (console) on t5120-01 Fri Sep 17 09:04:55...
    THE SYSTEM t5120-01 IS BEING SHUT DOWN NOW ! ! !
    Log off now or risk your files being damaged
    crsd.log file on node 2:
    root@t5120-02 # more /u01/app/oracle/10.2.0/crs/log/t5120-02/crsd/crsd.log
    2010-09-16 16:35:56.281: [  CRSRES][1326] t5120-02 : CRS-1019: Resource ora.t5120-01.gsd (application) cannot run on t5120-02
    2010-09-16 16:35:56.320: [  CRSRES][1325] t5120-02 : CRS-1019: Resource ora.t5120-01.LISTENER_T5120-01.lsnr (application) cannot run on t5120-02
    2010-09-16 16:35:56.346: [  CRSRES][1327] t5120-02 : CRS-1019: Resource ora.t5120-01.ons (application) cannot run on t5120-02
    2010-09-16 17:06:10.202: [  CRSRES][1520] StopResource: setting CLI values
    2010-09-17 09:06:10.567: [ CRSCOMM][5709] CLEANUP: Searching for connections to failed node t5120-01
    2010-09-17 09:06:10.577: [  CRSEVT][5709] Processing member leave for t5120-01, incarnation: 11
    2010-09-17 09:06:10.665: [    CRSD][5709] SM: recovery in process: 8
    2010-09-17 09:06:10.665: [  CRSEVT][5709] Do failover for: t5120-01
    2010-09-17 09:06:10.826: [  CRSEVT][5709] Post recovery done evmd event for: t5120-01
    2010-09-17 09:06:10.898: [    CRSD][5709] SM: recoveryDone: 0
    2010-09-17 09:06:10.918: [  CRSEVT][5710] Processing RecoveryDone
    crs_stat -t on node 2:
    oracle@t5120-02 $ crs_stat -t
    Name Type Target State Host
    ora.orcl.db application ONLINE ONLINE t5120-02
    ora....l1.inst application OFFLINE OFFLINE
    ora....l2.inst application ONLINE ONLINE t5120-02
    ora....01.lsnr application OFFLINE OFFLINE
    ora....-01.gsd application OFFLINE OFFLINE
    ora....-01.ons application OFFLINE OFFLINE
    ora....-01.vip application OFFLINE OFFLINE
    ora....02.lsnr application ONLINE ONLINE t5120-02
    ora....-02.gsd application ONLINE ONLINE t5120-02
    ora....-02.ons application ONLINE ONLINE t5120-02
    ora....-02.vip application ONLINE ONLINE t5120-02
    * In case of crashing the node 1: (at about 09:32 Sep 17)
    Crash the node 1:
    root@t5120-01 # Sep 17 09:31:16 t5120-01 Cluster.CCR: pmmd: fsync_core_files: could not get any core file paths: pcorefile error Invalid argument, gcorefile error Invalid argument, zcorefile error Invalid argument
    Sep 17 09:31:16 t5120-01 Cluster.CCR: [ID 408757 daemon.alert] pmmd: fsync_core_files: could not get any core file paths: pcorefile error Invalid argument, gcorefile error Invalid argument, zcorefile error Invalid argument
    Notifying cluster that this node is panicking
    crsd.log file on node 2:
    root@t5120-02 # tail -30 /u01/app/oracle/10.2.0/crs/log/t5120-02/crsd/crsd.log
    2010-09-16 16:35:56.281: [  CRSRES][1326] t5120-02 : CRS-1019: Resource ora.t5120-01.gsd (application) cannot run on t5120-02
    2010-09-16 16:35:56.320: [  CRSRES][1325] t5120-02 : CRS-1019: Resource ora.t5120-01.LISTENER_T5120-01.lsnr (application) cannot run on t5120-02
    2010-09-16 16:35:56.346: [  CRSRES][1327] t5120-02 : CRS-1019: Resource ora.t5120-01.ons (application) cannot run on t5120-02
    2010-09-16 17:06:10.202: [  CRSRES][1520] StopResource: setting CLI values
    2010-09-17 09:06:10.567: [ CRSCOMM][5709] CLEANUP: Searching for connections to failed node t5120-01
    2010-09-17 09:06:10.577: [  CRSEVT][5709] Processing member leave for t5120-01, incarnation: 11
    2010-09-17 09:06:10.665: [    CRSD][5709] SM: recovery in process: 8
    2010-09-17 09:06:10.665: [  CRSEVT][5709] Do failover for: t5120-01
    2010-09-17 09:06:10.826: [  CRSEVT][5709] Post recovery done evmd event for: t5120-01
    2010-09-17 09:06:10.898: [    CRSD][5709] SM: recoveryDone: 0
    2010-09-17 09:06:10.918: [  CRSEVT][5710] Processing RecoveryDone
    2010-09-17 09:32:08.810: [ CRSCOMM][5837] CLEANUP: Searching for connections to failed node t5120-01
    2010-09-17 09:32:08.811: [  CRSEVT][5837] Processing member leave for t5120-01, incarnation: 13
    2010-09-17 09:32:08.824: [    CRSD][5837] SM: recovery in process: 8
    2010-09-17 09:32:08.824: [  CRSEVT][5837] Do failover for: t5120-01
    2010-09-17 09:32:09.036: [  CRSRES][5837] startup = 0
    2010-09-17 09:32:09.075: [  CRSRES][5837] startup = 0
    2010-09-17 09:32:09.106: [  CRSRES][5837] startup = 0
    2010-09-17 09:32:09.132: [  CRSRES][5837] startup = 0
    2010-09-17 09:32:09.153: [  CRSRES][5837] startup = 0
    2010-09-17 09:32:09.565: [  CRSRES][5839] startRunnable: setting CLI values
    2010-09-17 09:32:09.575: [  CRSRES][5839] Attempting to start `ora.t5120-01.vip` on member `t5120-02`
    2010-09-17 09:32:16.276: [  CRSRES][5839] Start of `ora.t5120-01.vip` on member `t5120-02` succeeded.
    2010-09-17 09:32:16.340: [  CRSEVT][5837] Post recovery done evmd event for: t5120-01
    2010-09-17 09:32:16.342: [    CRSD][5837] SM: recoveryDone: 0
    2010-09-17 09:32:16.348: [  CRSEVT][5846] Processing RecoveryDone
    crs_stat -t on node 2:
    oracle@t5120-02 $ crs_stat -t
    Name Type Target State Host
    ora.orcl.db application ONLINE ONLINE t5120-02
    ora....l1.inst application ONLINE OFFLINE
    ora....l2.inst application ONLINE ONLINE t5120-02
    ora....01.lsnr application ONLINE OFFLINE
    ora....-01.gsd application ONLINE OFFLINE
    ora....-01.ons application ONLINE OFFLINE
    ora....-01.vip application ONLINE ONLINE t5120-02
    ora....02.lsnr application ONLINE ONLINE t5120-02
    ora....-02.gsd application ONLINE ONLINE t5120-02
    ora....-02.ons application ONLINE ONLINE t5120-02
    ora....-02.vip application ONLINE ONLINE t5120-02

  • Oracle 11g clusterware VIP failover failed

    I installed Oracle 11g Clusterware succesfully, without any errors as per link:
    http://www.oracle-base.com/articles/11g/OracleDB11gR1RACInstallationOnRHEL5UsingVMwareESXAndNFS.php
    After that,I did vip failover test
    I rebooted the node-2
    Before reboot,
    [root@advansrac1 bin]# ./crs_stat -t
    Name Type Target State Host
    ora....ac1.gsd application ONLINE ONLINE rac1
    ora....ac1.ons application ONLINE ONLINE rac1
    ora....ac1.vip application ONLINE ONLINE rac1
    ora....ac2.gsd application ONLINE ONLINE rac2
    ora....ac2.ons application ONLINE ONLINE rac2
    ora....ac2.vip application ONLINE ONLINE rac2
    After reboot,
    Name Type Target State Host
    ora....ac1.gsd application ONLINE ONLINE rac1
    ora....ac1.ons application ONLINE ONLINE rac1
    ora....ac1.vip application ONLINE ONLINE rac1
    ora....ac2.gsd application ONLINE ONLINE rac2
    ora....ac2.ons application ONLINE ONLINE rac2
    ora....ac2.vip application ONLINE UNKNOWN rac1
    [root@rac1 bin]# ./crs_stop ora.rac2.vip
    Attempting to stop `ora.rac2.vip` on member `advansrac1`
    Stop of `ora.rac2.vip` on member `advansrac1` succeeded.
    [root@rac1 bin]# ./crs_stat -t
    Name Type Target State Host
    ora....ac1.gsd application ONLINE ONLINE rac1
    ora....ac1.ons application ONLINE ONLINE rac1
    ora....ac1.vip application ONLINE ONLINE rac1
    ora....ac2.gsd application ONLINE ONLINE rac2
    ora....ac2.ons application ONLINE ONLINE rac2
    ora....ac2.vip application OFFLINE OFFLINE
    [root@rac1 bin]# ./crs_start ora.rac2.vip
    Attempting to start `ora.rac2.vip` on member `rac2`
    Start of `ora.rac2.vip` on member `rac2` succeeded.
    [root@rac1 bin]# ./crs_stat -t
    Name Type Target State Host
    ora....ac1.gsd application ONLINE ONLINE rac1
    ora....ac1.ons application ONLINE ONLINE rac1
    ora....ac1.vip application ONLINE ONLINE rac1
    ora....ac2.gsd application ONLINE ONLINE rac2
    ora....ac2.ons application ONLINE ONLINE rac2
    ora....ac2.vip application ONLINE ONLINE rac2
    I have only 1.5G on each node
    Here the issue is,
    1. Actual Result: # During failover why it is showing UNKNOWN state for ora.rac2.vip on member rac1
    Expected Result: # During failover,it have to be ONLINE state for ora.rac2.vip on member rac1
    2. I have to start ora.rac2.vip manually, when node-2 is up.I want VIP fail over have to happen automatically when node-2 is up to normal online state.
    Help me out from this issue

    VMware is unsupported but that is likely not your issue.
    1. Run cluster verify and report the results
    2. Did you create a failover service? How?
    3. Post your TNSNAMES.ORA

  • Failover time.

    hi,
              I have got a problem with failover time.
              My environment,
              One cluster: two weblogic servers5.1 sp4s running on Sun Solaris. The
              cluster uses In-memory replication.
              Web Server is Apache running on Sun solaris. Apache bridge is setup
              with weblogic.conf reads:
              WeblogicCluster 10.2.2.20:7001,10.2.2.21:7001
              ConnectTimeoutSecs 10
              ConnectRetrySecs 5
              StatPath true
              HungServerRecoverSecs 30:100:120
              Everything is starting fine. Both weblogic server says Joins the
              cluster....and application is working fine. When one weblogic server is
              forced to shutdown, failover takes place fine.
              The problem occurs when the machine, that has first entry in
              weblogic.conf file( 10.2.2.20 )running weblogic server is unplugged from
              the network, failover takes after three minutes.
              Could someone help me how to reduce this time. Is there any property
              that has to be set in the weblogic.conf or in weblogic.properties file
              that need to be set.
              Thanks in Advance
              Arun
              

    arunbabu wrote:
              > hi,
              > I have got a problem with failover time.
              > My environment,
              > One cluster: two weblogic servers5.1 sp4s running on Sun Solaris. The
              > cluster uses In-memory replication.
              > Web Server is Apache running on Sun solaris. Apache bridge is setup
              > with weblogic.conf reads:
              >
              > WeblogicCluster 10.2.2.20:7001,10.2.2.21:7001
              > ConnectTimeoutSecs 10
              > ConnectRetrySecs 5
              > StatPath true
              > HungServerRecoverSecs 30:100:120
              >
              > Everything is starting fine. Both weblogic server says Joins the
              > cluster....and application is working fine. When one weblogic server is
              > forced to shutdown, failover takes place fine.
              > The problem occurs when the machine, that has first entry in
              > weblogic.conf file( 10.2.2.20 )running weblogic server is unplugged from
              > the network, failover takes after three minutes.
              > Could someone help me how to reduce this time. Is there any property
              > that has to be set in the weblogic.conf or in weblogic.properties file
              > that need to be set.
              HungServerRecoverSecs seconds
              This implementation takes care of the hung or unresponsive servers in
              the cluster. The plug-in waits for HungServerRecoverSecs for the server to
              respond and then declares that server dead, failing over to the next server.
              The minimum value for this setting is 10 and the maximum value is 600. The
              default is set at 300. It should be set to a very large value. If it is less
              than the time the servlets take to process, then you will see unexpected
              results.
              Try reducing hungserver recover seconds. But remember if you application
              processing takes long time then you will in trouble since the plugin will be
              failing over to other servers in the cluster and you will be thrashing the
              servers.
              - Prasad
              >
              > Thanks in Advance
              > Arun
              Cheers
              - Prasad
              

  • What are typical failover times for application X on Sun Cluster

    Our company does not yet have any hands-on experience with clustering anything on Solaris, although we do with Veritas and Miscrosoft. My experience with MS is that it is as close to seemless (instantaneous) as possible. The Veritas clustering takes a little bit longer to activate the standby's. A new application we are bringing in house soon runs on Sun cluster (it is some BEA Tuxedo/WebLogic/Oracle monster). They claim the time it takes to flip from the active node to the standby node is ~30minutes. This to us seems a bit insane since they are calling this "HA". Is this type of failover time typical in Sun land? Thanks for any numbers or reference.

    This is a hard question to answer because it depends on the cluster agent/application.
    On one hand you may have a simple Sun Cluster application that fails over in seconds because it has to do a limited amount of work (umount here, mount there, plumb network interface, etc) to actually failover.
    On the other hand these operations may, depending on the application, take longer than another application due to the very nature of that application.
    An Apache web server failover may take 10-15 seconds but an Oracle failover may take longer. There are many variables that control what happens from the time that a node failure is detected to the time that an application appears on another cluster node.
    If the failover time is 30 minutes I would ask your vendor why that is exactly.
    Not in a confrontational way but a 'I don't get how this is high availability' since the assumption is that up to 30 minutes could elapse from the time that your application goes down to it coming back on another node.
    A better solution might be a different application vendor (I know, I know) or a scalable application that can run on more than one cluster node at a time.
    The logic with the scalable approach is that if a failover takes 30 minutes or so to complete it (failover) becomes an expensive operation so I would rather that my application can use multiple nodes at once rather than eat a 30 minute failover if one node dies in a two node cluster:
    serverA > 30 minute failover > serverB
    seems to be less desirable than
    serverA, serverB, serverC, etc concurrently providing access to the application so that failover only happens when we get down to a handful of nodes
    Either one is probably more desirable than having an application outage(?)

  • VIP Failover at the web server level??

    Oracle10gR2
    RHEL 4 AS 64bit
    Hi,
    I wanted to know is the VIP failover at the web server level also? For example, we are running Apex and that uses Apache/HTTP webserver, if that were to go down on one node, would it failover to the other node? Or is it not at the webserver level?
    Thank you.

    Yes, thank you for the documentation.
    However, I had one question about an action script that is in the following documentation:
    http://www.oracle.com/technology/products/database/clustering/pdf/Using_Oracle_Clusterware_to_protect_Oracle_Application_Server.pdf
    In there in APPENDIX B is a script called webcache_action.scr. I modified this script to use in our environment to start and stop the http_server process. We have been having some problems with it...mainly when it fails over, it shuts down the http server, then brings it back up, then down again. This is happening in a production system so it's a big issue. My question is can you explain to me why that is happening and maybe also explain what the script is doing? Maybe I'm missing something. Do I even need to have the stop part in the script? All we need to do is when it fails over to startup the http server on the node, that's it! Any help would be appreciated.
    #!/bin/bash
    SCRIPT=$0
    ACTION=$1
    # Action (start, stop or check)
    ORA_OWNER=oracle
    # ORACLE installation owner
    ORA_HTTP_HOME=/opt/app/oracle/product/10.2.0/http_1
    # ORACLE_HOME of HTTP Server
    RET1=1
    # Internal return values ( do not change )
    RETVAL=1
    # Script return value
    # Main section of Action Script - starts, stops, or checks an application
    # This script is invoked by CRS when managing the application associated
    # with this script.
    # Argument: $1 - start | stop | check
    # Returns: 0 - successful start, stop, or check
    # 1 - error
    # Start section - start the process and report results
    case $1 in
    'start')
    ulimit -n 65536
    ulimit -u unlimited
    echo "DATE: `date`" >> /tmp/e
    echo "ulimit: `ulimit -n`" >> /tmp/e
    echo "ulimit: `ulimit -u`" >> /tmp/e
    # A) START - HTTP Server:
    $ORA_HTTP_HOME/opmn/bin/opmnctl startproc ias-component=HTTP_Server 1>/dev/null 2>&1
    RET1=$?
    # Prepare return values:
    if [ ${RET1:-0} -eq 0 ]; then
    RETVAL=0
    else
    RETVAL=1
    fi
    # Stop section - stop the process and report results
    'stop')
    # A) STOP - HTTP Server:
    $ORA_HTTP_HOME/opmn/bin/opmnctl stopproc ias-component=HTTP_Server 1>/dev/null 2>&1
    RET1=$?
    # Prepare return values:
    if [ ${RET1:-0} -eq 0 ]; then
    RETVAL=0
    else
    RETVAL=1
    fi
    echo "usage: $0 {start stop}"
    esac
    echo "RETURN: $RETVAL" >> /tmp/e
    # Return value to CRS daemon:
    echo "RETVAL: $RETVAL" >> /tmp/e
    if [ $RETVAL -eq 0 ]; then
    exit 0
    else
    exit 1
    fi
    #exit 0

  • Failover time using BFD

    Hi Champs,
    we have configured BFD in multihoming scenario with BGP routing protocol.Timer configuration is bfd interval 100 min_rx 100 multiplier 5.
    Failover from first ISP to second ISP takes 30 sec and same from first ISP to second ISP takes more than 1min. Can you suggest reason for different failver times and how can i have equal failover time from both ISP.How convergence time is calculated in BGP + BFD scenario?
    Regards
    V

    Vicky,
    A simple topology would help better understand the scenario. Do you have both the ISP terminated on same router or different router?.
    How many prefixes are you learning?. Full internet table or few prefixes?.
    Accordingly, you can consider BGP PIC or best external to speed up the convergence.
    -Nagendra

  • 2540 / RDAC path failover time

    Hi,
    I have a RHEL 5.3 server with two single port HBAs. These connect to a Brocade 300 switch and are zoned to two controllers on a 2540. Each HBA is zoned to see each controller. RDAC is used as the multipathing driver.
    When testing the solution, if I pull the cable from the active path between the HBA and the switch, it takes 60 seconds before the path fails over to the second HBA. No controller failover is taking place on the array - the path already exists through the brocade between the preferred array controller and the second HBA. After 60 seconds disk I/O continues to the original controller.
    Is this normal ? Is there a way of reducing the failover time ? I had a look at the /etc/mpp.conf variables but there is nothing obvious there that is causing this delay.
    Thanks

    Thanks Hugh,
    I forgot to mention that we were using Qlogic HBAs so our issue was a bit different...
    To resolve our problem; since we had 2x2FC HBA cards in each server we needed to configure zoning on the brocade switch to ensure that each HBA port only saw one of the two array controllers (previously both controllers were visable to each HBA port - which was breaking some RDAC rule). Also we upgraded the qlogic drivers using qlinstall -i before installing RDAC (QLogic drivers which come with RHEL5.3 are pretty old it seems).
    Anyway, after these changes path failovers were working as expected and our timeout value of 60sec for Oracle ocfs2 cluster was not exceeded.
    We actually ended up having to increase the ocfs2 timeout from 60 to 120 seconds because another test case failed - it was taking more than 60sec for a controller to failover (simulated by placing active controller offline from the service advisor). We are not sure if this time is expected or not... anyway have a service request open for this.
    Thanks again,
    Trev

  • Oracle11g r2 Grid/RAC  VIP failover instead of SCAN VIP failover

    Dear Experts and Gurus
    Our Platform: 2-Node ORACLE11G r2 RAC/GRID 11.2.0.1.0
    ReadHat Enterprise Linux5.3 64 bit
    We have not available the DNS Server for used to SCAN feature of Oracle11g r2 GRID/RAC.
    we have successfully deployed the the setup using scan-vip in /etc/host in our production site.
    we want to used the Oracle11g r2 Grid/RAC as Oracle10g r2 RAC/Oracle11g r1 RAC(VIP Failover)
    plz find the default configurations of my setup.
    cat /etc/hosts
    #public
    xxx.xxx.0.1 xyz-ch-aaadb-01
    xxx.xxx.0.2 xyzl-ch-aaadb-02
    #Virtual
    xxx.xxx.0.3 xyz-ch-aaadb-01-vip
    xxx.xxx.0.4 xyz-ch-aaadb-02-vip
    #Private
    10.10.0.1 xyz-ch-aaadb-01-priv
    10.10.0.2 xyz-ch-aaadb-01-priv
    #Scan
    xxx.xxx.0.5 rac-scan
    cat listener.ora
    listener.ora in both the RAC nodes
    LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))) # line added by Agent
    LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
    ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
    ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
    cat tnsnames.ora.
    AAADB =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac-scan)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = aaadb)
    aaadb2 =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = xyz-ch-aaadb-02-vip)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = aaadb)
    (INSTANCE_NAME = aaadb2)
    aaadb1 =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = xyz-ch-aaadb-01-vip)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = aaadb)
    (INSTANCE_NAME = aaadb1)
    listener parameters
    RAC-NODE1
    SQL> show parameter listener
    NAME TYPE VALUE
    listener_networks string
    local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=xyz-ch-aaadb-01-vip)(PORT=1521))))
    remote_listener string rac-scan:1521
    RAC-NODE2
    SQL> show parameter listener
    NAME TYPE VALUE
    listener_networks string
    local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=xyz-ch-aaadb-02-vip)(PORT=1521))))
    remote_listener string rac-scan:1521
    listener status
    RAC Node-1
    [oracle@aaarac1 ~]$ lsnrctl status
    LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 30-AUG-2011 23:43:50
    Copyright (c) 1991, 2009, Oracle. All rights reserved.
    Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
    STATUS of the LISTENER
    Alias LISTENER
    Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
    Start Date 30-AUG-2011 22:31:34
    Uptime 0 days 1 hr. 12 min. 15 sec
    Trace Level off
    Security ON: Local OS Authentication
    SNMP OFF
    Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
    Listener Log File /u01/app/oracle/diag/tnslsnr/aaarac1/listener/alert/log.xml
    Listening Endpoints Summary...
    (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xyz-ch-aaadb-01-vip)(PORT=1521)))
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xxx.xxx.0.1)(PORT=1521)))
    Services Summary...
    Service "+ASM" has 1 instance(s).
    Instance "+ASM1", status READY, has 1 handler(s) for this service...
    Service "aaadb" has 1 instance(s).
    Instance "aaadb1", status READY, has 1 handler(s) for this service...
    Service "aaadbXDB" has 1 instance(s).
    Instance "aaadb1", status READY, has 1 handler(s) for this service...
    The command completed successfully
    RAC Node-2
    [oracle@aaarac2 ~]$ lsnrctl status
    LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 30-AUG-2011 23:44:27
    Copyright (c) 1991, 2009, Oracle. All rights reserved.
    Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
    STATUS of the LISTENER
    Alias LISTENER
    Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
    Start Date 30-AUG-2011 22:08:45
    Uptime 0 days 1 hr. 35 min. 42 sec
    Trace Level off
    Security ON: Local OS Authentication
    SNMP OFF
    Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
    Listener Log File /u01/app/oracle/diag/tnslsnr/aaarac2/listener/alert/log.xml
    Listening Endpoints Summary...
    (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xyz-ch-aaadb-02-vip)(PORT=1521)))
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xxx.xxx.0.2)(PORT=1521)))
    Services Summary...
    Service "+ASM" has 1 instance(s).
    Instance "+ASM2", status READY, has 1 handler(s) for this service...
    Service "aaadb" has 1 instance(s).
    Instance "aaadb2", status READY, has 1 handler(s) for this service...
    Service "aaadbXDB" has 1 instance(s).
    Instance "aaadb2", status READY, has 1 handler(s) for this service...
    The command completed successfully
    plz suggest the provide the step to configure the listener.ora and tnsnames.ora for use the Oracle11g r2 Grid/RAC to use as
    VIP failover instead of SCAN-VIP failover.
    Regards
    Hitgon
    Edited by: hitgon on Aug 31, 2011 12:14 AM

    hitgon wrote:
    Dear Experts and Gurus
    plz suggest the provide the step to configure the listener.ora and tnsnames.ora for use the Oracle11g r2 Grid/RAC to use as
    VIP failover instead of SCAN-VIP failover.
    Regards
    Hitgon
    Hi,
    Have a read http://download.oracle.com/docs/cd/E11882_01/network.112/e10836/advcfg.htm#NETAG348
    Hope it helps
    CHeers

  • Optimize rac failover time?

    I have 2node RAC and the failover time is taking 4 minutes. Please advice some tips/documents/links that shows, how to optimize the rac failover time?
    [email protected]

    Hi
    Could you provide some more information of what it is you are trying to achieve. I assume you are talking about a the time it takes for clients to start connecting to the available instance on the second node, could you clarify this?
    There is SQLnet parameters that can be set, you can also make shadow connections with the preconnect parameter in your fail_over section of your tnsnames.ora on the clients.
    Have you set both of your hosts as preferred in the service configuration on the RAC cluster. The impact will be less in a failure as approximately half of your connections will be unaffeced when an instance fails.
    Cheers
    Peter

  • VIP Failover Testing

    Hello,
    Am new to Oracle RAC. We have a 2 node 11gR2 Cluster and we are in the process of doing some failover testing. For database deployments we use an internal third part tool called the deployer which has tokens for DB configurations and the DBHost token in the deployer has the Hostname for either Node 1 or Node 2. In this way we are not actually utilising the HA feature because the connection is either to Node1 or Node 2 and if something happens to either the deployment cannot connect to the database on the respective Node which treats as a single Node instead of a Cluster.
    Instead of mentioning the DBHost value to point to the Physical Hostname of the Server in a Cluster I was thinking if I can use the VIP address i.e ipaddress-VIP for either of the Node. So after making changes I would like to do some failover testing manually and I am stuck here. How do I go about the testing scenarios.
    For Eg: if DBHOST token value is VIP for Node 2, connections are coming in to Node2 via deployer how do I proceed with the testing
    Should I bring down Node 2? If I reboot how can I see if it failed over or not to the surviving Node?
    Any help/suggestions much appreciated.
    Thanks!

    What you describe is having a RAC cluster, possibly working possibly not, and no actual use of the value of the licensing you paid for.
    My first advice to you is to read the docs and learn what RAC is, how it works, how to define and use services, and how a properly configured LISTENER.ORA and TNSNAMES.ORA should be constructed so you can compare that to what you have. With 11gR2 you should connect to the SCAN not the VIP.
    Here's how I would test RAC:
    1. Walk up to one of the servers while half the users are connected to each instance and do a SHUTDOWN ABORT. See what happens. Restart the killed node. Try it with the other node.
    2. With everything running properly and load on both machines disconnect the switch that provides the cache fusion interconnect or pull one of the cables out of the server. When you reestablish the connection what happens?
    3. Repeat #2 but this time with the connection to storage.
    The above should get you started.

  • FAILED_OVER for single sessions. No VIP failover

    Hi all,
    On a 10.2.0.3.0 2-nodes RAC (AIX 5L) I am seeing from time to time single of few sessions marked in gv$session with FAILED_OVER='YES', but no services failover occurred. Those sessions are still connected to the preferred instance, that is in their tnsnames.ora they refer to a service running, let's say, on instance 2 and they ARE running on instance 2.
    I am wondering if this can be due to some kind of connectivity issue, that is the client doesn't ping the VIP address anymore (and vice-versa) and then the session is marked as FAILED_OVER.
    Has anyone seen something similar?
    Thanks for any feedback,
    Riccardo

    With Oracle 10g you do connection time failover with setting in clients tnsnames.ora file all nodes addresses: Check following link:
    http://docs.oracle.com/cd/B19306_01/network.102/b14213/tnsnames.htm#i477297
    If your clients tnsnames.ora file is ok, Then client tries to connect addresses one by one. And this way it does not matter even if some server of the cluster is down. But of course when you shutdown one database instance it's connections will be dropped. Althought you can have SELECT clause failovers with Transparent Application Failover (TAF).
    So you can run rolling update without shutting down whole RAC database as long as your clients tnsnames.ora is configured correctly. But those dropped connections need to be handled in application level.

  • Vip Failover and rolling patch

    Hi,
    For the purpose of implementing security features in the 2 Node RAC DB EE=10.2.0.5 [ID 1340831.1], I want to apply patch for bug:12880299 which is rolling available.
    My question is: If I do all this procedure on node 1 (creating wallets,self-signed certificates,stop/start instances on the node), the node 2 DB should continue to accept incoming connection requests right?
    In another way, I have tested FAILOVER tests while crashing node 1, and VIP was failed over to node 2, but I have no idea how vip will behave if DB,LISTENER,CRS,etc are literally stopped on node 1, will it still automatically move to Node 2?
    Also, prior installation of PSE 12880299 before implementing COST in this verison of DB? Is this necessary whatever it is?
    Thank you for your useful inputs.
    Regards

    With Oracle 10g you do connection time failover with setting in clients tnsnames.ora file all nodes addresses: Check following link:
    http://docs.oracle.com/cd/B19306_01/network.102/b14213/tnsnames.htm#i477297
    If your clients tnsnames.ora file is ok, Then client tries to connect addresses one by one. And this way it does not matter even if some server of the cluster is down. But of course when you shutdown one database instance it's connections will be dropped. Althought you can have SELECT clause failovers with Transparent Application Failover (TAF).
    So you can run rolling update without shutting down whole RAC database as long as your clients tnsnames.ora is configured correctly. But those dropped connections need to be handled in application level.

  • HANA High Availability System Vs Storage Vs VIP failover

    Dear Experts,
    Hope your all doing great. I would like to seek your expertise on HANA high availability best practice. We have been deciding to use TDI for BW on HANA. The next big question for us is how make it available atleast 99.99%.
    I was going through multiple documents, SDN forums, etc..but would like to see how the experts are performing in real time.
    My view -
    Virtual IP failover is a common HA practice which have been used to failover CI / DB hosts depends on failure/maintenance. In this case, both nodes can be used to run app servers.
    System replication - HANA based required secondary standby node, which doesn't accept user requests, but replicate the database from primary using logs after initial data snapshot either synchronous or Asynchronous. (Can be used as HA or DR - if servers are between different data centers).
    Storage replication - HANA based required secondary standby node, which replicates SAN for HA/DR.
    Could you please provide your expertise method you followed for HANA HA and what are the pros, cons and challenges that you have faced or facing.
    Thanks
    Yoga

    Thanks forbrich
    Do you know any specific doc that describes the installation and configuration steps of 10g RAC on NAS? If possible, can you provide some link that I could use to perform this task?
    I have done RAC installations on SAN without any problems and its something I'm fairly experienced with. With NAS I am not really comfortable since I can't seem to find any documentation that describes step by step installation procedure or guidelines for that matter.
    Thank you for your input
    Best Regards
    Abbas

  • Fwsm failover times in real crash

    Hi,
    I have got two cat6k vss and two servis modelu FWSM
    How fast FWSM will be switch over to back up Firewall, after active-fw crash/down power?
    Sent from Cisco Technical Support iPad App

    Hi,
    The initial 15 seconds detection time can be reduced to 3 seconds, by tuning failover polltime and holdtime to the following:
    "failover polltime unit 1 holdtime 3"
    Also keep in mind after  switchover new active will establish nbr relation with nbr router. At any point of time standby does  not participate in OSPF process.  so in short new active have to  re-establish adjacencies.
    Hope that helps.
    Thanks,
    Varun

Maybe you are looking for