VIP failover question

We recently had a node failure in our 2 node Solaris 10 and Oracle 10.2.0.3 cluster. When the failed node was restored and came back online the Load balancing in the cluster was still off. The node that had not failed was still taking all the requests where the restored node was idle.
The final resolution was to restart crs on the node that had not failed. This balanced the cluster again. At first I thought it was related to the listener and how it handles the load balancing but now Oracle and I believe it may be due to the VIP not failing back.
Has anyone else seen this? Is there a way to manually fail a VIP from one node to another? I am still waiting to here more from Oracle but was hoping someone in the forums might have some insight.
Thanks

fjlee, the ML note you are referring, says :
"*What happens when the Public adapter comes back?*
The VIP will be relocated back to its host node, this will happen the next time Oracle Clusterware checks the instance. This check is carried out, by default, every 600 seconds so this is the maximum amount of time that could pass before the relocate is triggered. The VIP can manually be relocated back to its home node using the srvctl start nodeapps –n <nodename> command.
then, if the relocation did not appear after 600 seconds, we have got an issue => SR should be open

Similar Messages

VIP Failover at the web server level??

Oracle10gR2
RHEL 4 AS 64bit
Hi,
I wanted to know is the VIP failover at the web server level also? For example, we are running Apex and that uses Apache/HTTP webserver, if that were to go down on one node, would it failover to the other node? Or is it not at the webserver level?
Thank you.

Yes, thank you for the documentation.
However, I had one question about an action script that is in the following documentation:
http://www.oracle.com/technology/products/database/clustering/pdf/Using_Oracle_Clusterware_to_protect_Oracle_Application_Server.pdf
In there in APPENDIX B is a script called webcache_action.scr. I modified this script to use in our environment to start and stop the http_server process. We have been having some problems with it...mainly when it fails over, it shuts down the http server, then brings it back up, then down again. This is happening in a production system so it's a big issue. My question is can you explain to me why that is happening and maybe also explain what the script is doing? Maybe I'm missing something. Do I even need to have the stop part in the script? All we need to do is when it fails over to startup the http server on the node, that's it! Any help would be appreciated.
#!/bin/bash
SCRIPT=$0
ACTION=$1
# Action (start, stop or check)
ORA_OWNER=oracle
# ORACLE installation owner
ORA_HTTP_HOME=/opt/app/oracle/product/10.2.0/http_1
# ORACLE_HOME of HTTP Server
RET1=1
# Internal return values ( do not change )
RETVAL=1
# Script return value
# Main section of Action Script - starts, stops, or checks an application
# This script is invoked by CRS when managing the application associated
# with this script.
# Argument: $1 - start | stop | check
# Returns: 0 - successful start, stop, or check
# 1 - error
# Start section - start the process and report results
case $1 in
'start')
ulimit -n 65536
ulimit -u unlimited
echo "DATE: `date`" >> /tmp/e
echo "ulimit: `ulimit -n`" >> /tmp/e
echo "ulimit: `ulimit -u`" >> /tmp/e
# A) START - HTTP Server:
$ORA_HTTP_HOME/opmn/bin/opmnctl startproc ias-component=HTTP_Server 1>/dev/null 2>&1
RET1=$?
# Prepare return values:
if [ ${RET1:-0} -eq 0 ]; then
RETVAL=0
else
RETVAL=1
fi
# Stop section - stop the process and report results
'stop')
# A) STOP - HTTP Server:
$ORA_HTTP_HOME/opmn/bin/opmnctl stopproc ias-component=HTTP_Server 1>/dev/null 2>&1
RET1=$?
# Prepare return values:
if [ ${RET1:-0} -eq 0 ]; then
RETVAL=0
else
RETVAL=1
fi
echo "usage: $0 {start stop}"
esac
echo "RETURN: $RETVAL" >> /tmp/e
# Return value to CRS daemon:
echo "RETVAL: $RETVAL" >> /tmp/e
if [ $RETVAL -eq 0 ]; then
exit 0
else
exit 1
fi
#exit 0

VIP failover in Oracle RAC

Dear all,
I am using Oracle Rac 10gR2 running on top of Sun Cluster 3.2u3.
I have a test to check the failover ability of VIP in Oracle RAC, however the result was not as I expected.
The test scenario was:
- Turn on the 02 nodes and wait for all services including both Sun Cluster and Oracle RAC online.
- Using SQL Navigator to connect to the database using the VIP on node1. (VIP1)
- Shutdown the node1.
- All services and resources on node2 still online, however after a long time (about 10 mins), I did not see the VIP1 failover to the alive node.
- The "crs_stat -t" command did not show the VIP1 online on node2 (alice node).
- The SQL Navigator could not establish the connection to the databasse using the VIP1 any more.
The output of "crs_stat -t" command before shutting down the node1:
oracle@t5120-02 $ crs_stat -t
Name Type Target State Host
ora.orcl.db application ONLINE ONLINE t5120-02
ora....l1.inst application ONLINE ONLINE t5120-01
ora....l2.inst application ONLINE ONLINE t5120-02
ora....01.lsnr application ONLINE ONLINE t5120-01
ora....-01.gsd application ONLINE ONLINE t5120-01
ora....-01.ons application ONLINE ONLINE t5120-01
ora....-01.vip application ONLINE ONLINE t5120-01
ora....02.lsnr application ONLINE ONLINE t5120-02
ora....-02.gsd application ONLINE ONLINE t5120-02
ora....-02.ons application ONLINE ONLINE t5120-02
ora....-02.vip application ONLINE ONLINE t5120-02
The output of "crs_stat -t" command after shutting down the node1:
oracle@t5120-02 $ crs_stat -t
Name Type Target State Host
ora.orcl.db application ONLINE ONLINE t5120-02
ora....l1.inst application OFFLINE OFFLINE
ora....l2.inst application ONLINE ONLINE t5120-02
ora....01.lsnr application OFFLINE OFFLINE
ora....-01.gsd application OFFLINE OFFLINE
ora....-01.ons application OFFLINE OFFLINE
ora....-01.vip application OFFLINE OFFLINE
ora....02.lsnr application ONLINE ONLINE t5120-02
ora....-02.gsd application ONLINE ONLINE t5120-02
ora....-02.ons application ONLINE ONLINE t5120-02
ora....-02.vip application ONLINE ONLINE t5120-02
So my questions are:
- Was my test scenario correct to check the failover ability of VIP in Oracle RAC?
- Is there any additional configuration needed to perform on the system to achieve the VIP failover?
Please help me in this case as I am new to Oracle RAC.
Thanks.
HuyNQ.

Dear Rajesh,
Sorry for late reply.
I have already tested 02 cases: shutting down a node and crashing a node. Below are the output of the log files in the 2 test cases.
Once again, when shutting down a node, the VIP did not failover although the CRS on that node was shutdown before all other services and resources of Sun Cluster shutdown.
Please help to check the log files and give me advise if you see anything abnormally.
Thanks.
* In case of shutting down the node 1: (at about 09:05 Sep 17)
Shutdown node 1:
root@t5120-01 # shutdown -y -g0 -i0
Shutdown started. Fri Sep 17 09:04:55 ICT 2010
Changing to init state 0 - please wait
Broadcast Message from root (console) on t5120-01 Fri Sep 17 09:04:55...
THE SYSTEM t5120-01 IS BEING SHUT DOWN NOW ! ! !
Log off now or risk your files being damaged
crsd.log file on node 2:
root@t5120-02 # more /u01/app/oracle/10.2.0/crs/log/t5120-02/crsd/crsd.log
2010-09-16 16:35:56.281: [ CRSRES][1326] t5120-02 : CRS-1019: Resource ora.t5120-01.gsd (application) cannot run on t5120-02
2010-09-16 16:35:56.320: [ CRSRES][1325] t5120-02 : CRS-1019: Resource ora.t5120-01.LISTENER_T5120-01.lsnr (application) cannot run on t5120-02
2010-09-16 16:35:56.346: [ CRSRES][1327] t5120-02 : CRS-1019: Resource ora.t5120-01.ons (application) cannot run on t5120-02
2010-09-16 17:06:10.202: [ CRSRES][1520] StopResource: setting CLI values
2010-09-17 09:06:10.567: [ CRSCOMM][5709] CLEANUP: Searching for connections to failed node t5120-01
2010-09-17 09:06:10.577: [ CRSEVT][5709] Processing member leave for t5120-01, incarnation: 11
2010-09-17 09:06:10.665: [    CRSD][5709] SM: recovery in process: 8
2010-09-17 09:06:10.665: [ CRSEVT][5709] Do failover for: t5120-01
2010-09-17 09:06:10.826: [ CRSEVT][5709] Post recovery done evmd event for: t5120-01
2010-09-17 09:06:10.898: [    CRSD][5709] SM: recoveryDone: 0
2010-09-17 09:06:10.918: [ CRSEVT][5710] Processing RecoveryDone
crs_stat -t on node 2:
oracle@t5120-02 $ crs_stat -t
Name Type Target State Host
ora.orcl.db application ONLINE ONLINE t5120-02
ora....l1.inst application OFFLINE OFFLINE
ora....l2.inst application ONLINE ONLINE t5120-02
ora....01.lsnr application OFFLINE OFFLINE
ora....-01.gsd application OFFLINE OFFLINE
ora....-01.ons application OFFLINE OFFLINE
ora....-01.vip application OFFLINE OFFLINE
ora....02.lsnr application ONLINE ONLINE t5120-02
ora....-02.gsd application ONLINE ONLINE t5120-02
ora....-02.ons application ONLINE ONLINE t5120-02
ora....-02.vip application ONLINE ONLINE t5120-02
* In case of crashing the node 1: (at about 09:32 Sep 17)
Crash the node 1:
root@t5120-01 # Sep 17 09:31:16 t5120-01 Cluster.CCR: pmmd: fsync_core_files: could not get any core file paths: pcorefile error Invalid argument, gcorefile error Invalid argument, zcorefile error Invalid argument
Sep 17 09:31:16 t5120-01 Cluster.CCR: [ID 408757 daemon.alert] pmmd: fsync_core_files: could not get any core file paths: pcorefile error Invalid argument, gcorefile error Invalid argument, zcorefile error Invalid argument
Notifying cluster that this node is panicking
crsd.log file on node 2:
root@t5120-02 # tail -30 /u01/app/oracle/10.2.0/crs/log/t5120-02/crsd/crsd.log
2010-09-16 16:35:56.281: [ CRSRES][1326] t5120-02 : CRS-1019: Resource ora.t5120-01.gsd (application) cannot run on t5120-02
2010-09-16 16:35:56.320: [ CRSRES][1325] t5120-02 : CRS-1019: Resource ora.t5120-01.LISTENER_T5120-01.lsnr (application) cannot run on t5120-02
2010-09-16 16:35:56.346: [ CRSRES][1327] t5120-02 : CRS-1019: Resource ora.t5120-01.ons (application) cannot run on t5120-02
2010-09-16 17:06:10.202: [ CRSRES][1520] StopResource: setting CLI values
2010-09-17 09:06:10.567: [ CRSCOMM][5709] CLEANUP: Searching for connections to failed node t5120-01
2010-09-17 09:06:10.577: [ CRSEVT][5709] Processing member leave for t5120-01, incarnation: 11
2010-09-17 09:06:10.665: [    CRSD][5709] SM: recovery in process: 8
2010-09-17 09:06:10.665: [ CRSEVT][5709] Do failover for: t5120-01
2010-09-17 09:06:10.826: [ CRSEVT][5709] Post recovery done evmd event for: t5120-01
2010-09-17 09:06:10.898: [    CRSD][5709] SM: recoveryDone: 0
2010-09-17 09:06:10.918: [ CRSEVT][5710] Processing RecoveryDone
2010-09-17 09:32:08.810: [ CRSCOMM][5837] CLEANUP: Searching for connections to failed node t5120-01
2010-09-17 09:32:08.811: [ CRSEVT][5837] Processing member leave for t5120-01, incarnation: 13
2010-09-17 09:32:08.824: [    CRSD][5837] SM: recovery in process: 8
2010-09-17 09:32:08.824: [ CRSEVT][5837] Do failover for: t5120-01
2010-09-17 09:32:09.036: [ CRSRES][5837] startup = 0
2010-09-17 09:32:09.075: [ CRSRES][5837] startup = 0
2010-09-17 09:32:09.106: [ CRSRES][5837] startup = 0
2010-09-17 09:32:09.132: [ CRSRES][5837] startup = 0
2010-09-17 09:32:09.153: [ CRSRES][5837] startup = 0
2010-09-17 09:32:09.565: [ CRSRES][5839] startRunnable: setting CLI values
2010-09-17 09:32:09.575: [ CRSRES][5839] Attempting to start `ora.t5120-01.vip` on member `t5120-02`
2010-09-17 09:32:16.276: [ CRSRES][5839] Start of `ora.t5120-01.vip` on member `t5120-02` succeeded.
2010-09-17 09:32:16.340: [ CRSEVT][5837] Post recovery done evmd event for: t5120-01
2010-09-17 09:32:16.342: [    CRSD][5837] SM: recoveryDone: 0
2010-09-17 09:32:16.348: [ CRSEVT][5846] Processing RecoveryDone
crs_stat -t on node 2:
oracle@t5120-02 $ crs_stat -t
Name Type Target State Host
ora.orcl.db application ONLINE ONLINE t5120-02
ora....l1.inst application ONLINE OFFLINE
ora....l2.inst application ONLINE ONLINE t5120-02
ora....01.lsnr application ONLINE OFFLINE
ora....-01.gsd application ONLINE OFFLINE
ora....-01.ons application ONLINE OFFLINE
ora....-01.vip application ONLINE ONLINE t5120-02
ora....02.lsnr application ONLINE ONLINE t5120-02
ora....-02.gsd application ONLINE ONLINE t5120-02
ora....-02.ons application ONLINE ONLINE t5120-02
ora....-02.vip application ONLINE ONLINE t5120-02

Oracle 11g clusterware VIP failover failed

I installed Oracle 11g Clusterware succesfully, without any errors as per link:
http://www.oracle-base.com/articles/11g/OracleDB11gR1RACInstallationOnRHEL5UsingVMwareESXAndNFS.php
After that,I did vip failover test
I rebooted the node-2
Before reboot,
[root@advansrac1 bin]# ./crs_stat -t
Name Type Target State Host
ora....ac1.gsd application ONLINE ONLINE rac1
ora....ac1.ons application ONLINE ONLINE rac1
ora....ac1.vip application ONLINE ONLINE rac1
ora....ac2.gsd application ONLINE ONLINE rac2
ora....ac2.ons application ONLINE ONLINE rac2
ora....ac2.vip application ONLINE ONLINE rac2
After reboot,
Name Type Target State Host
ora....ac1.gsd application ONLINE ONLINE rac1
ora....ac1.ons application ONLINE ONLINE rac1
ora....ac1.vip application ONLINE ONLINE rac1
ora....ac2.gsd application ONLINE ONLINE rac2
ora....ac2.ons application ONLINE ONLINE rac2
ora....ac2.vip application ONLINE UNKNOWN rac1
[root@rac1 bin]# ./crs_stop ora.rac2.vip
Attempting to stop `ora.rac2.vip` on member `advansrac1`
Stop of `ora.rac2.vip` on member `advansrac1` succeeded.
[root@rac1 bin]# ./crs_stat -t
Name Type Target State Host
ora....ac1.gsd application ONLINE ONLINE rac1
ora....ac1.ons application ONLINE ONLINE rac1
ora....ac1.vip application ONLINE ONLINE rac1
ora....ac2.gsd application ONLINE ONLINE rac2
ora....ac2.ons application ONLINE ONLINE rac2
ora....ac2.vip application OFFLINE OFFLINE
[root@rac1 bin]# ./crs_start ora.rac2.vip
Attempting to start `ora.rac2.vip` on member `rac2`
Start of `ora.rac2.vip` on member `rac2` succeeded.
[root@rac1 bin]# ./crs_stat -t
Name Type Target State Host
ora....ac1.gsd application ONLINE ONLINE rac1
ora....ac1.ons application ONLINE ONLINE rac1
ora....ac1.vip application ONLINE ONLINE rac1
ora....ac2.gsd application ONLINE ONLINE rac2
ora....ac2.ons application ONLINE ONLINE rac2
ora....ac2.vip application ONLINE ONLINE rac2
I have only 1.5G on each node
Here the issue is,
1. Actual Result: # During failover why it is showing UNKNOWN state for ora.rac2.vip on member rac1
Expected Result: # During failover,it have to be ONLINE state for ora.rac2.vip on member rac1
2. I have to start ora.rac2.vip manually, when node-2 is up.I want VIP fail over have to happen automatically when node-2 is up to normal online state.
Help me out from this issue

VMware is unsupported but that is likely not your issue.
1. Run cluster verify and report the results
2. Did you create a failover service? How?
3. Post your TNSNAMES.ORA

Oracle11g r2 Grid/RAC VIP failover instead of SCAN VIP failover

Dear Experts and Gurus
Our Platform: 2-Node ORACLE11G r2 RAC/GRID 11.2.0.1.0
ReadHat Enterprise Linux5.3 64 bit
We have not available the DNS Server for used to SCAN feature of Oracle11g r2 GRID/RAC.
we have successfully deployed the the setup using scan-vip in /etc/host in our production site.
we want to used the Oracle11g r2 Grid/RAC as Oracle10g r2 RAC/Oracle11g r1 RAC(VIP Failover)
plz find the default configurations of my setup.
cat /etc/hosts
#public
xxx.xxx.0.1 xyz-ch-aaadb-01
xxx.xxx.0.2 xyzl-ch-aaadb-02
#Virtual
xxx.xxx.0.3 xyz-ch-aaadb-01-vip
xxx.xxx.0.4 xyz-ch-aaadb-02-vip
#Private
10.10.0.1 xyz-ch-aaadb-01-priv
10.10.0.2 xyz-ch-aaadb-01-priv
#Scan
xxx.xxx.0.5 rac-scan
cat listener.ora
listener.ora in both the RAC nodes
LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))) # line added by Agent
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
cat tnsnames.ora.
AAADB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = rac-scan)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = aaadb)
aaadb2 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = xyz-ch-aaadb-02-vip)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = aaadb)
(INSTANCE_NAME = aaadb2)
aaadb1 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = xyz-ch-aaadb-01-vip)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = aaadb)
(INSTANCE_NAME = aaadb1)
listener parameters
RAC-NODE1
SQL> show parameter listener
NAME TYPE VALUE
listener_networks string
local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=xyz-ch-aaadb-01-vip)(PORT=1521))))
remote_listener string rac-scan:1521
RAC-NODE2
SQL> show parameter listener
NAME TYPE VALUE
listener_networks string
local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=xyz-ch-aaadb-02-vip)(PORT=1521))))
remote_listener string rac-scan:1521
listener status
RAC Node-1
[oracle@aaarac1 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 30-AUG-2011 23:43:50
Copyright (c) 1991, 2009, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date 30-AUG-2011 22:31:34
Uptime 0 days 1 hr. 12 min. 15 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/aaarac1/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xyz-ch-aaadb-01-vip)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xxx.xxx.0.1)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "aaadb" has 1 instance(s).
Instance "aaadb1", status READY, has 1 handler(s) for this service...
Service "aaadbXDB" has 1 instance(s).
Instance "aaadb1", status READY, has 1 handler(s) for this service...
The command completed successfully
RAC Node-2
[oracle@aaarac2 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 30-AUG-2011 23:44:27
Copyright (c) 1991, 2009, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date 30-AUG-2011 22:08:45
Uptime 0 days 1 hr. 35 min. 42 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/aaarac2/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xyz-ch-aaadb-02-vip)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xxx.xxx.0.2)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM2", status READY, has 1 handler(s) for this service...
Service "aaadb" has 1 instance(s).
Instance "aaadb2", status READY, has 1 handler(s) for this service...
Service "aaadbXDB" has 1 instance(s).
Instance "aaadb2", status READY, has 1 handler(s) for this service...
The command completed successfully
plz suggest the provide the step to configure the listener.ora and tnsnames.ora for use the Oracle11g r2 Grid/RAC to use as
VIP failover instead of SCAN-VIP failover.
Regards
Hitgon
Edited by: hitgon on Aug 31, 2011 12:14 AM

hitgon wrote:
Dear Experts and Gurus
plz suggest the provide the step to configure the listener.ora and tnsnames.ora for use the Oracle11g r2 Grid/RAC to use as
VIP failover instead of SCAN-VIP failover.
Regards
Hitgon
Hi,
Have a read http://download.oracle.com/docs/cd/E11882_01/network.112/e10836/advcfg.htm#NETAG348
Hope it helps
CHeers

Vip Failover and rolling patch

Hi,
For the purpose of implementing security features in the 2 Node RAC DB EE=10.2.0.5 [ID 1340831.1], I want to apply patch for bug:12880299 which is rolling available.
My question is: If I do all this procedure on node 1 (creating wallets,self-signed certificates,stop/start instances on the node), the node 2 DB should continue to accept incoming connection requests right?
In another way, I have tested FAILOVER tests while crashing node 1, and VIP was failed over to node 2, but I have no idea how vip will behave if DB,LISTENER,CRS,etc are literally stopped on node 1, will it still automatically move to Node 2?
Also, prior installation of PSE 12880299 before implementing COST in this verison of DB? Is this necessary whatever it is?
Thank you for your useful inputs.
Regards

With Oracle 10g you do connection time failover with setting in clients tnsnames.ora file all nodes addresses: Check following link:
http://docs.oracle.com/cd/B19306_01/network.102/b14213/tnsnames.htm#i477297
If your clients tnsnames.ora file is ok, Then client tries to connect addresses one by one. And this way it does not matter even if some server of the cluster is down. But of course when you shutdown one database instance it's connections will be dropped. Althought you can have SELECT clause failovers with Transparent Application Failover (TAF).
So you can run rolling update without shutting down whole RAC database as long as your clients tnsnames.ora is configured correctly. But those dropped connections need to be handled in application level.

ACE and VIP failover

Hi,
There is another one:-)
On CSS i could define critical service and put the VIP down if critical service is down. Also CSS used something like VRRP to define active VIP per CSS.
So the question is, can I do the same thing on two ACE modules? So, one is active for the VIP, and if service associated with that VIP fails, the active VIP is moved to another ACE module?
Can this be accomplished with contexts? FT VLAN..etc. It is not the same as VRRP VIP fail over on CSS but i could use it. Can i use FT VLAN over L2 devices/MPLS backbone or do i have to use dedicated link?

On ACE the failover is context based (not Vip based).
ACe can be configured to track and detect failures in the following items in the
Admin context and any user context:
â¢ Gateways or hosts
â¢ Interfaces
â¢ Hot Standby Router Protocol (HSRP) groups
You need to configure a tracking priority for each tracking event.
from ACE Admin guide
"Suppose that on ACE 1 you configure the active FT group member
with a priority of 100 and on ACE 2 you configure the standby FT group member
with a priority of 70. Further, assume that you configure the FT group to track
three critical interfaces, each with a unit priority of 15. To trigger a switchover,
all three interfaces must fail so that the priority of the active member is less than
the priority of the standby member (100 - 45 = 55)."
Please read ACE Admin guide for more details
Syed

RHI failover question

Hi,
I have a pair of 6509's with Sup-720's and ACE modules. IOS is SXI3, ACE software is 3.0.
When I do a manual ACE failover from one chassis to another I see this behaviour (from debug ip routing)
switch1:
Jan 14 09:13:32.210 CET: RT(test-context): del 10.0.2.1/32 via 10.0.0.1, static metric [77/0]
switch2:
Jan 14 09:13:34.211 CET: RT(test-context): add 10.0.2.1/32 via 10.0.0.1, static metric [77/0]
My question is this - why two seconds to install the static route on the second switch? The VIP is there pretty much instantly but the static route is lagging behind by two seconds.
Is there any way to speed up the process or is it hardwired? (I've tried other versions of IOS and other versions of ACE code - no difference)
thanks,
Andrew.

sqlnet client failover is defined in either tnsnames.ora or a thin JDBC connection string on the client.

HANA High Availability System Vs Storage Vs VIP failover

Dear Experts,
Hope your all doing great. I would like to seek your expertise on HANA high availability best practice. We have been deciding to use TDI for BW on HANA. The next big question for us is how make it available atleast 99.99%.
I was going through multiple documents, SDN forums, etc..but would like to see how the experts are performing in real time.
My view -
Virtual IP failover is a common HA practice which have been used to failover CI / DB hosts depends on failure/maintenance. In this case, both nodes can be used to run app servers.
System replication - HANA based required secondary standby node, which doesn't accept user requests, but replicate the database from primary using logs after initial data snapshot either synchronous or Asynchronous. (Can be used as HA or DR - if servers are between different data centers).
Storage replication - HANA based required secondary standby node, which replicates SAN for HA/DR.
Could you please provide your expertise method you followed for HANA HA and what are the pros, cons and challenges that you have faced or facing.
Thanks
Yoga

Thanks forbrich
Do you know any specific doc that describes the installation and configuration steps of 10g RAC on NAS? If possible, can you provide some link that I could use to perform this task?
I have done RAC installations on SAN without any problems and its something I'm fairly experienced with. With NAS I am not really comfortable since I can't seem to find any documentation that describes step by step installation procedure or guidelines for that matter.
Thank you for your input
Best Regards
Abbas

VIP Failover Testing

Hello,
Am new to Oracle RAC. We have a 2 node 11gR2 Cluster and we are in the process of doing some failover testing. For database deployments we use an internal third part tool called the deployer which has tokens for DB configurations and the DBHost token in the deployer has the Hostname for either Node 1 or Node 2. In this way we are not actually utilising the HA feature because the connection is either to Node1 or Node 2 and if something happens to either the deployment cannot connect to the database on the respective Node which treats as a single Node instead of a Cluster.
Instead of mentioning the DBHost value to point to the Physical Hostname of the Server in a Cluster I was thinking if I can use the VIP address i.e ipaddress-VIP for either of the Node. So after making changes I would like to do some failover testing manually and I am stuck here. How do I go about the testing scenarios.
For Eg: if DBHOST token value is VIP for Node 2, connections are coming in to Node2 via deployer how do I proceed with the testing
Should I bring down Node 2? If I reboot how can I see if it failed over or not to the surviving Node?
Any help/suggestions much appreciated.
Thanks!

What you describe is having a RAC cluster, possibly working possibly not, and no actual use of the value of the licensing you paid for.
My first advice to you is to read the docs and learn what RAC is, how it works, how to define and use services, and how a properly configured LISTENER.ORA and TNSNAMES.ORA should be constructed so you can compare that to what you have. With 11gR2 you should connect to the SCAN not the VIP.
Here's how I would test RAC:
1. Walk up to one of the servers while half the users are connected to each instance and do a SHUTDOWN ABORT. See what happens. Restart the killed node. Try it with the other node.
2. With everything running properly and load on both machines disconnect the switch that provides the cache fusion interconnect or pull one of the cables out of the server. When you reestablish the connection what happens?
3. Repeat #2 but this time with the connection to storage.
The above should get you started.

WLC Failover Question

Hi All
Can anyone give me a definitive answer to this question please?
If you are using a pair of wireless LAN controllers configured with primary and secondary controllers for the access points and the primary controller fails - do the access points reboot before associating to the secondary controller. I can't see why they would need to but documentation suggests they do.
Additionally, has anyone significantly reduced the failover time? If so, what is the lowest practical failover time. I know the actual failover time can be reduced to 3 seconds but I think that is likely to cause other problems.
Thanks guys.
Regards
Roger

As far I know, In this case the AP does not reboot, only changes its lwapp status to discovery and begins with the discovery proccess.
You can see in the AP if it is restarted; when it places registered in the second WLC, Wireless tab and select the AP affected; normaly in the first tab you can see bottom right the AP up time and the AP association time; if this AP has rebooted this value will close to 00:00.
Normaly I set the Ap heartbeat timeout to 5 seconds, I don´t know if is the best value and my failover time is bigger than your, I don´t know how critical are your network, but a prefer a higher heartbeat timeout to avoid unnecessary AP changes that spend more time.
Best Regards.

FAILED_OVER for single sessions. No VIP failover

Hi all,
On a 10.2.0.3.0 2-nodes RAC (AIX 5L) I am seeing from time to time single of few sessions marked in gv$session with FAILED_OVER='YES', but no services failover occurred. Those sessions are still connected to the preferred instance, that is in their tnsnames.ora they refer to a service running, let's say, on instance 2 and they ARE running on instance 2.
I am wondering if this can be due to some kind of connectivity issue, that is the client doesn't ping the VIP address anymore (and vice-versa) and then the session is marked as FAILED_OVER.
Has anyone seen something similar?
Thanks for any feedback,
Riccardo

With Oracle 10g you do connection time failover with setting in clients tnsnames.ora file all nodes addresses: Check following link:
http://docs.oracle.com/cd/B19306_01/network.102/b14213/tnsnames.htm#i477297
If your clients tnsnames.ora file is ok, Then client tries to connect addresses one by one. And this way it does not matter even if some server of the cluster is down. But of course when you shutdown one database instance it's connections will be dropped. Althought you can have SELECT clause failovers with Transparent Application Failover (TAF).
So you can run rolling update without shutting down whole RAC database as long as your clients tnsnames.ora is configured correctly. But those dropped connections need to be handled in application level.

VIP failover time

I have configured a critical service(ap-kal-pinglist) for the VIP redundant failover, default freq,maxfail and retry freq is 5,3,5, so I think the failover time is 5+5*3*2=35s.But the virtual-router's state changed from "master" to "backup" in around 5 secs after connection lost.
Anyone help me to understand it?

Service sw1-up-down connect to e2 interface,going down in 15sec
Service sw2-up-down connect to e3 interface,going down in 4sec?
JAN 14 02:38:41 5/1 3857 NETMAN-2: Generic:LINK DOWN for e2
JAN 14 02:39:57 5/1 3858 NETMAN-2: Generic:LINK DOWN for e3
JAN 14 02:39:57 5/1 3859 VRRP-0: VrrpTx: Failed on Ipv4FindInterface
JAN 14 02:40:11 5/1 3860 NETMAN-2: Enterprise:Service Transition:sw2-up-down -> down
JAN 14 02:40:11 5/1 3861 NETMAN-2: Enterprise:Service Transition:sw1-up-down -> down

Load balancing with failover questions

If we install 2 multi-role Exchange servers in our building and a 3rd multi-role server in our remote data center, what is the best way to load balance them? Do we need two load balancers or is there some way to span a single load balancer across the
WAN ?
What about using Windows NLB as an alternative to using round robin internally?
Can a load balancer keep our interoffice Exchange CAS traffic from leaving our LAN and only failover to using the 3rd CAS/mailbox sever for internal users if both internal Exchange servers are offline?
We would also like remote users to "prefer" to use the data center CAS unless it is down. Right now we point our smart host directly to a CAS, but if we had a load balancer there, we could point the smart host to the IP of the load balancer and
the load balancer could normally send it to data center CAS if it's up and forward it to one of the servers in the office otherwise.
Is it possible to do all this without a very complicated and expensive solution?

Depends... what is the connectivity speed between two sites, is it good enough?
You can use load balance in front of all the 3 CAS if your inter-site connectivity is very good.
What about using Windows NLB as an alternative to using round robin internally? WNLB and round robin is different,. You can use DNS Roud Robin if you want to or WNLB for all three CAS Server. Or Hardware loadbalancer for all three CAS servers
Can a load balancer keep our interoffice Exchange CAS traffic from leaving our LAN and only failover to using the 3rd CAS/mailbox sever for internal users if both internal Exchange servers are offline? If you want to use the load balancer then you don't
need to fail them over one by one -- again you can use DNS Round Robing so the request will go to eah CAS servers one by one or use Hardware Load balance.
We would also like remote users to "prefer" to use the data center CAS unless it is down. Right now we point our smart host directly to a CAS, but if we had a load balancer there, we could point the smart host to the IP of the load balancer and
the load balancer could normally send it to data center CAS if it's up and forward it to one of the servers in the office otherwise.
Use DNS Server and point the A record to the Primary Data center load balanced CAS server instead using IP or host file.
Hope that helps
Where Technology Meets Talent

Two Wireless controllers load balance and failover question

I have two 4404 controllers and each can take 100 APs. I have 140 APs in total. With the default settings (no master controller, no configuration of Prime, secondary controller on APs), each controller will take 70 APs, right?
Then I will need to configure each AP with an IP address, name ...etc. My question is, when one Controller failed, these 70 APs will try to associate with another controller, right? However only 30 APs can because another controller can maximum manage 100 APs. Then in this case, will these 30 APs lose their static IP addresses and names? When the failed controller came back online, will the 70 APs automatically go back to this controller and have their IP, name configuration back?
Thanks!

With default setting you have no control how many ap's go to what wlc. It doesn't matter, because you will need to specify the primary and secondary. You might as well stage all the ap's you want on one wlc first and set that wlc to master, then when you have finished that, set the other wlc to master and have the ap's join that wlc which will be the primary fro those ap's.
You only can support 100 ap's so depending what code you use, 30 ap's that are not able to join will just keep trying. If you run 5.2 (I think is buggy) you can set the priority on the ap's so that ap's that you set up with a higher priority will be able to join and the others will again sit there until the othe wlc comes back up. Static IP address will not dissapear because the wlc doesnt' accept any more connections. Once both wlc are up, the ap's will go back to their primary wlc as long as ap fallback is enabled and mobility is configured right.

VIP failover question

Similar Messages

Maybe you are looking for