Transparent failover of CSS services

I need help understanding service failure and the action taken by the CSS to rebalance the traffic.
I am working on a project to replace a non-Cisco load balancer that works by receiving each packet and forwarding it to all of the web servers. The web servers are stateful at the application layer. This is an on-line brokerage application.
Currently, any server can be taken down hard and the user does not experience an outage since another server takes over and is already receiving the traffic (at least this is my understanding). I would like to try to replicate this functionality as closely as possible. Suspending a service is not desirable since a user can stay active for extended periods and the application team needs to fix the server/application and get it back in production quickly.
What is the recommended CSS implementation for this application?
Currently, SSL termination is performed by the server. We are also investigating the SSL module in the CSS.
Option 1: SSL on the server
- When one service fails, can the CSS initiate a TCP connection to another service and update the flow table without the client being affected?
Option 2: SSL Module in CSS
- Is there anything in the CSS that makes this easier or more difficult to implement?
Any help would be appreciated.
Thanks,
Rob

HI Rob,
first of all I do not know any possible implementation to do statefull SSL-Failover on any hardware box. I heard rumors that some servers should be able to do this but from my understanding this is quite hard to implement as the servers have to replicated all key pairs used in any SSL-Connection.
For taking servers out of service there is an easy way. Define a testing URL. As soon as this URL is no longer responding (i.e. by renaming the testing file) the server is taken out of the loadbalancing and the session are load balanced.
IN terms of the SSl-Module it will help you reducing the CPU-Usage of your servers (only true if they do not have a seperate HW in their servers for SSL) but be aware that you might have to configure backend-ssl as this is a banking environment. Another thing I'm quite sure is that the the two SSL-Modules are not able to do statefull failover but maybe this knowledge is out-dated and someone could shade some light on this.
Hope that helped.
Kind Regards,
Joerg

Similar Messages

Load balancing and transparent failover in RAC 11g

Hi,
The only way to configure load balancing and transparent failover in RAC 11 g R2 is using the clauses LOAD_BALANCING and FAILOVER on the tnsnames.ora of application server. Is that correct?

If the database is admin-managed, I shall suggest that rather than playing with the remote_listener parameter , set one service as preferred for one instance.For the other service, let it be preferred on both the nodes. The service with just one instance as Preferred won't be able to use loab-balancing as there is just one instance available for its disposal.
HTH
Aman....

Transparent failover for forms

Kindly let me know that if real application clusters are implemented in windows enviorment does it gives transparent failover if we are using forms and report only in our applications .that means that if an end user is accessing a form and that node goes down ,he would not come to know and the control switch to some other node and he will continues with his work

TAF implementation has not much to do with RAC. It's client side feature and it will work equally well with RAC, Data Guard, single instance and cold failover databases assuming the service is back when reconnect is issued.
However, Transparent Application Failover is not transparent at all for transactions. Read only activities are pretty much OK and depending on whether you use SESSION of SELECT based method, and user might not see the impact at all.
However, if user session was in the middle of transaction when failure happened, session will not reconnect until user session issues rollback. This is because instance failure in the middle of transaction causes Oracle to rollback the changes and it needs to make sure that user (application) is aware that previous part of the transaction was rolled back. The way it's done - application needs rollback.
So application DOES need to handle these situations. Imagine a transaction that spans across several forms or user interactions and commit at the end. Let's say user is already at the last step and most of transaction has been applied to the database but no commit yet. Failure happens and Oracle rolls everything back naturally to preserve transactional integrity. User/application must realize that to avoid situation when user presses final "DONE" button sending just simple commit while loosing all the changes. The way to handle it is rollback and either to re-try transaction from the beginning (might be a problem in forms) or inform the user that transaction failed and must be retried.
Reports must be ok assuming it's selects only that's probably safely can be assumed for Oracle Reports.
Form might have an issue unless there is something already implemented to handle this but I can't recall (not that I'm a Forms exert though).

Solaris SMF configuration for Oracle CSS service

Below is the code to create oracle CSS service with solaris SMF, it will create smf service with instance name as "default" where as i need to change it to "css"
svc:/application/oracle/css:default
change service instance name to
svc:/application/oracle/css:css
i don't know about smf and xml, can somebody help me to change css service name ?
<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>

<service_bundle type='manifest' name='oracle-database-css'>
<service
name='application/oracle/css'
type='service'
version='1'>
<create_default_instance enabled='false' />

<dependency
   name='multi-user'
   grouping='require_all'
   restart_on='error'
   type='service'>
   <service_fmri
    value='svc:/milestone/multi-user:default' />
</dependency>
<exec_method
   type='method'
   name='start'
   exec='$ORACLE_HOME/bin/ocssd'
   timeout_seconds='30' >
   <method_context>
    <method_credential
     user='oracle'
     group='dba'
     supp_groups=':default'
     privileges=':default'
     limit_privileges=':default' />
    <method_environment>
     <envvar name='ORACLE_HOME' value='/u01/app/oracle/product/10.2.0.1/asm_1' />
    </method_environment>
   </method_context>
</exec_method>
<exec_method
   type='method'
   name='stop'
   exec=':kill'
   timeout_seconds='60' >
   <method_context>
    <method_credential
     user='oracle'
     group='dba'
     supp_groups=':default'
     privileges=':default'
     limit_privileges=':default' />
    <method_environment>
     <envvar name='ORACLE_HOME' value='/u01/app/oracle/product/10.2.0.1/asm_1' />
    </method_environment>
   </method_context>
</exec_method>

<property_group name='startd' type='framework'>
   <propval name='duration' type='astring' value='child' />
   <propval name='modify_authorization' type='astring'
    value='solaris.smf.manage.oracle.database' />
</property_group>
<property_group name='general' type='framework'>
   <propval name='modify_authorization' type='astring'
    value='solaris.smf.manage.oracle.database' />
   <propval name='action_authorization' type='astring'
    value='solaris.smf.manage.oracle.database' />
</property_group>
<stability value='Unstable' />
<template>
   <common_name>
    <loctext xml:lang='C'>
     Oracle Cluster Synchronization Services (CSS)
    </loctext>
   </common_name>
   <documentation>
    <doc_link
     name='Intro to Oracle Clusterware and Oracle Real Application Clusters'
     uri='http://download-east.oracle.com/docs'/>
   </documentation>
</template>
</service>
</service_bundle>Edited by: sachinonnet on Jan 12, 2010 2:01 AM

Hi,
i got the solution as below
removed below line from service xml
<create_default_instance enabled='false' />
created another xml file for instance xml, where i specified the name of instance.
<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>

<service_bundle type='manifest' name='oracle-database-instance'>
<service
name='application/oracle/database'
type='service'
version='1'>
<dependency name='oracle-asm' grouping='require_all' restart_on='none' type='service'>
    <service_fmri value='svc:/application/oracle/database:ASM' />
</dependency>

<instance name='PDB' enabled='false'>
   <method_context
    working_directory='/u01/app/oracle/product/10.2.0.1/db'
    project=':default'
    resource_pool=':default'>
    
    <method_credential
     user='oracle'
     group='dba'
     supp_groups=':default'
     privileges=':default'
     limit_privileges=':default'/>
    <method_environment>
     <envvar name='ORACLE_SID' value='pdb' />
     <envvar name='ORACLE_HOME' value='/u01/app/oracle/product/10.2.0.1/db' />
    </method_environment>
   </method_context>

<property_group name='options' type='application'>
   <stability value='External' />
   <propval name='instance_type' type='astring' value='RDBMS' />
   <propval name='modify_authorization' type='astring'
    value='solaris.smf.manage.oracle.database' />
</property_group>
</instance>
</service>
</service_bundle>

Windows Server 2012: SMB share with transparent failover

Have a nice day to all!
I have 2 HP Proliant DL380P Gen8 servers containing 8 x 1TB disks (with P420i HP Smart Array RAID Controller) in each server.
So, there are 2 arrays on every server:
1. 2 x 1TB in RAID1 (+1 disk for hot swap) - system volume
2. 5 x 1TB in RAID5 (+1 disk for hot swap) - data volume
And I installed Windows Server 2012 Standard on each server.
Than I created a failover two-nodes cluster.
And now I want to create a SMB share with transparent failover for all the second (data) volume (it's about 3.3TB in RAID5 array). How just can I reach this goal? I'm going to use it in future for Hyper-V VMs, so, the main reqirement is powered-on and working
VMs even if one node of SMB share cluster is failed.
I wasn't able to see my volumes in failover cluster manager. I tried to create iSCSI targets, storage pools, virtual disks, etc. but no luck. My failover cluster manager can't see it to create SMB share!
Can anyone advice me something?
Thanks in advance!

Have a nice day to all!
I have 2 HP Proliant DL380P Gen8 servers containing 8 x 1TB disks (with P420i HP Smart Array RAID Controller) in each server.
So, there are 2 arrays on every server:
1. 2 x 1TB in RAID1 (+1 disk for hot swap) - system volume
2. 5 x 1TB in RAID5 (+1 disk for hot swap) - data volume
And I installed Windows Server 2012 Standard on each server.
Than I created a failover two-nodes cluster.
And now I want to create a SMB share with transparent failover for all the second (data) volume (it's about 3.3TB in RAID5 array). How just can I reach this goal? I'm going to use it in future for Hyper-V VMs, so, the main reqirement is powered-on and working
VMs even if one node of SMB share cluster is failed.
I wasn't able to see my volumes in failover cluster manager. I tried to create iSCSI targets, storage pools, virtual disks, etc. but no luck. My failover cluster manager can't see it to create SMB share!
Can anyone advice me something?
Thanks in advance!
You need to have your storage you want to export as being a shared storage visible to your cluster (part of CSV). Then you'll configure failover file shares using content accessible from both cluster nodes. Refer to this manual for diagrams (ignore StarWind
and replace it logically with your existing shared storage you've used to create your cluster):
http://www.starwindsoftware.com/configuring-ha-file-server-on-windows-server-2012-for-smb-nas
Also see these manuals from MS on how to create failover file server:
http://technet.microsoft.com/en-us/library/cc753969.aspx
http://technet.microsoft.com/en-us/library/cc731844(v=ws.10).aspx
http://blogs.technet.com/b/askcore/archive/2010/08/19/working-with-file-shares-in-windows-server-2008-r2-failover-clusters.aspx
However if you want to use existing storage located on the both nodes you're out of luck. Microsoft does not provide anything representing local DAS to the cluster nodes. If you want to use existing DAS then you'll have to stick with a third-party product
like StarWind, SteelEye or DataCore. To create something like in this picture:
So you'll have a configuration with only two nodes, no physical shared hardware (SAS JBOD, FC or iSCSI) and vSAN. Refer to this manual:
http://www.starwindsoftware.com/ns-configuring-ha-file-server-for-smb-nas
Hope this helped :)
StarWind iSCSI SAN & NAS

Error 1067 while starting oracle css service in XP

Error 1067 while starting oracle css service in XP

Hi all,
I have installed Oracle10g in windows xp ...
I am unable to connect from my application . On viewing the services, oracleCSService is not yet started eventhough it is automatic . While manually starting it
its showing an error
"Could not start the oracleCSService on Local Computer
Error 1067: Process terminated Unexpectedly...."
I hope somebody can help me on this !!
Thanks

Web interface to read CSS service status

I'd like to write a web page to read the service summary. I'm using Content Switch SW Version 5.00. It looks like the xml interace can write but not read. Is there any method to read the CSS service status from an active server page?

Yes. This URL will help configure SNMP:
http://www.cisco.com/univercd/cc/td/doc/product/webscale/css/css_500/bsccfggd/snmp.htm
Also, if you are running SNMP on your CSS you should be mindful of the vulnerability documented in bug CSCdw64236. This is fixed in 5.0 build 37s downloadable from the below URL:
http://www.cisco.com/cgi-bin/tablebuild.pl/webns
Cheers,
Perry.

CSS Service not coming up after reboot

I installed oracle 10.2.0.2 (X86_64) on SLES 9 (x86_64) NON_RAC (single instance with ASM) and the CSS service is not coming up after reboot. If i do localconif reset then it starts up but fail after every reboot. Here is the data to debug:
This is the tail of my $ORACLE_HOME/log/servername/alertservername.log
2008-03-06 22:16:32.587
[client(13472)]CRS-1006:The OCR location /u01/app/epsora/product/10.2.0/cdata/localhost/local.ocr is inaccessible. Details in /u01/app/epsora/product/10.2.0/log/pune-srv-eps-02/client/clscfg_13472.log.
2008-03-06 22:16:32.588
[client(13472)]CRS-1006:The OCR location /u01/app/epsora/product/10.2.0/cdata/localhost/local.ocr is inaccessible. Details in /u01/app/epsora/product/10.2.0/log/pune-srv-eps-02/client/clscfg_13472.log.
2008-03-06 22:16:33.217
[client(13472)]CRS-1001:The OCR was formatted using version 2.
2008-03-06 22:17:54.615
[cssd(13670)]CRS-1601:CSSD Reconfiguration complete. Active nodes are pune-srv-eps-02 .
u01/app/epsora/product/10.2.0/cdata/localhost/local.ocr is accessible through oracle user:
$ls -l /u01/app/epsora/product/10.2.0/cdata/localhost/local.ocr
-rw-r--r-- 1 epsora epsdba 364544 2008-03-06 22:16 /u01/app/epsora/product/10.2.0/cdata/localhost/local.ocr
This is the log file mentioned in above file:
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracl
e. All rights reserved.
2008-03-06 22:16:32.579: [ OCROSD][2546869600]utread:3: problem reading buffer
5db000 buflen 512 retval 0 phy_offset 102400 retry 0
2008-03-06 22:16:32.580: [ OCROSD][2546869600]utread:4: problem reading the buf
fer errno 2 errstring No such file or directory
2008-03-06 22:16:32.580: [ OCROSD][2546869600]utread:3: problem reading buffer
5dd000 buflen 4096 retval 0 phy_offset 102400 retry 0
2008-03-06 22:16:32.580: [ OCROSD][2546869600]utread:4: problem reading the buf
fer errno 2 errstring No such file or directory
2008-03-06 22:16:32.580: [ OCRRAW][2546869600]propriogid:1: INVALID FORMAT
2008-03-06 22:16:32.580: [ OCROSD][2546869600]utread:3: problem reading buffer
5dd000 buflen 4096 retval 0 phy_offset 102400 retry 0
2008-03-06 22:16:32.580: [ OCROSD][2546869600]utread:4: problem reading the buf
fer errno 2 errstring No such file or directory
2008-03-06 22:16:32.588: [ OCRRAW][2546869600]ibctx:1:ERROR: INVALID FORMAT
2008-03-06 22:16:32.588: [ OCRRAW][2546869600]proprinit:problem reading the boo
tblock or superbloc 22
2008-03-06 22:16:32.588: [ default][2546869600]a_init:7!: Backend init unsuccess
ful : [22]
2008-03-06 22:16:32.588: [ OCROSD][2546869600]utread:3: problem reading buffer
5dd000 buflen 512 retval 0 phy_offset 102400 retry 0
2008-03-06 22:16:32.588: [ OCROSD][2546869600]utread:4: problem reading the buf
fer errno 2 errstring No such file or directory
2008-03-06 22:16:32.588: [ OCROSD][2546869600]utread:3: problem reading buffer
5dd000 buflen 4096 retval 0 phy_offset 102400 retry 0
2008-03-06 22:16:32.588: [ OCROSD][2546869600]utread:4: problem reading the buf
fer errno 2 errstring No such file or directory
2008-03-06 22:16:32.588: [ OCRRAW][2546869600]propriogid:1: INVALID FORMAT
2008-03-06 22:16:32.588: [ OCROSD][2546869600]utread:3: problem reading buffer
5dd000 buflen 4096 retval 0 phy_offset 102400 retry 0
2008-03-06 22:16:32.588: [ OCROSD][2546869600]utread:4: problem reading the buf
fer errno 2 errstring No such file or directory
2008-03-06 22:16:32.588: [ OCRRAW][2546869600]ibctx:1:ERROR: INVALID FORMAT
2008-03-06 22:16:32.588: [ OCRRAW][2546869600]proprinit:problem reading the boo
tblock or superbloc 22
2008-03-06 22:16:32.588: [ OCROSD][2546869600]utread:3: problem reading buffer
5dd000 buflen 512 retval 0 phy_offset 102400 retry 0
2008-03-06 22:16:32.588: [ OCROSD][2546869600]utread:4: problem reading the buf
fer errno 2 errstring No such file or directory
2008-03-06 22:16:32.588: [ OCROSD][2546869600]utread:3: problem reading buffer
5dd000 buflen 4096 retval 0 phy_offset 102400 retry 0
2008-03-06 22:16:32.588: [ OCROSD][2546869600]utread:4: problem reading the buf
fer errno 2 errstring No such file or directory
2008-03-06 22:16:32.588: [ OCRRAW][2546869600]propriogid:1: INVALID FORMAT
2008-03-06 22:16:32.676: [ OCRRAW][2546869600]propriowv: Vote information on di
sk 0 [u01/app/epsora/product/10.2.0/cdata/localhost/local.ocr] is adjusted from
[0/0] to [2/2]
2008-03-06 22:16:33.217: [ OCRRAW][2546869600]propriniconfig:No 92 configuratio
n
2008-03-06 22:16:33.217: [ OCRAPI][2546869600]a_init:6a: Backend init successfu
l
This is tail of my ocssd.log
[    CSSD]2008-03-06 22:17:55.955 [98309] >TRACE: clssgmClientConnectMsg: Connect from con(0x7a1b10) proc(0x7a4e70) pid(13754) proto(10:2:1:1)
[    CSSD]2008-03-06 22:17:56.455 [98309] >TRACE: clssgmClientConnectMsg: Connect from con(0x7a3740) proc(0x7a4e70) pid(13773) proto(10:2:1:1)
[    CSSD]2008-03-06 22:17:56.460 [98309] >TRACE: clssgmClientConnectMsg: Connect from con(0x7a5b10) proc(0x7a7f50) pid(13772) proto(10:2:1:1)
[    CSSD]2008-03-06 22:18:08.380 [98309] >TRACE: clssgmClientConnectMsg: Connect from con(0x7a9770) proc(0x7a1f50) pid(13839) proto(10:2:1:1)
[    CSSD]2008-03-06 22:18:08.383 [98309] >TRACE: clssgmClientConnectMsg: Connect from con(0x7a9770) proc(0x7a1f50) pid(13839) proto(10:2:1:1)
[    CSSD]2008-03-06 22:18:11.556 [98309] >TRACE: clssgmClientConnectMsg: Connect from con(0x7aa3c0) proc(0x7ac7d0) pid(13845) proto(10:2:1:1)
[    CSSD]2008-03-06 22:18:17.873 [98309] >TRACE: clssgmClientConnectMsg: Connect from con(0x7ad070) proc(0x7af480) pid(13828) proto(10:2:1:1)
[    CSSD]2008-03-06 22:44:22.700 [98309] >TRACE: clssgmClientConnectMsg: Connect from con(0x7a9850) proc(0x7a1e50) pid(14060) proto(10:2:1:1)
[    CSSD]2008-03-06 22:44:22.703 [98309] >TRACE: clssgmClientConnectMsg: Connect from con(0x7a22b0) proc(0x7a9850) pid(14060) proto(10:2:1:1)
[    CSSD]2008-03-06 22:44:25.875 [98309] >TRACE: clssgmClientConnectMsg: Connect from con(0x7a32b0) proc(0x7a1800) pid(14077) proto(10:2:1:1)
And this is my css.log
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracl
e. All rights reserved.
2008-03-06 22:17:47.404: [ CSSCLNT][2549892096]clsssInitNative: connect failed,
rc 9
I checked on metalink and found that this error can occur due to some old oracle references or changes in the uid of users but this is fresh install no oracle was there before and we didnt' change the OS user's uid also.
Please let me know what else to check or SR is the only option?
Thanks
Daljit Singh

Do you mean CRS rather than CSS?
I note that your database is patched to 10.2.0.2 but your CRS is still at 10.2.0.1. Bring both to 10.2.0.3 and if that doesn't resolve the issue open an SR at metalink.
Do not install the January CPU ... it appears to have issues with RAC that we are currently investigating.

How long I have to wail the Stateful Failover on CSS 11154 ?

Somebody knows when the next Webns release is expected to implent the TCP Stateful Failover on CSS with VIP redundancy configuration.
At the begining of the year, the Product manager said that will be available on the WebNs V6.
For information: Alteon WEBOS v8 has released this feature for more one year ago.
What do cisco ?

Is Adaptive Session Redundancy what you are looking for?
http://www.cisco.com/univercd/cc/td/doc/product/webscale/css/css_510/advcfggd/vipredun.htm#xtocid24

CSS service Down even though responding to icmp probe

Hi,
A server is responding to PINGs but when configured to be used on the CSS service, it will keep in "Down" state. I have tried to add additional services on the CSS for other valid destinations and all would not become Alive.
CSS01#
ping 10.10.101.98
Pinging 10.10.101.98 1 time(s)...
Working(-) 1/1
100% Success.
CSS01#
service test_service
ip address 10.10.101.98
keepalive type icmp
active
show keepalive AUTO_test_service
Name: AUTO_test_service Index: 66 State: Down
Description: Auto generated for service test_service
Address: 10.10.101.98 Port: Any
Type:            ICMP
Frequency:        5
Max Failures:     3
Retry Frequency: 5
Dependent Services:
    test_service
Anything to check on the CSS that might indicate what is the issue? system-resource does not give an indication that memory or cpu are exhausted.
Regards

I am currently having this issue myself - very annoying. No new services will show "Alive" no matter what the keepalive configured is.
I am currently thinking a reload will "fix all" but I would rather not reload my production CSS - if there is a reason/explianation and a solution?!
CSS11503(debug)# show uptime ${output}
Uptime:
CSS5-SCM-2GE G0        : 1281 days 23:59:08
CSS5-IOM-2GE E0        : 1281 days 23:59:05
CSS5-SSL-K9 G0         : 1281 days 23:59:05
CSS503-SM-INT          : 1281 days 23:59:05
CSS11503(debug)# echo "show disk" ${output}
show disk
CSS11503(debug)# show disk ${output}
PCMCIA Slot: 0
          total # of clusters: 62544
            bytes per cluster: 16384
                free clusters: 57668
                 bad clusters: 0
                   free bytes: 944832512 (944 MB)
    max contiguous free bytes: 876724224 (876 MB)
                        files: 707
                      folders: 40
         total bytes in files: 71821754
                  lost chains: 0
   total bytes in lost chains: 0
CSS11503(debug)# echo "show running-config" ${output}
show running-config
CSS11503(debug)# show running-config ${output}
!Generated on 03/18/2010 14:24:02
!Active version: sg0810106

ASM : Css service is starting

Hi
I m trying to start asm on server :
# ./localconfig reset
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'system'..
Operation successful.
Configuration for local CSS has been initialized
Adding to inittab
Startup will be queued to init within 30 seconds.
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.
Giving up: Oracle CSS stack appears NOT to be running.
Oracle CSS service would not start as installed
Automatic Storage Management(ASM) cannot be used until Oracle CSS service is started
M new to ASM .....

Hi,
unfortunately your question misses a lot of crucial information to help you:
a.) What version of Oracle software have you installed?
b.) If 11.2 have you installed Grid Infrastructure?
c.) Do you run a cluster or stand alone ASM instance / have you installed Oracle Clusterware/GI for single instance or not?
d.) What O/S do you run?
e.) Output of the Error stack of the css logfile?
Regards
Sebastian

Failover - How to achieve a transparent failover using SQLPlus

AIX 5.3 Oracle Clusterware 10.2.0.4.0 Oracle Enterprise Edition 10.2.0.4.0
This is the behavior I see from an Oracle Client session which is to be expected if I read RAC: Frequently Asked Questions [ID 220970.1] see below
(1) SQLPlus session connected to NodeA
(2) NodeA - Clusterware services stopped
(3) NodeA-vip has failed over to Node B
(4) SQLPlus session receives an error
(5) SQLPlus establish new connection to NodeA-vip
My question is how is a transparent SQLPLus session failover achieved as illustrated in [ID 339107.1] see below
*** Dedicated Connections to a Migrated VIP Can Lose their Connection after the VIP is Switched Back [ID 339107.1] ***
SQL> select instance_name from v$instance;
INSTANCE_NAME
rac11g1
$ crsctl stop crs
SQL> /
INSTANCE_NAME
rac11g2
RAC: Frequently Asked Questions [ID 220970.1]
*** Why do we have a Virtual IP (VIP) in Oracle RAC 10g or 11g? Why does it just return a dead connection when its primary node fails? ***
The goal is application availability.
When a node fails, the VIP associated with it is automatically failed over to some other node. When this occurs, the following things happen.
(1) VIP detects public network failure which generates a FAN event.
(2) the new node re-arps the world indicating a new MAC address for the IP.
(3) connected clients subscribing to FAN immediately receive ORA-3113 error or equivalent. Those not subscribing to FAN will eventually time out.
(4) New connection requests rapidly traverse the tnsnames.ora address list skipping over the dead nodes, instead of having to wait on TCP-IP timeouts
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error.
As a result, you don't really have a good HA solution without using VIPs and FAN. The easiest way to use FAN is to use an integrated client with Fast Connection Failover (FCF) such as JDBC, OCI, or ODP.NET.
*** What do the VIP resources do once they detect a node has failed/gone down? Are the VIPs automatically acquired, and published, or is manual intervention required? Are VIPs mandatory? ***
With Oracle RAC 10g or higher, each node requires a VIP. With Oracle RAC 11g Release 2, 3 additional SCAN vips are required for the cluster. When a node fails, the VIP associated with the failed node is automatically failed over to one of the other nodes in the cluster. When this occurs, two things happen:
1. The new node re-arps the world indicating a new MAC address for this IP address. For directly connected clients, this usually causes them to see errors on their connections to the old address;
2. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.
In the case of existing SQL conenctions, errors will typically be in the form of ORA-3113 errors, while a new connection using an address list will select the next entry in the list. Without using VIPs, clients connected to a node that died will often wait for a TCP/IP timeout period before getting an error. This can be as long as 10 minutes or more. As a result, you don't really have a good HA solution without using VIPs.
With Oracle RAC 11g Release 2, you can delegate the management of the VIPs to the cluster. If you do this, the Grid Naming Service (part of the Oracle Clusterware) will automatically allocated and manage all VIPs in the cluster. This requires a DHCP service on the public network.
Thank you
Steve

Answer = Follow MetaLink 377100.1

WWW service is not able to start via Microsoft Failover Cluster generic service resource

Environment
Cluster Nodes = two
Cluster Nodes OS = Windows 2008R2
Application = IIS
Query
I created generic service resources of many windows services under Microsoft Failover Cluster and they are failing over successfully but when I create a generic service resource for WWW, then the WWW service is not able to online
via Microsoft Failover Cluster. It stuck in online pending.
I have noticed two things.
1.) If the WWW service is set to manual and started at passive node and I manually restart the Active node then the WWW service successfully switch over to stand by/passive node. but if the WWW service is set to
manual and not started on stand by/Passive node then the WWW service is not failing over.
2.) if I kill the WWW service manually (as a test case) on Active Node via this command (taskkill /f /pid XXXX) then the WWW service failed and is not failing
over to standby/passive node.
Any comment will be appreciated. Thanks. Zahid Haseeb.

The problem is resolved. I feel that it will be helpful to other people who may face the same problem which I faced, so I wrote a blog on "How to configure IIS Web Site and Application Pool in Microsoft Failover Cluster" and mentioned almost all activities
which I have done. Kindly see the resolution under section "Configure some changes in Cluster Configuration" in the below link
http://zahidhaseeb.wordpress.com/2014/02/12/how-to-configure-iis-web-site-and-application-pool-in-microsoft-failover-cluster/
Any comment will be appreciated. Thanks. Zahid Haseeb.

CSS Service Down - Why?

I am setting up a basic service on my CSS but it is showing as DOWN even though I can reach it fine directly and from the CSS.
I'd appreciate any hints on why. I am running sg0810106 (08.10.1.06) on a CSS 11501. Other services on the same backend subnet are working fine.
Here is the service definition and status:
CSS11501# sh run service vmi-82-100-8000
!************************** SERVICE **************************
service vmi-82-100-8000
ip address 10.10.82.100
protocol tcp
port 8000
keepalive type tcp
keepalive port 8000
active
CSS11501# sh service vmi-82-100-8000
Name: vmi-82-100-8000   Index: 37
Type: Local            State: Down
Rule ( 10.10.82.100 TCP 8000 )
Session Redundancy: Disabled
Redirect Domain:
Redirect String:
Keepalive: (TCP-8000   5   3   5 )
Keepalive Encryption:      Disabled
Last Clearing of Stats Counters: 10/21/2010 07:24:59
Mtu:                       1500        State Transitions:            0
Total Local Connections:   0           Total Backup Connections:     0
Current Local Connections: 0           Current Backup Connections:   0
Total Connections:         0           Max Connections:              65534
Total Reused Conns:        0
Weight:                    1           Load:                         255
Weight Reporting:          None
CSS11501#
Debugging in the CSS seems to indcate the service should be up:
CSS11501(debug)# icp probe service vmi-82-100-8000
Probing 10.10.82.100:8000(-) KeepAlive probe..
IP Address:       10.10.82.100
Port:             8000
URL:              /
HTTP Version:     1.1
Server Model:     Apache/2.2.9 (Unix) mod_ssl/2.2.9 OpenSSL/0.9.8h
Server Date:      Thu, 21 Oct 2010 16:11:37 GMT
HEAD Response:    404 Not Found
HEAD Support:     Yes
Persistence:      Yes
Keep-Alive:       Yes
Request Depth:    100
TBR:              5
Connect Time:     1 ms
Rqst/Rsp Time:    4 ms
Pipeline:         No
SSL:              No
CSS11501(debug)# icp probe host 10.10.82.100
Probing 10.10.82.100:80(-) KeepAlive probe..
IP Address:       10.10.82.100
Port:             80
URL:              /
HTTP Version:     1.1
Server Model:     Apache/2.2.3 (Unix) mod_ssl/2.2.3 OpenSSL/0.9.8h
Server Date:      Thu, 21 Oct 2010 16:12:34 GMT
HEAD Response:    403 Forbidden
HEAD Support:     Yes
Persistence:      Yes
Keep-Alive:       Yes
Request Depth:    100
TBR:              5
Connect Time:     1 ms
Rqst/Rsp Time:    13 ms
Pipeline:         No
SSL:              No

Hi,
When you say "I can reach it fine directly and from the CSS." I'm assuming you mean ICMP reach correct?
From your PC when you do C:> telnet 10.10.82.100 8000 what do you get?
ICP probe is not a good test here as it's trying to reach the site with a HTTP HEAD and not getting a response as shown in the output..
HEAD Response:    404 Not Found
HEAD Response:    403 Forbidden
Is this a HTTP service that you're trying to probe?
Are you able to connect using the command:
CSS11503(config) socket connect host 10.10.82.100 port 8000 tcp
Regards
Pablo

CSS & Service Port Command

I am trying to fix a problem in our network that I believe to be caused by ephemeral ports originating on the CSS (tcp 6000-6063). My questions is as follows: what exactly does the "(config-service)port" command do? I trying to avoid using the above mentioned ports as destination port numbers (I thing?!). Would the following command accomplish this?
(config-service)port 6064 range 65535
If you have any questions or need further clarification just let me know. Thanks for the help guys.
bc

Gilles,
I'm attaching a diagram and config file to help explain what is happening.
In step 5 of the diagram when the webservers are responding to the request for content is where we encounter the issue. When the web boxes respond to the CSS with content they respond with incrementing source ports. These ports range from approx. 2000-65500. I am thinking that the CSS doesn't really care what the actual source port of the internet user is and assigns a source port from the incrementing range I described above. When the checkpoint FW see ports in the 6000-6063 range it recognizes them as X11 traffic and denies it b/c it is considered a security risk (or at least thats what I assume). When these packets are denied we lose access to those webservers for about 2 minutes untill the ports cycle out of the X11 range. I've also attached a screenshot of some of the loggs so that you can see the incrementing port numbers.
I have two possible soutions for this problem. The first is to add an extra rule in the FW and the second is to somehow exclude the 6000-6063 range in the CSS. Let me know if you have any further questoins. Thanks.
bc

Transparent failover of CSS services

Similar Messages

Maybe you are looking for