Automatic Site Fail Over

Hi
Here is my setup
2 Mail box servers cluster with one dag in Toronto. 1 Cas server in Toronto.
My question is if I setup another mailbox Cas server in vancouver. Can I acheieve automatic site fail over to Vancouver if Toronto goes down?
Is it recommended to do so? or what would be a better site failover strategy?

In simple words Automatic failover won't happen if Toronto goes down.
First let us read Exchange 2010 DAG Misconceptions
here
For multi-site DAG, you need to enable DAC (Datacenter Activation coordination), that will require to contact each DAG member to seek permission if it can mount databases. Since Toronto is down or there is network problem, Remote site would not be able to
mount databases.
Failover to Local Site is always automatic (preferred) and to remote site manual (preferred again). Because, Exchange has no knowledge of your Infrastructure and can't make smart decision whether to mount databases.
Exchange does support mounting of databases, and requires careful steps for failover and fallback. As explained below -
http://smtpport25.wordpress.com/2010/12/10/exchange-2010-dag-local-and-site-drfailover-and-fail-back/
Hope this helps.
Regards,
Sarvesh Goel
MCP, MCITP, MCTS, MCSA - Directory Services and Microsoft Exchange
Ok but what is the reason automatic site fail over will not occur?
The reason why I'm asking is because another admins believes if Vancouver is added to the DAG then the databases should mount automatically when the other nodes fail.

Similar Messages

Automatic windows agent fail over in SCOM 2012 / 2012 R2

Hi All,
I have a question with respect to SCOM 2012 / 2012 R2 windows agent fail over.
For example i have only 400 windows agents and i have 2 MS in my environment. So 200 windows agents are managed by MS1 and the rest 200 are managed by the MS2.
What i want to know is if either of the MS (MS1 or MS2) Shutdown / restart or any thing happens to them, Do the agents automatically fail over to the other MS ? As i have not configured ant thing for this as i don't know.
So is there any configuration to be done for the above to happen after the deployment of both the management servers is done before discovering the agents or does it automatically understand take the rest 200 agents thinking that the other MS is down ?
Gautam.75801

Hi All,
I have a question with respect to SCOM 2012 / 2012 R2 windows agent fail over.
For example i have only 400 windows agents and i have 2 MS in my environment. So 200 windows agents are managed by MS1 and the rest 200 are managed by the MS2.
What i want to know is if either of the MS (MS1 or MS2) Shutdown / restart or any thing happens to them, Do the agents automatically fail over to the other MS ? As i have not configured ant thing for this as i don't know.
So is there any configuration to be done for the above to happen after the deployment of both the management servers is done before discovering the agents or does it automatically understand take the rest 200 agents thinking that the other MS is down ?
Gautam.75801
This happens automagically for Windows agents within a domain that are assigned to a Management Server. For other scenarios, such as Gateways, the Gateways should be configured for failover between Management Servers, and the agents attached to those
Gateways should be configured to failover between the Gateways. Cross platform agents report to a resource pool, so that happens automagically as well.
To confirm your scenario, simply run the Get-SCOMAgent cmdlet.
To test you can do the following, assuming first agent in the array isn't a Management Server/Gateway :)
$SCOMAgents = get-scomagent
$SCOMAgents[0].PrimaryManagementServerName
$SCOMAgents[0].GetFailoverManagementServers()
Supporting article:
http://blogs.technet.com/b/jimmyharper/archive/2010/07/23/powershell-commands-to-configure-gateway-server-agent-failover.aspx

Firefox Proxy Fail-over is not working correctly

I am in a corporate environment, where we must use a complex auto-proxy, by configuring an automatic proxy configuration of http://proxyconf/proxy.pac. I am seeing an intermittent failure with Firefox 3.6.13, where the same site will load after a delay in IE (e.g. it works for half an hour, then fails for a while, etc.).
By using Wireshark and tracing the packets, I have identified that a proxy server is intermittently failing, and Firefox is failing to try the second proxy. The auto proxy rule that is being invoked is:
if (!isResolvable(host)) return "PROXY 172.16.39.201:8080; PROXY 10.241.32.28:8080";
The problem is that Firefox is never failing over - it tries the 172 address 6 times in a row, then gives up and displays the "The proxy server is refusing connections" "Firefox is configured to use a proxy server that is refusing connections." "* Check the proxy settings to make sure that they are correct." "* Contact your network administrator to make sure the proxy server is working." error message. It continues with this behavior regardless of how many attempts, reloads, restarts are tried.
IE on the other hand will try and fail with the 172 address, and then start using the 10. address (which works correctly). Several other applications also work correctly, such as IRC clients.
Obviously the corporate proxy that is failing must be fixed, however Firefox is failing to utilitize the 2nd proxy after the first one fails.
Seems like a bug.
Is there some easy way for me to replace the proxy file with my own file? E.g. replace http://http://proxyconf/proxy.pac with file://c:\..., or use some add-on?
It must be an autoproxy script, as there is no single proxy that I can use for all addresses.

You can correct this issue by forcing the file blocklist.xml to update or wait until Firefox updates the file. 
That update will remove the severity="0" flags in the file that cause the problem.
See:
* [/questions/832793?page=2#answer-198407]
* http://forums.mozillazine.org/viewtopic.php?p=10899869#p10899869
*[https://bugzilla.mozilla.org/show_bug.cgi?id=663722 Bug 663722] - The blocklist output is including severity="0" where it shouldn't be

Failed over to a Aysnc Replica and now previous primary replica(Now Secondary) is in NOT SYNC state

Hello All,
Here is my situation :
3 Nodes in an AG configuration, and its a multi-site cluster. Sync commit between 2 nodes in one DC and Async commit to a node in the DR DC.
AG is failed over to the Async Replica which is the DR site and all the databases comes up fine and application also can connect using the listener.
When checked the state of secondary databases, its in NOT SYNC mode. Data is suspended automatically. I can resume data movement to fix the problem, but was curious why this will be in NOT SYNC mode?
Thanks in advance.
Thank you,
Anup
<div> Anup | Database Consultant Blog: <a href="www.sqlsailor.com/">www.sqlsailor.com</a> Twitter: <a href="https://twitter.com/#!/AnupWarrier"> Follow me !</a>
 Please use Mark as Answer if my post solved your problem and use Vote As Helpful if a post was useful. </div>

Hello Anup,
The reason this happens is because of the forced failover needed to be used when moving to an Async replica. It will cause all other replicas to become suspended due to the fact that it is never known if data loss will occur or not.
It might not make sense right now, but think about a situation where the databases are not synchronized and failover is forced (it has to work in all situations). There may be a good bit of data on the primary replica that has not yet made it (or partially)
to the async secondary. It wouldn't make sense to negotiate the primary back down (after all, it's the async one) and undo valid transactions. It also allows for a database snapshot or other method to be done on the old sync primary which could be used for
DR purposes to get those valid transactions and data out.
BOL Doc:
http://msdn.microsoft.com/en-us/library/hh213151.aspx#ForcedFailover
Sean Gallardy | Blog |
Twitter

Is there a way to config WLS to fail over from a primary RAC cluster to a DR RAC cluster?

Here's the situation:
We have two Oracle RAC clusters, one in a primary site, and the other in a DR site
Although they run active/active using some sort of replication (Oracle Streams? not sure), we are being asked to use only the one currently being used as the primary to prevent latency & conflict issues
We are using this only for read-only queries.
We are not concerned with XA
We're using WebLogic 10.3.5 with MultiDatasources, using the Oracle Thin driver (non-XA for this use case) for instances
I know how to set up MultiDatasources for an individual RAC cluster, and I have been doing that for years.
Question:
Is there a way to configure MultiDatasources (mDS) in WebLogic to allow for automatic failover between the two clusters, or does the app have to be coded to failover from an mDS that's not working to one that's working (with preference to a currently labelled "primary" site).
Note:
We still want to have load balancing across the current "primary" cluster's members
Is there a "best practice" here?

Hi Steve,
There are 2 ways to connect WLS to a Oracle RAC.
1. Use the Oracle RAC service URL which contains the details of all the RAC nodes and the respective IP address and DNS.
2. Connect to the primary cluster as you are currently doing and use a MDS to load-balance/failover between multiple nodes in the primary RAC (if applicable).
In case of a primary RAC nodes failure and switch to DR RAC nodes, use WLST scripts to change the connection URL and restart the application to remove any old connections.
Such DB fail-over tests can be conducted in a test/reference environment to set up the required log monitoring and subsequent steps to measure the timelines.
Thanks,
Souvik.

ISE fail over

Hi I have 2 ise 3315 working in stanalone mode
I have 2 sites
ISE_1 is installed on site 1 and manage user groupe_1
ISE_2 is installed on site 2 and manage user groupe_2
I am plannig to use the 2 ISE in fail over
I would like to configure
1. ISE_1 to be primary for user groupe_1 and secondary (backup) for user groupe_2
2. ISE_2 to be primary for user groupe_2 and secondary (backup) for user groupe_1
Please how can I configure it ?
Which midofication would I add on the switch, WLC and ISE ?
Thanks in advance for your help

Hello,
In this case, you can use a simple 2-node deployment scenario, in this scenario you will have ISE-1 as: primary admin, secondary monitor, and PSN. you'll have ISE-2 as: secondary admin, primary monior, and PSN.
Be aware of these points:
1- If ISE-1 went down, you have to access ISE-2 GUI and promote it manually.
2- If ISE-2 fails, no problem the monitoring persona failover happens automatically.
3- To load balance the users you are talking about, you have to do this based on NADs. for example you have 4 switches, so do the following:
A.make SW1 and SW2 point to ISE-1 and ISE-2 as the radius servers but give higher priority to ISE-1.
B.make SW3 and SW4 point to ISE-1 and ISE-2 as the radius servers but give higher priority to ISE-2.
So you have divided the job on the two nodes, if one is down the other will handle all the communications with the NADs.
check this document for all the info you mau need regarding distributed deployments ( and yes the connection speed between the two nodes should be 1Gbps)
http://www.cisco.com/en/US/solutions/collateral/ns340/ns414/ns742/ns744/docs/howto_50_ise_deployment_tg.pdf
Message was edited by: Ahmed AboRahal to add the document link.

SQL Server 2014 Always on HA takes 8-14 seconds to fail over. Application side timeouts occur

Hi All,
I have a very similar post in the SQL Server 2014 forums too (https://social.technet.microsoft.com/Forums/sqlserver/en-US/adb5e338-907e-4405-aa62-d3ea93c7a98a/sql-server-2014-always-on-ha-takes-814-seconds-to-fail-over-application-side-timeouts-occur?forum=sqldisasterrecovery) -
advice in the end was to post a question here.
SQL Server Nodes, 2014 (12.0.2480.0)
1 Share witness (on separate subnet)
1 Cluster
1 Listener
I have been testing the response time to failovers – both manual (right-click, fail over in SSMS) and Automatic (shut down the primary host). The way I am testing response is to have a SSMS query running on my desktop, connected to the listener querying
a small table and hit execute.
The Query response time, from execute to receiving the result, has been between 8 and 14 seconds based on my testing. My previous experience (in a separate environment) showed around 2 second fail over times in a very similar configuration.
Availability DB is 200Mb and is not actively used. The nodes are synchronised.
SQL Server Hosts: Windows 2012, 2 cpu, 8gb RAM.
Questions:
1: It’s a big question but what should I expect for a ‘normal’ fail over time. Keep in mind this scenario is about as simple as it gets.
2: As it stands an 8 to 14 second ‘outage’ could cause some applications to time out. Or am I being un-reasonable? I am seeing the very simple query in SSMS to time out with this:
Msg 983, Level 14, State 1, Line 2
Unable to access availability database 'DATABASE' because the database replica is not in the PRIMARY or SECONDARY role. Connections to
an availability database is permitted only when the database replica is in the PRIMARY or SECONDARY role. Try the operation again later.
Cluster logs are long - this section accounts for 8 seconds of the 11 second outage I experienced. I can supply the full log if required. Also this log is just the 2 cluster nodes, I removed the witness share to make sure it was as simple as possible.
00001090.00002128::2015/02/25-03:05:08.255 INFO [GEM] Node 2: Deleting [1:65 , 1:71] (both included) as it has been ack'd by every node
00001ee4.00002130::2015/02/25-03:05:10.107 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
00001090.00002128::2015/02/25-03:05:11.888 INFO [GEM] Node 2: Deleting [1:72 , 1:73] (both included) as it has been ack'd by every node
00001090.00002698::2015/02/25-03:05:11.889 INFO [GUM] Node 2: Processing RequestLock 2:49
00001090.00002128::2015/02/25-03:05:11.890 INFO [GUM] Node 2: Processing GrantLock to 2 (sent by 1 gumid: 67)
00001090.00002698::2015/02/25-03:05:11.890 INFO [GUM] Node 2: executing request locally, gumId:68, my action: /dm/update, # of updates: 1
00001090.00002128::2015/02/25-03:05:12.890 INFO [GEM] Node 2: Deleting [1:74 , 1:74] (both included) as it has been ack'd by every node
00001ee4.00002130::2015/02/25-03:05:15.107 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
00001090.00002128::2015/02/25-03:05:16.988 INFO [GUM] Node 2: Processing RequestLock 1:28
Thanks in advance.
Keegan

Hi Keegan,
From these event log , what I can see is "Sending request Netname" wasted the time .
Could you please tell us the network configuration of that cluster nodes ?
If I recall correctly , it is recommended to only remain Tcp/IP protocol and disable NetBIOS over TCP/IP for "Private Network" , also do not configure DNS/Wins default gateway for "Private Network" :
https://support.microsoft.com/kb/258750?wa=wsignin1.0
After that please test again .
Best Regards,
Elton JI
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected] .

Front End pool failed over

Hi all,
1. I setup a pool with three Front End servers (FQDN of pool is pool.site1.sip96x2.com and it's pointed to IP address of three Front End servers). Everything works fine. But When I disable network interface on FE1 and FE2, the Lync clients are disconnected.
I haven't understood clearly how the Lync clients failed over in a pool? Please clarify to me.
2. I have two central site (Root site and Primary site, they have different domain sip96x2.com and site1.sip96x2.com). The simple URL dialin is pointed to Front End server at Root site. So if the link between Root site and Primary site is down, how can the
users at Primary site connect to dialin URL?
3. In building topology for Front End pool, I checked Override FQDN internal web service and the FQDN is "poolint.site1.sip96x2.com". I created three A records "poolint.site1.sip96x2.com" and pointed to three IP addresses of Front End
servers. Is it right?
Thanks so much!

Ah ok, well first thing if I am reading this correctly, pool pairing Standard with Enterprise is not supported. You should only pair Standard with Standard and Enterprise with Enterprise (even though topology builder won't stop you) Take a look here for
support scenarios http://technet.microsoft.com/en-us/library/jj204697.aspx
To deal with the simple URLs in the event of failover you need to add them using Powershell. Take a look at this article which explains and gives an example: http://blogs.perficient.com/microsoft/2012/01/configuring-simple-urls-for-multiple-lync-pools/
If this helped you please click "Vote As Helpful" if it answered your question please click "Mark As Answer"
Georg Thomas | Lync MVP
Blog www.lynced.com.au | Twitter
@georgathomas
Lync Edge Port Check (Beta)

OCR and voting disks on ASM, problems in case of fail-over instances

Hi everybody
in case at your site you :
- have an 11.2 fail-over cluster using Grid Infrastructure (CRS, OCR, voting disks),
where you have yourself created additional CRS resources to handle single-node db instances,
their listener, their disks and so on (which are started only on one node at a time,
can fail from that node and restart to another);
- have put OCR and voting disks into an ASM diskgroup (as strongly suggested by Oracle);
then you might have problems (as we had) because you might:
- reach max number of diskgroups handled by an ASM instance (63 only, above which you get ORA-15068);
- experiment delays (especially in case of multipath), find fake CRS resources, etc.
whenever you dismount disks from one node and mount to another;
So (if both conditions are true) you might be interested in this story,
then please keep reading on for the boring details.
One step backward (I'll try to keep it simple).
Oracle Grid Infrastructure is mainly used by RAC db instances,
which means that any db you create usually has one instance started on each node,
and all instances access read / write the same disks from each node.
So, ASM instance on each node will mount diskgroups in Shared Mode,
because the same diskgroups are mounted also by other ASM instances on the other nodes.
ASM instances have a spfile parameter CLUSTER_DATABASE=true (and this parameter implies
that every diskgroup is mounted in Shared Mode, among other things).
In this context, it is quite obvious that Oracle strongly recommends to put OCR and voting disks
inside ASM: this (usually called CRS_DATA) will become diskgroup number 1
and ASM instances will mount it before CRS starts.
Then, additional diskgroup will be added by users, for DATA, REDO, FRA etc of each RAC db,
and will be mounted later when a RAC db instance starts on the specific node.
In case of fail-over cluster, where instances are not RAC type and there is
only one instance running (on one of the nodes) at any time for each db, it is different.
All diskgroups of db instances don't need to be mounted in Shared Mode,
because they are used by one instance only at a time
(on the contrary, they should be mounted in Exclusive Mode).
Yet, if you follow Oracle advice and put OCR and voting inside ASM, then:
- at installation OUI will start ASM instance on each node with CLUSTER_DATABASE=true;
- the first diskgroup, which contains OCR and votings, will be mounted Shared Mode;
- all other diskgroups, used by each db instance, will be mounted Shared Mode, too,
even if you'll take care that they'll be mounted by one ASM instance at a time.
At our site, for our three-nodes cluster, this fact has two consequences.
One conseguence is that we hit ORA-15068 limit (max 63 diskgroups) earlier than expected:
- none ot the instances on this cluster are Production (only Test, Dev, etc);
- we planned to have usually 10 instances on each node, each of them with 3 diskgroups (DATA, REDO, FRA),
so 30 diskgroups each node, for a total of 90 diskgroups (30 instances) on the cluster;
- in case one node failed, surviving two should get resources of the failing node,
in the worst case: one node with 60 diskgroups (20 instances), the other one with 30 diskgroups (10 instances)
- in case two nodes failed, the only node survived should not be able to mount additional diskgroups
(because of limit of max 63 diskgroup mounted by an ASM instance), so all other would remain unmounted
and their db instances stopped (they are not Production instances);
But it didn't worked, since ASM has parameter CLUSTER_DATABASE=true, so you cannot mount 90 diskgroups,
you can mount 62 globally (once a diskgroup is mounted on one node, it is given a number between 2 and 63,
and other diskgroups mounted on other nodes cannot reuse that number).
So as a matter of fact we can mount only 21 diskgroups (about 7 instances) on each node.
The second conseguence is that, every time our CRS handmade scripts dismount diskgroups
from one node and mount it to another, there are delays in the range of seconds (especially with multipath).
Also we found inside CRS log that, whenever we mounted diskgroups (on one node only), then
behind the scenes were created on the fly additional fake resources
of type ora*.dg, maybe to accomodate the fact that on other nodes those diskgroups were left unmounted
(once again, instances are single-node here, and not RAC type).
That's all.
Did anyone go into similar problems?
We opened a SR to Oracle asking about what options do we have here, and we are disappointed by their answer.
Regards
Oscar

Hi Klaas-Jan
- best practises require that also online redolog files are in a separate diskgroup, in case of ASM logical corruption (we are a little bit paranoid): in case DATA dg gets corrupted, you can restore Full backup plus Archived RedoLog plus Online Redolog (otherwise you will stop at the latest Archived).
So we have 3 diskgroups for each db instance: DATA, REDO, FRA.
- in case of fail-over cluster (active-passive), Oracle provide some templates of CRS scripts (in $CRS_HOME/crs/crs/public) that you edit and change at your will, also you might create additionale scripts in case of additional resources you might need (Oracle Agents, backups agent, file systems, monitoring tools, etc)
About our problem, the only solution is to move OCR and voting disks from ASM and change pfile af all ASM instance (parameter CLUSTER_DATABASE from true to false ).
Oracle aswers were a litlle bit odd:
- first they told us to use Grid Standalone (without CRS, OCR, voting at all), but we told them that we needed a Fail-over solution
- then they told us to use RAC Single Node, which actually has some better features, in csae of planned fail-over it might be able to migreate
client sessions without causing a reconnect (for SELECTs only, not in case of a running transaction), but we already have a few fail-over cluster, we cannot change them all
So we plan to move OCR and voting disks into block devices (we think that the other solution, which needs a Shared File System, will take longer).
Thanks Marko for pointing us to OCFS2 pros / cons.
We asked Oracle a confirmation that it supported, they said yes but it is discouraged (and also, doesn't work with OUI nor ASMCA).
Anyway that's the simplest approach, this is a non-Prod cluster, we'll start here and if everthing is fine, after a while we'll do it also on Prod ones.
- Note 605828.1, paragraph 5, Configuring non-raw multipath devices for Oracle Clusterware 11g (11.1.0, 11.2.0) on RHEL5/OL5
- Note 428681.1: OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE)
-"Grid Infrastructure Install on Linux", paragraph 3.1.6, Table 3-2
Oscar

Which role do I need DFS or File server on fail over cluster server 2012 R2?

what I want to achieve is that I want to share all my user data files in a central location and to be highly available all the time whether it's a general share or folder redirection data. BUT I'm a bit confused; I have fail over cluster set-up
on server 2012, now I would like to add DFS as a role but than we have another role called File server and virtually it does the same thing as DFS? Means it creates a namespace share that can be access even one of the nodes goes down. Now I am thinking is
that DFS does the replication between two physical location but fail over cluster works slightly differently and with file server it pretty much does the same thing except for replicating data from one drive to another. Now what do you suggest I do or
did I get the concept wrong like a noob?

DFS and Failover Clustering for file shares provides a similar end result for file access, but they are significantly different implementations.
Clustering provides high availability to files by presenting shared access to set a files served from a cluster. With 2012 R2 Microsoft added the ability to create a Scale-out File Server that even allows all nodes of the cluster to server access to
the files for a higher level of performance and other great things. Bottom line with Failover Clusters for files is that there is a single copy of the file presented from the cluster.
DFS on the other hand provides high availability to files by presenting multiple copies of the file by making a copy in two or more locations and presenting a naming space that allows access to the file through any of the network paths. DFS works very
well for files that are primarily read-only. When you get into a situation where there is a lot of updating of the shared files, DFS is not a very good solution. There are ways to implement DFS for read/write files, but it generally requires a
good knowledge of how the files are used and how you want to manage them.
The key to answering your question comes in your first sentence "I want to share all my user data files in a central location and to be highly available all the time". My initial reaction to this is that central location means Failover Cluster
- there is only a single copy of the file. However, "all the time" can be compromised by network failures to the central site. Remote sites would not have access if they can't access the central site. DFS provides the ability to
have copies remotely, but then if you allow updating at multiple sites, you have to manage the merging of the changes, among other things.
. : | : . : | : . tim

Is Replica aware stubs are in infinite loop when fail over????

 Hi
 Any help on this Appreciated
 See in this senario, where there is four weblogic instance runs in the cluster
 and a replica aware stub(stateless bean with idempodent methods) finds a particular
 method fails on a server and it redircets the request to another one server but the
 same method fails on all the server, then what is goin to happen?? is it going to
 throw some exception or gonna be in a loop to keep on redirecting the method request
 to all servers in Round???
 Regards
 Aruna


 Aruna,
 A stateless session bean whose methods have been declared idempotent will automatically
 retry on another service provider in a fail-over situation. When a fail-over situation
 occurs, the stub refreshes its list of service providers. Note: Just because your
 method call fails, doesn't mean it's a fail-over situation.
 Jane
 "Aruna" <[email protected]> wrote:
 >
 >Hi
 >
 > Any help on this Appreciated
 >
 > See in this senario, where there is four weblogic instance runs in
 >the cluster
 >and a replica aware stub(stateless bean with idempodent methods) finds a
 >particular
 >method fails on a server and it redircets the request to another one server
 >but the
 >same method fails on all the server, then what is goin to happen?? is it
 >going to
 >throw some exception or gonna be in a loop to keep on redirecting the method
 >request
 >to all servers in Round???
 >
 >
 >Regards
 >Aruna

How do the application servers connect the new database after failing over from primary DB to standby DB

How do the application servers connect the new database after failing over from primary DB to standby DB?
We have setup a DR environment with a standalone Primary server and a standalone Physical Standby server on RHEL Linux 6.4. Now our application team would like to know:
When the primary DB server is crashed, the standy DB server will takeover the role of primary DB through the DataGuard fast failover. As the applications are connected by the primary DB IP before,currently the physical DB is used as a different IP or listener. If this is happened, they need to stop their application servers and re-configure their connection so the they coonect the new DB server, they cannot tolerate these workaround.
Whether does oracle have the better solution for this so that the application can automatically know the role's transition and change to the new IP without re-confige any connection and shutdown their application?
Oracle support provides us the answer as following:
==================================================================
Applications connected to a primary database can transparently failover to the new primary database upon an Oracle Data Guard role transition. Integration with Fast Application Notification (FAN) provides fast failover for integrated clients.
After a failover, the broker publishes Fast Application Notification (FAN) events. These FAN events can be used in the following ways:
Applications can use FAN without programmatic changes if they use one of these Oracle integrated database clients: Oracle Database JDBC, Oracle Database Oracle Call Interface (OCI), and Oracle Data Provider for .NET ( ODP.NET). These clients can be configured for Fast Connection Failover (FCF) to automatically connect to a new primary database after a failover.
JAVA applications can use FAN programmatically by using the JDBC FAN application programming interface to subscribe to FAN events and to execute event handling actions upon the receipt of an event.
FAN server-side callouts can be configured on the database tier.
FAN events are published using Oracle Notification Services (ONS) and Oracle Streams Advanced Queuing (AQ).
=======================================================================================
Who has the experience and the related documentation or other solutions? we don't have the concept of about FAN.
Thank very much in advance.

Hi mesbeg,
Thanks alot.
For example, there is an application JBOSS server connecting the DB, we just added another datasource and put the standby IP into the configuration file except adding a service on DB side like this following:
 <subsystem xmlns="urn:jboss:domain:datasources:1.0">
 <datasources>
 <datasource jta="false" jndi-name="java:/jdbc/idserverDatasource" pool-name="IDServerDataSource" enabled="true" use-java-context="true">
 <connection-url>jdbc:oracle:thin:@<primay DB IP>:1521:testdb</connection-url>
 <connection-url>jdbc:oracle:thin:@<standby DB IP>:1521:testdb</connection-url>
 <driver>oracle</driver>
 <pool>
 <min-pool-size>2</min-pool-size>
 <max-pool-size>10</max-pool-size>
 <prefill>true</prefill>
 </pool>
 <security>
 <user-name>TEST_USER</user-name>
 <password>Password1</password>
 </security>
 <validation>
 <valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleValidConnectionChecker"/>
 <validate-on-match>false</validate-on-match>
 <background-validation>false</background-validation>
 <use-fast-fail>false</use-fast-fail>
 <stale-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleStaleConnectionChecker"/>
 <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleExceptionSorter"/>
 </validation>
 </datasource>
 <drivers>
 <driver name="oracle" module="com.oracle.jdbc">
 <xa-datasource-class>oracle.jdbc.OracleDriver</xa-datasource-class>
 </driver>
 </drivers>
 </datasources>
 </subsystem>
If the failover is occurred, the JBOSS will automatically be pointed to the standby DB. Additional actions are not needed.

Physical standby database fail-over

Hi,
I am working on Oracle 10.2.0.3 on Solaris SPARC 64-bit.
I have a Dataguard configuration with a single Physical standby database that uses real time application. We had a major application upgrade yesterday and before the start of upgrade, we cancelled the media recovery and disabled the log_archive_dest_n so that it doesn't ship the archive logs to standby site. We left the dataguard configuration in this mode incase of a rollback.
Primary:
alter system set log_archive_dest_state_2='DEFER';
alter system switch logfile;
Standby:
alter database recover managed standby database cancel;Due to application upgrade induced problems we had to failover to the physical standby, which was not in sync with primary from yesterday. I used the following method to fail-over since i do not want to apply any redo from yesterday.
Standby:
alter database activate physical standby database;
alter database open;
shutdown immediate;
startupSo, after this step, the database was a stand alone database, which doesn't have any standby databases yet (but it still has log_archive_config parameter set and log_archive_dest_n parameters set but i have 'DEFER' the log_archive_dest_n pointing to the old primary). I have even changed the "archive log deletion policy to NONE"
RMAN> configure archivelog deletion policy to none;After the fail-over was completed, the log sequence started from Sequence 1. We cleared the FRA to make space for the new archive logs and started off a FULL database backup (backup incremental level 0 database plus archivelog delete input). The backup succeded but we got these alerts in the backup log that RMAN cannot delete the archivelogs.
RMAN-08137: WARNING: archive log not deleted as it is still neededMy question here is
1) Even though i have disabled the log_archive_dest_n parameters, why is RMAN not able to delete the archivelogs after backup when there is no standby database for this failed-over database?
2) Are all the old backups marked unusable after a fail-over is performed?
FYI... flashback database was not used in this case as it did not server our purpose.
Any information or documentation links would be greatly appreciated.
Thanks,
Harris.

Thanks for the reply.
The FINISH FORCE works in some cases but if there is an archive gap (though it didn't report in our case), it might not work some times (DOCID: 846087.1). So, we followed the Switch-over & Fail-Over best practices where it mentioned about this "ACTIVE PHYSICAL STANDBY" for a fail-over if you intend not to apply any archivelogs. The process we followed is the Right one.
Anyhow, we got the issue resolved. Below is the resolution path.
1) Even though if you DEFER the LOG_ARCHIVE_DEST_STATE_N parameter's on the primary, there are some situations where the Primary database in a dataguard configuration where it will not delete the archive logs due to some SCN issues. This issue may or may not arise in all fail-over scenarios. If it does, then do the following checks
Follow DOCID: 803635.1, which talks about a PLSQL procedure to check for problematic SCN's in a dataguard configuration even though the physical standby databases are no available (i.e., if the dataguard parameters are set, log_archive_config, log_archive_dest_n='SERVICE=..." still set and even though corresponding LOG_ARCHIVE_DEST_STATE_N parameters are DEFERRED).
If this procedure returns any rows, then the primary database is not able to delete the archivelogs because it is still thinking there is a standby database and trying to save the archive logs because of the SCN conflict.
So, the best thing to do is, remove the DG related parameters from the spfile (log_archive_config, log_archive_dest_n parameters).
After i made these changes, i ran a test backup using "backup archivelog all delete input", the archive logs got deleted after backup without any issues.
Thanks,
Harris.
Edited by: user11971589 on Nov 18, 2010 2:55 PM

ISE admin , PSN and monitoring node fail-over and fall back scenario

Hi Experts,
I have question about ISE failover .
I have two ISE appliaces in two different location . I am trying to understand the fail-over scenario and fall-back scenario
I have gone through document as well however still not clear.
my Primary ISE server would have primary admin role , primary monitoring node and secondary ISE would have secondary admin and secondary monitoring role .
In case of primary ISE appliance failure , I will have to login into secondary ISE node and make admin role as primary but how about if primary ISE comes back ? what would be scenario ?
during the primary failure will there any impact with users for authentication ? as far as PSN is available from secondary , it should work ...right ?
and what is the actual method to promote the secondary ISE admin node to primary ? do i have to even manually make monitoring node role changes ?
will i have to reboot the secondary ISE after promoting admin role to primary ?

We have the same set up across an OTV link and have tested this scenario out multiple times. You don't have to do anything if communication is broken between the prim and secondary nodes. The secondary will automatically start authenticating devices that it is in contact with. If you promote the secondary to primary after the link is broke it will assume the primary role when the link is restored and force the former primary nodes to secondary.

Thin Client connection not failing over

I'm using the following thin client connection and the sessions do not failover. Test with SQLPLUS and the sessions do fail over. One difference I see between the two different connections is the thin connection has NONE for the failover_method and failover_type but the SQLPLUS connection show BASIC for failover_method and SELECT for failover_type.
Is there any issues with the thin client the version is 10.2.0.3
jdbc:oracle:thin:@(description=(address_list=(load_balance=YES)(address=(protocol=tcp)(host=crpu306-vip.wm.com)(port=1521))(address=(protocol=tcp)(host=crpu307-vip.wm.com)(port=1521)))(connect_data=(service_name=ocsqat02)(failover_mode=(type=select)(method=basic)(DELAY=5)(RETRIES=180))))

You have to use (FAILOVER=on) as well on jdbc url.
http://download.oracle.com/docs/cd/B19306_01/network.102/b14212/advcfg.htm#sthref1292
Example: TAF with Connect-Time Failover and Client Load Balancing
Implement TAF with connect-time failover and client load balancing for multiple addresses. In the following example, Oracle Net connects randomly to one of the protocol addresses on sales1-server or sales2-server. If the instance fails after the connection, the TAF application fails over to the other node's listener, reserving any SELECT statements in progress.sales.us.acme.com=
(DESCRIPTION=
*(LOAD_BALANCE=on)*
*(FAILOVER=on)*
(ADDRESS=
(PROTOCOL=tcp)
(HOST=sales1-server)
(PORT=1521))
(ADDRESS=
(PROTOCOL=tcp)
(HOST=sales2-server)
(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=sales.us.acme.com)
*(FAILOVER_MODE=*
*(TYPE=select)*
*(METHOD=basic))))*
Example: TAF Retrying a Connection
TAF also provides the ability to automatically retry connecting if the first connection attempt fails with the RETRIES and DELAY parameters. In the following example, Oracle Net tries to reconnect to the listener on sales1-server. If the failover connection fails, Oracle Net waits 15 seconds before trying to reconnect again. Oracle Net attempts to reconnect up to 20 times.sales.us.acme.com=
(DESCRIPTION=
(ADDRESS=
(PROTOCOL=tcp)
(HOST=sales1-server)
(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=sales.us.acme.com)
*(FAILOVER_MODE=*
*(TYPE=select)*
*(METHOD=basic)*
*(RETRIES=20)*
*(DELAY=15))))*

Automatic Site Fail Over

Similar Messages

Maybe you are looking for