Failover (reasons)

I was looking for a documentation that explains the failover reasons but the only doc I found (command guide), does not explain the reasons only the states.
http://www.cisco.com/en/US/docs/security/asa/asa82/command/reference/s3.html#wp1473355
•No Error
•Set by the CI config cmd
•Failover state check
•Failover interface become OK
•HELLO not heard from mate
•Other unit has different software version
•Other unit operating mode is different
•Other unit license is different
•Other unit chassis configuration is different
•Other unit card configuration is different
•Other unit want me Active
•Other unit want me Standby
•Other unit reports that I am failed
•Other unit reports that it is failed
•Configuration mismatch
•Detected an Active mate
•No Active unit found
•Configuration synchronization done
•Recovered from communication failure
•Other unit has different set of vlans configured
•Unable to verify vlan configuration
•Incomplete configuration synchronization
•Configuration synchronization failed
•Interface check
•My communication failed
•ACK not received for failover message
•Other unit got stuck in learn state after sync
•No power detected from peer
•No failover cable
•HA state progression failed
•Detect service card failure
•Service card in other unit has failed
•My service card is as good as peer
•LAN Interface become un-configured
•Peer unit just reloaded
•Switch from Serial Cable to LAN-Based fover
•Unable to verify state of config sync
•Auto-update request
•Unknown reason

Re "interface check" - it's pretty straightforward. The active unit queries the monitored interfaces on the standby for state (line up, protocol up) and, when a standby IP is configured, reachability.
If it fails any of those, the standby unit is marked as not ready due to interface check failing.

Similar Messages

Oracle Database down /failover clause ?

hi
I am collecting the information regarding oracle database down / failover clouses , so all of you requested kindly send me your experience regarding Oracle database down/failover reason/clause.
thanks in advance
regards

Hi,
Are you looking for notes or scenarious.
If you are looking for some notes follow the below link:
http://pavandba.wordpress.com/category/dataguard/
Thanks,
Rafi.

Database switch over automatically

Team,
We are observing dtabase is getting switch over on passive server automatically. Please help me troubleshooting steps to get the RCA. Why it is happening so ?????

Hi ,
1.Please check the database copy on the server on which you wants to mount the database is having the least preference number and also it should have to be in a healthy state including the content index files.
2.What about the remaining databases mounted on the server ?Does that databases also failover to the other node ?
3.What about the result for the below mentioned command on the server on where you want the database to be mounted permanantely?
test-replicationhealth
4.Make sure the network cards on the dag members are configured correctly and it should not to be in a misconfigured state ?
5.If you want to review the event logs then you have to see it on the below mentioned path to find out the reason for the database failover.
Application and service logs ------>microsoft----->exchange---->high availability ------->operational
6.Another method is to find the database failover reason by using the below mentioned script.
CollectOverMetrics.ps1
7.In case if all the above is good and you cannot able to find out the reason for this issue .Then the final option would be to allow the database to get failover to the passive node and then reseed a fresh copy on the node on which you want to mount the database.
Note : Before reseeding you need to delete the problematic copy which includes the .edb files and the logs files and then have the fresh copy.
All the above is done then try to mount that new copy as an active database copy.
Thanks & Regards S.Nithyanandham

Database Availability Group - Network Problem

Hi All i have a question, hope that you can help me.
My domain have 2 site. Each site have 3 mailbox server.
This morning at 1 AM, mailbox database active move from HQ site to DR site because of network issue.
But at 3 AM, mailbox database active move again from DR site to HQ site automatically.
This is the first time i realize this act.
Can somebody explain to me what happen to my environment?
Why system can move mailbox database active from HQ site to DR site and otherwise?
Is it related with network issue?
Thanks for your help!
Best Regards,
Henry Stefanus

Hi Henry Stefanus,
I am not familiar with Exchange server and can not advice you how to check the Exchange log, but based on my experience in some situation there have instability of connection
between Exchange and Active Directory system may caused the failover ether, you can check your confirm your network work properly.
At the cluster side you can use the following method to get the cluster log then found the failover reason.
Please run: Get-ClusterLog –Destination C:\temp on Powershell in one Exchange 2013 server in the DAG. By default, it will collect the cluster logs on all DAG members.
If you need more information, please refer to following article:
http://technet.microsoft.com/en-us/library/hh847315.aspx
I’m glad to be of help to you!
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

Reason for firewall failover

Hi,
Below are the logs. Please do let me know what causes the firewall to fail from primary firewall to secondary firewall.
Pix logs
08/11/2007 17:12:06 Local4 Alert 192.168.1.1 Nov 08 2011 17:06:14 pix-firewall : %PIX-1-105036: (Secondary) LAN failover dropped a cmd msg: FREQARP, seq = 871125
08/11/2007 17:12:06 Local4 Alert 192.168.1.1 Nov 08 2011 17:06:14 pix-firewall : %PIX-1-105036: (Secondary) LAN failover dropped a cmd msg: FHELLO, seq = 871126
08/11/2007 17:12:06 Local4 Alert 192.168.1.1 Nov 08 2011 17:06:14 pix-firewall : %PIX-1-105036: (Secondary) LAN failover dropped a cmd msg: FTRAFFIC, seq = 871127
08/11/2007 17:12:05 Local4 Alert 192.168.1.1 Nov 08 2011 17:06:14 pix-firewall : %PIX-1-105003: (Secondary) Monitoring on interface 0 waiting
08/11/2007 17:12:05 Local4 Alert 192.168.1.1 Nov 08 2011 17:06:14 pix-firewall : %PIX-1-105003: (Secondary) Monitoring on interface 1 waiting
08/11/2007 17:12:05 Local4 Alert 192.168.1.1 Nov 08 2011 17:06:14 pix-firewall : %PIX-1-105003: (Secondary) Monitoring on interface 2 waiting
08/11/2007 17:12:05 Local4 Alert 192.168.1.1 Nov 08 2011 17:06:14 pix-firewall : %PIX-1-105003: (Secondary) Monitoring on interface 3 waiting
08/11/2007 17:12:05 Local4 Alert 192.168.1.1 Nov 08 2011 17:06:14 pix-firewall : %PIX-1-105003: (Secondary) Monitoring on interface 4 waiting
08/11/2007 17:12:05 Local4 Alert 192.168.1.1 Nov 08 2011 17:06:14 pix-firewall : %PIX-1-105003: (Secondary) Monitoring on interface 5 waiting
08/11/2007 17:12:05 Local4 Alert 192.168.1.1 Nov 08 2011 17:06:14 pix-firewall : %PIX-1-104001: (Secondary) Switching to ACTIVE - no response from mate.
08/11/2007 17:12:05 Local4 Alert 192.168.1.1 Nov 08 2011 17:06:14 pix-firewall : %PIX-1-103001: (Secondary) No response from other firewall (reason code = 1).
Thanks

Hello Kunal,
As we can see on the logs the Secondary device is monitoring all interfaces and he is not receiving any hello packets that is why we see the interface on waiting state, this caused failover to happen.
If a Pix/Asa does not receive hello packets on the interfaces being monitored he will think his mate is dead so he will become active.
Hope this helps! if not let me know and I will do my best to help you on this
Please rate helpful posts.
Julio

Advice Requested - High Availability WITHOUT Failover Clustering

We're creating an entirely new Hyper-V virtualized environment on Server 2012 R2. My question is: Can we accomplish high availability WITHOUT using failover clustering?
So, I don't really have anything AGAINST failover clustering, and we will happily use it if it's the right solution for us, but to be honest, we really don't want ANYTHING to happen automatically when it comes to failover. Here's what I mean:
In this new environment, we have architected 2 identical, very capable Hyper-V physical hosts, each of which will run several VMs comprising the equivalent of a scaled-back version of our entire environment. In other words, there is at least a domain
controller, multiple web servers, and a (mirrored/HA/AlwaysOn) SQL Server 2012 VM running on each host, along with a few other miscellaneous one-off worker-bee VMs doing things like system monitoring. The SQL Server VM on each host has about 75% of the
physical memory resources dedicated to it (for performance reasons). We need pretty much the full horsepower of both machines up and going at all times under normal conditions.
So now, to high availability. The standard approach is to use failover clustering, but I am concerned that if these hosts are clustered, we'll have the equivalent of just 50% hardware capacity going at all times, with full failover in place of course
(we are using an iSCSI SAN for storage).
BUT, if these hosts are NOT clustered, and one of them is suddenly switched off, experiences some kind of catastrophic failure, or simply needs to be rebooted while applying WSUS patches, the SQL Server HA will fail over (so all databases will remain up
and going on the surviving VM), and the environment would continue functioning at somewhat reduced capacity until the failed host is restarted. With this approach, it seems to me that we would be running at 100% for the most part, and running at 50%
or so only in the event of a major failure, rather than running at 50% ALL the time.
Of course, in the event of a catastrophic failure, I'm also thinking that the one-off worker-bee VMs could be replicated to the alternate host so they could be started on the surviving host if needed during a long-term outage.
So basically, I am very interested in the thoughts of others with experience regarding taking this approach to Hyper-V architecture, as it seems as if failover clustering is almost a given when it comes to best practices and high availability. I guess
I'm looking for validation on my thinking.
So what do you think? What am I missing or forgetting? What will we LOSE if we go with a NON-clustered high-availability environment as I've described it?
Thanks in advance for your thoughts!

Udo -
Yes your responses are very helpful.
Can we use the built-in Server 2012 iSCSI Target Server role to convert the local RAID disks into an iSCSI LUN that the VMs could access? Or can that not run on the same physical box as the Hyper-V host? I guess if the physical box goes down
the LUN would go down anyway, huh? Or can I cluster that role (iSCSI target) as well? If not, do you have any other specific product suggestions I can research, or do I just end up wasting this 12TB of local disk storage?
- Morgan
That's a bad idea. First of all Microsoft iSCSI target is slow (it's non-cached @ server side). So if you really decided to use dedicated hardware for storage (maybe you do have a reason I don't know...) and if you're fine with your storage being a single
point of failure (OK, maybe your RTOs and RPOs are fair enough) then at least use SMB share. SMB at least does cache I/O on both client and server sides and also you can use Storage Spaces as a back end of it (non-clustered) so read "write back flash cache
for cheap". See:
What's new in iSCSI target with Windows Server 2012 R2
http://technet.microsoft.com/en-us/library/dn305893.aspx
Improved optimization to allow disk-level caching
Updated
iSCSI Target Server now sets the disk cache bypass flag on a hosting disk I/O, through Force Unit Access (FUA), only when the issuing initiator explicitly requests it. This change can potentially improve performance.
Previously, iSCSI Target Server would always set the disk cache bypass flag on all I/O’s. System cache bypass functionality remains unchanged in iSCSI Target Server; for instance, the file system cache on the target server is always bypassed.
Yes you can cluster iSCSI target from Microsoft but a) it would be SLOW as there would be only active-passive I/O model (no real use from MPIO between multiple hosts) and b) that would require a shared storage for Windows Cluster. What for? Scenario was
usable with a) there was no virtual FC so guest VM cluster could not use FC LUs and b) there was no shared VHDX so SAS could not be used for guest VM cluster as well. Now both are present so scenario is useless: just export your existing shared storage without
any Microsoft iSCSI target and you'll be happy. For references see:
MSFT iSCSI Target in HA mode
http://technet.microsoft.com/en-us/library/gg232621(v=ws.10).aspx
Cluster MSFT iSCSI Target with SAS back end
http://techontip.wordpress.com/2011/05/03/microsoft-iscsi-target-cluster-building-walkthrough/
Guest
VM Cluster Storage Options
http://technet.microsoft.com/en-us/library/dn440540.aspx
Storage options
The following tables lists the storage types that you can use to provide shared storage for a guest cluster.
Storage Type
Description
Shared virtual hard disk
New in Windows Server 2012 R2, you can configure multiple virtual machines to connect to and use a single virtual hard disk (.vhdx) file. Each virtual machine can access the virtual hard disk just like servers
would connect to the same LUN in a storage area network (SAN). For more information, see Deploy a Guest Cluster Using a Shared Virtual Hard Disk.
Virtual Fibre Channel
Introduced in Windows Server 2012, virtual Fibre Channel enables you to connect virtual machines to LUNs on a Fibre Channel SAN. For more information, see Hyper-V
Virtual Fibre Channel Overview.
iSCSI
The iSCSI initiator inside a virtual machine enables you to connect over the network to an iSCSI target. For more information, see iSCSI
Target Block Storage Overviewand the blog post Introduction of iSCSI Target in Windows
Server 2012.
Storage requirements depend on the clustered roles that run on the cluster. Most clustered roles use clustered storage, where the storage is available on any cluster node that runs a clustered
role. Examples of clustered storage include Physical Disk resources and Cluster Shared Volumes (CSV). Some roles do not require storage that is managed by the cluster. For example, you can configure Microsoft SQL Server to use availability groups that replicate
the data between nodes. Other clustered roles may use Server Message Block (SMB) shares or Network File System (NFS) shares as data stores that any cluster node can access.
Sure you can use third-party software to replicate 12TB of your storage between just a pair of nodes to create a fully fault-tolerant cluster. See (there's also a free offering):
StarWind VSAN [Virtual SAN] for Hyper-V
http://www.starwindsoftware.com/native-san-for-hyper-v-free-edition
Product is similar to what VMware had just released for ESXi except it's selling for ~2 years so is mature :)
There are other guys doing this say DataCore (more playing for Windows-based FC) and SteelEye (more about geo-cluster & replication). But you may want to give them a try.
Hope this helped a bit :)
StarWind VSAN [Virtual SAN] clusters Hyper-V without SAS, Fibre Channel, SMB 3.0 or iSCSI, uses Ethernet to mirror internally mounted SATA disks between hosts.

Cisco ASA 5505 Failover issue..

Hi,
I am having two firewalls (cisco ASA 5505) which is configured as active/standby Mode.It was running smoothly for more than an year,but last week the secondary firewall got failed and It made my whole network down.then I just removed the connectivity of the secondary firewall and run only the primary one.when I login by console i found out that the failover has been disabled .So again I connected to the Network and enabled the firewall.After a couple of days same issue happen.This time I take down the Secondary firewall erased the Flash.Reloaded the IOS image.Configured the failover and connected to the primary for the replication of configs.It found out the Active Mate.Replicated the configs and got synced...But after sync the same thing happened,The whole network gone down .I juz done the same thing removed the secondary firewall.Network came up.I feel there is some thing with failover thing ,but couldnt fin out :( .And the firewalls are in Router Mode.

Please find the logs...
Secondary Firewall While Sync..
cisco-asa(config)# sh failover
Failover On
Failover unit Secondary
Failover LAN Interface: e0/7 Vlan3 (up)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 4 of 23 maximum
Version: Ours 8.2(5), Mate 8.2(5)
Last Failover at: 06:01:10 GMT Apr 29 2015
This host: Secondary - Sync Config
Active time: 55 (sec)
slot 0: ASA5505 hw/sw rev (1.0/8.2(5)) status (Up Sys)
Interface outside (27.251.167.246): No Link (Waiting)
Interface inside (10.11.0.20): No Link (Waiting)
Interface mgmt (10.11.200.21): No Link (Waiting)
slot 1: empty
Other host: Primary - Active
Active time: 177303 (sec)
slot 0: ASA5505 hw/sw rev (1.0/8.2(5)) status (Up Sys)
Interface outside (27.251.167.247): Unknown (Waiting)
Interface inside (10.11.0.21): Unknown (Waiting)
Interface mgmt (10.11.200.22): Unknown (Waiting)
slot 1: empty
=======================================================================================
Secondary Firewall Just after Sync ,Active (primary Firewall got rebootted)
cisco-asa# sh failover
Failover On
Failover unit Secondary
Failover LAN Interface: e0/7 Vlan3 (up)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 4 of 23 maximum
Version: Ours 8.2(5), Mate Unknown
Last Failover at: 06:06:12 GMT Apr 29 2015
This host: Secondary - Active
Active time: 44 (sec)
slot 0: ASA5505 hw/sw rev (1.0/8.2(5)) status (Up Sys)
Interface outside (27.251.167.246): Normal (Waiting)
Interface inside (10.11.0.20): No Link (Waiting)
Interface mgmt (10.11.200.21): No Link (Waiting)
slot 1: empty
Other host: Primary - Not Detected
Active time: 0 (sec)
slot 0: empty
Interface outside (27.251.167.247): Unknown (Waiting)
Interface inside (10.11.0.21): Unknown (Waiting)
Interface mgmt (10.11.200.22): Unknown (Waiting)
slot 1: empty
==========================================================================================
After Active firewall got rebootted failover off,whole network gone down.
cisco-asa# sh failover
Failover Off
Failover unit Secondary
Failover LAN Interface: e0/7 Vlan3 (up)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 4 of 23 maximum
===========================================================================================
Primary Firewall after rebootting
cisco-asa# sh failover
Failover On
Failover unit Primary
Failover LAN Interface: e0/7 Vlan3 (Failed - No Switchover)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 4 of 23 maximum
Version: Ours 8.2(5), Mate Unknown
Last Failover at: 06:17:29 GMT Apr 29 2015
This host: Primary - Active
Active time: 24707 (sec)
slot 0: ASA5505 hw/sw rev (1.0/8.2(5)) status (Up Sys)
Interface outside (27.251.167.246): Normal (Waiting)
Interface inside (10.11.0.20): Normal (Waiting)
Interface mgmt (10.11.200.21): Normal (Waiting)
slot 1: empty
Other host: Secondary - Failed
Active time: 0 (sec)
slot 0: empty
Interface outside (27.251.167.247): Unknown (Waiting)
Interface inside (10.11.0.21): Unknown (Waiting)
Interface mgmt (10.11.200.22): Unknown (Waiting)
slot 1: empty
cisco-asa# sh failover history
==========================================================================
From State To State Reason
==========================================================================
06:16:43 GMT Apr 29 2015
Not Detected Negotiation No Error
06:17:29 GMT Apr 29 2015
Negotiation Just Active No Active unit found
06:17:29 GMT Apr 29 2015
Just Active Active Drain No Active unit found
06:17:29 GMT Apr 29 2015
Active Drain Active Applying Config No Active unit found
06:17:29 GMT Apr 29 2015
Active Applying Config Active Config Applied No Active unit found
06:17:29 GMT Apr 29 2015
Active Config Applied Active No Active unit found
==========================================================================
cisco-asa#
cisco-asa# sh failover state
State Last Failure Reason Date/Time
This host - Primary
Active None
Other host - Secondary
Failed Comm Failure 06:17:43 GMT Apr 29 2015
====Configuration State===
====Communication State===
==================================================================================
Secondary Firewall
cisc-asa# sh failover h
==========================================================================
From State To State Reason
==========================================================================
06:16:32 GMT Apr 29 2015
Not Detected Negotiation No Error
06:17:05 GMT Apr 29 2015
Negotiation Disabled Set by the config command
==========================================================================
cisco-asa# sh failover
Failover Off
Failover unit Secondary
Failover LAN Interface: e0/7 Vlan3 (down)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 4 of 23 maximum
ecs-pune-fw-01# sh failover h
==========================================================================
From State To State Reason
==========================================================================
06:16:32 GMT Apr 29 2015
Not Detected Negotiation No Error
06:17:05 GMT Apr 29 2015
Negotiation Disabled Set by the config command
==========================================================================
cisco-asa# sh failover state
State Last Failure Reason Date/Time
This host - Secondary
Disabled None
Other host - Primary
Not Detected None
====Configuration State===
====Communication State===
Thanks...

Performance degradation factor 1000 on failover???

          Hi,
          we are gaining first experience with WLS 5.1 EBF 8 clustering on
          NT4 SP 6 workstation.
          We have two servers in the cluster, both on same machine but with
          different IP adresses (as it has to be)!
          In general it seems to work: we have a test client connecting to
          one of the servers and
          uses a stateless test EJB which does nothing but writing into weblogic.log.
          When this server fails, the other server resumes to work the client
          requests, BUT VERY VERY VERY SLOW!!!
          - I should repeat VERY a thousand times, because a normal client
          request takes about 10-30 ms
          and after failure/failover it takes 10-15 SECONDS!!!
          As naive as I am I want to know: IS THIS NORMAL?
          After the server is back, the performance is also back to normal,
          but we were expecting a much smaller
          performance degradation.
          So I think we are doing something totally wrong!
          Do we need some Network solution to make failover performance better?
          Or is there a chance to look closer at deployment descriptors or
          weblogic.system.executeThreadCount
          or weblogic.system.percentSocketReaders settings?
          Thanks in advance for any help!
          Fleming


See http://www.weblogic.com/docs51/cluster/setup.html#680201
          Basically, the rule of thumb is to set the number of execute threads ON
          THE CLIENT to 2 times the number of servers in the cluster and the
          percent socket readers to 50%. In your case with 8 WLS instances in the
          cluster, add the following to the java command line used to start your
          client:
          -Dweblogic.system.executeThreadCount=16
          -Dweblogic.system.percentSocketReaders=50
          Hope this helps,
          Robert
          Fleming Frese wrote:
          > Hi Mike,
          >
          > thanks for your reply.
          >
          > We do not have HTTP clients or Servlets, just EJBs and clients
          > in the same LAN,
          > and the failover should be handled by the replica-aware stubs.
          > So we thought we need no Proxy solution for failover. Maybe we
          > need a DNS to serve failover if this
          > increases our performance?
          >
          > The timeout clue sounds reasonable, but I would expect that the
          > stub times out once and than switches
          > to the other server for subsequent requests. There should be a
          > refresh (after 3 Minutes?) when the stub
          > gets new information about the servers in the cluster, so he could
          > check then if the server is back.
          > This works perfectly with load balancing: If a new server joins
          > the cluster, I automatically receives
          > requests after a while.
          >
          > Fleming
          >
          > "Mike Reiche" <[email protected]> wrote:
          > >
          > >It sounds like every request is first timing out it's
          > >connection
          > >attempt (10 seconds, perhaps?) on the 'down' instance
          > >before
          > >trying the second instance. How do requests 'failover'?
          > >Do you
          > >have Netscape, Apache, or IIS with a wlproxy module? Or
          > >do
          > >you simply have a DNS that takes care of that?
          > >
          > >Mike
          > >
          > >
          > >
          > >"Fleming Frese" <[email protected]> wrote:
          > >>
          > >>Hi,
          > >>
          > >>we are gaining first experience with WLS 5.1 EBF 8 clustering
          > >>on
          > >>NT4 SP 6 workstation.
          > >>We have two servers in the cluster, both on same machine
          > >>but with
          > >>different IP adresses (as it has to be)!
          > >>
          > >>In general it seems to work: we have a test client connecting
          > >>to
          > >>one of the servers and
          > >>uses a stateless test EJB which does nothing but writing
          > >>into weblogic.log.
          > >>
          > >>When this server fails, the other server resumes to work
          > >>the client
          > >>requests, BUT VERY VERY VERY SLOW!!!
          > >> - I should repeat VERY a thousand times, because a normal
          > >>client
          > >>request takes about 10-30 ms
          > >>and after failure/failover it takes 10-15 SECONDS!!!
          > >>
          > >>As naive as I am I want to know: IS THIS NORMAL?
          > >>
          > >>After the server is back, the performance is also back
          > >>to normal,
          > >>but we were expecting a much smaller
          > >>performance degradation.
          > >>
          > >>So I think we are doing something totally wrong!
          > >>Do we need some Network solution to make failover performance
          > >>better?
          > >>Or is there a chance to look closer at deployment descriptors
          > >>or
          > >>weblogic.system.executeThreadCount
          > >>or weblogic.system.percentSocketReaders settings?
          > >>
          > >>Thanks in advance for any help!
          > >>
          > >>Fleming
          > >>
          > >

RE: Hard Failures, KeepAlive, and Failover --Follow-up

Hi,
It's a really challenging question. However, what do you want to do after
the network crash? Failover or just stop the service? Should we assume
that when the network is down, and so do your name service?
One idea is to use externalconnection to "listen" to your external non-forte
alarm, so do "whatever" after you receive the alarm instead of letting the
"logical connection" to time out or hang.
Regards,
Peter Sham.
-----Original Message-----
From: Michael Lee [SMTP:[email protected]]
Sent: Wednesday, June 16, 1999 12:44 AM
To: [email protected]
Subject: Hard Failures, KeepAlive, and Failover -- Follow-up
I've gotten a handful of responses to my original post, and the suggested
solutions are all variations on the same theme -- periodically ping remote
nodes/partitions and then react when the node/partition goes down. In
other circumstance this would work, but unless I'm missing something this
solution doesn't solve the problem I'm running into.
Some background...
When a connection is set up between partitions on two different nodes,
Forte is effectively establishing two connections: a "physical
connection"
over TCP/IP between two ports and a "logical connection" between the two
partitions (running on top of the physical connection). Once a connection
is established between two partitions Forte assumes the logical connection
is valid until one of two things happen:
1) The logical connection is broken (by shutting down a partition from
Econsole/Escript, by killing a node manager, by terminating the ftexec,
etc.)
2) Forte detects that the physical connection is broken (via its KeepAlive
functionality).
If a physical connection is broken (via a cut cable or power-off
condition), and Forte has not yet detected the situation (via a KeepAlive
failure), the logical connection is still valid and Forte will still allow
method calls on the remote partition. In effect, Forte thinks the remote
partition is still up and running. In this situation, any method calls
made after the physical connection has been broken will simply hang. No
exceptions are generated and failover does not occur.
However, once a KeepAlive failure is detected all is made right.
Unfortunately, the lowest-bound latency of KeepAlive is greater than one
second, and we need to detect and react to hard failures in the 250-500ms
range. Using technology outside of Forte we are able to detect the hard
failures within the required times, but we haven't been able to get Forte
to react to this "outside" knowledge. Here's why:
Since Forte has not yet detected a KeepAlive failure, the logical
connection to the remote partition is still "valid". Although there are a
number of mechanisms that would allow a logical connection to be broken,
they all assume a valid physical connection -- which, of course, we don't
have!
It appears I'm in a "Catch-22" situation: In order to break a logical
connection between partitions, I need a valid physical connection. But
the
reason I'm trying to break the logical connection in the first place is
that I know (but Forte doesn't yet know) that the physical connection has
been broken.
If anyone knows a way around this Catch-22, please let me know.
Mike
To unsubscribe, email '[email protected]' with
'unsubscribe forte-users' as the body of the message.
Searchable thread archive <URL:http://pinehurst.sageit.com/listarchive/>-
To unsubscribe, email '[email protected]' with
'unsubscribe forte-users' as the body of the message.
Searchable thread archive <URL:http://pinehurst.sageit.com/listarchive/>

Make sure you chose the right format, and as far as partitioning in concerned, you have to select at least one partition, which will be the entire drive.

Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason: The handle is invalid.

I'm stuck here trying to figure this error out.
2003 domain, 2012 hyper v core 3 nodes. (I have two of these hyper V groups, hvclust2012 is the problem group, hvclust2008 is okay)
In Failover Cluster Manager I see these errors, "Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason: The handle is invalid."
I restarted the host node that was listed in having the error then another node starts showing the errors.
I tried to follow this site: http://blog.subvertallmedia.com/2012/12/06/repairing-a-failover-cluster-in-windows-server-2012-live-migration-fails-dns-cluster-name-errors/
Then this error shows up when doing the repair: there was an error repairing the active directory object for 'Cluster Name'
I looked at our domain controller and noticed I don't have access to local users and groups. I can access our other hvclust2008 (both clusters are same version 2012).
<image here>
I came upon this thread: http://social.technet.microsoft.com/Forums/en-US/85fc2ad5-b0c0-41f0-900e-df1db8625445/windows-2012-cluster-resource-name-fails-dns-registration-evt-1196?forum=winserverClustering
Now, I'm stuck on adding a managed service account (mas). I'm not sure if I'm way off track to fix this. Any advice? Thanks in advance!
<image here>

Thanks Elton,
I restarted 3 hosts after applying the hotfix. Then I did the steps below and got stuck on step 5. That is when I get the error (image above). There
was an error repairing the active directory object for 'Cluster Name'. For more data, see 'Information Details'.
To reset the password on the affected name resource, perform the following steps:
From Failover Cluster Manager, locate the name resource.
Right-click on the resource, and click Properties.
On the Policies tab, select If resource fails, do not restart, and then click OK.
Right-click on the resource, click More Actions, and then click Simulate Failure.
When the name resource shows "Failed," right-click on the resource, click More Actions, and then click Repair.
After the name resource is online, right-click on the resource, and then click Properties.
On the Policies tab, select If resource fails, attempt restart on current node, and then click OK.
Thanks

Cannot add multiple members of a failover cluster to a DFSR replication group

Server 2012 RTM. I have two physical servers, in two separate data centers 35 miles apart, with a GbE link over metro fibre between them. Both have a large (10TB+) local RAID storage arrays, but given the physical separation there is no physical shared storage.
The hosts need to be in a Windows failover cluster (WSFC), so that I can run high-availability VMs and SQL Availability Groups across these two hosts for HA and DR. VM and SQL app data storage is using a SOFS (scale out file server) network share on separate
servers.
I need to be able to use DFSR to replicate multi-TB user data file folders between the two local storage arrays on these two hosts for HA and DR. But when I try to add the second server to a DFSR replication group, I get the error:
The specified member is part of a failover cluster that is already a member of the replication group. You cannot add multiple members for the same cluster to a replication group.
I'm not clear why this has to be a restriction. I need to be able to replicate files somehow for HA & DR of the 10TB+ of file storage. I can't use a clustered file server for file storage, as I don't have any shared storage on these two servers. Likewise
I can't run a HA single DFSR target for the same reason (no shared storage) - and in any case, this doesn't solve the problem of replicating files between the two hosts for HA & DR. DFSR is the solution for replicating files storage across servers with
non-shared storage.
Why would there be a restriction against using DFSR between multiple hosts in a cluster, so long as you are not trying to replicate folders in a shared storage target accessible to both hosts (which would obviously be a problem)? So long as you are not replicating
folders in c:\ClusterStorage, there should be no conflict.
Is there a workaround or alternative solution?

Yes, I read that series. But it doesn't address the issue. The article is about making a DFSR target highly available. That won't help me here.
I need to be able to use DFSR to replicate files between two different servers, with those servers being in a WSFC for the purpose of providing other clustered services (Hyper-V, SQL availability groups, etc.). DFSR should not interfere with this, but it
is being blocked between nodes in the same WSFC for a reason that is not clear to me.
This is a valid use case and I can't see an alternative solution in the case where you only have two physical servers. Windows needs to be able to provide HA, DR, and replication of everything - VMs, SQL, and file folders. But it seems that this artificial
barrier is causing us to need to choose either clustered services or DFSR between nodes. But I can't see any rationale to block DFSR between cluster nodes - especially those without shared storage.
Perhaps this blanket block should be changed to a more selective block at the DFSR folder level, not the node level.

Selecting VHDx as storage for File Server Role (Failover Cluster 2012 R2)

Is it possible to select an already existing (offline) VHD or VHDX as storage when creating the "File Server" role? Reason I want to do that is because I already have a file server setup as a virtual machine and causing issues so my company
decided to make the change towards a File Server role.
Thank you
David

Hi David,
Do you mean you configured it to file server failover cluster via "High Availability Wizard" ?
I think you need to choose a shared volume between two nodes to achieve high availability .
Please refer to following link :
http://technet.microsoft.com/en-us/library/cc731844(v=WS.10).aspx
If you do not select a shared volume , I think there is no difference than sharing a mounted VHDX file on a standalone file server .
I would suggest to copy these files to CSV and share them .
Hope it helps
Best Regards
Elton Ji
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

DSC, SQL Server 2012 Enterprise sp2 x64, SQL Server Failover Cluster Install not succeeding

Summary: DSC fails to fully install the SQL Server 2012 Failover Cluster, but the identical code snippet below run in powershell ise with administrator credentials works perfectly as does running the SQL server install interface.
In order to develop DSC configurations, I have set up a Windows Server 2012 R2 failover cluster in VMware Workstation v10 consisting of 3 nodes. All have the same Windows Server 2012 version and have been fully patched via Microsoft Updates.
The cluster properly fails over on command and the cluster validates. Powershell 4.0 is being used as installed in windows.
PDC
Node1
Node2
The DSC script builds up the parameters to setup.exe for SQL Server. Here is the cmd that gets built...
$cmd2 = "C:\SOFTWARE\SQL\Setup.exe /Q /ACTION=InstallFailoverCluster /INSTANCENAME=MSSQLSERVER /INSTANCEID=MSSQLSERVER /IACCEPTSQLSERVERLICENSETERMS /UpdateEnabled=false /IndicateProgress=false /FEATURES=SQLEngine,FullText,SSMS,ADV_SSMS,BIDS,IS,BC,CONN,BOL /SECURITYMODE=SQL /SAPWD=password#1 /SQLSVCACCOUNT=SAASLAB1\sql_services /SQLSVCPASSWORD=password#1 /SQLSYSADMINACCOUNTS=`"SAASLAB1\sql_admin`" `"SAASLAB1\sql_services`" `"SAASLAB1\cubara01`" /AGTSVCACCOUNT=SAASLAB1\sql_services /AGTSVCPASSWORD=password#1 /ISSVCACCOUNT=SAASLAB1\sql_services /ISSVCPASSWORD=password#1 /ISSVCSTARTUPTYPE=Automatic /FAILOVERCLUSTERDISKS=MountRoot /FAILOVERCLUSTERGROUP='SQL Server (MSSQLSERVER)' /FAILOVERCLUSTERNETWORKNAME=SQLClusterLab1 /FAILOVERCLUSTERIPADDRESSES=`"IPv4;192.168.100.15;LAN;255.255.255.0`" /INSTALLSQLDATADIR=M:\SAN\SQLData\MSSQLSERVER /SQLUSERDBDIR=M:\SAN\SQLData\MSSQLSERVER /SQLUSERDBLOGDIR=M:\SAN\SQLLogs\MSSQLSERVER /SQLTEMPDBDIR=M:\SAN\SQLTempDB\MSSQLSERVER /SQLTEMPDBLOGDIR=M:\SAN\SQLTempDB\MSSQLSERVER /SQLBACKUPDIR=M:\SAN\Backups\MSSQLSERVER > C:\Logs\sqlInstall-log.txt "
Invoke-Expression $cmd2
When I run this specific command in Powershell ISE running as administrator, logged in as domain account that is in the Node1's administrators group and has domain administrative authority, it works perfectly fine and sets up the initial node properly.
When I use the EXACT SAME code above pasted into my custom DSC resource, as a test with a known successful install, run with the same user as above, it does NOT completely install the cluster properly. It still installs 17 applications
related to SQL Server and seems to properly configure everything except the cluster. The Failover Cluster Manager shows that the SQL Server Role will not come on line and the SQL Server Agent Role is not created.
The code is run on Node1 so the setup folder is local to Node1.
The ConfigurationFile.ini files for the two types of installs are identical.
Summary.txt does have issues..
Feature:                       Database Engine Services
Status:                        Failed: see logs for details
Reason for failure:            An error occurred during the setup process of the feature.
Next Step:                     Use the following information to resolve the error, uninstall this feature, and then run the setup process again.
Component name:                SQL Server Database Engine Services Instance Features
Component error code:          0x86D8003A
Error description:             The cluster resource 'SQL Server' could not be brought online. Error: There was a failure to call cluster code from a provider. Exception message: Generic
failure . Status code: 5023. Description: The group or resource is not in the correct state to perform the requested operation. .
It feels like this is a security issue with DSC or an issue with the setup in SQL Server, but please note I have granted administrators group and domain administrators authority. The nodes were built with the same login. Windows firewall
is completely disabled.
Please let me know if any more detail is required.

Hi Lydia,
Thanks for your interest and help.
I tried "Option 3 (recommended)" and that did not help.
The issue I encounter with the fail-over cluster only occurs when trying to install with DSC!
Using the SQL Server Install wizard, Command Prompt and even in Powershell by invoking the setup.exe all work perfectly.
So, to reiterate, this issue only occurs while running in the context of DSC.
I am using the same domain login with Domain Admin Security and locally the account has Administrators group credentials. The SQL Server Service account also has Administrators Group Credentials.

Failover is not working in clustering

we installed infrastructure in the one system and added 2 instances app1.mycompany.com,app2.mycompany.com into it.
for loadbalancing we r using webcache.
we configured origin servers,site definitions,site-server mappings.
in the cluster two instances showing up.
that we can see in health monitor in Up/Down* parameter of web cache administrator console.
we deployed same ear in two instances.
but when i down one instance say app1.mycompany.com,
In the health monitor its not showing up DOWN parameter for host: app1.mycompany.com.same for UP also.
immediately its not showing changes when i am testing failover.
Is webcache loadbalancing is Round robin based ?
when i down one of the instances session replication is not happening properly.sometimes session expired is coming.
when 2 instances r up if user access application all the requests r coming to one instance if down that instance session expired is coming.
i think failover is not working in clustering.
i checked replication properties and added <distributable> tag in both the instances.
in webcache console page what will sessionbinding will do?i have not configured anything.

Why are you using Webcache?
Web cache will certainly work, but its more common role is to more access as a simple load balancer over HTTP servers, not OC4J instances.
What I'd do is to simplify your situation to verify you have the server setup correctly.
That means using the Oracle HTTP Server which will be part of your cluster as the common routing point. OHS and mod_oc4j are session state aware and know about all the OC4J instances. In the situation where an OC4J instance dies for some reason, mod_oc4j will know to which other OC4J instance(s) the request can be routed to pickup the replicated session state.
Once you have verified that the failover is working on the backend, you can then configure another OHS instance and position webcache in front of them to act as a request router and failover handler for when the OHS instances are inactive.
The Enterprise Deployment Guide offers some guidance in typical architectures, well worth a read.
cheers
-steve-

Automatic failover doesn't failback to the first server if the second server is lost.

Hi Everybody,
We use the database mirroring a lot in our product solutions and we have recently experienced a strange behaviour in our failover tests with SQL2008R2.
We have 2 servers running Windows 2008 R2 standard and SQL 2008 R2 standard SP2. (let's call them DB1 and DB2)
We also have a Witness workstation running SQL 2008 Express on a Windows 7
A database from DB1 is mirrored to DB2 in "safety full" mode, with witness. At this stage, the database is principal on DB1 and mirror on DB2
To test the automatic failover, we first restart the DB1 server which has the database in principal mode
After a few seconds, the database on DB2 becomes principal, which is normal , that's exactly what we want.
After a few minutes, DB1 comes back online and its database takes the mirror role (still OK). At this stage then, the database is principal on DB2 and mirror on DB1
when the monitoring application shows that the mirror is synchronized and that both servers are connected to the witness, we restart DB2 to trigger an automatic failover to DB1.
What we see is that DB1 never takes the principal role and the database stays in mirror.
In the DB1 Errorlog, I only see these 2 lines when DB2 disappears, no other message related to the mirroring session.
2014-01-22 08:57:26.91 spid43s Starting up database 'Test123'.
2014-01-22 08:57:26.95 spid43s Bypassing recovery for database 'Test123' because it is marked as a mirror database, which cannot be recovered. This is an informational message only. No user action is required.
When DB2 comes back online, the database on DB2 keeps its principal status and the database on DB1 stays mirror.
And what is really really strange is that, if I restart DB2 once again, directly after that, DB1 failover normally and the database on DB1 takes the principal role after a few seconds. without any configuration changes between the 2 restarts.
DB1 errorlog shows then :
2014-01-22 09:00:37.53 spid29s Error: 1474, Severity: 16, State: 1.
2014-01-22 09:00:37.53 spid29s Database mirroring connection error 4 'An error occurred while receiving data: '64(The specified network name is no longer available.)'.' for 'TCP://DB2:5022'.
2014-01-22 09:00:37.53 spid18s Database mirroring is inactive for database 'Test123'. This is an informational message only. No user action is required.
2014-01-22 09:00:42.37 spid32s The mirrored database "Test123" is changing roles from "MIRROR" to "PRINCIPAL" due to Auto Failover.
2014-01-22 09:00:42.39 spid32s Recovery is writing a checkpoint in database 'Test123' (7). This is an informational message only. No user action is required.
2014-01-22 09:00:42.39 spid32s Recovery completed for database Test123 (database ID 7) in 78 second(s) (analysis 0 ms, redo 0 ms, undo 7 ms.) This is an informational message only. No user action is required.
So, if I summarize,
- a first failover from DB1 to DB2 always work
- then, a restart of DB2 never failover to DB1
- a second restart of DB2 always failover to DB1
This is pretty much systematic on one our server couple.
Any explanation for this or any idea where I can search to find the reason of this strange behavior ?
Thanks a lot for your help
Seb

Thank you Tom
But I have already checked that and reported the Errorlog abstracts in my original post.
When DB01 disapears for the first time, nothing in the DB01 ERRORLOG (it is restarting :-) )
AND no particular error message in the DB02 ERRORLOG (nothing related to the fact that DB01 is not reachable anymore !!! )
Only these two lines
2014-01-22 08:57:26.91 spid43s Starting
up database 'Test123'.
2014-01-22 08:57:26.95 spid43s Bypassing recovery
for database 'Test123' because it is marked as a mirror database, which cannot be recovered. This is an informational message only. No user action is required.
So my main question remains Why DB02 doesn't detect that DB01 disapears (and the first time only) and why the failover mechanism doesn't trigger the failover ?
Thank you
Seb

Failover (reasons)

Similar Messages

Maybe you are looking for