Failover Failure.

First let me apologize for my ignorance and please assume I know nothing! I am managing a hyper-v environment that someone else configured. We are experiencing and issue that I will try to describe to the best of my ability.
We have two physical locations that each contain a SAN and three Hyper-V hosts. He have 3 CSV's (high, medium, and low priority) as well as a witness disk. One location is our primary location (PL) and the other our disaster recovery location (DRL)
if the hosts at the PL have the VM's residing on them (as well as the CSV's) and all the hosts at the PL go down, everything will failover to our DRL and come back up with no problems. However, if our DRL goes down EVEN IF EVERYTHING IS RUNNING
AT THE primary location, everything goes down! All of the VM's attempt to failover to the DRL that went offline! I am very confused by this! It has caused major problems a couple of times and I have no idea what to do!
Any help would be greatly appreciated.
Thanks!

Although you aren't mentioning it, I assume that there's some storage replication involved here?
If yes, then you should engage with the vendor in order to point at the configuration that is causing this.
Using the Microsoft stack in the same scenario, you would use Hyper-V Replica with Azure Site recovery and System Center, that would control and orchestrate the Disaster Recovery scenarios for you.
Also note that I am referring to Disaster Recovery - and not High Availability in this case as the MS solutions are DR and not HA across sites.
-kn
Kristian (Virtualization and some coffee: http://kristiannese.blogspot.com )

Similar Messages

IPMP failover Failure

I have one qustion about IPMP under solaris 9 9/04 SPARC 64-bit
My OS: with EIS 3.1.1 patches
Clusterware: Sun Cluster 3.1u4 with EIS 3.1.1 patches
My IPMP group contains two NICs: ce0 & ce3.
Two NICs are linked to CISCO 4506
IPMP configuration Files as the following:
*/etc/hostname.ce0*
lamp-test2 netmask + broadcast + group ipmp1 deprecated -failover up
*/etc/hostname.ce3*
lamp netmask + broadcast + group ipmp1 up \
addif lamp-test1 netmask + broadcast + deprecated -failover up
I am alway using the default in.mpathd configuration file
But once I pull out ceN NIC's cable, my IPMP group will complaint that:
+Mar 18 18:06:34 lamp in.mpathd[2770]: [ID 215189 daemon.error] The link has gone down on ce0+
+Mar 18 18:06:34 lamp in.mpathd[2770]: [ID 594170 daemon.error] NIC failure detected on ce0 of group ipmp1+
+Mar 18 18:06:34 lamp ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '00:03:ba:b0:5d:54' trying to be our address 192.168.217.020!+
+Mar 18 18:06:34 lamp in.mpathd[2770]: [ID 832587 daemon.error] Successfully failed over from NIC ge0 to NIC ce0+
+Mar 18 18:06:34 lamp ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '00:03:ba:b0:5d:54' trying to be our address 192.168.217.020!+
Why do solaris OS tell us Hardware Address conflict ?
But I'm sure this IPMP configuration files can cowork finely with CISCO 2950 and DLINK mini switch.
By the way, there are no the same MACs in the LAN.
I should modify some CICSO parameters?
Your advicement is so appreciated!!!

lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
     inet 127.0.0.1 netmask ff000000
ce0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
     inet 192.168.217.6 netmask ffffff00 broadcast 192.168.217.255
     groupname ipmp1
     ether 0:3:ba:b0:5d:54
ce3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
     inet 192.168.217.20 netmask ffffff00 broadcast 192.168.217.255
     groupname ipmp1
     ether 0:3:ba:95:5d:6e
ce3:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 4
     inet 192.168.217.4 netmask ffffff00 broadcast 192.168.217.255
General speaking,
When I switch float IP from ce0 to ce3, IPMP will say ce0 MAC is "trying to be our address ....", then ce0 test IP failed, FLOAT IP didn't failover.
When I switch float IP from ce3 to ce0, IPMP will say ce3 MAC is "trying to be our address ....",
then ce0 test IP failed, FLOAT IP didn't failover.
In my viewpoint, float NIC MAC & address information may be cached in CICSO device's RAM, not released in time, I think.

ACE 4710 FT failover failure

Hello,
I am running redundant ACE 4710 appliances running A3(2.7). I have five FT groups configured along with FT Tracking and when the vlans fail due to physical links being down, the contexts to do not failover. If one of the ACE boxes fail completely, failover works fine. I have included the FT config from one of the contexts below. I have a case open with TAC and the Engineer is suggesting the use of a query interface in additon to FT Tracking. We have had two incidents on separate contexts where we lost a physical interface on the primary ACE, one for the maintenance of the core switch, the other was a cable disconnect and we are unable to understand why the indivdual context didn't failover. Any ideas would be much appreciated. Let me know if more info/configs are needed.
Dave
ft interface vlan 900
ip address 10.10.10.1 255.255.255.0
peer ip address 10.10.10.2 255.255.255.0
no shutdown
ft peer 1
heartbeat interval 300
heartbeat count 20
ft-interface vlan 900
ft group 3
peer 1
no preempt
priority 210
peer priority 120
associate-context XYZ
inservice
FT Group                     : 3
No. of Contexts             : 1
Context Name                 : XYZ
Context Id                   : 2
Configured Status           : in-service
Maintenance mode             : MAINT_MODE_OFF
My State                   : FSM_FT_STATE_ACTIVE
My Config Priority           : 210
My Net Priority             : 210
My Preempt                   : Disabled
Peer State                   : FSM_FT_STATE_STANDBY_HOT
Peer Config Priority         : 120
Peer Net Priority           : 120
Peer Preempt                 : Disabled
Peer Id                     : 1
Last State Change time       : Wed Jan 11 13:14:16 2012
Running cfg sync enabled     : Enabled
Running cfg sync status     : Running configuration sync has completed
Startup cfg sync enabled     : Enabled
Startup cfg sync status     : Startup configuration sync has completed
Bulk sync done for ARP: 0
Bulk sync done for LB: 0
Bulk sync done for ICM: 0
show int
vlan424 is up, VLAN up on the physical port
Hardware type is VLAN
MAC address is 00:1e:68:1e:ba:b7
Virtual MAC address is 00:0b:fc:fe:1b:03
Mode : routed
IP address is 10.104.224.6 netmask is 255.255.255.0
FT status is active
Description:"New Server VIP and real"
MTU: 1500 bytes
Last cleared: never
Last Changed: Sun Mar 11 01:13:12 2012
No of transitions: 3
Alias IP address is 10.104.224.5 netmask is 255.255.255.0
Peer IP address is 10.104.224.7 Peer IP netmask is 255.255.255.0
Assigned on the physical port, up on the physical port
Previous State: Sun Mar 11 00:04:57 2012, VLAN not up on the physical port
Previous State: Sun Sep 18 10:21:15 2011, administratively up
     3991888419 unicast packets input, 23734607976687 bytes
     20246934 multicast, 174801 broadcast
     0 input errors, 0 unknown, 0 ignored, 0 unicast RPF drops
     1609345958 unicast packets output, 23690663385228 bytes
     7 multicast, 55807 broadcast
     0 output errors, 0 ignored

Dave,
For tracking to work you need to have preempt enabled. Can you try enabling preempt under the ft group and test your tracking again? Another potential issue you may run into is if your tracking is not lowering the priority enough when it fails. The difference between the active and standby device is 100. If you are not decrementing the priority greater than this value even if priority is enabled it will not lower it enough to force the failover. If after enabling preempt on this group the tracking still does not work as expected send you whole config for us to look at.
Regarding the query interface; This is not a bad idea. It will help prevent an active active situation if there is a problem with the ft link between the two modules.
Thanks
Jim

Storage issues during live clone in Server 2012 R2

We just set up a new 2012 R2 cluster and we are having troubles with live cloning. I have seen it work in our environment but usually it fails.
We get this error in our cluster events:
Cluster Shared Volume 'Volume8' ('Volume8') has entered a paused state because of '(c0000435)'. All I/O will temporarily be queued until a path to the volume is reestablished.
and the clone fails with:
Error (2916)
VMM is unable to complete the request. The connection to the agent ServerName was lost.
WinRM: URL: [http://ServerPath], Verb: [INVOKE], Method: [GetError], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/microsoft/bits/BitsClientJob?JobId={DC6D9530-26F1-4F95-A1AB-0197B1406F98}]
Not found (404) (0x80190194)
The storage is a Dell EqualLogic PS6500 and its connected to two S55 Force 10 48 port switches. The client side of the network is hooked to two Cisco 2960G 48 port switches.
I can't find any information on error c0000435. Has anyone head of this issue?

Hi,
I am Chetan Savade from Symantec Technical Support Team.
Few issues have been reported with the older version of SEP. I would recommend to test the connection using the latest version of SEP. SEP 12.1 RU4 MP1b is the latest version.
Reported issues:
Cluster environment does not fail over
Fix ID: 2731793
Symptom: A cluster environment does not fail over when Symantec Endpoint Protection client is installed due to inability to unload drivers.
Solution: Modified a driver to properly detach from a volume when the volume dismounts
Reference:
http://www.symantec.com/docs/TECH199676
Cluster is unable to fail over with AutoProtect enabled
Fix ID: 3246552
Symptom: With AutoProtect enabled, an active cluster node cannot fail over and hangs.
Solution: Corrected a delay in the AutoProtect volume dismount that resulted in cluster failover failures
http://www.symantec.com/docs/TECH211972
Best Regards,
Chetan

RE: Hard Failures, KeepAlive, and Failover --Follow-up

Hi,
It's a really challenging question. However, what do you want to do after
the network crash? Failover or just stop the service? Should we assume
that when the network is down, and so do your name service?
One idea is to use externalconnection to "listen" to your external non-forte
alarm, so do "whatever" after you receive the alarm instead of letting the
"logical connection" to time out or hang.
Regards,
Peter Sham.
-----Original Message-----
From: Michael Lee [SMTP:[email protected]]
Sent: Wednesday, June 16, 1999 12:44 AM
To: [email protected]
Subject: Hard Failures, KeepAlive, and Failover -- Follow-up
I've gotten a handful of responses to my original post, and the suggested
solutions are all variations on the same theme -- periodically ping remote
nodes/partitions and then react when the node/partition goes down. In
other circumstance this would work, but unless I'm missing something this
solution doesn't solve the problem I'm running into.
Some background...
When a connection is set up between partitions on two different nodes,
Forte is effectively establishing two connections: a "physical
connection"
over TCP/IP between two ports and a "logical connection" between the two
partitions (running on top of the physical connection). Once a connection
is established between two partitions Forte assumes the logical connection
is valid until one of two things happen:
1) The logical connection is broken (by shutting down a partition from
Econsole/Escript, by killing a node manager, by terminating the ftexec,
etc.)
2) Forte detects that the physical connection is broken (via its KeepAlive
functionality).
If a physical connection is broken (via a cut cable or power-off
condition), and Forte has not yet detected the situation (via a KeepAlive
failure), the logical connection is still valid and Forte will still allow
method calls on the remote partition. In effect, Forte thinks the remote
partition is still up and running. In this situation, any method calls
made after the physical connection has been broken will simply hang. No
exceptions are generated and failover does not occur.
However, once a KeepAlive failure is detected all is made right.
Unfortunately, the lowest-bound latency of KeepAlive is greater than one
second, and we need to detect and react to hard failures in the 250-500ms
range. Using technology outside of Forte we are able to detect the hard
failures within the required times, but we haven't been able to get Forte
to react to this "outside" knowledge. Here's why:
Since Forte has not yet detected a KeepAlive failure, the logical
connection to the remote partition is still "valid". Although there are a
number of mechanisms that would allow a logical connection to be broken,
they all assume a valid physical connection -- which, of course, we don't
have!
It appears I'm in a "Catch-22" situation: In order to break a logical
connection between partitions, I need a valid physical connection. But
the
reason I'm trying to break the logical connection in the first place is
that I know (but Forte doesn't yet know) that the physical connection has
been broken.
If anyone knows a way around this Catch-22, please let me know.
Mike
To unsubscribe, email '[email protected]' with
'unsubscribe forte-users' as the body of the message.
Searchable thread archive <URL:http://pinehurst.sageit.com/listarchive/>-
To unsubscribe, email '[email protected]' with
'unsubscribe forte-users' as the body of the message.
Searchable thread archive <URL:http://pinehurst.sageit.com/listarchive/>

Make sure you chose the right format, and as far as partitioning in concerned, you have to select at least one partition, which will be the entire drive.

NAC Server and Manager Failure with out failover

Hi, I'm working on a NAC L2 OOB wired design with 1 CAM and 1 CAS. I've not included failover to the design for the obvious financial reasons, and want to figure out the affect that the network would have in the case of a failure.
1.)What would the users experience in the event of a CAS failure? both currently online users and new users
2.)What would the users experience in the event of a CAM failure? both currently online users and new users
3.) Are there any ideas on how to minimize the effect on the users in the event of a failure, w/o adding failover bundle ?
Many thanks for your valuable input in advance.
Din

If you are out OOB, then a CAS failure would not affect logged in, remediated users, anyone not logged in would be stuck because when the CAS fails, the connectivity to the CAM would be lost.
If the CAM fails, you will not be able to log in, do remediation or anything. VLAN settings on switches will be frozen where they are at the moment of CAM faiure. Not that you could easily connect to switches, change vlans to allow users onto the LAN and the CAM would accept that passively when restarted but if you use the Agent it will probably want to log in again, which is not a huge issue if you use AD SSO.
Dan Sichel
Dan S.

Disk Witness Failover with simulated network failure

Hello everyone,
I am running two windows server 2012 R2 machines clustered with a disk witness. The disk witness is a LUN created on our SAN, presented to both machines. SAN is connected through two fiber channels per server both servers are networked on split
out network connections teamed together. Now on a hard server fault or simulated power failure (i.e. pulling the power cords from a server) Disk witness will fail over to second node and Cluster survives. But, when simulating a Network card
failure (disconnecting the cat5) from the Node hosting the Disk witness, I see the cluster attempt to offload hosting to UP node. But, the Disk witness will not come online. I think that the issue is because the server is technically
still running and the disk Witness has not really failed on the primary host, so it never releases ownership. I am new to clustering and could use a little guidance here. Is there anyway to make the disk witness
Thank you

Hi,
Please try to modify network settings for a failover cluster
1. In the Failover Cluster Manager snap-in, if the cluster that you want to configure is not displayed, in the console tree, right-click Failover Cluster Manager, click Manage
a Cluster, and then select or specify the cluster that you want.
2. If the console tree is collapsed, expand the tree under the cluster that you want to configure.
3. Expand Networks.
4. Right-click the network that you want to modify settings for, and then click Properties.
5. If needed, change the name of the network.
6. Select one of the following options:
-Allow cluster network communication on this network
If you select this option and you want the network to be used by the nodes only (not clients), clear Allow clients to connect through this network. Otherwise, make sure it
is selected.
-Do not allow cluster network communication on this network
Select this option if you are using a network only for iSCSI (communication with storage) or only for backup. (These are among the most common reasons for
selecting this option.)
Quote from:
Modify Network Settings for a Failover Cluster
http://technet.microsoft.com/en-us/library/cc725775.aspx
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

Does Weblogic12c support Application Failover ? If yes, then how does Weblogic12c detect an application failure (OutOfMemoryException)?

Hi all,
Need help to setup High Availability at my workplace, can somebody please tell that Weblogic12c support Application Failover ?
If yes, then how does Weblogic12c detect an application failure (OutOfMemoryException)?
WebLogic Server - General@

Hi there user,
you can achieve HA in different levels:
1. On a single machine - here you need to set up nodemanager. When started by nodemanager, any server failures will be detected and the nodemanager will try to restart it. OOME is an exception thrown by the JVM and the server state should go FAILED at some point then the NM will try to restart it. Nodemanager is the simplest HA solution you can and must implement for production environment;
2. On redundant machine - you can configure WLS clustering, but you will need more complex environment, i.e. you will need a load balancer in front of the cluster to reverse proxy the requests. This scenario is can also use nodemanager to control the WLS instances on each machine
3. Cluster with server/service migration - the most complex scenario where in cases of machine failure the WLS cluster can "relocate" resources (services and whole severs) to spare machines.
In your case OOME should cause the JVM respectively WLS to be unresponsive, hence the nodemanager will detect this at some point and will try to restart the WLS.
Hope this helps,
A.

Data Guard Failover after primary site network failure or disconnect.

Hello Experts:
I'll try to be clear and specific with my issue:
Environment:
Two nodes with NO shared storage (I don't have an Observer running).
Veritas Cluser Server (VCS) with Data Guar Agent. (I don't use the Broker. Data Guard agent "takes care" of the switchover and failover).
Two single instance databases, one per node. NO RAC.
What I'm being able to perform with no issues:
Manual switch(over) of the primary database by running VCS command "hagrp -switch oraDG_group -to standby_node"
Automatic fail(over) when primary node is rebooted with "reboot" or "init"
Automatic fail(over) when primary node is shut down with "shutdown".
What I'm NOT being able to perform:
If I manually unplug the network cables from the primary site (all the network, not only the link between primary and standby node so, it's like a server unplug from the energy source).
Same situation happens if I manually disconnect the server from the power.
This is the alert logs I have:
This is the portion of the alert log at Standby site when Real Time Replication is working fine:
Recovery of Online Redo Log: Thread 1 Group 4 Seq 7 Reading mem 0
Mem# 0: /u02/oracle/fast_recovery_area/standby_db/onlinelog/o1_mf_4_9c3tk3dy_.log
At this moment, node1 (Primary) is completely disconnected from the network. SEE at the end when the database (standby which should be converted to PRIMARY) is not getting all the archived logs from the Primary due to the abnormal disconnect from the network:
Identified End-Of-Redo (failover) for thread 1 sequence 7 at SCN 0xffff.ffffffff
Incomplete Recovery applied until change 15922544 time 12/23/2013 17:12:48
Media Recovery Complete (primary_db)
Terminal Recovery: successful completion
Forcing ARSCN to IRSCN for TR 0:15922544
Mon Dec 23 17:13:22 2013
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance primary_db - Archival ErrorAttempt to set limbo arscn 0:15922544 irscn 0:15922544
ORA-16014: log 4 sequence# 7 not archived, no available destinations
ORA-00312: online log 4 thread 1: '/u02/oracle/fast_recovery_area/standby_db/onlinelog/o1_mf_4_9c3tk3dy_.log'
Resetting standby activation ID 2071848820 (0x7b7de774)
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH
Mon Dec 23 17:13:33 2013
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH
Terminal Recovery: applying standby redo logs.
Terminal Recovery: thread 1 seq# 7 redo required
Terminal Recovery:
Recovery of Online Redo Log: Thread 1 Group 4 Seq 7 Reading mem 0
Mem# 0: /u02/oracle/fast_recovery_area/standby_db/onlinelog/o1_mf_4_9c3tk3dy_.log
Identified End-Of-Redo (failover) for thread 1 sequence 7 at SCN 0xffff.ffffffff
Incomplete Recovery applied until change 15922544 time 12/23/2013 17:12:48
Media Recovery Complete (primary_db)
Terminal Recovery: successful completion
Forcing ARSCN to IRSCN for TR 0:15922544
Mon Dec 23 17:13:22 2013
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance primary_db - Archival ErrorAttempt to set limbo arscn 0:15922544 irscn 0:15922544
ORA-16014: log 4 sequence# 7 not archived, no available destinations
ORA-00312: online log 4 thread 1: '/u02/oracle/fast_recovery_area/standby_db/onlinelog/o1_mf_4_9c3tk3dy_.log'
Resetting standby activation ID 2071848820 (0x7b7de774)
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH
Mon Dec 23 17:13:33 2013
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH
Attempt to do a Terminal Recovery (primary_db)
Media Recovery Start: Managed Standby Recovery (primary_db)
started logmerger process
Mon Dec 23 17:13:33 2013
Managed Standby Recovery not using Real Time Apply
Media Recovery failed with error 16157
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH...
Mon Dec 23 17:13:34 2013
Shutting down instance (immediate)
Shutting down instance: further logons disabled
Stopping background process MMNL
Stopping background process MMON
License high water mark = 38
All dispatchers and shared servers shutdown
ALTER DATABASE CLOSE NORMAL
ORA-1109 signalled during: ALTER DATABASE CLOSE NORMAL...
ALTER DATABASE DISMOUNT
Shutting down archive processes
Archiving is disabled
Mon Dec 23 17:13:38 2013
Mon Dec 23 17:13:38 2013
Mon Dec 23 17:13:38 2013
ARCH shutting downARCH shutting down
ARCH shutting down
ARC0: Relinquishing active heartbeat ARCH role
ARC2: Archival stopped
ARC0: Archival stopped
ARC1: Archival stopped
Completed: ALTER DATABASE DISMOUNT
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Mon Dec 23 17:13:40 2013
Stopping background process VKTM
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Mon Dec 23 17:13:43 2013
Instance shutdown complete
Mon Dec 23 17:13:44 2013
Adjusting the default value of parameter parallel_max_servers
from 1280 to 470 due to the value of parameter processes (500)
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 64 KB
Total Shared Global Region in Large Pages = 0 KB (0%)
Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB
RECOMMENDATION:
Total System Global Area size is 3762 MB. For optimal performance,
prior to the next instance restart:
1. Increase the number of unused large pages by
at least 1881 (page size 2048 KB, total size 3762 MB) system wide to
get 100% of the System Global Area allocated with large pages
2. Large pages are automatically locked into physical memory.
Increase the per process memlock (soft) limit to at least 3770 MB to lock
100% System Global Area's large pages into physical memory
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 32
Number of processor cores in the system is 16
Number of processor sockets in the system is 2
CELL communication is configured to use 0 interface(s):
CELL IP affinity details:
    NUMA status: NUMA system w/ 2 process groups
    cellaffinity.ora status: cannot find affinity map at '/etc/oracle/cell/network-config/cellaffinity.ora' (see trace file for details)
CELL communication will use 1 IP group(s):
    Grp 0:
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
IMODE=BR
ILAT =88
LICENSE_MAX_USERS = 0
SYS auditing is disabled
NUMA system with 2 nodes detected
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.
ORACLE_HOME = /u01/oracle/product/11.2.0.4
System name:    Linux
Node name:      node2.localdomain
Release:        2.6.32-131.0.15.el6.x86_64
Version:        #1 SMP Tue May 10 15:42:40 EDT 2011
Machine:        x86_64
Using parameter settings in server-side spfile /u01/oracle/product/11.2.0.4/dbs/spfileprimary_db.ora
System parameters with non-default values:
processes                = 500
sga_target               = 3760M
control_files            = "/u02/oracle/orafiles/primary_db/control01.ctl"
control_files            = "/u01/oracle/fast_recovery_area/primary_db/control02.ctl"
db_file_name_convert     = "standby_db"
db_file_name_convert     = "primary_db"
log_file_name_convert    = "standby_db"
log_file_name_convert    = "primary_db"
control_file_record_keep_time= 40
db_block_size            = 8192
compatible               = "11.2.0.4.0"
log_archive_dest_1       = "location=/u02/oracle/archivelogs/primary_db"
log_archive_dest_2       = "SERVICE=primary_db ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=primary_db"
log_archive_dest_state_2 = "ENABLE"
log_archive_min_succeed_dest= 1
fal_server               = "primary_db"
log_archive_trace        = 0
log_archive_config       = "DG_CONFIG=(primary_db,standby_db)"
log_archive_format       = "%t_%s_%r.dbf"
log_archive_max_processes= 3
db_recovery_file_dest    = "/u02/oracle/fast_recovery_area"
db_recovery_file_dest_size= 30G
standby_file_management = "AUTO"
db_flashback_retention_target= 1440
undo_tablespace          = "UNDOTBS1"
remote_login_passwordfile= "EXCLUSIVE"
db_domain                = ""
dispatchers              = "(PROTOCOL=TCP) (SERVICE=primary_dbXDB)"
job_queue_processes      = 0
audit_file_dest          = "/u01/oracle/admin/primary_db/adump"
audit_trail              = "DB"
db_name                  = "primary_db"
db_unique_name           = "standby_db"
open_cursors             = 300
pga_aggregate_target     = 1250M
dg_broker_start          = FALSE
diagnostic_dest          = "/u01/oracle"
Mon Dec 23 17:13:45 2013
PMON started with pid=2, OS id=29108
Mon Dec 23 17:13:45 2013
PSP0 started with pid=3, OS id=29110
Mon Dec 23 17:13:46 2013
VKTM started with pid=4, OS id=29125 at elevated priority
VKTM running at (1)millisec precision with DBRM quantum (100)ms
Mon Dec 23 17:13:46 2013
GEN0 started with pid=5, OS id=29129
Mon Dec 23 17:13:46 2013
DIAG started with pid=6, OS id=29131
Mon Dec 23 17:13:46 2013
DBRM started with pid=7, OS id=29133
Mon Dec 23 17:13:46 2013
DIA0 started with pid=8, OS id=29135
Mon Dec 23 17:13:46 2013
MMAN started with pid=9, OS id=29137
Mon Dec 23 17:13:46 2013
DBW0 started with pid=10, OS id=29139
Mon Dec 23 17:13:46 2013
DBW1 started with pid=11, OS id=29141
Mon Dec 23 17:13:46 2013
DBW2 started with pid=12, OS id=29143
Mon Dec 23 17:13:46 2013
DBW3 started with pid=13, OS id=29145
Mon Dec 23 17:13:46 2013
LGWR started with pid=14, OS id=29147
Mon Dec 23 17:13:46 2013
CKPT started with pid=15, OS id=29149
Mon Dec 23 17:13:46 2013
SMON started with pid=16, OS id=29151
Mon Dec 23 17:13:46 2013
RECO started with pid=17, OS id=29153
Mon Dec 23 17:13:46 2013
MMON started with pid=18, OS id=29155
Mon Dec 23 17:13:46 2013
MMNL started with pid=19, OS id=29157
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
ORACLE_BASE from environment = /u01/oracle
Mon Dec 23 17:13:46 2013
ALTER DATABASE   MOUNT
ARCH: STARTING ARCH PROCESSES
Mon Dec 23 17:13:50 2013
ARC0 started with pid=23, OS id=29210
ARC0: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Successful mount of redo thread 1, with mount id 2071851082
Mon Dec 23 17:13:51 2013
ARC1 started with pid=24, OS id=29212
Allocated 15937344 bytes in shared pool for flashback generation buffer
Mon Dec 23 17:13:51 2013
ARC2 started with pid=25, OS id=29214
Starting background process RVWR
ARC1: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Mon Dec 23 17:13:51 2013
RVWR started with pid=26, OS id=29216
Physical Standby Database mounted.
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Mon Dec 23 17:13:51 2013
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
         USING CURRENT LOGFILE DISCONNECT FROM SESSION
Attempt to start background Managed Standby Recovery process (primary_db)
Mon Dec 23 17:13:51 2013
MRP0 started with pid=27, OS id=29219
MRP0: Background Managed Standby Recovery process started (primary_db)
ARC2: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC2: Becoming the heartbeat ARCH
ARC2: Becoming the active heartbeat ARCH
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance primary_db - Archival Error
ORA-16014: log 4 sequence# 7 not archived, no available destinations
ORA-00312: online log 4 thread 1: '/u02/oracle/fast_recovery_area/standby_db/onlinelog/o1_mf_4_9c3tk3dy_.log'
At this moment, I've lost service and I have to wait until the prmiary server goes up again to receive the missing log.
This is the rest of the log:
Fatal NI connect error 12543, connecting to:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=primary_db)(CID=(PROGRAM=oracle)(HOST=node2.localdomain)(USER=oracle))))
VERSION INFORMATION:
        TNS for Linux: Version 11.2.0.4.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
Time: 23-DEC-2013 17:13:52
Tracing not turned on.
Tns error struct:
    ns main err code: 12543
TNS-12543: TNS:destination host unreachable
    ns secondary err code: 12560
    nt main err code: 513
TNS-00513: Destination host unreachable
    nt secondary err code: 113
    nt OS err code: 0
Fatal NI connect error 12543, connecting to:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=primary_db)(CID=(PROGRAM=oracle)(HOST=node2.localdomain)(USER=oracle))))
VERSION INFORMATION:
        TNS for Linux: Version 11.2.0.4.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
Time: 23-DEC-2013 17:13:55
Tracing not turned on.
Tns error struct:
    ns main err code: 12543
TNS-12543: TNS:destination host unreachable
    ns secondary err code: 12560
    nt main err code: 513
TNS-00513: Destination host unreachable
    nt secondary err code: 113
    nt OS err code: 0
started logmerger process
Mon Dec 23 17:13:56 2013
Managed Standby Recovery starting Real Time Apply
MRP0: Background Media Recovery terminated with error 16157
Errors in file /u01/oracle/diag/rdbms/standby_db/primary_db/trace/primary_db_pr00_29230.trc:
ORA-16157: media recovery not allowed following successful FINISH recovery
Managed Standby Recovery not using Real Time Apply
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
         USING CURRENT LOGFILE DISCONNECT FROM SESSION
Recovery Slave PR00 previously exited with exception 16157
MRP0: Background Media Recovery process shutdown (primary_db)
Fatal NI connect error 12543, connecting to:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=primary_db)(CID=(PROGRAM=oracle)(HOST=node2.localdomain)(USER=oracle))))
VERSION INFORMATION:
        TNS for Linux: Version 11.2.0.4.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
Time: 23-DEC-2013 17:13:58
Tracing not turned on.
Tns error struct:
    ns main err code: 12543
TNS-12543: TNS:destination host unreachable
    ns secondary err code: 12560
    nt main err code: 513
TNS-00513: Destination host unreachable
    nt secondary err code: 113
    nt OS err code: 0
Mon Dec 23 17:14:01 2013
Fatal NI connect error 12543, connecting to:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=primary_db)(CID=(PROGRAM=oracle)(HOST=node2.localdomain)(USER=oracle))))
VERSION INFORMATION:
        TNS for Linux: Version 11.2.0.4.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
Time: 23-DEC-2013 17:14:01
Tracing not turned on.
Tns error struct:
    ns main err code: 12543
TNS-12543: TNS:destination host unreachable
    ns secondary err code: 12560
    nt main err code: 513
TNS-00513: Destination host unreachable
    nt secondary err code: 113
    nt OS err code: 0
Error 12543 received logging on to the standby
FAL[client, ARC0]: Error 12543 connecting to primary_db for fetching gap sequence
Archiver process freed from errors. No longer stopped
Mon Dec 23 17:15:07 2013
Using STANDBY_ARCHIVE_DEST parameter default value as /u02/oracle/archivelogs/primary_db
Mon Dec 23 17:19:51 2013
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance primary_db - Archival Error
ORA-16014: log 4 sequence# 7 not archived, no available destinations
ORA-00312: online log 4 thread 1: '/u02/oracle/fast_recovery_area/standby_db/onlinelog/o1_mf_4_9c3tk3dy_.log'
Mon Dec 23 17:26:18 2013
RFS[1]: Assigned to RFS process 31456
RFS[1]: No connections allowed during/after terminal recovery.
Mon Dec 23 17:26:47 2013
flashback database to scn 15921680
ORA-16157 signalled during: flashback database to scn 15921680...
Mon Dec 23 17:27:05 2013
alter database recover managed standby database using current logfile disconnect
Attempt to start background Managed Standby Recovery process (primary_db)
Mon Dec 23 17:27:05 2013
MRP0 started with pid=28, OS id=31481
MRP0: Background Managed Standby Recovery process started (primary_db)
started logmerger process
Mon Dec 23 17:27:10 2013
Managed Standby Recovery starting Real Time Apply
MRP0: Background Media Recovery terminated with error 16157
Errors in file /u01/oracle/diag/rdbms/standby_db/primary_db/trace/primary_db_pr00_31486.trc:
ORA-16157: media recovery not allowed following successful FINISH recovery
Managed Standby Recovery not using Real Time Apply
Completed: alter database recover managed standby database using current logfile disconnect
Recovery Slave PR00 previously exited with exception 16157
MRP0: Background Media Recovery process shutdown (primary_db)
Mon Dec 23 17:27:18 2013
RFS[2]: Assigned to RFS process 31492
RFS[2]: No connections allowed during/after terminal recovery.
Mon Dec 23 17:28:18 2013
RFS[3]: Assigned to RFS process 31614
RFS[3]: No connections allowed during/after terminal recovery.
Do you have any advice?
Thanks!
Alex.

Hello;
What's not clear to me in your question at this point:
What I'm NOT being able to perform:
If I manually unplug the network cables from the primary site (all the network, not only the link between primary and standby node so, it's like a server unplug from the energy source).
Same situation happens if I manually disconnect the server from the power.
This is the alert logs I have:"
Are you trying a failover to the Standby?
Please advise.
Is it possible your "valid_for clause" is set incorrectly?
Would also review this:
ORA-16014 and ORA-00312 Messages in Alert.log of Physical Standby
Best Regards
mseberg

Client failure failover/switchover standby configuration...

I have created standby database, the standby database is synch with primary... After switchover the primary is now standby and standby is now primary, the clients are unable to connect to new primary database.
TNSNAMES.ora file at client side...
prim.world=
(DESCRIPTION_LIST=
(FAILOVER=true)
(LOAD_BALANCE=no)
(DESCRIPTION=
(ADDRESS=
(PROTOCOL=TCP)
(HOST= test9)
(PORT=1521)
(CONNECT_DATA=
(SERVER=dedicated)
(SERVICE_NAME=primary)
(DESCRIPTION=
(ADDRESS=
(PROTOCOL=TCP)
(HOST=standby)
(PORT=1521)
(CONNECT_DATA=
(SERVER=dedicated)
(SERVICE_NAME=standby )
SQL> conn iq/[email protected]
ERROR:
ORA-01033: ORACLE initialization or shutdown in progress
SQL> conn sys/[email protected] as sysdba
Connected.
SQL> select open_mode from v$database;
OPEN_MODE
MOUNTED
SQL> select database_role from v$database;
DATABASE_ROLE
PHYSICAL STANDBY
SQL>
It's Oracle 10.2.0.1.0 version....

oracleRaj
Handle: oracleRaj
Status Level: Newbie
Registered: Mar 26, 2010
Total Posts: 370
Total Questions: 64 (36 unresolved)
Name Raj
Location Karachi
Occupation DBA

CUCM failover to subscriber failure!

Hi everyone!
I have a CUCM cluster of one publisher and one subscriber, active version 7.0 and inactive version 5.1.3
The pubslisher failed due to a power failure, the Cisco DB wasn't starting at all, and there was no DRS backup. Anyway, I did an upgrade from 5.1 to 7.0 again on the publisher while the telephony was operational normally on the subscriber node.
After the upgrade, I uploaded the Publisher's and Subscriber's licenses, I added back all the changes done to the CUCM between the databases of 5 and 7 (manually added by comparing to the subscriber's) and I replicated to the subscriber when I was done and the replication state was good '2'. And I took an immediate DRS backup.
However, the problem appeared when I restarted the CUCM's publisher node and none of the phones registered with the subscriber node. I thought it was a network problem or server going slow. I turned the publisher off for around 20 mins and nothing changed.
The configuration of the Call Manager group is correct, the licenses are correct, everything seem to be ok. When the phones are registered with the publisher, I can see them registered from the subscriber's phone page but when I stop the publisher, they turn to 'unknown'.
Does anyone have any clue why this is happening? Do I have to upgrade the subscriber again from 5 to 7? I just had this clue in mind, it doesn't make sense to me since replication is working fine between the servers.
One more thing, I use FreeSshd to do the DRS backup on an XP machine, it wasn't connecting on my laptop with Windows 7. What are you guys using on Windows 7? Tried a search result on Google but nothing worked.
Thank you for reading and for tips!
Regards,
Mazen

Hi Mazen,
I think you are on the right track with your thought of re-building
the Subscriber
You would be hitting this CUCM 5.x restriction;
Replacing the Publisher Node
Complete the following tasks to replace the Cisco Unified CallManager publisher server. If you are replacing a single server that is not part of a cluster, follow this procedure to replace your server.
Caution If you are replacing a publisher node in a cluster, you must also reinstall all the subscriber nodes and dedicated TFTP servers in the cluster, after replacing the publisher node. For instructions on reinstalling these other nodes types, see the "Replacing a Subscriber or Dedicated TFTP Server Node" section
Follow the references in the For More Information column to get more information about a step.
Table 4 Replacing the Publisher Node Process Overview
Description For More Information
Step 1
Perform the tasks in the "Server or Cluster Replacement Preparation Checklist" section.
"Server or Cluster Replacement Preparation Checklist" section
Step 2
Gather the necessary information about the old publisher server.
"Gathering System Configuration Information to Replace or Reinstall a Server" section
Step 3
Back up the publisher server to a remote SFTP server by using the Disaster Recovery System (DRS) and verify that you have a good backup.
"Creating a Backup File" section
Step 4
Get the new license and verify it before system replacement.
You only need a new license if you are replacing the publisher node.
See the "Obtaining a License File" section.
Step 5
Shut down and turn off the old server.
Step 6
Connect the new server.
Step 7
Install the same Cisco Unified CallManager release on the new server that was installed on the old server, including any Engineering Special releases.
Configure the server as the publisher server for the cluster.
"Installing Cisco Unified CallManager on the New Publisher Server" section
Step 8
Upload the new license file to the publisher server.
"Uploading a License File" section
Step 9
Restore backed up data to the publisher server by using DRS.
"Restoring a Backup File" section
Step 10
Reboot the publisher server.
Step 11
Perform the post-replacement tasks in the "Post-Replacement Checklist" section.
http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/install/5_1/clstr513.html#wp87717
Cheers!
Rob

SQL SERVER Failover Cluster switch failure because the passive node automatically reassign drive letter

I switch the sql server resource group to the standby node , when the disk resource ready bring online in the passive node ,then occur exception. because the original dependency disk resource the drive letter is 'K:' , BUT when the disk bring online , it
automatically reassign new drive letter 'H:' , So the sql server resource couldnot bring online . And After Manual modify the drive letter to 'K:' in the passive node , It Works ! So my question is why it not use the original drive letter
and reassign a new one . what reasons would be cause it ? mount point ? Some log as follows:
00001cbc.000004e0::2015/03/12-14:41:11.377 WARN [RES] Physical Disk <FltLowestPrice_K>: OnlineThread: Failed to set volguid \??\Volume{e32c13d5-02e6-4924-a2d9-59a6fae1a1be}. Error: 183.
00001cbc.000004e0::2015/03/12-14:41:11.377 INFO [RES] Physical Disk <FltLowestPrice_K>: Found 2 mount points for device \Device\Harddisk8\Partition2
00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO [RES] Physical Disk: PNP: Update volume exit, status 1168
00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO [RES] Physical Disk: PNP: Updating volume
\\?\STORAGE#Volume#{1a8ddb8e-fe43-11e2-b7c5-6c3be5a5cdca}#0000000008100000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b}
00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO [RES] Physical Disk: PNP: Update volume exit, status 5023
00001cbc.000004e0::2015/03/12-14:41:11.377 ERR [RES] Physical Disk: Failed to get volname for drive H:\, status 2
00001cbc.000004e0::2015/03/12-14:41:11.377 INFO [RES] Physical Disk <FltLowestPrice_K>: VolumeIsNtfs: Volume
\\?\GLOBALROOT\Device\Harddisk8\Partition2\ has FS type NTFS
00001cbc.000004e0::2015/03/12-14:41:11.377 INFO [RES] Physical Disk: Volume
\\?\GLOBALROOT\Device\Harddisk8\Partition2\ has FS type NTFS
00001cbc.000004e0::2015/03/12-14:41:11.377 INFO [RES] Physical Disk: MountPoint H:\ points to volume
\\?\Volume{e32c13d5-02e6-4924-a2d9-59a6fae1a1be}\

Sounds like you have an cluster hive that is out of date/bad, or some registry settings which are incorrect. You'll want to have this question transferred to the windows forum as that's really what you're asking about.
-Sean
The views, opinions, and posts do not reflect those of my company and are solely my own. No warranty, service, or results are expressed or implied.

Reporting Services as a generic service in a failover cluster group?

There is some confusion on whether or not Microsoft will support a Reporting Services deployment on a failover cluster using scale-out, and adding the Reporting Services service as a generic service in a cluster group to achieve active-passive high
availability.
A deployment like this is described by Lukasz Pawlowski (Program Manager on the Reporting Services team) in this blog article
http://blogs.msdn.com/b/lukaszp/archive/2009/10/28/high-availability-frequently-asked-questions-about-failover-clustering-and-reporting-services.aspx. There it is stated that it can be done, and what needs to be considered when doing such a deployment.
This article (http://technet.microsoft.com/en-us/library/bb630402.aspx) on the other hand states: "Failover clustering is supported only for the report server database; you
cannot run the Report Server service as part of a failover cluster."
This is somewhat confusing to me. Can I expect to receive support from Microsoft for a setup like this?
Best Regards,
Peter Wretmo

Hi Peter,
Thanks for your posting.
As Lukasz said in the
blog, failover clustering with SSRS is possible. However, during the failover there is some time during which users will receive errors when accessing SSRS since the network names will resolve to a computer where the SSRS service is in the process of starting.
Besides, there are several considerations and manual steps involved on your part before configuring the failover clustering with SSRS service:
Impact on other applications that share the SQL Server. One common idea is to put SSRS in the same cluster group as SQL Server. If SQL Server is hosting multiple application databases, other than just the SSRS databases, a failure in SSRS may cause
a significant failover impact to the entire environment.
SSRS fails over independently of SQL Server.
If SSRS is running, it is going to do work on behalf of the overall deployment so it will be Active. To make SSRS Passive is to stop the SSRS service on all passive cluster nodes.
So, SSRS is designed to achieve High Availability through the Scale-Out deployment. Though a failover clustered SSRS deployment is achievable, it is not the best option for achieving High Availability with Reporting Services.
Regards,
Mike Yin
If you have any feedback on our support, please click
here
Mike Yin
TechNet Community Support

Advice Requested - High Availability WITHOUT Failover Clustering

We're creating an entirely new Hyper-V virtualized environment on Server 2012 R2. My question is: Can we accomplish high availability WITHOUT using failover clustering?
So, I don't really have anything AGAINST failover clustering, and we will happily use it if it's the right solution for us, but to be honest, we really don't want ANYTHING to happen automatically when it comes to failover. Here's what I mean:
In this new environment, we have architected 2 identical, very capable Hyper-V physical hosts, each of which will run several VMs comprising the equivalent of a scaled-back version of our entire environment. In other words, there is at least a domain
controller, multiple web servers, and a (mirrored/HA/AlwaysOn) SQL Server 2012 VM running on each host, along with a few other miscellaneous one-off worker-bee VMs doing things like system monitoring. The SQL Server VM on each host has about 75% of the
physical memory resources dedicated to it (for performance reasons). We need pretty much the full horsepower of both machines up and going at all times under normal conditions.
So now, to high availability. The standard approach is to use failover clustering, but I am concerned that if these hosts are clustered, we'll have the equivalent of just 50% hardware capacity going at all times, with full failover in place of course
(we are using an iSCSI SAN for storage).
BUT, if these hosts are NOT clustered, and one of them is suddenly switched off, experiences some kind of catastrophic failure, or simply needs to be rebooted while applying WSUS patches, the SQL Server HA will fail over (so all databases will remain up
and going on the surviving VM), and the environment would continue functioning at somewhat reduced capacity until the failed host is restarted. With this approach, it seems to me that we would be running at 100% for the most part, and running at 50%
or so only in the event of a major failure, rather than running at 50% ALL the time.
Of course, in the event of a catastrophic failure, I'm also thinking that the one-off worker-bee VMs could be replicated to the alternate host so they could be started on the surviving host if needed during a long-term outage.
So basically, I am very interested in the thoughts of others with experience regarding taking this approach to Hyper-V architecture, as it seems as if failover clustering is almost a given when it comes to best practices and high availability. I guess
I'm looking for validation on my thinking.
So what do you think? What am I missing or forgetting? What will we LOSE if we go with a NON-clustered high-availability environment as I've described it?
Thanks in advance for your thoughts!

Udo -
Yes your responses are very helpful.
Can we use the built-in Server 2012 iSCSI Target Server role to convert the local RAID disks into an iSCSI LUN that the VMs could access? Or can that not run on the same physical box as the Hyper-V host? I guess if the physical box goes down
the LUN would go down anyway, huh? Or can I cluster that role (iSCSI target) as well? If not, do you have any other specific product suggestions I can research, or do I just end up wasting this 12TB of local disk storage?
- Morgan
That's a bad idea. First of all Microsoft iSCSI target is slow (it's non-cached @ server side). So if you really decided to use dedicated hardware for storage (maybe you do have a reason I don't know...) and if you're fine with your storage being a single
point of failure (OK, maybe your RTOs and RPOs are fair enough) then at least use SMB share. SMB at least does cache I/O on both client and server sides and also you can use Storage Spaces as a back end of it (non-clustered) so read "write back flash cache
for cheap". See:
What's new in iSCSI target with Windows Server 2012 R2
http://technet.microsoft.com/en-us/library/dn305893.aspx
Improved optimization to allow disk-level caching
Updated
iSCSI Target Server now sets the disk cache bypass flag on a hosting disk I/O, through Force Unit Access (FUA), only when the issuing initiator explicitly requests it. This change can potentially improve performance.
Previously, iSCSI Target Server would always set the disk cache bypass flag on all I/O’s. System cache bypass functionality remains unchanged in iSCSI Target Server; for instance, the file system cache on the target server is always bypassed.
Yes you can cluster iSCSI target from Microsoft but a) it would be SLOW as there would be only active-passive I/O model (no real use from MPIO between multiple hosts) and b) that would require a shared storage for Windows Cluster. What for? Scenario was
usable with a) there was no virtual FC so guest VM cluster could not use FC LUs and b) there was no shared VHDX so SAS could not be used for guest VM cluster as well. Now both are present so scenario is useless: just export your existing shared storage without
any Microsoft iSCSI target and you'll be happy. For references see:
MSFT iSCSI Target in HA mode
http://technet.microsoft.com/en-us/library/gg232621(v=ws.10).aspx
Cluster MSFT iSCSI Target with SAS back end
http://techontip.wordpress.com/2011/05/03/microsoft-iscsi-target-cluster-building-walkthrough/
Guest
VM Cluster Storage Options
http://technet.microsoft.com/en-us/library/dn440540.aspx
Storage options
The following tables lists the storage types that you can use to provide shared storage for a guest cluster.
Storage Type
Description
Shared virtual hard disk
New in Windows Server 2012 R2, you can configure multiple virtual machines to connect to and use a single virtual hard disk (.vhdx) file. Each virtual machine can access the virtual hard disk just like servers
would connect to the same LUN in a storage area network (SAN). For more information, see Deploy a Guest Cluster Using a Shared Virtual Hard Disk.
Virtual Fibre Channel
Introduced in Windows Server 2012, virtual Fibre Channel enables you to connect virtual machines to LUNs on a Fibre Channel SAN. For more information, see Hyper-V
Virtual Fibre Channel Overview.
iSCSI
The iSCSI initiator inside a virtual machine enables you to connect over the network to an iSCSI target. For more information, see iSCSI
Target Block Storage Overviewand the blog post Introduction of iSCSI Target in Windows
Server 2012.
Storage requirements depend on the clustered roles that run on the cluster. Most clustered roles use clustered storage, where the storage is available on any cluster node that runs a clustered
role. Examples of clustered storage include Physical Disk resources and Cluster Shared Volumes (CSV). Some roles do not require storage that is managed by the cluster. For example, you can configure Microsoft SQL Server to use availability groups that replicate
the data between nodes. Other clustered roles may use Server Message Block (SMB) shares or Network File System (NFS) shares as data stores that any cluster node can access.
Sure you can use third-party software to replicate 12TB of your storage between just a pair of nodes to create a fully fault-tolerant cluster. See (there's also a free offering):
StarWind VSAN [Virtual SAN] for Hyper-V
http://www.starwindsoftware.com/native-san-for-hyper-v-free-edition
Product is similar to what VMware had just released for ESXi except it's selling for ~2 years so is mature :)
There are other guys doing this say DataCore (more playing for Windows-based FC) and SteelEye (more about geo-cluster & replication). But you may want to give them a try.
Hope this helped a bit :)
StarWind VSAN [Virtual SAN] clusters Hyper-V without SAS, Fibre Channel, SMB 3.0 or iSCSI, uses Ethernet to mirror internally mounted SATA disks between hosts.

Cisco ASA 5505 Failover issue..

Hi,
I am having two firewalls (cisco ASA 5505) which is configured as active/standby Mode.It was running smoothly for more than an year,but last week the secondary firewall got failed and It made my whole network down.then I just removed the connectivity of the secondary firewall and run only the primary one.when I login by console i found out that the failover has been disabled .So again I connected to the Network and enabled the firewall.After a couple of days same issue happen.This time I take down the Secondary firewall erased the Flash.Reloaded the IOS image.Configured the failover and connected to the primary for the replication of configs.It found out the Active Mate.Replicated the configs and got synced...But after sync the same thing happened,The whole network gone down .I juz done the same thing removed the secondary firewall.Network came up.I feel there is some thing with failover thing ,but couldnt fin out :( .And the firewalls are in Router Mode.

Please find the logs...
Secondary Firewall While Sync..
cisco-asa(config)# sh failover
Failover On
Failover unit Secondary
Failover LAN Interface: e0/7 Vlan3 (up)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 4 of 23 maximum
Version: Ours 8.2(5), Mate 8.2(5)
Last Failover at: 06:01:10 GMT Apr 29 2015
This host: Secondary - Sync Config
Active time: 55 (sec)
slot 0: ASA5505 hw/sw rev (1.0/8.2(5)) status (Up Sys)
Interface outside (27.251.167.246): No Link (Waiting)
Interface inside (10.11.0.20): No Link (Waiting)
Interface mgmt (10.11.200.21): No Link (Waiting)
slot 1: empty
Other host: Primary - Active
Active time: 177303 (sec)
slot 0: ASA5505 hw/sw rev (1.0/8.2(5)) status (Up Sys)
Interface outside (27.251.167.247): Unknown (Waiting)
Interface inside (10.11.0.21): Unknown (Waiting)
Interface mgmt (10.11.200.22): Unknown (Waiting)
slot 1: empty
=======================================================================================
Secondary Firewall Just after Sync ,Active (primary Firewall got rebootted)
cisco-asa# sh failover
Failover On
Failover unit Secondary
Failover LAN Interface: e0/7 Vlan3 (up)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 4 of 23 maximum
Version: Ours 8.2(5), Mate Unknown
Last Failover at: 06:06:12 GMT Apr 29 2015
This host: Secondary - Active
Active time: 44 (sec)
slot 0: ASA5505 hw/sw rev (1.0/8.2(5)) status (Up Sys)
Interface outside (27.251.167.246): Normal (Waiting)
Interface inside (10.11.0.20): No Link (Waiting)
Interface mgmt (10.11.200.21): No Link (Waiting)
slot 1: empty
Other host: Primary - Not Detected
Active time: 0 (sec)
slot 0: empty
Interface outside (27.251.167.247): Unknown (Waiting)
Interface inside (10.11.0.21): Unknown (Waiting)
Interface mgmt (10.11.200.22): Unknown (Waiting)
slot 1: empty
==========================================================================================
After Active firewall got rebootted failover off,whole network gone down.
cisco-asa# sh failover
Failover Off
Failover unit Secondary
Failover LAN Interface: e0/7 Vlan3 (up)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 4 of 23 maximum
===========================================================================================
Primary Firewall after rebootting
cisco-asa# sh failover
Failover On
Failover unit Primary
Failover LAN Interface: e0/7 Vlan3 (Failed - No Switchover)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 4 of 23 maximum
Version: Ours 8.2(5), Mate Unknown
Last Failover at: 06:17:29 GMT Apr 29 2015
This host: Primary - Active
Active time: 24707 (sec)
slot 0: ASA5505 hw/sw rev (1.0/8.2(5)) status (Up Sys)
Interface outside (27.251.167.246): Normal (Waiting)
Interface inside (10.11.0.20): Normal (Waiting)
Interface mgmt (10.11.200.21): Normal (Waiting)
slot 1: empty
Other host: Secondary - Failed
Active time: 0 (sec)
slot 0: empty
Interface outside (27.251.167.247): Unknown (Waiting)
Interface inside (10.11.0.21): Unknown (Waiting)
Interface mgmt (10.11.200.22): Unknown (Waiting)
slot 1: empty
cisco-asa# sh failover history
==========================================================================
From State To State Reason
==========================================================================
06:16:43 GMT Apr 29 2015
Not Detected Negotiation No Error
06:17:29 GMT Apr 29 2015
Negotiation Just Active No Active unit found
06:17:29 GMT Apr 29 2015
Just Active Active Drain No Active unit found
06:17:29 GMT Apr 29 2015
Active Drain Active Applying Config No Active unit found
06:17:29 GMT Apr 29 2015
Active Applying Config Active Config Applied No Active unit found
06:17:29 GMT Apr 29 2015
Active Config Applied Active No Active unit found
==========================================================================
cisco-asa#
cisco-asa# sh failover state
State Last Failure Reason Date/Time
This host - Primary
Active None
Other host - Secondary
Failed Comm Failure 06:17:43 GMT Apr 29 2015
====Configuration State===
====Communication State===
==================================================================================
Secondary Firewall
cisc-asa# sh failover h
==========================================================================
From State To State Reason
==========================================================================
06:16:32 GMT Apr 29 2015
Not Detected Negotiation No Error
06:17:05 GMT Apr 29 2015
Negotiation Disabled Set by the config command
==========================================================================
cisco-asa# sh failover
Failover Off
Failover unit Secondary
Failover LAN Interface: e0/7 Vlan3 (down)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 4 of 23 maximum
ecs-pune-fw-01# sh failover h
==========================================================================
From State To State Reason
==========================================================================
06:16:32 GMT Apr 29 2015
Not Detected Negotiation No Error
06:17:05 GMT Apr 29 2015
Negotiation Disabled Set by the config command
==========================================================================
cisco-asa# sh failover state
State Last Failure Reason Date/Time
This host - Secondary
Disabled None
Other host - Primary
Not Detected None
====Configuration State===
====Communication State===
Thanks...

Failover Failure.

Similar Messages

Maybe you are looking for