Failover Failure.

First let me apologize for my ignorance and please assume I know nothing! I am managing a hyper-v environment that someone else configured. We are experiencing and issue that I will try to describe to the best of my ability. 
We have two physical locations that each contain a SAN and three Hyper-V hosts. He have 3 CSV's (high, medium, and low priority) as well as a witness disk. One location is our primary location (PL) and the other our disaster recovery location (DRL)
if the hosts at the PL have the VM's residing on them (as well as the CSV's) and all the hosts at the PL go down, everything will failover to our DRL and come back up with no problems. However, if our DRL goes down EVEN IF EVERYTHING IS RUNNING
AT THE primary location, everything goes down! All of the VM's attempt to failover to the DRL that went offline! I am very confused by this! It has caused major problems a couple of times and I have no idea what to do!
Any help would be greatly appreciated.
Thanks!

Although you aren't mentioning it, I assume that there's some storage replication involved here?
If yes, then you should engage with the vendor in order to point at the configuration that is causing this.
Using the Microsoft stack in the same scenario, you would use Hyper-V Replica with Azure Site recovery and System Center, that would control and orchestrate the Disaster Recovery scenarios for you.
Also note that I am referring to Disaster Recovery - and not High Availability in this case as the MS solutions are DR and not HA across sites. 
-kn
Kristian (Virtualization and some coffee: http://kristiannese.blogspot.com )

Similar Messages

  • IPMP failover Failure

    I have one qustion about IPMP under solaris 9 9/04 SPARC 64-bit
    My OS: with EIS 3.1.1 patches
    Clusterware: Sun Cluster 3.1u4 with EIS 3.1.1 patches
    My IPMP group contains two NICs: ce0 & ce3.
    Two NICs are linked to CISCO 4506
    IPMP configuration Files as the following:
    */etc/hostname.ce0*
    lamp-test2 netmask + broadcast + group ipmp1 deprecated -failover up
    */etc/hostname.ce3*
    lamp netmask + broadcast + group ipmp1 up \
    addif lamp-test1 netmask + broadcast + deprecated -failover up
    I am alway using the default in.mpathd configuration file
    But once I pull out ceN NIC's cable, my IPMP group will complaint that:
    +Mar 18 18:06:34 lamp in.mpathd[2770]: [ID 215189 daemon.error] The link has gone down on ce0+
    +Mar 18 18:06:34 lamp in.mpathd[2770]: [ID 594170 daemon.error] NIC failure detected on ce0 of group ipmp1+
    +Mar 18 18:06:34 lamp ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '00:03:ba:b0:5d:54' trying to be our address 192.168.217.020!+
    +Mar 18 18:06:34 lamp in.mpathd[2770]: [ID 832587 daemon.error] Successfully failed over from NIC ge0 to NIC ce0+
    +Mar 18 18:06:34 lamp ip: [ID 903730 kern.warning] WARNING: IP: Hardware address '00:03:ba:b0:5d:54' trying to be our address 192.168.217.020!+
    Why do solaris OS tell us Hardware Address conflict ?
    But I'm sure this IPMP configuration files can cowork finely with CISCO 2950 and DLINK mini switch.
    By the way, there are no the same MACs in the LAN.
    I should modify some CICSO parameters?
    Your advicement is so appreciated!!!

    lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
         inet 127.0.0.1 netmask ff000000
    ce0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
         inet 192.168.217.6 netmask ffffff00 broadcast 192.168.217.255
         groupname ipmp1
         ether 0:3:ba:b0:5d:54
    ce3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
         inet 192.168.217.20 netmask ffffff00 broadcast 192.168.217.255
         groupname ipmp1
         ether 0:3:ba:95:5d:6e
    ce3:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 4
         inet 192.168.217.4 netmask ffffff00 broadcast 192.168.217.255
    General speaking,
    When I switch float IP from ce0 to ce3, IPMP will say ce0 MAC is "trying to be our address ....", then ce0 test IP failed, FLOAT IP didn't failover.
    When I switch float IP from ce3 to ce0, IPMP will say ce3 MAC is "trying to be our address ....",
    then ce0 test IP failed, FLOAT IP didn't failover.
    In my viewpoint, float NIC MAC & address information may be cached in CICSO device's RAM, not released in time, I think.

  • ACE 4710 FT failover failure

    Hello,
    I am running redundant ACE 4710 appliances running A3(2.7).  I have five FT groups configured along with FT Tracking and when the vlans fail due to physical links being down, the contexts to do not failover.  If one of the ACE boxes fail completely, failover works fine.  I have included the FT config from one of the contexts below.  I have a case open with TAC and the Engineer is suggesting the use of a query interface in additon to FT Tracking.  We have had two incidents on separate contexts where we lost a physical interface on the primary ACE, one for the maintenance of the core switch, the other was a cable disconnect and we are unable to understand why the indivdual context didn't failover.  Any ideas would be much appreciated.  Let me know if more info/configs are needed.
    Dave
    ft interface vlan 900
      ip address 10.10.10.1 255.255.255.0
      peer ip address 10.10.10.2 255.255.255.0
      no shutdown
    ft peer 1
      heartbeat interval 300
      heartbeat count 20
      ft-interface vlan 900
    ft group 3
      peer 1
      no preempt
      priority 210
      peer priority 120
      associate-context XYZ
      inservice
    FT Group                     : 3
    No. of Contexts             : 1
    Context Name                 : XYZ
    Context Id                   : 2
    Configured Status           : in-service
    Maintenance mode             : MAINT_MODE_OFF
    My State                   : FSM_FT_STATE_ACTIVE
    My Config Priority           : 210
    My Net Priority             : 210
    My Preempt                   : Disabled
    Peer State                   : FSM_FT_STATE_STANDBY_HOT
    Peer Config Priority         : 120
    Peer Net Priority           : 120
    Peer Preempt                 : Disabled
    Peer Id                     : 1
    Last State Change time       : Wed Jan 11 13:14:16 2012
    Running cfg sync enabled     : Enabled
    Running cfg sync status     : Running configuration sync has completed
    Startup cfg sync enabled     : Enabled
    Startup cfg sync status     : Startup configuration sync has completed
    Bulk sync done for ARP: 0
    Bulk sync done for LB: 0
    Bulk sync done for ICM: 0
    show int
    vlan424 is up, VLAN up on the physical port
    Hardware type is VLAN
    MAC address is 00:1e:68:1e:ba:b7
    Virtual MAC address is 00:0b:fc:fe:1b:03
    Mode : routed
    IP address is 10.104.224.6 netmask is 255.255.255.0
    FT status is active
    Description:"New Server VIP and real"
    MTU: 1500 bytes
    Last cleared: never
    Last Changed: Sun Mar 11 01:13:12 2012
    No of transitions: 3
    Alias IP address is 10.104.224.5 netmask is 255.255.255.0
    Peer IP address is 10.104.224.7 Peer IP netmask is 255.255.255.0
    Assigned on the physical port, up on the physical port
    Previous State: Sun Mar 11 00:04:57 2012, VLAN not up on the physical port
    Previous State: Sun Sep 18 10:21:15 2011, administratively up
         3991888419 unicast packets input, 23734607976687 bytes
         20246934 multicast, 174801 broadcast
         0 input errors, 0 unknown, 0 ignored, 0 unicast RPF drops
         1609345958 unicast packets output, 23690663385228 bytes
         7 multicast, 55807 broadcast
         0 output errors, 0 ignored

    Dave,
    For tracking to work you need to have preempt enabled. Can you try enabling preempt under the ft group and test your tracking again? Another potential issue you may run into is if your tracking is not lowering the priority enough when it fails. The difference between the active and standby device is 100. If you are not decrementing the priority greater than this value even if priority is enabled it will not lower it enough to force the failover. If after enabling preempt on this group the tracking still does not work as expected send you whole config for us to look at.
    Regarding the query interface; This is not a bad idea. It will help prevent an active active situation if there is a problem with the ft link between the two modules.
    Thanks
    Jim

  • Storage issues during live clone in Server 2012 R2

    We just set up a new 2012 R2 cluster and we are having troubles with live cloning. I have seen it work in our environment but usually it fails.
    We get this error in our cluster events:
    Cluster Shared Volume 'Volume8' ('Volume8') has entered a paused state because of '(c0000435)'. All I/O will temporarily be queued until a path to the volume is reestablished.
    and the clone fails with:
    Error (2916)
    VMM is unable to complete the request. The connection to the agent ServerName was lost.
    WinRM: URL: [http://ServerPath], Verb: [INVOKE], Method: [GetError], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/microsoft/bits/BitsClientJob?JobId={DC6D9530-26F1-4F95-A1AB-0197B1406F98}]
    Not found (404) (0x80190194)
    The storage is a Dell EqualLogic PS6500 and its connected to two S55 Force 10 48 port switches.  The client side of the network is hooked to two Cisco 2960G 48 port switches.
    I can't find any information on error c0000435.  Has anyone head of this issue?

    Hi,
    I am Chetan Savade from Symantec Technical Support Team.
    Few issues have been reported with the older version of SEP. I would recommend to test the connection using the latest version of SEP. SEP 12.1 RU4 MP1b is the latest version.
    Reported issues:
    Cluster environment does not fail over
    Fix ID: 2731793
    Symptom: A cluster environment does not fail over when Symantec Endpoint Protection client is installed due to inability to unload drivers.
    Solution: Modified a driver to properly detach from a volume when the volume dismounts
    Reference:
    http://www.symantec.com/docs/TECH199676
    Cluster is unable to fail over with AutoProtect enabled
    Fix ID: 3246552
    Symptom:  With AutoProtect enabled, an active cluster node cannot fail over and hangs.
    Solution: Corrected a delay in the AutoProtect volume dismount that resulted in cluster failover failures
    http://www.symantec.com/docs/TECH211972
    Best Regards,
    Chetan

  • RE: Hard Failures, KeepAlive, and Failover --Follow-up

    Hi,
    It's a really challenging question. However, what do you want to do after
    the network crash? Failover or just stop the service? Should we assume
    that when the network is down, and so do your name service?
    One idea is to use externalconnection to "listen" to your external non-forte
    alarm, so do "whatever" after you receive the alarm instead of letting the
    "logical connection" to time out or hang.
    Regards,
    Peter Sham.
    -----Original Message-----
    From: Michael Lee [SMTP:[email protected]]
    Sent: Wednesday, June 16, 1999 12:44 AM
    To: [email protected]
    Subject: Hard Failures, KeepAlive, and Failover -- Follow-up
    I've gotten a handful of responses to my original post, and the suggested
    solutions are all variations on the same theme -- periodically ping remote
    nodes/partitions and then react when the node/partition goes down. In
    other circumstance this would work, but unless I'm missing something this
    solution doesn't solve the problem I'm running into.
    Some background...
    When a connection is set up between partitions on two different nodes,
    Forte is effectively establishing two connections: a "physical
    connection"
    over TCP/IP between two ports and a "logical connection" between the two
    partitions (running on top of the physical connection). Once a connection
    is established between two partitions Forte assumes the logical connection
    is valid until one of two things happen:
    1) The logical connection is broken (by shutting down a partition from
    Econsole/Escript, by killing a node manager, by terminating the ftexec,
    etc.)
    2) Forte detects that the physical connection is broken (via its KeepAlive
    functionality).
    If a physical connection is broken (via a cut cable or power-off
    condition), and Forte has not yet detected the situation (via a KeepAlive
    failure), the logical connection is still valid and Forte will still allow
    method calls on the remote partition. In effect, Forte thinks the remote
    partition is still up and running. In this situation, any method calls
    made after the physical connection has been broken will simply hang. No
    exceptions are generated and failover does not occur.
    However, once a KeepAlive failure is detected all is made right.
    Unfortunately, the lowest-bound latency of KeepAlive is greater than one
    second, and we need to detect and react to hard failures in the 250-500ms
    range. Using technology outside of Forte we are able to detect the hard
    failures within the required times, but we haven't been able to get Forte
    to react to this "outside" knowledge. Here's why:
    Since Forte has not yet detected a KeepAlive failure, the logical
    connection to the remote partition is still "valid". Although there are a
    number of mechanisms that would allow a logical connection to be broken,
    they all assume a valid physical connection -- which, of course, we don't
    have!
    It appears I'm in a "Catch-22" situation: In order to break a logical
    connection between partitions, I need a valid physical connection. But
    the
    reason I'm trying to break the logical connection in the first place is
    that I know (but Forte doesn't yet know) that the physical connection has
    been broken.
    If anyone knows a way around this Catch-22, please let me know.
    Mike
    To unsubscribe, email '[email protected]' with
    'unsubscribe forte-users' as the body of the message.
    Searchable thread archive <URL:http://pinehurst.sageit.com/listarchive/>-
    To unsubscribe, email '[email protected]' with
    'unsubscribe forte-users' as the body of the message.
    Searchable thread archive <URL:http://pinehurst.sageit.com/listarchive/>

    Make sure you chose the right format, and as far as partitioning in concerned, you have to select at least one partition, which will be the entire drive.

  • NAC Server and Manager Failure with out failover

    Hi, I'm working on a NAC L2 OOB wired design with 1 CAM and 1 CAS. I've not included failover to the design for the obvious financial reasons, and want to figure out the affect that the network would have in the case of a failure.
    1.)What would the users experience in the event of a CAS failure? both currently online users and new users
    2.)What would the users experience in the event of a CAM failure? both currently online users and new users
    3.) Are there any ideas on how to minimize the effect on the users in the event of a failure, w/o adding failover bundle ?
    Many thanks for your valuable input in advance.
    Din

    If you are out OOB, then a CAS failure would not affect logged in, remediated users, anyone not logged in would be stuck because when the CAS fails, the connectivity to the CAM would be lost.
    If the CAM fails, you will not be able to log in, do remediation or anything. VLAN settings on switches will be frozen where they are at the moment of CAM faiure. Not that you could easily connect to switches, change vlans to allow users onto the LAN and the CAM would accept that passively when restarted but if you use the Agent it will probably want to log in again, which is not a huge issue if you use AD SSO.
    Dan Sichel
    Dan S.

  • Disk Witness Failover with simulated network failure

    Hello everyone, 
    I am running two windows server 2012 R2 machines clustered with a disk witness.  The disk witness is a LUN created on our SAN, presented to both machines.  SAN is connected through two fiber channels per server both servers are networked on split
    out network connections teamed together.  Now on a hard server fault or simulated power failure  (i.e. pulling the power cords from a server) Disk witness will fail over to second node and Cluster survives.  But, when simulating a Network card
    failure (disconnecting the cat5) from the Node hosting the Disk witness, I see the cluster attempt to offload hosting to UP node.   But, the Disk witness will not come online.   I think that the issue is because the server is technically
    still running and the disk Witness has not really failed on the primary host, so it never releases ownership.  I am new to clustering and could use a little guidance here.  Is there anyway to make the disk witness
    Thank you 

    Hi,
    Please try to modify network settings for a failover cluster 
    1. In the Failover Cluster Manager snap-in, if the cluster that you want to configure is not displayed, in the console tree, right-click Failover Cluster Manager, click Manage
    a Cluster, and then select or specify the cluster that you want.
    2. If the console tree is collapsed, expand the tree under the cluster that you want to configure.
    3. Expand Networks.
    4. Right-click the network that you want to modify settings for, and then click Properties.
    5. If needed, change the name of the network.
    6. Select one of the following options:
    -Allow cluster network communication on this network
    If you select this option and you want the network to be used by the nodes only (not clients), clear Allow clients to connect through this network. Otherwise, make sure it
    is selected.
    -Do not allow cluster network communication on this network
     Select this option if you are using a network only for iSCSI (communication with storage) or only for backup. (These are among the most common reasons for
    selecting this option.)
    Quote from:
    Modify Network Settings for a Failover Cluster
    http://technet.microsoft.com/en-us/library/cc725775.aspx
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Does Weblogic12c support Application Failover ? If yes, then how does Weblogic12c detect an application failure (OutOfMemoryException)?

    Hi all,
    Need help to setup High Availability at my workplace, can somebody please tell that Weblogic12c support Application Failover ?
    If yes, then how does Weblogic12c detect an application failure (OutOfMemoryException)?
    WebLogic Server - General@

    Hi there user,
    you can achieve HA in different levels:
    1. On a single machine - here you need to set up nodemanager. When started by nodemanager, any server failures will be detected and the nodemanager will try to restart it. OOME is an exception thrown by the JVM and the server state should go FAILED at some point then the NM will try to restart it. Nodemanager is the simplest HA solution you can and must implement for production environment;
    2. On redundant machine - you can configure WLS clustering, but you will need more complex environment, i.e. you will need a load balancer in front of the cluster to reverse proxy the requests. This scenario is can also use nodemanager to control the WLS instances on each machine
    3. Cluster with server/service migration - the most complex scenario where in cases of machine failure the WLS cluster can "relocate" resources (services and whole severs) to spare machines.
    In your case OOME should cause the JVM respectively WLS to be unresponsive, hence the nodemanager will detect this at some point and will try to restart the WLS.
    Hope this helps,
    A.

  • Data Guard Failover after primary site network failure or disconnect.

    Hello Experts:
    I'll try to be clear and specific with my issue:
    Environment:
    Two nodes with NO shared storage (I don't have an Observer running).
    Veritas Cluser Server (VCS) with Data Guar Agent. (I don't use the Broker. Data Guard agent "takes care" of the switchover and failover).
    Two single instance databases, one per node. NO RAC.
    What I'm being able to perform with no issues:
    Manual switch(over) of the primary database by running VCS command "hagrp -switch oraDG_group -to standby_node"
    Automatic fail(over) when primary node is rebooted with "reboot" or "init"
    Automatic fail(over) when primary node is shut down with "shutdown".
    What I'm NOT being able to perform:
    If I manually unplug the network cables from the primary site (all the network, not only the link between primary and standby node so, it's like a server unplug from the energy source).
    Same situation happens if I manually disconnect the server from the power.
    This is the alert logs I have:
    This is the portion of the alert log at Standby site when Real Time Replication is working fine:
    Recovery of Online Redo Log: Thread 1 Group 4 Seq 7 Reading mem 0
      Mem# 0: /u02/oracle/fast_recovery_area/standby_db/onlinelog/o1_mf_4_9c3tk3dy_.log
    At this moment, node1 (Primary) is completely disconnected from the network. SEE at the end when the database (standby which should be converted to PRIMARY) is not getting all the archived logs from the Primary due to the abnormal disconnect from the network:
    Identified End-Of-Redo (failover) for thread 1 sequence 7 at SCN 0xffff.ffffffff
    Incomplete Recovery applied until change 15922544 time 12/23/2013 17:12:48
    Media Recovery Complete (primary_db)
    Terminal Recovery: successful completion
    Forcing ARSCN to IRSCN for TR 0:15922544
    Mon Dec 23 17:13:22 2013
    ARCH: Archival stopped, error occurred. Will continue retrying
    ORACLE Instance primary_db - Archival ErrorAttempt to set limbo arscn 0:15922544 irscn 0:15922544
    ORA-16014: log 4 sequence# 7 not archived, no available destinations
    ORA-00312: online log 4 thread 1: '/u02/oracle/fast_recovery_area/standby_db/onlinelog/o1_mf_4_9c3tk3dy_.log'
    Resetting standby activation ID 2071848820 (0x7b7de774)
    Completed:  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH
    Mon Dec 23 17:13:33 2013
    ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH
    Terminal Recovery: applying standby redo logs.
    Terminal Recovery: thread 1 seq# 7 redo required
    Terminal Recovery:
    Recovery of Online Redo Log: Thread 1 Group 4 Seq 7 Reading mem 0
      Mem# 0: /u02/oracle/fast_recovery_area/standby_db/onlinelog/o1_mf_4_9c3tk3dy_.log
    Identified End-Of-Redo (failover) for thread 1 sequence 7 at SCN 0xffff.ffffffff
    Incomplete Recovery applied until change 15922544 time 12/23/2013 17:12:48
    Media Recovery Complete (primary_db)
    Terminal Recovery: successful completion
    Forcing ARSCN to IRSCN for TR 0:15922544
    Mon Dec 23 17:13:22 2013
    ARCH: Archival stopped, error occurred. Will continue retrying
    ORACLE Instance primary_db - Archival ErrorAttempt to set limbo arscn 0:15922544 irscn 0:15922544
    ORA-16014: log 4 sequence# 7 not archived, no available destinations
    ORA-00312: online log 4 thread 1: '/u02/oracle/fast_recovery_area/standby_db/onlinelog/o1_mf_4_9c3tk3dy_.log'
    Resetting standby activation ID 2071848820 (0x7b7de774)
    Completed:  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH
    Mon Dec 23 17:13:33 2013
    ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH
    Attempt to do a Terminal Recovery (primary_db)
    Media Recovery Start: Managed Standby Recovery (primary_db)
    started logmerger process
    Mon Dec 23 17:13:33 2013
    Managed Standby Recovery not using Real Time Apply
    Media Recovery failed with error 16157
    Recovery Slave PR00 previously exited with exception 283
    ORA-283 signalled during:  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH...
    Mon Dec 23 17:13:34 2013
    Shutting down instance (immediate)
    Shutting down instance: further logons disabled
    Stopping background process MMNL
    Stopping background process MMON
    License high water mark = 38
    All dispatchers and shared servers shutdown
    ALTER DATABASE CLOSE NORMAL
    ORA-1109 signalled during: ALTER DATABASE CLOSE NORMAL...
    ALTER DATABASE DISMOUNT
    Shutting down archive processes
    Archiving is disabled
    Mon Dec 23 17:13:38 2013
    Mon Dec 23 17:13:38 2013
    Mon Dec 23 17:13:38 2013
    ARCH shutting downARCH shutting down
    ARCH shutting down
    ARC0: Relinquishing active heartbeat ARCH role
    ARC2: Archival stopped
    ARC0: Archival stopped
    ARC1: Archival stopped
    Completed: ALTER DATABASE DISMOUNT
    ARCH: Archival disabled due to shutdown: 1089
    Shutting down archive processes
    Archiving is disabled
    Mon Dec 23 17:13:40 2013
    Stopping background process VKTM
    ARCH: Archival disabled due to shutdown: 1089
    Shutting down archive processes
    Archiving is disabled
    Mon Dec 23 17:13:43 2013
    Instance shutdown complete
    Mon Dec 23 17:13:44 2013
    Adjusting the default value of parameter parallel_max_servers
    from 1280 to 470 due to the value of parameter processes (500)
    Starting ORACLE instance (normal)
    ************************ Large Pages Information *******************
    Per process system memlock (soft) limit = 64 KB
    Total Shared Global Region in Large Pages = 0 KB (0%)
    Large Pages used by this instance: 0 (0 KB)
    Large Pages unused system wide = 0 (0 KB)
    Large Pages configured system wide = 0 (0 KB)
    Large Page size = 2048 KB
    RECOMMENDATION:
      Total System Global Area size is 3762 MB. For optimal performance,
      prior to the next instance restart:
      1. Increase the number of unused large pages by
    at least 1881 (page size 2048 KB, total size 3762 MB) system wide to
      get 100% of the System Global Area allocated with large pages
      2. Large pages are automatically locked into physical memory.
    Increase the per process memlock (soft) limit to at least 3770 MB to lock
    100% System Global Area's large pages into physical memory
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Initial number of CPU is 32
    Number of processor cores in the system is 16
    Number of processor sockets in the system is 2
    CELL communication is configured to use 0 interface(s):
    CELL IP affinity details:
        NUMA status: NUMA system w/ 2 process groups
        cellaffinity.ora status: cannot find affinity map at '/etc/oracle/cell/network-config/cellaffinity.ora' (see trace file for details)
    CELL communication will use 1 IP group(s):
        Grp 0:
    Picked latch-free SCN scheme 3
    Autotune of undo retention is turned on.
    IMODE=BR
    ILAT =88
    LICENSE_MAX_USERS = 0
    SYS auditing is disabled
    NUMA system with 2 nodes detected
    Starting up:
    Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options.
    ORACLE_HOME = /u01/oracle/product/11.2.0.4
    System name:    Linux
    Node name:      node2.localdomain
    Release:        2.6.32-131.0.15.el6.x86_64
    Version:        #1 SMP Tue May 10 15:42:40 EDT 2011
    Machine:        x86_64
    Using parameter settings in server-side spfile /u01/oracle/product/11.2.0.4/dbs/spfileprimary_db.ora
    System parameters with non-default values:
      processes                = 500
      sga_target               = 3760M
      control_files            = "/u02/oracle/orafiles/primary_db/control01.ctl"
      control_files            = "/u01/oracle/fast_recovery_area/primary_db/control02.ctl"
      db_file_name_convert     = "standby_db"
      db_file_name_convert     = "primary_db"
      log_file_name_convert    = "standby_db"
      log_file_name_convert    = "primary_db"
      control_file_record_keep_time= 40
      db_block_size            = 8192
      compatible               = "11.2.0.4.0"
      log_archive_dest_1       = "location=/u02/oracle/archivelogs/primary_db"
      log_archive_dest_2       = "SERVICE=primary_db ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=primary_db"
      log_archive_dest_state_2 = "ENABLE"
      log_archive_min_succeed_dest= 1
      fal_server               = "primary_db"
      log_archive_trace        = 0
      log_archive_config       = "DG_CONFIG=(primary_db,standby_db)"
      log_archive_format       = "%t_%s_%r.dbf"
      log_archive_max_processes= 3
      db_recovery_file_dest    = "/u02/oracle/fast_recovery_area"
      db_recovery_file_dest_size= 30G
      standby_file_management  = "AUTO"
      db_flashback_retention_target= 1440
      undo_tablespace          = "UNDOTBS1"
      remote_login_passwordfile= "EXCLUSIVE"
      db_domain                = ""
      dispatchers              = "(PROTOCOL=TCP) (SERVICE=primary_dbXDB)"
      job_queue_processes      = 0
      audit_file_dest          = "/u01/oracle/admin/primary_db/adump"
      audit_trail              = "DB"
      db_name                  = "primary_db"
      db_unique_name           = "standby_db"
      open_cursors             = 300
      pga_aggregate_target     = 1250M
      dg_broker_start          = FALSE
      diagnostic_dest          = "/u01/oracle"
    Mon Dec 23 17:13:45 2013
    PMON started with pid=2, OS id=29108
    Mon Dec 23 17:13:45 2013
    PSP0 started with pid=3, OS id=29110
    Mon Dec 23 17:13:46 2013
    VKTM started with pid=4, OS id=29125 at elevated priority
    VKTM running at (1)millisec precision with DBRM quantum (100)ms
    Mon Dec 23 17:13:46 2013
    GEN0 started with pid=5, OS id=29129
    Mon Dec 23 17:13:46 2013
    DIAG started with pid=6, OS id=29131
    Mon Dec 23 17:13:46 2013
    DBRM started with pid=7, OS id=29133
    Mon Dec 23 17:13:46 2013
    DIA0 started with pid=8, OS id=29135
    Mon Dec 23 17:13:46 2013
    MMAN started with pid=9, OS id=29137
    Mon Dec 23 17:13:46 2013
    DBW0 started with pid=10, OS id=29139
    Mon Dec 23 17:13:46 2013
    DBW1 started with pid=11, OS id=29141
    Mon Dec 23 17:13:46 2013
    DBW2 started with pid=12, OS id=29143
    Mon Dec 23 17:13:46 2013
    DBW3 started with pid=13, OS id=29145
    Mon Dec 23 17:13:46 2013
    LGWR started with pid=14, OS id=29147
    Mon Dec 23 17:13:46 2013
    CKPT started with pid=15, OS id=29149
    Mon Dec 23 17:13:46 2013
    SMON started with pid=16, OS id=29151
    Mon Dec 23 17:13:46 2013
    RECO started with pid=17, OS id=29153
    Mon Dec 23 17:13:46 2013
    MMON started with pid=18, OS id=29155
    Mon Dec 23 17:13:46 2013
    MMNL started with pid=19, OS id=29157
    starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
    starting up 1 shared server(s) ...
    ORACLE_BASE from environment = /u01/oracle
    Mon Dec 23 17:13:46 2013
    ALTER DATABASE   MOUNT
    ARCH: STARTING ARCH PROCESSES
    Mon Dec 23 17:13:50 2013
    ARC0 started with pid=23, OS id=29210
    ARC0: Archival started
    ARCH: STARTING ARCH PROCESSES COMPLETE
    ARC0: STARTING ARCH PROCESSES
    Successful mount of redo thread 1, with mount id 2071851082
    Mon Dec 23 17:13:51 2013
    ARC1 started with pid=24, OS id=29212
    Allocated 15937344 bytes in shared pool for flashback generation buffer
    Mon Dec 23 17:13:51 2013
    ARC2 started with pid=25, OS id=29214
    Starting background process RVWR
    ARC1: Archival started
    ARC1: Becoming the 'no FAL' ARCH
    ARC1: Becoming the 'no SRL' ARCH
    Mon Dec 23 17:13:51 2013
    RVWR started with pid=26, OS id=29216
    Physical Standby Database mounted.
    Lost write protection disabled
    Completed: ALTER DATABASE   MOUNT
    Mon Dec 23 17:13:51 2013
    ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
             USING CURRENT LOGFILE DISCONNECT FROM SESSION
    Attempt to start background Managed Standby Recovery process (primary_db)
    Mon Dec 23 17:13:51 2013
    MRP0 started with pid=27, OS id=29219
    MRP0: Background Managed Standby Recovery process started (primary_db)
    ARC2: Archival started
    ARC0: STARTING ARCH PROCESSES COMPLETE
    ARC2: Becoming the heartbeat ARCH
    ARC2: Becoming the active heartbeat ARCH
    ARCH: Archival stopped, error occurred. Will continue retrying
    ORACLE Instance primary_db - Archival Error
    ORA-16014: log 4 sequence# 7 not archived, no available destinations
    ORA-00312: online log 4 thread 1: '/u02/oracle/fast_recovery_area/standby_db/onlinelog/o1_mf_4_9c3tk3dy_.log'
    At this moment, I've lost service and I have to wait until the prmiary server goes up again to receive the missing log.
    This is the rest of the log:
    Fatal NI connect error 12543, connecting to:
    (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=primary_db)(CID=(PROGRAM=oracle)(HOST=node2.localdomain)(USER=oracle))))
      VERSION INFORMATION:
            TNS for Linux: Version 11.2.0.4.0 - Production
            TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
      Time: 23-DEC-2013 17:13:52
      Tracing not turned on.
      Tns error struct:
        ns main err code: 12543
    TNS-12543: TNS:destination host unreachable
        ns secondary err code: 12560
        nt main err code: 513
    TNS-00513: Destination host unreachable
        nt secondary err code: 113
        nt OS err code: 0
    Fatal NI connect error 12543, connecting to:
    (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=primary_db)(CID=(PROGRAM=oracle)(HOST=node2.localdomain)(USER=oracle))))
      VERSION INFORMATION:
            TNS for Linux: Version 11.2.0.4.0 - Production
            TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
      Time: 23-DEC-2013 17:13:55
      Tracing not turned on.
      Tns error struct:
        ns main err code: 12543
    TNS-12543: TNS:destination host unreachable
        ns secondary err code: 12560
        nt main err code: 513
    TNS-00513: Destination host unreachable
        nt secondary err code: 113
        nt OS err code: 0
    started logmerger process
    Mon Dec 23 17:13:56 2013
    Managed Standby Recovery starting Real Time Apply
    MRP0: Background Media Recovery terminated with error 16157
    Errors in file /u01/oracle/diag/rdbms/standby_db/primary_db/trace/primary_db_pr00_29230.trc:
    ORA-16157: media recovery not allowed following successful FINISH recovery
    Managed Standby Recovery not using Real Time Apply
    Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
             USING CURRENT LOGFILE DISCONNECT FROM SESSION
    Recovery Slave PR00 previously exited with exception 16157
    MRP0: Background Media Recovery process shutdown (primary_db)
    Fatal NI connect error 12543, connecting to:
    (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=primary_db)(CID=(PROGRAM=oracle)(HOST=node2.localdomain)(USER=oracle))))
      VERSION INFORMATION:
            TNS for Linux: Version 11.2.0.4.0 - Production
            TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
      Time: 23-DEC-2013 17:13:58
      Tracing not turned on.
      Tns error struct:
        ns main err code: 12543
    TNS-12543: TNS:destination host unreachable
        ns secondary err code: 12560
        nt main err code: 513
    TNS-00513: Destination host unreachable
        nt secondary err code: 113
        nt OS err code: 0
    Mon Dec 23 17:14:01 2013
    Fatal NI connect error 12543, connecting to:
    (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=primary_db)(CID=(PROGRAM=oracle)(HOST=node2.localdomain)(USER=oracle))))
      VERSION INFORMATION:
            TNS for Linux: Version 11.2.0.4.0 - Production
            TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
      Time: 23-DEC-2013 17:14:01
      Tracing not turned on.
      Tns error struct:
        ns main err code: 12543
    TNS-12543: TNS:destination host unreachable
        ns secondary err code: 12560
        nt main err code: 513
    TNS-00513: Destination host unreachable
        nt secondary err code: 113
        nt OS err code: 0
    Error 12543 received logging on to the standby
    FAL[client, ARC0]: Error 12543 connecting to primary_db for fetching gap sequence
    Archiver process freed from errors. No longer stopped
    Mon Dec 23 17:15:07 2013
    Using STANDBY_ARCHIVE_DEST parameter default value as /u02/oracle/archivelogs/primary_db
    Mon Dec 23 17:19:51 2013
    ARCH: Archival stopped, error occurred. Will continue retrying
    ORACLE Instance primary_db - Archival Error
    ORA-16014: log 4 sequence# 7 not archived, no available destinations
    ORA-00312: online log 4 thread 1: '/u02/oracle/fast_recovery_area/standby_db/onlinelog/o1_mf_4_9c3tk3dy_.log'
    Mon Dec 23 17:26:18 2013
    RFS[1]: Assigned to RFS process 31456
    RFS[1]: No connections allowed during/after terminal recovery.
    Mon Dec 23 17:26:47 2013
    flashback database to scn 15921680
    ORA-16157 signalled during: flashback database to scn 15921680...
    Mon Dec 23 17:27:05 2013
    alter database recover managed standby database using current logfile disconnect
    Attempt to start background Managed Standby Recovery process (primary_db)
    Mon Dec 23 17:27:05 2013
    MRP0 started with pid=28, OS id=31481
    MRP0: Background Managed Standby Recovery process started (primary_db)
    started logmerger process
    Mon Dec 23 17:27:10 2013
    Managed Standby Recovery starting Real Time Apply
    MRP0: Background Media Recovery terminated with error 16157
    Errors in file /u01/oracle/diag/rdbms/standby_db/primary_db/trace/primary_db_pr00_31486.trc:
    ORA-16157: media recovery not allowed following successful FINISH recovery
    Managed Standby Recovery not using Real Time Apply
    Completed: alter database recover managed standby database using current logfile disconnect
    Recovery Slave PR00 previously exited with exception 16157
    MRP0: Background Media Recovery process shutdown (primary_db)
    Mon Dec 23 17:27:18 2013
    RFS[2]: Assigned to RFS process 31492
    RFS[2]: No connections allowed during/after terminal recovery.
    Mon Dec 23 17:28:18 2013
    RFS[3]: Assigned to RFS process 31614
    RFS[3]: No connections allowed during/after terminal recovery.
    Do you have any advice?
    Thanks!
    Alex.

    Hello;
    What's not clear to me in your question at this point:
    What I'm NOT being able to perform:
    If I manually unplug the network cables from the primary site (all the network, not only the link between primary and standby node so, it's like a server unplug from the energy source).
    Same situation happens if I manually disconnect the server from the power.
    This is the alert logs I have:"
    Are you trying a failover to the Standby?
    Please advise.
    Is it possible your "valid_for clause" is set incorrectly?
    Would also review this:
    ORA-16014 and ORA-00312 Messages in Alert.log of Physical Standby
    Best Regards
    mseberg

  • Client failure failover/switchover standby configuration...

    I have created standby database, the standby database is synch with primary... After switchover the primary is now standby and standby is now primary, the clients are unable to connect to new primary database.
    TNSNAMES.ora file at client side...
    prim.world=
    (DESCRIPTION_LIST=
    (FAILOVER=true)
    (LOAD_BALANCE=no)
    (DESCRIPTION=
    (ADDRESS=
    (PROTOCOL=TCP)
    (HOST= test9)
    (PORT=1521)
    (CONNECT_DATA=
    (SERVER=dedicated)
    (SERVICE_NAME=primary)
    (DESCRIPTION=
    (ADDRESS=
    (PROTOCOL=TCP)
    (HOST=standby)
    (PORT=1521)
    (CONNECT_DATA=
    (SERVER=dedicated)
    (SERVICE_NAME=standby )
    SQL> conn iq/[email protected]
    ERROR:
    ORA-01033: ORACLE initialization or shutdown in progress
    SQL> conn sys/[email protected] as sysdba
    Connected.
    SQL> select open_mode from v$database;
    OPEN_MODE
    MOUNTED
    SQL> select database_role from v$database;
    DATABASE_ROLE
    PHYSICAL STANDBY
    SQL>
    It's Oracle 10.2.0.1.0 version....

    oracleRaj
    Handle: oracleRaj
    Status Level: Newbie
    Registered: Mar 26, 2010
    Total Posts: 370
    Total Questions:  64 (36 unresolved)
    Name Raj
    Location Karachi
    Occupation DBA

  • CUCM failover to subscriber failure!

    Hi everyone!
    I have a CUCM cluster of one publisher and one subscriber, active version 7.0 and inactive version 5.1.3
    The pubslisher failed due to a power failure, the Cisco DB wasn't  starting at all, and there was no DRS backup.  Anyway, I did an upgrade  from 5.1 to 7.0 again on the publisher while the telephony was  operational normally on the subscriber node.
    After the  upgrade, I uploaded the Publisher's and Subscriber's licenses, I added  back all the changes done to the CUCM between the databases of 5 and 7  (manually added by comparing to the subscriber's) and I replicated to  the subscriber when I was done and the replication state was good '2'. And I took an immediate DRS backup.
    However, the problem appeared when I restarted the  CUCM's publisher node and none of the phones registered with the  subscriber node. I thought it was a network problem or server going  slow. I turned the publisher off for around 20 mins and nothing changed.
    The  configuration of the Call Manager group is correct, the licenses are  correct, everything seem to be ok. When the phones are registered with  the publisher, I can see them registered from the subscriber's phone  page but when I stop the publisher, they turn to 'unknown'.
    Does  anyone have any clue why this is happening? Do I have to upgrade the  subscriber again from 5 to 7? I just had this clue in mind, it doesn't  make sense to me since replication is working fine between the servers.
    One  more thing, I use FreeSshd to do the DRS backup on an XP machine, it  wasn't connecting on my laptop with Windows 7. What are you guys using  on Windows 7? Tried a search result on Google but nothing worked.
    Thank you for reading and for tips!
    Regards,
    Mazen

    Hi Mazen,
    I think you are on the right track with your thought of re-building
    the Subscriber
    You would be hitting this CUCM 5.x restriction;
    Replacing the Publisher Node
    Complete the following tasks to replace the Cisco Unified CallManager publisher server. If you are replacing a single server that is not part of a cluster, follow this procedure to replace your server.
    Caution     If you are replacing a publisher node in a cluster, you must also reinstall all the subscriber nodes and dedicated TFTP servers in the cluster, after replacing the publisher node. For instructions on reinstalling these other nodes types, see the "Replacing a Subscriber or Dedicated TFTP Server Node" section
    Follow the references in the For More Information column to get more information about a step.
    Table 4     Replacing the Publisher Node Process Overview
    Description For More Information
    Step 1
    Perform the tasks in the "Server or Cluster Replacement Preparation Checklist" section.
    "Server or Cluster Replacement Preparation Checklist" section
    Step 2
    Gather the necessary information about the old publisher server.
    "Gathering System Configuration Information to Replace or Reinstall a Server" section
    Step 3
    Back up the publisher server to a remote SFTP server by using the Disaster Recovery System (DRS) and verify that you have a good backup.
    "Creating a Backup File" section
    Step 4
    Get the new license and verify it before system replacement.
    You only need a new license if you are replacing the publisher node.
    See the "Obtaining a License File" section.
    Step 5
    Shut down and turn off the old server.
    Step 6
    Connect the new server.
    Step 7
    Install the same Cisco Unified CallManager release on the new server that was installed on the old server, including any Engineering Special releases.
    Configure the server as the publisher server for the cluster.
    "Installing Cisco Unified CallManager on the New Publisher Server" section
    Step 8
    Upload the new license file to the publisher server.
    "Uploading a License File" section
    Step 9
    Restore backed up data to the publisher server by using DRS.
    "Restoring a Backup File" section
    Step 10
    Reboot the publisher server.
    Step 11
    Perform the post-replacement tasks in the "Post-Replacement Checklist" section.
    http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/install/5_1/clstr513.html#wp87717
    Cheers!
    Rob

  • SQL SERVER Failover Cluster switch failure because the passive node automatically reassign drive letter

    I switch the sql server resource group to the standby node , when the disk resource ready bring online in the passive node ,then occur exception. because the original dependency disk resource the drive letter is 'K:' , BUT when the disk bring online , it
    automatically reassign new drive letter 'H:' ,  So the sql server resource couldnot bring online . And After Manual modify the drive letter to 'K:' in the passive node , It Works !  So my question is why it not use the original drive letter
    and reassign a new one . what reasons would be cause it ? mount point ? Some log as follows:
    00001cbc.000004e0::2015/03/12-14:41:11.377 WARN  [RES] Physical Disk <FltLowestPrice_K>: OnlineThread: Failed to set volguid \??\Volume{e32c13d5-02e6-4924-a2d9-59a6fae1a1be}. Error: 183.
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk <FltLowestPrice_K>: Found 2 mount points for device \Device\Harddisk8\Partition2
    00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: PNP: Update volume exit, status 1168
    00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: PNP: Updating volume
    \\?\STORAGE#Volume#{1a8ddb8e-fe43-11e2-b7c5-6c3be5a5cdca}#0000000008100000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b}
    00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: PNP: Update volume exit, status 5023
    00001cbc.000004e0::2015/03/12-14:41:11.377 ERR   [RES] Physical Disk: Failed to get volname for drive H:\, status 2
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk <FltLowestPrice_K>: VolumeIsNtfs: Volume
    \\?\GLOBALROOT\Device\Harddisk8\Partition2\ has FS type NTFS
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: Volume
    \\?\GLOBALROOT\Device\Harddisk8\Partition2\ has FS type NTFS
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: MountPoint H:\ points to volume
    \\?\Volume{e32c13d5-02e6-4924-a2d9-59a6fae1a1be}\

    Sounds like you have an cluster hive that is out of date/bad, or some registry settings which are incorrect. You'll want to have this question transferred to the windows forum as that's really what you're asking about.
    -Sean
    The views, opinions, and posts do not reflect those of my company and are solely my own. No warranty, service, or results are expressed or implied.

  • Reporting Services as a generic service in a failover cluster group?

    There is some confusion on whether or not Microsoft will support a Reporting Services deployment on a failover cluster using scale-out, and adding the Reporting Services service as a generic service in a cluster group to achieve active-passive high
    availability.
    A deployment like this is described by Lukasz Pawlowski (Program Manager on the Reporting Services team) in this blog article
    http://blogs.msdn.com/b/lukaszp/archive/2009/10/28/high-availability-frequently-asked-questions-about-failover-clustering-and-reporting-services.aspx. There it is stated that it can be done, and what needs to be considered when doing such a deployment.
    This article (http://technet.microsoft.com/en-us/library/bb630402.aspx) on the other hand states: "Failover clustering is supported only for the report server database; you
    cannot run the Report Server service as part of a failover cluster."
    This is somewhat confusing to me. Can I expect to receive support from Microsoft for a setup like this?
    Best Regards,
    Peter Wretmo

    Hi Peter,
    Thanks for your posting.
    As Lukasz said in the
    blog, failover clustering with SSRS is possible. However, during the failover there is some time during which users will receive errors when accessing SSRS since the network names will resolve to a computer where the SSRS service is in the process of starting.
    Besides, there are several considerations and manual steps involved on your part before configuring the failover clustering with SSRS service:
    Impact on other applications that share the SQL Server. One common idea is to put SSRS in the same cluster group as SQL Server.  If SQL Server is hosting multiple application databases, other than just the SSRS databases, a failure in SSRS may cause
    a significant failover impact to the entire environment.
    SSRS fails over independently of SQL Server.
    If SSRS is running, it is going to do work on behalf of the overall deployment so it will be Active. To make SSRS Passive is to stop the SSRS service on all passive cluster nodes.
    So, SSRS is designed to achieve High Availability through the Scale-Out deployment. Though a failover clustered SSRS deployment is achievable, it is not the best option for achieving High Availability with Reporting Services.
    Regards,
    Mike Yin
    If you have any feedback on our support, please click
    here
    Mike Yin
    TechNet Community Support

  • Advice Requested - High Availability WITHOUT Failover Clustering

    We're creating an entirely new Hyper-V virtualized environment on Server 2012 R2.  My question is:  Can we accomplish high availability WITHOUT using failover clustering?
    So, I don't really have anything AGAINST failover clustering, and we will happily use it if it's the right solution for us, but to be honest, we really don't want ANYTHING to happen automatically when it comes to failover.  Here's what I mean:
    In this new environment, we have architected 2 identical, very capable Hyper-V physical hosts, each of which will run several VMs comprising the equivalent of a scaled-back version of our entire environment.  In other words, there is at least a domain
    controller, multiple web servers, and a (mirrored/HA/AlwaysOn) SQL Server 2012 VM running on each host, along with a few other miscellaneous one-off worker-bee VMs doing things like system monitoring.  The SQL Server VM on each host has about 75% of the
    physical memory resources dedicated to it (for performance reasons).  We need pretty much the full horsepower of both machines up and going at all times under normal conditions.
    So now, to high availability.  The standard approach is to use failover clustering, but I am concerned that if these hosts are clustered, we'll have the equivalent of just 50% hardware capacity going at all times, with full failover in place of course
    (we are using an iSCSI SAN for storage).
    BUT, if these hosts are NOT clustered, and one of them is suddenly switched off, experiences some kind of catastrophic failure, or simply needs to be rebooted while applying WSUS patches, the SQL Server HA will fail over (so all databases will remain up
    and going on the surviving VM), and the environment would continue functioning at somewhat reduced capacity until the failed host is restarted.  With this approach, it seems to me that we would be running at 100% for the most part, and running at 50%
    or so only in the event of a major failure, rather than running at 50% ALL the time.
    Of course, in the event of a catastrophic failure, I'm also thinking that the one-off worker-bee VMs could be replicated to the alternate host so they could be started on the surviving host if needed during a long-term outage.
    So basically, I am very interested in the thoughts of others with experience regarding taking this approach to Hyper-V architecture, as it seems as if failover clustering is almost a given when it comes to best practices and high availability.  I guess
    I'm looking for validation on my thinking.
    So what do you think?  What am I missing or forgetting?  What will we LOSE if we go with a NON-clustered high-availability environment as I've described it?
    Thanks in advance for your thoughts!

    Udo -
    Yes your responses are very helpful.
    Can we use the built-in Server 2012 iSCSI Target Server role to convert the local RAID disks into an iSCSI LUN that the VMs could access?  Or can that not run on the same physical box as the Hyper-V host?  I guess if the physical box goes down
    the LUN would go down anyway, huh?  Or can I cluster that role (iSCSI target) as well?  If not, do you have any other specific product suggestions I can research, or do I just end up wasting this 12TB of local disk storage?
    - Morgan
    That's a bad idea. First of all Microsoft iSCSI target is slow (it's non-cached @ server side). So if you really decided to use dedicated hardware for storage (maybe you do have a reason I don't know...) and if you're fine with your storage being a single
    point of failure (OK, maybe your RTOs and RPOs are fair enough) then at least use SMB share. SMB at least does cache I/O on both client and server sides and also you can use Storage Spaces as a back end of it (non-clustered) so read "write back flash cache
    for cheap". See:
    What's new in iSCSI target with Windows Server 2012 R2
    http://technet.microsoft.com/en-us/library/dn305893.aspx
    Improved optimization to allow disk-level caching
    Updated
    iSCSI Target Server now sets the disk cache bypass flag on a hosting disk I/O, through Force Unit Access (FUA), only when the issuing initiator explicitly requests it. This change can potentially improve performance.
    Previously, iSCSI Target Server would always set the disk cache bypass flag on all I/O’s. System cache bypass functionality remains unchanged in iSCSI Target Server; for instance, the file system cache on the target server is always bypassed.
    Yes you can cluster iSCSI target from Microsoft but a) it would be SLOW as there would be only active-passive I/O model (no real use from MPIO between multiple hosts) and b) that would require a shared storage for Windows Cluster. What for? Scenario was
    usable with a) there was no virtual FC so guest VM cluster could not use FC LUs and b) there was no shared VHDX so SAS could not be used for guest VM cluster as well. Now both are present so scenario is useless: just export your existing shared storage without
    any Microsoft iSCSI target and you'll be happy. For references see:
    MSFT iSCSI Target in HA mode
    http://technet.microsoft.com/en-us/library/gg232621(v=ws.10).aspx
    Cluster MSFT iSCSI Target with SAS back end
    http://techontip.wordpress.com/2011/05/03/microsoft-iscsi-target-cluster-building-walkthrough/
    Guest
    VM Cluster Storage Options
    http://technet.microsoft.com/en-us/library/dn440540.aspx
    Storage options
    The following tables lists the storage types that you can use to provide shared storage for a guest cluster.
    Storage Type
    Description
    Shared virtual hard disk
    New in Windows Server 2012 R2, you can configure multiple virtual machines to connect to and use a single virtual hard disk (.vhdx) file. Each virtual machine can access the virtual hard disk just like servers
    would connect to the same LUN in a storage area network (SAN). For more information, see Deploy a Guest Cluster Using a Shared Virtual Hard Disk.
    Virtual Fibre Channel
    Introduced in Windows Server 2012, virtual Fibre Channel enables you to connect virtual machines to LUNs on a Fibre Channel SAN. For more information, see Hyper-V
    Virtual Fibre Channel Overview.
    iSCSI
    The iSCSI initiator inside a virtual machine enables you to connect over the network to an iSCSI target. For more information, see iSCSI
    Target Block Storage Overviewand the blog post Introduction of iSCSI Target in Windows
    Server 2012.
    Storage requirements depend on the clustered roles that run on the cluster. Most clustered roles use clustered storage, where the storage is available on any cluster node that runs a clustered
    role. Examples of clustered storage include Physical Disk resources and Cluster Shared Volumes (CSV). Some roles do not require storage that is managed by the cluster. For example, you can configure Microsoft SQL Server to use availability groups that replicate
    the data between nodes. Other clustered roles may use Server Message Block (SMB) shares or Network File System (NFS) shares as data stores that any cluster node can access.
    Sure you can use third-party software to replicate 12TB of your storage between just a pair of nodes to create a fully fault-tolerant cluster. See (there's also a free offering):
    StarWind VSAN [Virtual SAN] for Hyper-V
    http://www.starwindsoftware.com/native-san-for-hyper-v-free-edition
    Product is similar to what VMware had just released for ESXi except it's selling for ~2 years so is mature :)
    There are other guys doing this say DataCore (more playing for Windows-based FC) and SteelEye (more about geo-cluster & replication). But you may want to give them a try.
    Hope this helped a bit :) 
    StarWind VSAN [Virtual SAN] clusters Hyper-V without SAS, Fibre Channel, SMB 3.0 or iSCSI, uses Ethernet to mirror internally mounted SATA disks between hosts.

  • Cisco ASA 5505 Failover issue..

    Hi,
     I am having two firewalls (cisco ASA 5505) which is configured as active/standby Mode.It was running smoothly for more than an year,but last week the secondary firewall got failed and It made my whole network down.then I just removed the connectivity of the secondary firewall and run only the primary one.when I login  by console i found out that the failover has been disabled .So again I connected  to the Network and enabled the firewall.After a couple of days same issue happen.This time I take down the Secondary firewall erased the Flash.Reloaded the IOS image.Configured the failover and connected to the primary for the replication of configs.It found out the Active Mate.Replicated the configs and got synced...But after sync the same thing happened,The whole network gone down .I juz done the same thing removed the secondary firewall.Network came up.I feel there is some thing with failover thing ,but couldnt fin out :( .And the firewalls are in Router Mode.

    Please find the logs...
    Secondary Firewall While Sync..
    cisco-asa(config)# sh failover 
    Failover On 
    Failover unit Secondary
    Failover LAN Interface: e0/7 Vlan3 (up)
    Unit Poll frequency 1 seconds, holdtime 15 seconds
    Interface Poll frequency 5 seconds, holdtime 25 seconds
    Interface Policy 1
    Monitored Interfaces 4 of 23 maximum
    Version: Ours 8.2(5), Mate 8.2(5)
    Last Failover at: 06:01:10 GMT Apr 29 2015
    This host: Secondary - Sync Config 
    Active time: 55 (sec)
    slot 0: ASA5505 hw/sw rev (1.0/8.2(5)) status (Up Sys)
     Interface outside (27.251.167.246): No Link (Waiting)
     Interface inside (10.11.0.20): No Link (Waiting)
     Interface mgmt (10.11.200.21): No Link (Waiting)
    slot 1: empty
    Other host: Primary - Active 
    Active time: 177303 (sec)
    slot 0: ASA5505 hw/sw rev (1.0/8.2(5)) status (Up Sys)
     Interface outside (27.251.167.247): Unknown (Waiting)
     Interface inside (10.11.0.21): Unknown (Waiting)
     Interface mgmt (10.11.200.22): Unknown (Waiting)
    slot 1: empty
    =======================================================================================
    Secondary Firewall Just after Sync ,Active (primary Firewall got rebootted)
    cisco-asa# sh failover 
    Failover On 
    Failover unit Secondary
    Failover LAN Interface: e0/7 Vlan3 (up)
    Unit Poll frequency 1 seconds, holdtime 15 seconds
    Interface Poll frequency 5 seconds, holdtime 25 seconds
    Interface Policy 1
    Monitored Interfaces 4 of 23 maximum
    Version: Ours 8.2(5), Mate Unknown
    Last Failover at: 06:06:12 GMT Apr 29 2015
    This host: Secondary - Active 
    Active time: 44 (sec)
    slot 0: ASA5505 hw/sw rev (1.0/8.2(5)) status (Up Sys)
     Interface outside (27.251.167.246): Normal (Waiting)
     Interface inside (10.11.0.20): No Link (Waiting)
     Interface mgmt (10.11.200.21): No Link (Waiting)
    slot 1: empty
    Other host: Primary - Not Detected 
    Active time: 0 (sec)
    slot 0: empty
     Interface outside (27.251.167.247): Unknown (Waiting)
     Interface inside (10.11.0.21): Unknown (Waiting)
     Interface mgmt (10.11.200.22): Unknown (Waiting)
    slot 1: empty
    ==========================================================================================
    After Active firewall got rebootted failover off,whole network gone down.
    cisco-asa# sh failover 
    Failover Off 
    Failover unit Secondary
    Failover LAN Interface: e0/7 Vlan3 (up)
    Unit Poll frequency 1 seconds, holdtime 15 seconds
    Interface Poll frequency 5 seconds, holdtime 25 seconds
    Interface Policy 1
    Monitored Interfaces 4 of 23 maximum
    ===========================================================================================
    Primary Firewall after rebootting
    cisco-asa# sh failover
    Failover On
    Failover unit Primary
    Failover LAN Interface: e0/7 Vlan3 (Failed - No Switchover)
    Unit Poll frequency 1 seconds, holdtime 15 seconds
    Interface Poll frequency 5 seconds, holdtime 25 seconds
    Interface Policy 1
    Monitored Interfaces 4 of 23 maximum
    Version: Ours 8.2(5), Mate Unknown
    Last Failover at: 06:17:29 GMT Apr 29 2015
            This host: Primary - Active
                    Active time: 24707 (sec)
                    slot 0: ASA5505 hw/sw rev (1.0/8.2(5)) status (Up Sys)
                      Interface outside (27.251.167.246): Normal (Waiting)
                      Interface inside (10.11.0.20): Normal (Waiting)
                      Interface mgmt (10.11.200.21): Normal (Waiting)
                    slot 1: empty
            Other host: Secondary - Failed
                    Active time: 0 (sec)
                    slot 0: empty
                      Interface outside (27.251.167.247): Unknown (Waiting)
                      Interface inside (10.11.0.21): Unknown (Waiting)
                      Interface mgmt (10.11.200.22): Unknown (Waiting)
                    slot 1: empty
    cisco-asa# sh failover history
    ==========================================================================
    From State                 To State                   Reason
    ==========================================================================
    06:16:43 GMT Apr 29 2015
    Not Detected               Negotiation                No Error
    06:17:29 GMT Apr 29 2015
    Negotiation                Just Active                No Active unit found
    06:17:29 GMT Apr 29 2015
    Just Active                Active Drain               No Active unit found
    06:17:29 GMT Apr 29 2015
    Active Drain               Active Applying Config     No Active unit found
    06:17:29 GMT Apr 29 2015
    Active Applying Config     Active Config Applied      No Active unit found
    06:17:29 GMT Apr 29 2015
    Active Config Applied      Active                     No Active unit found
    ==========================================================================
    cisco-asa#
    cisco-asa# sh failover state
                   State          Last Failure Reason      Date/Time
    This host  -   Primary
                   Active         None
    Other host -   Secondary
                   Failed         Comm Failure             06:17:43 GMT Apr 29 2015
    ====Configuration State===
    ====Communication State===
    ==================================================================================
    Secondary Firewall
    cisc-asa# sh failover h
    ==========================================================================
    From State                 To State                   Reason
    ==========================================================================
    06:16:32 GMT Apr 29 2015
    Not Detected               Negotiation                No Error
    06:17:05 GMT Apr 29 2015
    Negotiation                Disabled                   Set by the config command
    ==========================================================================
    cisco-asa# sh failover
    Failover Off
    Failover unit Secondary
    Failover LAN Interface: e0/7 Vlan3 (down)
    Unit Poll frequency 1 seconds, holdtime 15 seconds
    Interface Poll frequency 5 seconds, holdtime 25 seconds
    Interface Policy 1
    Monitored Interfaces 4 of 23 maximum
    ecs-pune-fw-01# sh failover h
    ==========================================================================
    From State                 To State                   Reason
    ==========================================================================
    06:16:32 GMT Apr 29 2015
    Not Detected               Negotiation                No Error
    06:17:05 GMT Apr 29 2015
    Negotiation                Disabled                   Set by the config command
    ==========================================================================
    cisco-asa# sh failover state
                   State          Last Failure Reason      Date/Time
    This host  -   Secondary
                   Disabled       None
    Other host -   Primary
                   Not Detected   None
    ====Configuration State===
    ====Communication State===
    Thanks...

Maybe you are looking for

  • Create document with Compound Attributes

    Hi, does anyone can solve my problem I have created a customer parser to parse my customised document say "XXX.PO". I followed the instruction in the developer guide (create customised parser). Everything is fine except the compound attributes, i don

  • Uninstall SAP BPC 7.0

    Hi all, I have problem uninstalling the SAP BPC 7.0 on Microsoft Platform. When I uninstall through Windows Add/Remove Component, the wizard stuck at the Removing Sample OLAP database? Any idea what am I missing out here? Is there a uninstall guide I

  • Error: Adobe's Windows build-in player cannot play

    I get the following error in an Adobe Reader PDF when trying to click on an audio icon to read. I've searched this forum and Google to no avail. Adobe Reader X on Windows XP SP3 x86. [[ Warning: JavaScript Window - Acrobat has encountered an error wh

  • SOP for SDLC in ESOA

    My existing development is mainly on .NET environment. The previous development works do not follow any good SDLC model. As a result, the developed application is always hard to maintain. Now, the turning point is in front of me. I would like to spen

  • Fresh install, transfer information seems endless at 3 min. support apps.

    Greetings Apple Discussions: I did a fresh install of Leopard on a 250 GB HD. The working system on my primary 250 GB HD is Tiger. I prepped by verifying and then repairing permissions on Tiger before the install. During the install I opted to "trans