Failover cluster fails validation after a single node restart

I had a lab environment setup that works great, passes validation, can do live migrations without issue but as soon as I restarted one of the nodes, the then still live node became the only node able to access the storage backend. What's weird is that the restarted
node can still access the CSV storage and run VMs off of it, but the validation report is unable to list the actual disks.
My Cluster consists of 2 nodes. I have an iSCSI backed shared storage server and I can see that both of my nodes
are connected to the iSCSI targets successfully, but the node I first restarted no longer lists any disks/volumes in disk management and the once available MPIO menus are disabled in the iSCSI control panel. I also tried to restart the second node after the
first node came back but although the first node was up and running and had VMs on it, restarting the second node brought the entire cluster down. I see event IDs 1177, 1573, and 1069 appear in the Cluster Events log. When the second node came back up, the
cluster came back with it, but not the storage. Both nodes seem to display similar behavior in that they cannot access the storage backend. Now the storage is inaccessible by both nodes. I was able to get both nodes connected to the storage backend by
going to the iscsicpl and disconnecting all current connections to the iSCSI backend and adding them back. Doing the test again after bringing the storage back up resulted in the same behavior and this time redoing the iSCSI connections is not helping.
I think the issue here is that the first node I restarted is unable to see any disks/volumes from the storage backend only after joining the cluster and doing a restart. Before joining the cluster I did reboots on both nodes and both were able to connect to
the iSCSI backend without issue. It wasn't until after joining the cluster that node 1 became unable to access the storage backend after reboots. The validation report fails with "No disks were found on which to perform cluster validation tests. To correct
this, review the following possible causes: ..." although none of the suggestions seem applicable and the validation report was successful right before the restart of the node.
Does anyone have suggestions on how to further troubleshoot or resolve this issue?
I am using Hyper-V Server 2012 R2 on both nodes and they are joined to the same domain.

Hi,
I don’t found the similar issue, please your storage compatible with server 2012R2, Update Network Card Drivers and firmware on both the Nodes, temporarily disable your AV
soft and firewall install the Recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters update.
The Recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters
http://support.microsoft.com/kb/2920151/en-us
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

Similar Messages

  • Failover cluster failed due to mysterious IP conflict ?

    I'm having a mysterious problem with my Failover cluster,
    Cluster name: PrintCluster01.domain.com
    Members: PrintServer01.domain.com andPrintServer02.domain.com
    in the Failover Cluster Management – Cluster Event I received the Critical error message 1135 and 1177:
    Log Name: System
    Source: Microsoft-Windows-FailoverClustering
    Date: 15/06/2011 9:07:49 PM
    Event ID: 1177
    Task Category: None
    Level: Critical
    Keywords:
    User: SYSTEM
    Computer: PrintServer01.domain.com
    Description:
    The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
    Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is
    connected such as hubs, switches, or bridges.
    Log Name: System
    Source: Microsoft-Windows-FailoverClustering
    Date: 15/06/2011 9:07:28 PM
    Event ID: 1135
    Task Category: None
    Level: Critical
    Keywords:
    User: SYSTEM
    Computer: PrintServer01.domain.com
    Description:
    Cluster node 'PrintServer02' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run
    the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node
    is connected such as hubs, switches, or bridges.
    After further investigation, I found some interesting error here, from the very first critical error message logged in the Event viewer on PrintServer02:
    Log Name: System
    Source: Tcpip
    Date: 15/06/2011 9:07:29 PM
    Event ID: 4199
    Task Category: None
    Level: Error
    Keywords: Classic
    User: N/A
    Computer: PrintServer02-VM.domain.com
    Description:
    The system detected an address conflict for IP address 192.168.127.142 with the system having network hardware address 00-50-56-AE-29-23. Network operations on this system may be disrupted as a result.
    192.168.127.142 --> secondary IP of PrintServer01
    how could that be possible it conflict by one of the PrintServer01 node ? the detailed is as below:
    **From PrintServer01**
    Ethernet adapter Local Area Connection* 8:
    Connection-specific DNS Suffix . :
    Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
    Physical Address. . . . . . . . . : 02-50-56-AE-29-23
    DHCP Enabled. . . . . . . . . . . : No
    Autoconfiguration Enabled . . . . : Yes
    IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
    Subnet Mask . . . . . . . . . . . : 255.255.0.0
    Default Gateway . . . . . . . . . :
    NetBIOS over Tcpip. . . . . . . . : Enabled
    I have double check in all of the cluster members that all IP addresses is now unique.
    however I'm sure that I the IP is static not by DHCP as from the IPCONFIG results below:
    From **PrintServer01** (the Active Node)
    Windows IP Configuration
    Host Name . . . . . . . . . . . . : PrintServer01
    Primary Dns Suffix . . . . . . . : domain.com
    Node Type . . . . . . . . . . . . : Hybrid
    IP Routing Enabled. . . . . . . . : No
    WINS Proxy Enabled. . . . . . . . : No
    DNS Suffix Search List. . . . . . : domain.com
    domain.com.au
    Ethernet adapter Local Area Connection* 8:
    Connection-specific DNS Suffix . :
    Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
    Physical Address. . . . . . . . . : 02-50-56-AE-29-23
    DHCP Enabled. . . . . . . . . . . : No
    Autoconfiguration Enabled . . . . : Yes
    IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
    Subnet Mask . . . . . . . . . . . : 255.255.0.0
    Default Gateway . . . . . . . . . :
    NetBIOS over Tcpip. . . . . . . . : Enabled
    Ethernet adapter Cluster Public Network:
    Connection-specific DNS Suffix . :
    Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
    Physical Address. . . . . . . . . : 00-50-56-AE-29-23
    DHCP Enabled. . . . . . . . . . . : No
    Autoconfiguration Enabled . . . . : Yes
    IPv4 Address. . . . . . . . . . . : 192.168.127.155(Preferred)
    Subnet Mask . . . . . . . . . . . : 255.255.255.0
    IPv4 Address. . . . . . . . . . . : 192.168.127.88(Preferred)
    Subnet Mask . . . . . . . . . . . : 255.255.255.0
    IPv4 Address. . . . . . . . . . . : 192.168.127.142(Preferred)
    Subnet Mask . . . . . . . . . . . : 255.255.255.0
    IPv4 Address. . . . . . . . . . . : 192.168.127.143(Preferred)
    Subnet Mask . . . . . . . . . . . : 255.255.255.0
    IPv4 Address. . . . . . . . . . . : 192.168.127.144(Preferred)
    Subnet Mask . . . . . . . . . . . : 255.255.255.0
    Default Gateway . . . . . . . . . : 192.168.127.254
    DNS Servers . . . . . . . . . . . : 192.168.127.10
    192.168.127.11
    Primary WINS Server . . . . . . . : 192.168.127.10
    Secondary WINS Server . . . . . . : 192.168.127.11
    NetBIOS over Tcpip. . . . . . . . : Enabled
    Ethernet adapter Cluster Private Network:
    Connection-specific DNS Suffix . :
    Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
    Physical Address. . . . . . . . . : 00-50-56-AE-43-EC
    DHCP Enabled. . . . . . . . . . . : No
    Autoconfiguration Enabled . . . . : Yes
    IPv4 Address. . . . . . . . . . . : 10.184.2.2(Preferred)
    Subnet Mask . . . . . . . . . . . : 255.255.255.0
    Default Gateway . . . . . . . . . :
    NetBIOS over Tcpip. . . . . . . . : Disabled
    From **PrintServer02**
    Windows IP Configuration
    Host Name . . . . . . . . . . . . : PrintServer02
    Primary Dns Suffix . . . . . . . : domain.com
    Node Type . . . . . . . . . . . . : Hybrid
    IP Routing Enabled. . . . . . . . : No
    WINS Proxy Enabled. . . . . . . . : No
    DNS Suffix Search List. . . . . . : domain.com
    domain.com.au
    Ethernet adapter Local Area Connection* 8:
    Connection-specific DNS Suffix . :
    Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
    Physical Address. . . . . . . . . : 02-50-56-AE-5F-E5
    DHCP Enabled. . . . . . . . . . . : No
    Autoconfiguration Enabled . . . . : Yes
    IPv4 Address. . . . . . . . . . . : 169.254.2.86(Preferred)
    Subnet Mask . . . . . . . . . . . : 255.255.0.0
    Default Gateway . . . . . . . . . :
    NetBIOS over Tcpip. . . . . . . . : Enabled
    Ethernet adapter Cluster Public Network:
    Connection-specific DNS Suffix . :
    Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
    Physical Address. . . . . . . . . : 00-50-56-AE-79-FA
    DHCP Enabled. . . . . . . . . . . : No
    Autoconfiguration Enabled . . . . : Yes
    IPv4 Address. . . . . . . . . . . : 192.168.127.172(Preferred)
    Subnet Mask . . . . . . . . . . . : 255.255.255.0
    IPv4 Address. . . . . . . . . . . : 192.168.127.119(Preferred)
    Subnet Mask . . . . . . . . . . . : 255.255.255.0
    Default Gateway . . . . . . . . . : 192.168.127.254
    DNS Servers . . . . . . . . . . . : 192.168.127.10
    192.168.127.11
    Primary WINS Server . . . . . . . : 192.168.127.11
    Secondary WINS Server . . . . . . : 192.168.127.10
    NetBIOS over Tcpip. . . . . . . . : Enabled
    Ethernet adapter Cluster Private Network:
    Connection-specific DNS Suffix . :
    Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
    Physical Address. . . . . . . . . : 00-50-56-AE-77-8D
    DHCP Enabled. . . . . . . . . . . : No
    Autoconfiguration Enabled . . . . : Yes
    IPv4 Address. . . . . . . . . . . : 10.184.2.3(Preferred)
    Subnet Mask . . . . . . . . . . . : 255.255.255.0
    Default Gateway . . . . . . . . . :
    NetBIOS over Tcpip. . . . . . . . : Disabled
    Any help would be greatly appreciated.
    Thanks,
    AWT
    /* Server Support Specialist */

    I
    am facing the same scenario as the original poster. This is on Server 2008 R2 SP1.
     WIndow event log entries follow the same pattern. The MAC address listed in connection with the duplicate IP belonged to the passive node.
    Interestingly, the Cluster.log begins to explode with activity a few milliseconds before the first Windows event is logged.
    2012/07/11-15:20:59.517 INFO  [CHANNEL fe80::8145:f2b9:898e:784e%37:~3343~] graceful close, status (of previous failure, may not indicate problem) ERROR_IO_PENDING(997)
    2012/07/11-15:20:59.517 WARN  [PULLER SQLTESTSQLB] ReadObject failed with GracefulClose(1226)' because of 'channel to remote endpoint fe80::8145:f2b9:898e:784e%37:~3343~
    is closed'
    2012/07/11-15:20:59.517 ERR   [NODE] Node 1: Connection to Node 2 is broken. Reason GracefulClose(1226)' because of 'channel to remote endpoint fe80::8145:f2b9:898e:784e%37:~3343~
    is closed'
    2012/07/11-15:20:59.517 WARN  [RGP] Node 1: only local suspects are missing (2). moving to the next stage (shortcut compensation time 05.000)
    2012/07/11-15:20:59.548 WARN  [NETFTAPI] Failed to query parameters for fe80::5efe:169.254.1.79 (status 80070490)
    2012/07/11-15:20:59.548 WARN  [NETFTAPI] Failed to query parameters for fe80::5efe:169.254.1.79 (status 80070490)
    2012/07/11-15:20:59.579 INFO  [CHANNEL 192.168.3.22:~3343~] graceful close, status (of previous failure, may not indicate problem) ERROR_SUCCESS(0)
    2012/07/11-15:20:59.579 WARN  cxl::ConnectWorker::operator (): GracefulClose(1226)' because of 'channel to remote endpoint 192.168.3.22:~3343~ is closed'
    2012/07/11-15:20:59.829 INFO  [GEM] Node 1: EnterRepairStage1: Gem agent for node 1
    2012/07/11-15:21:00.141 INFO  [GEM] Node 1: EnterRepairStage2: Gem agent for node 1
    2012/07/11-15:21:00.499 WARN  [RCM] Moving orphaned group Available Storage from downed node SQLTESTSQLB to node SQLTESTSQLA.
    2012/07/11-15:21:00.499 WARN  [RES] IP Address <Cluster IP Address>: WorkerThread: NetInterface ef150d1a-f4a1-4f4f-a5c7-6e7cb2bfacab changed to state 3.
    2012/07/11-15:21:00.499 WARN  [RCM] Moving orphaned group MSSTEST from downed node SQLTESTSQLB to node SQLTESTSQLA.
    2012/07/11-15:21:00.546 WARN  [RES] IP Address <SQL IP Address 1 (DEVSQL)>: Failed to delete IP interface 2003B882, status 87.
    2012/07/11-15:21:00.562 WARN  [RES] Physical Disk <Cluster Disk 2>: PR reserve failed, status 170
    2012/07/11-15:21:00.577 WARN  [RES] Physical Disk <Cluster Disk 1>: PR reserve failed, status 170
    2012/07/11-15:21:00.593 WARN  [RES] Physical Disk <Cluster Disk 3>: PR reserve failed, status 170
    2012/07/11-15:21:02.215 WARN  [NETFTAPI] Failed to query parameters for 192.168.3.32 (status 80070490)
    2012/07/11-15:21:02.215 WARN  [NETFTAPI] Failed to query parameters for 192.168.3.32 (status 80070490)
    2012/07/11-15:21:05.864 DBG   [NETFTAPI] received NsiParameterNotification  for fe80::5cd:8cc2:186:f5cb (IpDadStatePreferred )
    2012/07/11-15:21:06.565 ERR   [RES] Physical Disk <Cluster Disk 2>: Failed to preempt reservation, status 170
    2012/07/11-15:21:06.581 ERR   [RES] Physical Disk <Cluster Disk 2>: OnlineThread: Unable to arbitrate for the disk. Error: 170.
    2012/07/11-15:21:06.581 ERR   [RES] Physical Disk <Cluster Disk 2>: OnlineThread: Error 170 bringing resource online.
    2012/07/11-15:21:06.581 ERR   [RHS] Online for resource Cluster Disk 2 failed.
    2012/07/11-15:21:06.581 WARN  [RCM] HandleMonitorReply: ONLINERESOURCE for 'Cluster Disk 2', gen(0) result 5018.
    2012/07/11-15:21:06.581 ERR   [RCM] rcm::RcmResource::HandleFailure: (Cluster Disk 2)
    2012/07/11-15:21:06.581 WARN  [RES] Physical Disk <Cluster Disk 2>: Terminate: Failed to open device \Device\Harddisk5\Partition1, Error 2
    2012/07/11-15:21:06.581 ERR   [RES] Physical Disk <Cluster Disk 1>: Failed to preempt reservation, status 170
    2012/07/11-15:21:06.581 ERR   [RES] Physical Disk <Cluster Disk 1>: OnlineThread: Unable to arbitrate for the disk. Error: 170.
    2012/07/11-15:21:06.581 ERR   [RES] Physical Disk <Cluster Disk 1>: OnlineThread: Error 170 bringing resource online.
    Full cluster log here:
    https://skydrive.live.com/redir?resid=A694FDEBF02727CD!133&authkey=!ADQMxHShdeDvXVc

  • Cluster fails validation

    An error occurred while executing the test. There was an error getting information about the SAS controllers installed on the nodes. There was an error retrieving information
    about the SAS host bus adapters from node  Invalid class 
    Successfully put PR reserve on cluster disk 0 from node C while it should have failed
    Cluster Disk 0 does not support Persistent Reservations. Some storage devices require specific firmware versions or settings to function properly with failover clusters.
    Please contact your storage administrator or storage vendor to check the configuration of the storage to allow it to function properly with failover clusters.
    Cluster Disk 1 does not support Persistent Reservations. Some storage devices require specific firmware versions or settings to function properly with failover clusters.
    Please contact your storage administrator or storage vendor to check the configuration of the storage to allow it to function properly with failover clusters.

    Hi,
    It must your storage is not compatible with Windows Server Failover Clustering. In this circumstance that cluster most likely will work, but if you are try to run it in the
    product environment you may need to do a few things.
     All storage vendors and almost all current shipping models support Failover Clustering, but many require firmware updates or configuration settings. Therefore
    please connect your storage vendor to confirm there need any help.
    The second case if you are building a failover cluster on the VMware® virtualization environment please refer the VMware® article:
    Configuring Microsoft Cluster Service fails with the error: Validate SCSI-3 Persistent Reservation (1030632)
    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1030632
    More information:
    Windows 2008 Failover Cluster Validation Fails on ‘Validate SCSI-3 Persistent Reservation’
    http://blogs.technet.com/b/askcore/archive/2009/04/15/windows-2008-failover-cluster-validation-fails-on-validate-scsi-3-persistent-reservation.aspx
    Hope this helps.
    *** This response contains a reference to a third party World Wide Web site. Microsoft is providing this information as a convenience to you. Microsoft does not control
    these sites and has not tested any software or information found on these sites; therefore, Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there. There are inherent dangers in the
    use of any software found on the Internet, and Microsoft cautions you to make sure that you completely understand the risk before retrieving any software from the Internet. ***
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • SQL 2012 installation for Failover Cluster failed

    While installation of SQL 2012 on FOC validation fails on "Database Engine configuration" page with following error:
    The volume that contains SQL Server data directory g:\MSSQL11.MSSQLSERVER\MSSQL\DATA does not belong to the cluster group.
    Want to know how does SQL installation wizard queries volumes configured with Failover Cluster. does it:
    - Enumerate "Physical Disk" resources in FOC
    - does it enumerate all Storage Class resources in FOC for getting the volume list
    - or it depends on WMI (Win32_Volume) to get volumes ?
    The wizard correctly discovers volume g:\ in its FOC group on "Cluster Resource Group" and "Cluster Disk Selection" page. but gives the error on Database configuration page.
    Any help in this would be appreciated.
    Thanks in advance
    Rakesh
    Rakesh Agrawal

    Can you please check if there is any disk in the cluster which is not in online state? Please run below script following the steps.
    1. Save a script as "Disk.vbs" and use
    use CSCRIPT to run it.
    2. Syntax: CSCRIPT <Disk.vbs> <Windows Cluster Name>
    < Script>
    Option Explicit
    Public objArgs, objCluster
    Public Function Connect()
    ' Opens a global cluster object. Using Windows Script Host syntax,
    ' the cluster name or "" must be passed as the first argument.
    Set objArgs = WScript.Arguments
    if objArgs.Count=0  then
     wscript.Echo "Usage Cscript  <script file name>  <Windows Cluster Name> "
     WScript.Quit
    end IF
    Set objCluster = CreateObject("MSCluster.Cluster")
    objCluster.Open objArgs(0)
    End Function
    Public Function Disconnect()
    ' Dereferences global objects.  Used with Connect.
     Set objCluster = Nothing
     Set objArgs = Nothing
    End Function
    Connect
    Dim objEnum
    For Each objEnum in objCluster.Resources
     If objEnum.ClassInfo = 1 Then
      WScript.Echo ObjEnum.Name
      Dim objDisk
      Dim objPartition
      On Error Resume Next
       Set objDisk = objEnum.Disk
       If Err.Number <> 0 Then
        WScript.Echo "Unable to retrieve the disk: " & Err
       Else
        For Each objPartition in objDisk.Partitions
         WScript.Echo objPartition.DeviceName
        Next
       End If
     End If
    Next
    Disconnect
    </Script>

  • Hyper-V Failover Cluster Networking Configuration After Install

    Hello All,
                Is it possible to install hyper-v and failover or in other words create a hyper-v failover cluster and then configure the networking part of the solution later?  As I am coming into
    terms with the networking part of it, wanted to do it later after the install.  Is it possible?
    And from later configuration, I am trying to say, creation of NIC Team, Virtual NICs, VLAN tagging, etc.

    Hi,
    Failover cluster deployment requires network connectivity between cluster nodes. You can't create a cluster without properly configured TCP\IP on cluster nodes.
    http://OpsMgr.ru/

  • SQL SERVER Failover Cluster switch failure because the passive node automatically reassign drive letter

    I switch the sql server resource group to the standby node , when the disk resource ready bring online in the passive node ,then occur exception. because the original dependency disk resource the drive letter is 'K:' , BUT when the disk bring online , it
    automatically reassign new drive letter 'H:' ,  So the sql server resource couldnot bring online . And After Manual modify the drive letter to 'K:' in the passive node , It Works !  So my question is why it not use the original drive letter
    and reassign a new one . what reasons would be cause it ? mount point ? Some log as follows:
    00001cbc.000004e0::2015/03/12-14:41:11.377 WARN  [RES] Physical Disk <FltLowestPrice_K>: OnlineThread: Failed to set volguid \??\Volume{e32c13d5-02e6-4924-a2d9-59a6fae1a1be}. Error: 183.
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk <FltLowestPrice_K>: Found 2 mount points for device \Device\Harddisk8\Partition2
    00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: PNP: Update volume exit, status 1168
    00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: PNP: Updating volume
    \\?\STORAGE#Volume#{1a8ddb8e-fe43-11e2-b7c5-6c3be5a5cdca}#0000000008100000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b}
    00001cbc.00001cdc::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: PNP: Update volume exit, status 5023
    00001cbc.000004e0::2015/03/12-14:41:11.377 ERR   [RES] Physical Disk: Failed to get volname for drive H:\, status 2
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk <FltLowestPrice_K>: VolumeIsNtfs: Volume
    \\?\GLOBALROOT\Device\Harddisk8\Partition2\ has FS type NTFS
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: Volume
    \\?\GLOBALROOT\Device\Harddisk8\Partition2\ has FS type NTFS
    00001cbc.000004e0::2015/03/12-14:41:11.377 INFO  [RES] Physical Disk: MountPoint H:\ points to volume
    \\?\Volume{e32c13d5-02e6-4924-a2d9-59a6fae1a1be}\

    Sounds like you have an cluster hive that is out of date/bad, or some registry settings which are incorrect. You'll want to have this question transferred to the windows forum as that's really what you're asking about.
    -Sean
    The views, opinions, and posts do not reflect those of my company and are solely my own. No warranty, service, or results are expressed or implied.

  • Server 2012 Failover cluster. Make two VMs stay on the same node

    We have a unique situation where i need two machines to stay on the same node. Its a 4 node cluster with 30+ resources but i want to make sure two boxes are ALWAYS on the same now. If one migrates to another node the second needs to follow. Is there
    a way to do this? 

    How an this KB help to stay the two vm's on the same node.
    With all do respect @justinv how could this helped you to your problem , your question was "We have a unique situation where i need two machines to stay on the same node. Its a 4 node cluster with 30+ resources but i want to make sure two boxes are ALWAYS
    on the same now"
    and the KB that elden showed you is for : "Failover clusters that are running inside of virtual machines (sometimes referred to as “guest clusters”) may have problems with nodes joining the cluster."
    @justinv cloud you tell us more about this did I misunderstand your question ?
    Greetings, Robert Smit Follow me @clustermvp http://robertsmit.wordpress.com/ “Please click "Vote As Helpful" if it is helpful for you and Proposed As Answer” Please remember to click “Mark as Answer” on the post that helps you
    I explained in one of my replies that my underlying issue was this exactly what the KB fixed..... A guest cluster failing when moved to different nodes. Thats the only reason why i wanted them on the same node to begin with. While this post didn't solve me
    original question is solved what my real problem was....

  • 2 node failover cluster power down

    I have a 2node failover cluster. When I power down a node that has the SQL server instance and resources, all the resources and service failover to the other node.   When I see that all the resources and service report "online" I then power
    that node.  I am being told that this is improper because failover may not have completed.  Is that correct?
    Also, in our 2 node failover cluster is there a proper sequence to restarting the powered down nodes?

    Hi,
    The cluster group containing SQL Server can be configured for automatic failback to the primary node when it becomes available again. By default, this is set to off.
    To Configure:
    Right-click the group containing SQL Server in the cluster administrator, select 'properties' then 'failback' tab.
    To prevent an auto-failback, select 'Prevent Failback', to allow select 'Allow Failback' then one of the following options:
    Immediately: Not recommended as it can disrupt clients
    Failback between n and n1 hours: allows a controlled failback to a preferred node (if it's online) during a certain period.
    The related article:
    Windows Failover Clustering Overview
    http://blogs.technet.com/b/rob/archive/2008/05/07/failover-clustering.aspx
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Failover cluster not cleanly shutting down service

    I've got a two node 2008 R2 failover cluster.  I have a single service being managed by it that I configured just as a generic service.  The failover works perfectly when the service is stopped, or when one of the machines goes down, and the immediate
    failback I have configured works perfectly in both scenarios as well.
    However, there's an issue when I take the networking down on the preferred owner of the service.  As far as I can tell (this is the first time I've tried failover clustering, so I'm learning), when I take the networking down, the cluster service shuts
    down, and in turn shuts down the service I've told it to manage.  At this point, when the services aren't running, the service fails over to the secondary as intended.  The problem shows up when I turn the networking back on.  The service tries
    and fails to start on the primary (as many times as I've configured it to try), and then eventually gives up and goes back to the secondary.
    The reason for this, examining logs for the service, is that the required port is already in use.  I checked some more, and sure enough, when I take the networking offline the service gets shut down, but the executable is still running.  This is
    repeatable every time.  When I just stop the service, though, the executables go away.  So it's something to do specifically with how the managed service gets shut down *when it's shut down due to the cluster service stopping*.  For some reason
    it's not cleaning up that associated executable.
    Any ideas as to why this is happening and how to fix/work around it would be extremely welcome.  Thank you!

    Try to generate cluster log using closter log /g /copy:<path to a local folder>. You might need to bump up log verbosity using cluster /prop ClusterLogLevel=5 (you can check current level using cluster /prop).
    You also can look at the SCM diagnostic channel in the event viewer. Start eventvwr. Wait for the clock icon on the Application and Services Logs to go away. Once the clock icon is gone select this entry and in the menu check Show Analytic and Debug Logs.
    Now expand to the SCM provider located at
    Application and Services Logs\Microsoft\Service Control Manager Performance Diagnostic Provider\Diagnostic.
    or Microsoft-Windows-Services/Diagnostic
    Enable the log, run repro, disable the log. After that you should see events from the SCM showing you your service state transitions.
    The terminate parameters do not seems to be configurable. I can think of two ways fixing the issue
    - Writing your own cluster resource DLL where you can implement your own policies. THis would be a place to start http://blogs.msdn.com/b/clustering/archive/2010/08/24/10053405.aspx.
    - This option is assuming you cannot change the source code of the service to kill orphaned child processes on startup so you have to clenup using some other means. Create another service and make your service dependent on this new service. This new serice
    must be much faster in responding do the SCM commands. On start of this service you using PSAPI enumirate all processes running on the machine and kill the orphaned child processes. You probably should be able to acheve something similar using GenScript resource
    + VB script that does the cleanup.
    Regards, Vladimir Petter, Microsoft Corporation

  • Two VM's in one role - Failover cluster

    Hello,
    In my 2 node Hyper-V 2012R2 cluster I had 2 VM's, DC01 and APP01.
    Today, I only saw DC01 in the Roles list at the Failover Cluster Manager. After a while I found APP01 under the Resources for DC01. What is happening here, and how can I revert it?

    APP01 is down.
    You should have 1 VM per Clustered Role. 
    In Server 2012 R2, it's not supported to have more than 1 VM per Clustered Role.
    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable) _________________________________________________________________________________
    Powershell: Learn it before it's an emergency http://technet.microsoft.com/en-us/scriptcenter/powershell.aspx http://technet.microsoft.com/en-us/scriptcenter/dd793612.aspx

  • Can't remove Failover Cluster feature on Windows 2008 R2

    Hello
    When remove the Failover Cluster feature has following message:
    Cannot remove Failover Clusting
    This server is an active node in a failover cluster. Uninstalling the Failover CVlustering feature on thos node may impact the availabilty of clustered service and applications. It is recommended that you first evict the server from cluster membership. This
    can be done through the Failover Cluster Management snap-in by expanding the console tree under Nodes, selecting the node, clicking More Actions, and then clicking Evict.
    I'm sure there no cluster formed, so how can I remove it?
    Thanks !

    Hey I have the same problem,
    Somehow cluster got installed on one node on windows 2008 R2 but it was not showing anything in cluster fail over manager wizard and cluster service is
    not running
    when I am trying to remove the fail over cluster it says
    "This
    server is an active node in a failover cluster. Uninstalling the Failover CVlustering feature on those node may impact the availability of clustered service and applications. It is recommended that you first evict the server from cluster membership. This can
    be done through the Failover Cluster Management snap-in by expanding the console tree under Nodes, selecting the node, clicking More Actions, and then clicking Evict."
    But there
    is no cluster at all, I am not sure how remove it
    So let
    me know will that  power shell command "clear-clusternode" fixes my problem?
    and please
    let me know do I need to run it in normal Power shell command line or Power shell failover cluster manager command line?

  • Can I upgrade Server 2012R2 Standard to 2012R2 Datacenter while in a failover cluster?

    I would like to know if it is supported to upgrade my Server 2012R2 Standard to 2012R2 Datacenter while in a failover cluster. I have a 3 node cluster all running Server 2012R2 standard with only a few VM's running on them at this time.
    -Jim

    Should be no problem. There are no technical differences between Standard and Datacenter. The difference is in the licensing.
    I would evict a node, upgrade it, and add it back in.  Rinse and repeat.
    . : | : . : | : . tim

  • Failover cluster server - File Server role is clustered - Shadow copies do not seem to travel to other node when failing over

    Hi,
    New to 2012 and implementing a clustered environment for our File Services role.  Have got to a point where I have successfully configured the Shadow copy settings.
    Have a large (15tb) disk.  S:
    Have a VSS drive (volume shadow copy drive) V:
    Have successfully configured through Windows Explorer the Shadow copy settings.
    Created dependencies in Failcover Cluster Server console whereby S: depends on V:
    However, when I failover the resource and browse the Client Access Point share there are no entries under the "Previous Versions" tab. 
    When I visit the S: drive in windows explorer and open the Shadow copy dialogue box, there are entries showing the times and dates of the shadow copies ran when on the original node.  So the disk knows about the shadow copies that were ran on the
    original node but the "previous versions" tab has no entries to display.
    This is in a 2012 server (NOT R2 version).
    Can anyone explain what might be the reason?  Do I have an "issue" or is this by design?
    All help apprecieated!
    Kathy
    Kathleen Hayhurst Senior IT Support Analyst

    Hi,
    Please first check the requirements in following article:
    Using Shadow Copies of Shared Folders in a server cluster
    http://technet.microsoft.com/en-us/library/cc779378(v=ws.10).aspx
    Cluster-managed shadow copies can only be created in a single quorum device cluster on a disk with a Physical Disk resource. In a single node cluster or majority node set cluster without a shared cluster disk, shadow copies can only be created and managed
    locally.
    You cannot enable Shadow Copies of Shared Folders for the quorum resource, although you can enable Shadow Copies of Shared Folders for a File Share resource.
    The recurring scheduled task that generates volume shadow copies must run on the same node that currently owns the storage volume.
    The cluster resource that manages the scheduled task must be able to fail over with the Physical Disk resource that manages the storage volume.
    If you have any feedback on our support, please send to [email protected]

  • SQL Server Agent fails to connect to DB after enabling mirror on failover cluster

    Hello:
    We have multiple databases running in a Failover Cluster instance: SQL 2012SP1 on Server 2008 R2 failover cluster (NOT AlwaysOn). We are trying to add a high-performance mirror in a standalone instance for DR. My understanding is that should be a perfectly
    normal, supported configuration.
    The mirroring is working properly; however, the clustered SQL Server agent is unable to run jobs that run in the mirrored databases.
    We get the following in the job log: Unable to connect to SQL Server 'VIRTUALSERVERNAME\INSTANCE'.  The step failed.
    There is a partner message in the agent log: [165] ODBC Error: 0, Connecting to a mirrored SQL Server instance using the MultiSubnetFailover connection option is not supported. [SQLSTATE IMH01]
    The cluster is not a mulitsubnet cluster. All hosts are connected to the same subnets and there is no storage replication. I can not find any place where I can adjust the connect string options for SQL Agent.
    Any guidance or suggestions on how to resolve this would be appreciated.
    ~joe

    SQL Team - MSFT:
    Thank you for taking the time to research and provide a clear answer.
    This seems very much a workaround and very unsatisfactory.
    You are correct, there is an IP dependency with OR condition. Moving to an AND condition is not viable for us. The whole point is to provide network redundancy. With an AND condition, if EITHER network interface fails, the service will go offline or fail
    to come online without manual intervention. This is arguably worse for uptime than having a single interface available.
    We are in process of rewriting all our SQL jobs to start in tempdb before transitioning to the appropriate target database. If this works for all of our jobs, I will mark the above response as answer.
    Again, thank you for the answer.
    Regards,
    Joe M.

  • GI installation on a single-node cluster error.

    Hello, I am trying to install GI on a single-node cluster (Solaris 10 / Sparc) but the root.sh script fails with the following error (this is not a GI installation for a Standalone Server :
    root@selvac./dev/ASM/OCRVTD_DG # /app/oracle/grid/11.2/root.sh
    Running Oracle 11g root script...
    The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME= /app/oracle/grid/11.2
    Enter the full pathname of the local bin directory: [usr/local/bin]:
    Copying dbhome to /usr/local/bin ...
    Copying oraenv to /usr/local/bin ...
    Copying coraenv to /usr/local/bin ...
    Creating /var/opt/oracle/oratab file...
    Entries will be added to the /var/opt/oracle/oratab file as needed by
    Database Configuration Assistant when a database is created
    Finished running generic part of root script.
    Now product-specific root actions will be performed.
    Using configuration parameter file: /app/oracle/grid/11.2/crs/install/crsconfig_params
    Creating trace directory
    LOCAL ADD MODE
    Creating OCR keys for user 'root', privgrp 'root'..
    Operation successful.
    OLR initialization - successful
    root wallet
    root wallet cert
    root cert export
    peer wallet
    profile reader wallet
    pa wallet
    peer wallet keys
    pa wallet keys
    peer cert request
    pa cert request
    peer cert
    pa cert
    peer root cert TP
    profile reader root cert TP
    pa root cert TP
    peer pa cert TP
    pa peer cert TP
    profile reader pa cert TP
    profile reader peer cert TP
    peer user cert
    pa user cert
    Adding daemon to inittab
    ACFS-9200: Supported
    ACFS-9300: ADVM/ACFS distribution files found.
    ACFS-9312: Existing ADVM/ACFS installation detected.
    ACFS-9314: Removing previous ADVM/ACFS installation.
    ACFS-9315: Previous ADVM/ACFS components successfully removed.
    ACFS-9307: Installing requested ADVM/ACFS software.
    ACFS-9308: Loading installed ADVM/ACFS drivers.
    ACFS-9327: Verifying ADVM/ACFS devices.
    ACFS-9309: ADVM/ACFS installation correctness verified.
    CRS-2672: Attempting to start 'ora.mdnsd' on 'selvac'
    CRS-2676: Start of 'ora.mdnsd' on 'selvac' succeeded
    CRS-2672: Attempting to start 'ora.gpnpd' on 'selvac'
    CRS-2676: Start of 'ora.gpnpd' on 'selvac' succeeded
    CRS-2672: Attempting to start 'ora.cssdmonitor' on 'selvac'
    CRS-2672: Attempting to start 'ora.gipcd' on 'selvac'
    CRS-2676: Start of 'ora.cssdmonitor' on 'selvac' succeeded
    CRS-2676: Start of 'ora.gipcd' on 'selvac' succeeded
    CRS-2672: Attempting to start 'ora.cssd' on 'selvac'
    CRS-2672: Attempting to start 'ora.diskmon' on 'selvac'
    CRS-2676: Start of 'ora.diskmon' on 'selvac' succeeded
    CRS-2676: Start of 'ora.cssd' on 'selvac' succeeded
    ASM created and started successfully.
    Disk Group OCRVTD_DG created successfully.
    The ora.asm resource is not ONLINE
    Did not succssfully configure and start ASM at /app/oracle/grid/11.2/crs/install/crsconfig_lib.pm line 6465.
    /app/oracle/grid/11.2/perl/bin/perl -I/app/oracle/grid/11.2/perl/lib -I/app/oracle/grid/11.2/crs/install /app/oracle/grid/11.2/crs/install/rootcrs.pl execution failed
    I also found the "PRVF-5150: Path OCRL:DISK1 is not a valid path on all nodes" error but as I have read it is a bug I Ignored it. But...
    I think my ASM_DG OCR and voting is ok, accessible by grid user and 660. It seems ASM does not start or does not start in time.
    Any help is wellcome.
    Thanks in advance.

    Thanks a lot for the hint. I had already checked this doc. but I think it is not the problem. Actually de error ora.asm is not online is not correct. After failing root.sh, ora.asm is ONLINE:
    root@selvac./app/oracle/grid/11.2/bin # ./crsctl check resource ora.asm -init
    root@selvac./app/oracle/grid/11.2/bin # ./crsctl stat resource ora.asm -init
    NAME=ora.asm
    TYPE=ora.asm.type
    TARGET=ONLINE
    STATE=ONLINE on selvac
    The last part of the /app/oracle/grid/11.2/cfgtoollogs/crsconfig/rootcrs_selvac.log file reads :
    >
    ASM created and started successfully.
    Disk Group OCRVTD_DG created successfully.
    End Command output2011-04-14 13:24:16: Executing cmd: /app/oracle/grid/11.2/bin/crsctl check resource ora.asm -init
    2011-04-14 13:24:17: Executing cmd: /app/oracle/grid/11.2/bin/crsctl status resource ora.asm -init
    2011-04-14 13:24:17: Command output:
    NAME=ora.asm
    TYPE=ora.asm.type
    TARGET=ONLINE
    STATE=OFFLINE
    End Command output2011-04-14 13:24:17: Checking the status of ora.asm
    2011-04-14 13:24:22: Executing cmd: /app/oracle/grid/11.2/bin/crsctl status resource ora.asm -init
    2011-04-14 13:24:22: Command output:
    NAME=ora.asm
    TYPE=ora.asm.type
    TARGET=ONLINE
    STATE=OFFLINE
    End Command output2011-04-14 13:24:22: Checking the status of ora.asm
    2011-04-14 13:24:27: Executing cmd: /app/oracle/grid/11.2/bin/crsctl status resource ora.asm -init
    2011-04-14 13:24:28: Command output:
    NAME=ora.asm
    TYPE=ora.asm.type
    TARGET=ONLINE
    STATE=OFFLINE
    End Command output2011-04-14 13:24:28: Checking the status of ora.asm
    2011-04-14 13:24:33: Executing cmd: /app/oracle/grid/11.2/bin/crsctl status resource ora.asm -init
    2011-04-14 13:24:33: Command output:
    NAME=ora.asm
    TYPE=ora.asm.type
    TARGET=ONLINE
    STATE=OFFLINE
    End Command output2011-04-14 13:24:33: Checking the status of ora.asm
    2011-04-14 13:24:38: Executing cmd: /app/oracle/grid/11.2/bin/crsctl status resource ora.asm -init
    2011-04-14 13:24:38: Command output:
    NAME=ora.asm
    TYPE=ora.asm.type
    TARGET=ONLINE
    STATE=OFFLINE
    End Command output2011-04-14 13:24:38: Checking the status of ora.asm
    2011-04-14 13:24:43: Executing cmd: /app/oracle/grid/11.2/bin/crsctl status resource ora.asm -init
    2011-04-14 13:24:43: Command output:
    NAME=ora.asm
    TYPE=ora.asm.type
    TARGET=ONLINE
    STATE=OFFLINE
    End Command output2011-04-14 13:24:43: Checking the status of ora.asm
    2011-04-14 13:24:48: Executing cmd: /app/oracle/grid/11.2/bin/crsctl status resource ora.asm -init
    2011-04-14 13:24:49: Command output:
    NAME=ora.asm
    TYPE=ora.asm.type
    TARGET=ONLINE
    STATE=OFFLINE
    End Command output2011-04-14 13:24:49: Checking the status of ora.asm
    2011-04-14 13:24:54: Executing cmd: /app/oracle/grid/11.2/bin/crsctl status resource ora.asm -init
    2011-04-14 13:24:54: Command output:
    NAME=ora.asm
    TYPE=ora.asm.type
    TARGET=ONLINE
    STATE=OFFLINE
    End Command output2011-04-14 13:24:54: Checking the status of ora.asm
    2011-04-14 13:24:59: Executing cmd: /app/oracle/grid/11.2/bin/crsctl status resource ora.asm -init
    2011-04-14 13:24:59: Command output:
    NAME=ora.asm
    TYPE=ora.asm.type
    TARGET=ONLINE
    STATE=OFFLINE
    End Command output2011-04-14 13:24:59: Checking the status of ora.asm
    2011-04-14 13:25:04: Executing cmd: /app/oracle/grid/11.2/bin/crsctl status resource ora.asm -init
    2011-04-14 13:25:04: Command output:
    NAME=ora.asm
    TYPE=ora.asm.type
    TARGET=ONLINE
    STATE=OFFLINE
    End Command output2011-04-14 13:25:04: Checking the status of ora.asm
    2011-04-14 13:25:09: The ora.asm resource is not ONLINE
    2011-04-14 13:25:09: Running as user grid: /app/oracle/grid/11.2/bin/cluutil -ckpt -oraclebase /app/grid -writeckpt -name ROOTCRS_BOOTCFG -state FAIL
    2011-04-14 13:25:09: s_run_as_user2: Running /bin/su grid -c ' /app/oracle/grid/11.2/bin/cluutil -ckpt -oraclebase /app/grid -writeckpt -name ROOTCRS_BOOTCFG -state FAIL '
    2011-04-14 13:25:10: Removing file /var/tmp/mbahSaGPn
    2011-04-14 13:25:10: Successfully removed file: /var/tmp/mbahSaGPn
    2011-04-14 13:25:10: /bin/su successfully executed
    2011-04-14 13:25:10: Succeeded in writing the checkpoint:'ROOTCRS_BOOTCFG' with status:FAIL
    2011-04-14 13:25:10: ###### Begin DIE Stack Trace ######
    2011-04-14 13:25:10: Package File Line Calling
    2011-04-14 13:25:10: --------------- -------------------- ---- ----------
    2011-04-14 13:25:10: 1: main rootcrs.pl 322 crsconfig_lib::dietrap
    2011-04-14 13:25:10: 2: crsconfig_lib crsconfig_lib.pm 6465 main::__ANON__
    2011-04-14 13:25:10: 3: crsconfig_lib crsconfig_lib.pm 6390 crsconfig_lib::perform_initial_config
    2011-04-14 13:25:10: 4: main rootcrs.pl 671 crsconfig_lib::perform_init_config
    2011-04-14 13:25:10: ####### End DIE Stack Trace #######
    2011-04-14 13:25:10: 'ROOTCRS_BOOTCFG' checkpoint has failed
    So this must be a bug. During root.sh execution ora.asm is OFFLINE but after failing it is ONLINE. It maight be a question of waiting/repeating or timeout as I see the "Checking the status of ora.asm" command is repeated several times during root.sh, but not enough perhaps. Now root.sh is failed, installation halted but ASM is ONLINE.
    Any other Idea?
    Thanks again.

Maybe you are looking for

  • Windows 8 installation freezes on reboot

    I downloaded bootcamp today and went through it until I got to the part where I partitioned half of my memory to Windows. I then inserted a new Windows 8 disc and it downloaded the partition to it then rebooted the computer. Once it rebooted it began

  • Photoshop Elements 12 Default Issue

    I have installed PE 12, but I'm unable to open my photos using Photoshop Elements 12 as default. (I must first open PE 12, choose Open: then navigate to the photo.) I have several other versions of PE installed, including PE 11. Right now when I doub

  • Software Components for XI in SLD.

    I'm starting with my first project in XI, and I have the following doubts. The SAP ERP systems are loaded as Technical Systems into the SLD with their products and software components, which are contents of Content Repository of the SLD. If, for exam

  • WD4A : Displaying images in ALV table

    Hi, Does anyone know how to display images in ALV tables in Webdynpro?

  • XL Reporter - Module Names

    Hi Everyone, I'm using SAP B1 2007B. When I open XL Reporter for any company, some code numbers are displayed in the place of module names for all modules except financial. For eg this is the menu I see below financial in the tree view: B1_08_42553.