Failover cluster failed due to mysterious IP conflict ?

I'm having a mysterious problem with my Failover cluster,
Cluster name: PrintCluster01.domain.com
Members: PrintServer01.domain.com andPrintServer02.domain.com
in the Failover Cluster Management – Cluster Event I received the Critical error message 1135 and 1177:
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 15/06/2011 9:07:49 PM
Event ID: 1177
Task Category: None
Level: Critical
Keywords:
User: SYSTEM
Computer: PrintServer01.domain.com
Description:
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is
connected such as hubs, switches, or bridges.
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 15/06/2011 9:07:28 PM
Event ID: 1135
Task Category: None
Level: Critical
Keywords:
User: SYSTEM
Computer: PrintServer01.domain.com
Description:
Cluster node 'PrintServer02' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run
the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node
is connected such as hubs, switches, or bridges.
After further investigation, I found some interesting error here, from the very first critical error message logged in the Event viewer on PrintServer02:
Log Name: System
Source: Tcpip
Date: 15/06/2011 9:07:29 PM
Event ID: 4199
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: PrintServer02-VM.domain.com
Description:
The system detected an address conflict for IP address 192.168.127.142 with the system having network hardware address 00-50-56-AE-29-23. Network operations on this system may be disrupted as a result.
192.168.127.142 --> secondary IP of PrintServer01
how could that be possible it conflict by one of the PrintServer01 node ? the detailed is as below:
**From PrintServer01**
Ethernet adapter Local Area Connection* 8:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
Physical Address. . . . . . . . . : 02-50-56-AE-29-23
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.0.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Enabled
I have double check in all of the cluster members that all IP addresses is now unique.
however I'm sure that I the IP is static not by DHCP as from the IPCONFIG results below:
From **PrintServer01** (the Active Node)
Windows IP Configuration
Host Name . . . . . . . . . . . . : PrintServer01
Primary Dns Suffix . . . . . . . : domain.com
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : domain.com
domain.com.au
Ethernet adapter Local Area Connection* 8:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
Physical Address. . . . . . . . . : 02-50-56-AE-29-23
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.0.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter Cluster Public Network:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
Physical Address. . . . . . . . . : 00-50-56-AE-29-23
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 192.168.127.155(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.88(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.142(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.143(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.144(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.127.254
DNS Servers . . . . . . . . . . . : 192.168.127.10
192.168.127.11
Primary WINS Server . . . . . . . : 192.168.127.10
Secondary WINS Server . . . . . . : 192.168.127.11
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter Cluster Private Network:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
Physical Address. . . . . . . . . : 00-50-56-AE-43-EC
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 10.184.2.2(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Disabled
From **PrintServer02**
Windows IP Configuration
Host Name . . . . . . . . . . . . : PrintServer02
Primary Dns Suffix . . . . . . . : domain.com
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : domain.com
domain.com.au
Ethernet adapter Local Area Connection* 8:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
Physical Address. . . . . . . . . : 02-50-56-AE-5F-E5
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 169.254.2.86(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.0.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter Cluster Public Network:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
Physical Address. . . . . . . . . : 00-50-56-AE-79-FA
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 192.168.127.172(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.119(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.127.254
DNS Servers . . . . . . . . . . . : 192.168.127.10
192.168.127.11
Primary WINS Server . . . . . . . : 192.168.127.11
Secondary WINS Server . . . . . . : 192.168.127.10
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter Cluster Private Network:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
Physical Address. . . . . . . . . : 00-50-56-AE-77-8D
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 10.184.2.3(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Disabled
Any help would be greatly appreciated.
Thanks,
AWT
/* Server Support Specialist */

I
am facing the same scenario as the original poster. This is on Server 2008 R2 SP1.
 WIndow event log entries follow the same pattern. The MAC address listed in connection with the duplicate IP belonged to the passive node.
Interestingly, the Cluster.log begins to explode with activity a few milliseconds before the first Windows event is logged.
2012/07/11-15:20:59.517 INFO  [CHANNEL fe80::8145:f2b9:898e:784e%37:~3343~] graceful close, status (of previous failure, may not indicate problem) ERROR_IO_PENDING(997)
2012/07/11-15:20:59.517 WARN  [PULLER SQLTESTSQLB] ReadObject failed with GracefulClose(1226)' because of 'channel to remote endpoint fe80::8145:f2b9:898e:784e%37:~3343~
is closed'
2012/07/11-15:20:59.517 ERR   [NODE] Node 1: Connection to Node 2 is broken. Reason GracefulClose(1226)' because of 'channel to remote endpoint fe80::8145:f2b9:898e:784e%37:~3343~
is closed'
2012/07/11-15:20:59.517 WARN  [RGP] Node 1: only local suspects are missing (2). moving to the next stage (shortcut compensation time 05.000)
2012/07/11-15:20:59.548 WARN  [NETFTAPI] Failed to query parameters for fe80::5efe:169.254.1.79 (status 80070490)
2012/07/11-15:20:59.548 WARN  [NETFTAPI] Failed to query parameters for fe80::5efe:169.254.1.79 (status 80070490)
2012/07/11-15:20:59.579 INFO  [CHANNEL 192.168.3.22:~3343~] graceful close, status (of previous failure, may not indicate problem) ERROR_SUCCESS(0)
2012/07/11-15:20:59.579 WARN  cxl::ConnectWorker::operator (): GracefulClose(1226)' because of 'channel to remote endpoint 192.168.3.22:~3343~ is closed'
2012/07/11-15:20:59.829 INFO  [GEM] Node 1: EnterRepairStage1: Gem agent for node 1
2012/07/11-15:21:00.141 INFO  [GEM] Node 1: EnterRepairStage2: Gem agent for node 1
2012/07/11-15:21:00.499 WARN  [RCM] Moving orphaned group Available Storage from downed node SQLTESTSQLB to node SQLTESTSQLA.
2012/07/11-15:21:00.499 WARN  [RES] IP Address <Cluster IP Address>: WorkerThread: NetInterface ef150d1a-f4a1-4f4f-a5c7-6e7cb2bfacab changed to state 3.
2012/07/11-15:21:00.499 WARN  [RCM] Moving orphaned group MSSTEST from downed node SQLTESTSQLB to node SQLTESTSQLA.
2012/07/11-15:21:00.546 WARN  [RES] IP Address <SQL IP Address 1 (DEVSQL)>: Failed to delete IP interface 2003B882, status 87.
2012/07/11-15:21:00.562 WARN  [RES] Physical Disk <Cluster Disk 2>: PR reserve failed, status 170
2012/07/11-15:21:00.577 WARN  [RES] Physical Disk <Cluster Disk 1>: PR reserve failed, status 170
2012/07/11-15:21:00.593 WARN  [RES] Physical Disk <Cluster Disk 3>: PR reserve failed, status 170
2012/07/11-15:21:02.215 WARN  [NETFTAPI] Failed to query parameters for 192.168.3.32 (status 80070490)
2012/07/11-15:21:02.215 WARN  [NETFTAPI] Failed to query parameters for 192.168.3.32 (status 80070490)
2012/07/11-15:21:05.864 DBG   [NETFTAPI] received NsiParameterNotification  for fe80::5cd:8cc2:186:f5cb (IpDadStatePreferred )
2012/07/11-15:21:06.565 ERR   [RES] Physical Disk <Cluster Disk 2>: Failed to preempt reservation, status 170
2012/07/11-15:21:06.581 ERR   [RES] Physical Disk <Cluster Disk 2>: OnlineThread: Unable to arbitrate for the disk. Error: 170.
2012/07/11-15:21:06.581 ERR   [RES] Physical Disk <Cluster Disk 2>: OnlineThread: Error 170 bringing resource online.
2012/07/11-15:21:06.581 ERR   [RHS] Online for resource Cluster Disk 2 failed.
2012/07/11-15:21:06.581 WARN  [RCM] HandleMonitorReply: ONLINERESOURCE for 'Cluster Disk 2', gen(0) result 5018.
2012/07/11-15:21:06.581 ERR   [RCM] rcm::RcmResource::HandleFailure: (Cluster Disk 2)
2012/07/11-15:21:06.581 WARN  [RES] Physical Disk <Cluster Disk 2>: Terminate: Failed to open device \Device\Harddisk5\Partition1, Error 2
2012/07/11-15:21:06.581 ERR   [RES] Physical Disk <Cluster Disk 1>: Failed to preempt reservation, status 170
2012/07/11-15:21:06.581 ERR   [RES] Physical Disk <Cluster Disk 1>: OnlineThread: Unable to arbitrate for the disk. Error: 170.
2012/07/11-15:21:06.581 ERR   [RES] Physical Disk <Cluster Disk 1>: OnlineThread: Error 170 bringing resource online.
Full cluster log here:
https://skydrive.live.com/redir?resid=A694FDEBF02727CD!133&authkey=!ADQMxHShdeDvXVc

Similar Messages

  • SQL 2012 installation for Failover Cluster failed

    While installation of SQL 2012 on FOC validation fails on "Database Engine configuration" page with following error:
    The volume that contains SQL Server data directory g:\MSSQL11.MSSQLSERVER\MSSQL\DATA does not belong to the cluster group.
    Want to know how does SQL installation wizard queries volumes configured with Failover Cluster. does it:
    - Enumerate "Physical Disk" resources in FOC
    - does it enumerate all Storage Class resources in FOC for getting the volume list
    - or it depends on WMI (Win32_Volume) to get volumes ?
    The wizard correctly discovers volume g:\ in its FOC group on "Cluster Resource Group" and "Cluster Disk Selection" page. but gives the error on Database configuration page.
    Any help in this would be appreciated.
    Thanks in advance
    Rakesh
    Rakesh Agrawal

    Can you please check if there is any disk in the cluster which is not in online state? Please run below script following the steps.
    1. Save a script as "Disk.vbs" and use
    use CSCRIPT to run it.
    2. Syntax: CSCRIPT <Disk.vbs> <Windows Cluster Name>
    < Script>
    Option Explicit
    Public objArgs, objCluster
    Public Function Connect()
    ' Opens a global cluster object. Using Windows Script Host syntax,
    ' the cluster name or "" must be passed as the first argument.
    Set objArgs = WScript.Arguments
    if objArgs.Count=0  then
     wscript.Echo "Usage Cscript  <script file name>  <Windows Cluster Name> "
     WScript.Quit
    end IF
    Set objCluster = CreateObject("MSCluster.Cluster")
    objCluster.Open objArgs(0)
    End Function
    Public Function Disconnect()
    ' Dereferences global objects.  Used with Connect.
     Set objCluster = Nothing
     Set objArgs = Nothing
    End Function
    Connect
    Dim objEnum
    For Each objEnum in objCluster.Resources
     If objEnum.ClassInfo = 1 Then
      WScript.Echo ObjEnum.Name
      Dim objDisk
      Dim objPartition
      On Error Resume Next
       Set objDisk = objEnum.Disk
       If Err.Number <> 0 Then
        WScript.Echo "Unable to retrieve the disk: " & Err
       Else
        For Each objPartition in objDisk.Partitions
         WScript.Echo objPartition.DeviceName
        Next
       End If
     End If
    Next
    Disconnect
    </Script>

  • Failover cluster fails validation after a single node restart

    I had a lab environment setup that works great, passes validation, can do live migrations without issue but as soon as I restarted one of the nodes, the then still live node became the only node able to access the storage backend. What's weird is that the restarted
    node can still access the CSV storage and run VMs off of it, but the validation report is unable to list the actual disks.
    My Cluster consists of 2 nodes. I have an iSCSI backed shared storage server and I can see that both of my nodes
    are connected to the iSCSI targets successfully, but the node I first restarted no longer lists any disks/volumes in disk management and the once available MPIO menus are disabled in the iSCSI control panel. I also tried to restart the second node after the
    first node came back but although the first node was up and running and had VMs on it, restarting the second node brought the entire cluster down. I see event IDs 1177, 1573, and 1069 appear in the Cluster Events log. When the second node came back up, the
    cluster came back with it, but not the storage. Both nodes seem to display similar behavior in that they cannot access the storage backend. Now the storage is inaccessible by both nodes. I was able to get both nodes connected to the storage backend by
    going to the iscsicpl and disconnecting all current connections to the iSCSI backend and adding them back. Doing the test again after bringing the storage back up resulted in the same behavior and this time redoing the iSCSI connections is not helping.
    I think the issue here is that the first node I restarted is unable to see any disks/volumes from the storage backend only after joining the cluster and doing a restart. Before joining the cluster I did reboots on both nodes and both were able to connect to
    the iSCSI backend without issue. It wasn't until after joining the cluster that node 1 became unable to access the storage backend after reboots. The validation report fails with "No disks were found on which to perform cluster validation tests. To correct
    this, review the following possible causes: ..." although none of the suggestions seem applicable and the validation report was successful right before the restart of the node.
    Does anyone have suggestions on how to further troubleshoot or resolve this issue?
    I am using Hyper-V Server 2012 R2 on both nodes and they are joined to the same domain.

    Hi,
    I don’t found the similar issue, please your storage compatible with server 2012R2, Update Network Card Drivers and firmware on both the Nodes, temporarily disable your AV
    soft and firewall install the Recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters update.
    The Recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters
    http://support.microsoft.com/kb/2920151/en-us
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Network DR test causes Exchange DAG network to fail (Failover Cluster Manager reports comms errors)

    We have a DAG configured between 2 mailbox servers, one in each of our main data centres. Our comms team recently performed a DR test between our 2 data centres, switiching from the main production link to the backup link. During this outage the Failover
    Cluster Manager reported errors, with each mailbox server reporting the other as uncontactable. The Events that were logged include the following:
    Isatap interface isatap.{02ADE20A-D5D4-437F-AD00-E6601F7E7A9D} is no longer active. (EventID 4201)
    Cluster node 'MAILBOX_SERVER' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the
    Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is
    connected such as hubs, switches, or bridges. (EventID 1135)
    File share witness resource 'File Share Witness (\\WITNESS_SERVER\SHARE_NAME)' failed to arbitrate for the file share '\\WITNESS_SERVER\SHARE_NAME'. Please ensure that file share '\\WITNESS_SERVER\SHARE_NAME' exists and is accessible by the cluster. (EventID
    1564)
    Cluster resource 'File Share Witness (\\\WITNESS_SERVER\SHARE_NAME)' in clustered service or application 'Cluster Group' failed. (EventID 1069)
    The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk. Run the Validate a Configuration wizard to check your network
    configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. (EventID 1177)
    The Cluster Service service terminated with service-specific error A quorum of cluster nodes was not present to form a cluster. (EventID 7024)
    The Microsoft Exchange Information Store service terminated unexpectedly.  It has done this 1 time(s).  The following corrective action will be taken in 5000 milliseconds: Restart the service. (EventID 7031)
    Looking at the Cluster Events in the Failover Cluster Manager Snap-In i see a heap of Event ID 47 (cannot activate the DAG databases as the server is not up according to Windows Failover Cluster Service) and:
    Node status could not be recorded. This could prevent some network failure logic from functioning correctly. NodeStatus:IsHealthy=True,HasADAccess=True,ClusterErrorOverrideFalse,LastUpdate=5/2/2011 8:25:42 AMUTC Failure:An Active Manager operation failed.
    Error: An error occurred while attempting a cluster operation. Error: Cluster API '"ClusterRegSetValue() failed with 0x6be. Error: The remote procedure call failed"' failed.. (EventID 184)
    Forcefully dismounting all the locally mounted databases on server 'BACKUP_MAILBOX_SERVER. (EventID 307).
    Our Comms team doesn't believe it is a comms issue as they did not log any network communication errors between the servers in the two sites (using icmp). So if it is not a comms issue, how can I configure the Failover Cluster Manager to be resilient to
    this type of network failover event.
    Thanks
    Dan

    Isn't it also true that in a stretched DAG with even numbered nodes, the PAM needs to be in the same site as the active DAG node?  If the connection between both nodes goes down, and the PAM is in the "passive" site, the primary node will
    dismount the databases since it can't check with the PAM to make sure its safe for it to be up.  
    In a even-numbered node stretched DAG, the PAM changes to the DR/passive site everytime a failover occurs, but doesn't automatically switch back when you reactivate the primary node.

  • The Cluster Service function call 'ClusterResourceControl' failed with error code '1008(An attempt was made to reference a token that does not exist.)' while verifying the file path. Verify that your failover cluster is configured properly.

    I am experiencing this error with one of our cluster environment. Can anyone help me in this issue.
    The Cluster Service function call 'ClusterResourceControl' failed with error code '1008(An attempt was made to reference a token that does not exist.)' while verifying the file path. Verify that your failover cluster is configured properly.
    Thanks,
    Venu S.
    Venugopal S ----------------------------------------------------------- Please click the Mark as Answer button if a post solves your problem!

    Hi Venu S,
    Based on my research, you might encounter a known issue, please try the hotfix in this KB:
    http://support.microsoft.com/kb/928385
    Meanwhile since there is less information about this issue, before further investigation, please provide us the following information:
    The version of Windows Server you are using
    The result of SELECT @@VERSION
    The scenario when you get this error
    If anything is unclear, please let me know.
    Regards,
    Tom Li

  • SQL Server Agent fails to connect to DB after enabling mirror on failover cluster

    Hello:
    We have multiple databases running in a Failover Cluster instance: SQL 2012SP1 on Server 2008 R2 failover cluster (NOT AlwaysOn). We are trying to add a high-performance mirror in a standalone instance for DR. My understanding is that should be a perfectly
    normal, supported configuration.
    The mirroring is working properly; however, the clustered SQL Server agent is unable to run jobs that run in the mirrored databases.
    We get the following in the job log: Unable to connect to SQL Server 'VIRTUALSERVERNAME\INSTANCE'.  The step failed.
    There is a partner message in the agent log: [165] ODBC Error: 0, Connecting to a mirrored SQL Server instance using the MultiSubnetFailover connection option is not supported. [SQLSTATE IMH01]
    The cluster is not a mulitsubnet cluster. All hosts are connected to the same subnets and there is no storage replication. I can not find any place where I can adjust the connect string options for SQL Agent.
    Any guidance or suggestions on how to resolve this would be appreciated.
    ~joe

    SQL Team - MSFT:
    Thank you for taking the time to research and provide a clear answer.
    This seems very much a workaround and very unsatisfactory.
    You are correct, there is an IP dependency with OR condition. Moving to an AND condition is not viable for us. The whole point is to provide network redundancy. With an AND condition, if EITHER network interface fails, the service will go offline or fail
    to come online without manual intervention. This is arguably worse for uptime than having a single interface available.
    We are in process of rewriting all our SQL jobs to start in tempdb before transitioning to the appropriate target database. If this works for all of our jobs, I will mark the above response as answer.
    Again, thank you for the answer.
    Regards,
    Joe M.

  • Failover Cluster Network Name Failed and Can't be Repaired

    I have an issue that seem to be a different problem than any others have encountered.
    I've scoured everything I can find and nothing has fixed my problem.
    The problem starts with the common problem of the cluster network name failing on my 2 node server 2012 file server cluster.  The computer object was still in AD and appeared to be fine so it was not the common problem of the object
    getting deleted somehow.  At the time, there was no other object with that name in the recycling bin, so I don't think it was mistakenly deleted and quickly recreated to cover any tracks, so to speak.
    Following one guide, I tried to find the registry key that corresponded with the GUID of the object, but neither node in the cluster had it in its registry (which may be part of the problem).
    Since it was in the failed state, I tried to do the repair on the object to no avail.
    We run a "locked down" DC environment so all computer objects have to be pre-provisioned.  They were all pre-provisioned successfully and successfully assigned during cluster creation.  The cluster was running with no issues for a month
    or so before this problem came up.
    When I do a repair on the object while taking diagnostic logs the following 4609 error appears:
    The action 'Repair' did not complete. - System.ApplicationException: An error occurred resetting the password for 'Cluster Name'. ---> System.ComponentModel.Win32Exception: Unknown error (0x80005000)
    There appears to be a corresponding 4771 error with a failure code 0x18 that comes from the security log of the DC that states there was a Kerberos pre-authentication failure for the cluster network name object (Domain\Clustername$)
    I believe this is what is causing the repair failure.  All the information I found related to security error 4771 was either a bad credentials given for a user account or the fix was to reconnect the computer to the domain.  I can't seem to find
    a way to do this with the cluster network name.  If there's a way please let me know.
    I've tried a number of things, like resetting the object, disabling it, deleting and creating a new object with the same name, deleting that new object and recovering the original, etc...
    Can anyone shed some light on what is going on and hopefully how to fix it other than rebuilding the cluster?  I'm quite close to just tearing it down and building it back up but am hesitant because this cluster in currently in production...
    Any help would be appreciated

    Hi,
    I don’t find out the similar issue with yours, base on my experience, the 4096 error
     often caused by the CSV disk issue, and the 0x80005000 error some time caused by the repetitive computer object in OU. Please check the above related part or run the validate test then post the error information.
    Although I do have a CSV, there doesn't seem to be any problems with it and it was running just fine for a month or so before the problem started.  I double checked and there is no duplicate computer objects, maybe I don't understand what you mean by
    repetitive, could you explain further?
    The cluster validates successfully with a few warnings:
    Validating cluster resource Name: DT-FileCluster.
    This resource is marked with a state of 'Failed' instead of
    'Online'. This failed state indicates that the resource had a problem either
    coming online or had a failure while it was online. The event logs and cluster
    logs may have information that is helpful in identifying the cause of the
    failure.
    - This is because the cluster name is in the failed state
    Validating the service principal names for Name:
    DT-FileCluster.
    The network name Name: DT-FileCluster does not have a valid
    value for the read-only property 'ObjectGUID'. To validate the service principal
    name the read-only private property 'ObjectGuid' must have a valid value. To
    correct this issue make sure that the network name has been brought online at
    least once. If this does not correct this issue you will need to delete the
    network name and re-create it.
    - This is definitely related to the problem and the GUID probably got removed when we attempted a fix by resetting the object and trying the repair from the failover cluster manager.
    The user running validate, does not have permissions to create
    computer objects in the 'ad.unlv.edu' domain.
    - This is correct, we run a restricted domain.  I have a delegated OU that I can pre-provision accounts in.  The account was pro-provisioned successfully and was at one point setup and working just fine.
    There are no other errors nor warnings.

  • Windows server 2012 failover cluster error: Cluster resource 'Virtual Machine Configuration ... of type 'Virtual machine configuration in clustered role ... failed.

    I have two windows 2012 host server that are clustered using windows failover cluster feature. Each server is hosting four VMs. When migrating from Host2 to Host1, the migration failed with the following error:
    Cluster resource 'Virtual Machine Configuration SCPCSQLSRV01' of type 'Virtual Machine Configuration' in clustered role 'SCPCSQLSRV01' failed. The error code was '0x569' ('Logon failure: the user has not been granted the requested logon type at this computer.').
    When this happens, the VM that I was migrating can no longer be started even on the original host. The only remedy is to restart the host server.
    Any suggestion on resolving this problem?
    Thanks
    Ikad

    Thanks. The article referred to above gives the solution to my issue. There is a group policy that is applied to the OU where the host servers were placed. Doing gpupdate /force temporarily removes the problem. Unfortunately the NT Virtual Machine\Virtual
    Machines account is a special account that cannot be added like other accounts and granted the log on as a service right. The thread
    http://social.technet.microsoft.com/Forums/en-US/winserverhyperv/thread/d56f2eae-726e-409a-8813-670a406593e8 contains how it can be added which is by creating a group and running the command
    Net localgroup VMTest “NT Virtual Machine\Virtual Machines” /add
    to add it to a local group VMTest. VMTest is then assigned the right to log on as a service.
    Ikad

  • Failover Cluster disk failed on a VM.

    Hello,
    I tried to create a Failover cluster node on a VM (with a virtual disk with fixed size).
    All the step are ok except disk storage Add.
    When I try to Add a disk, the following error appears :
    Status : Failed.
    Information : Incorrect function.
    Error Code : 0X80070001.
    Does anyone know if it's possible to add a V-disk Hyper-V on a cluster node ?
    Thanks,
    Regards,

    What version of Hyper-V are you running?
    It sounds like you are trying to use shared vhdx.  Note, the shared disk MUST be vhdx.
    http://technet.microsoft.com/en-us/library/dn265980.aspx
    http://technet.microsoft.com/en-us/library/dn281956.aspx
    . : | : . : | : . tim

  • Failed VM in Failover Cluster Manager

    We have a problem that seems to be caused by 3 VMs in our cluster that have failed.
    The cluster is a server 2012 R2 cluster.
    The VMs are not in Hyper-V anymore, but they still appear in the Failover Cluster Manager. We have tried removing each VM and we receive the following error message, "The file cannot be opened because it is in the process of being deleted."
    We have tried moving the failed VM to another server in the cluster but receive the same message.
    Is anyone aware of a way to manually delete a specific VM from a cluster or know a solution to our problem?
    Thanks

    Hi,
    Just want to confirm the current situations.
    Please feel free to let us know if you need further assistance.
    Regards.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Hyper-v Replica: Planned Failover from Cluster to another Cluster Failed

    Hi Guys,
    i have one issue and i cannot find a way to troubleshooting
    i Have one Primary Site running 8 nodes Hyper-v with Failover Cluster and Hyper-v Replica Broker confguration
    i have one Replica Site  running 8 nodes Hyper-v with Failover Cluster and Hyper-v Replica Broker
    Replica Broker Configuration:
    https://onedrive.live.com/redir?resid=45C59B28F67FC71C%212271
    both side are pinging and name resolution is ok
    all ports are open
    The Virtual Machine Replication is ok
    The Planned Failover cannot complete and show me this simple error
    "failed" 
    i have no message on event Viewer
    Error:
    https://onedrive.live.com/redir?resid=45C59B28F67FC71C%212270
    Planned Failover from Cluster (Primary site) to another Cluster (Replica site) with Replica
    Start Hyper-V Manager pr failover cluster console on the primary server and choose a virtual machine to fail over.
    1. Turn off the virtual machine that you want to fail over.
    2. Right-click the virtual machine, point to Replication, and then point to
    Planned Failover.
    3. Click Fail Over to actually transfer operations to the virtual machine on the Replica server.

    Hi Samuel,
    In the original post :
    Tests: 1 -
    Planned Failover from Cluster to StandAlone Hyper-v is ok
    Test: 2
    Planned Failover from Standalone hyper-v to Cluster is ok
    Test:3
    Planned Failover from Cluster (Primary site) to another Cluster (Replica site) with Replica
    There is no  standalone hyper-v to standalone hyper-v between two sites .
    After the last test , if they still can not do planned failover , maybe you can focus on network configuration .
    Best Regads
    Elton Ji
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • DFS-R folder resources Failed in Failover Cluster on Server 2008 R2

    Maybe a week ago, 7 of our DFS-R Folder failover cluster resources lost their configuration data. In the resource list, they just show up as "()".
    I've seen suggestions online that disabling the replicated folder will remove it from the Failover Cluster, and then re-enabling it will automatically add it back to the cluster. So, I've disabled it via the Connections tab, and that does nothing. The other
    option is to disable it under the Memberships tab...but that has consequences, and I would rather not do that if it isn't necessary.
    I'm also now seeing event ID 5012's popping up in the DFS Replication logs every hour or so.
    The DFS Replication service failed to communicate with partner etc etc. The partner did not recognize the connection or the
    replication group configuration.
    Error: 9026 (The connection is invalid)
    Now, "dfsrdiag pollad /verbose" or "dfsrdiag pollad /mem:<dc name>" both come back as successful. And a DFS Diagnostic Report brings back nothing apparently relevant to this issue, other than one error concerning the same event
    ID 5012's.
    Also, I've verified that it's not a DNS issue, pinging the home office DFS server from the site server resolves correctly, and vice versa. And the windows firewall is disabled on both servers. It's POSSIBLE that there is a network issue. Replication works
    fine between the site server and the 2nd cluster server service, just not between the site server and the 1st cluster server service. However, via netstat I see active connections between the site server and the cluster server, as well as the active cluster
    host (although, just a single connection to the clustered DFS server, but ~6 to the cluster host itself).
    So, I'm completely at a loss here. Any recommendations?

    Hi,
    This error usually occurs when one partner attempts to establish an RPC connection with another member, but is unable to.
    You could refer to the thread below to troubleshoot the issue:
    DFSR Event ID 5012 when other DFS folder working
    https://social.technet.microsoft.com/Forums/en-US/9748cb08-858d-454e-93cd-233c98cb2ee8/dfsr-event-id-5012-when-other-dfs-folder-working?forum=winserverfiles
    Best Regards,
    Mandy
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected]

  • Failover Cluster 2012 Network Name fails to come online.

    Hi,
    I created a new one node Windows 2012 failover cluster.  The cluster was created successfully, but configuring Client Access Point  finished without creating VCO in Active Directory. Then I created computer object in the same OU
    with cluster according article
    http://technet.microsoft.com/en-us/library/cc732035(v=ws.10).aspx 
    Permissions on OU and  VCO for cluster account where escalated to FULL, DNS A and PTR records where created, quota related to creating computer objects was increased, but has fixed my problem: events 1194 and 1069 are generated on attempt to online
    network name. Following one guide, I tried to find the registry key for network name resource that corresponded with the GUID of the object of VCO, but it was absent here.
    I've investigated everything I could find and found no solution.
    Any help would be appreciated.

    Hi,
    Please install the Recommended hotfixes and updates for Windows Server 2012-based failover clusters update first, then try again.
    Recommended hotfixes and updates for Windows Server 2012-based failover clusters
    http://support.microsoft.com/kb/2784261
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Target fieldpoint connection failed due to version conflicts

    I have set up a small network of fieldpoint (fp-2015) and laptop with Ethernet. Then, I have succesfully configured this fp with MAX and downloaded the iak file to it. Now, I want to target the labview RT of this fp. However, the the connection failed with following message "Version Conflicts between host labview and RT Engine". I'm using Labview 7.1 released in August.

    It sounds like the FP-2015 that you have has an older version of software on it (probably LabVIEW 7.0). To upgrade the software on the FP-2015, first launch Measurement & Automation Explorer. Locate the FP-2015 under Remote Systems. Right click on the software item and select Install Software. This will allow you to install the new software image and upgrade the controller to LabVIEW 7.1. Note: You will need FieldPoint Explorer 4.1 which shipped with LabVIEW 7.1. If you have an older version of FieldPoint Explorer, it will not have the 7.1 image for the FieldPoint controller.
    Regards,
    Aaron

  • Failover Cluster - GHOST VMS / ROLES

    I mean Ghost as in mysterious non-existent machine, not the old Norton program.  I've periodically had random cluster crashes, mainly due to my own negligence.  99%
    of the time everything comes back up normally.  However periodically a machine will have very strange symptoms that i'm unable to resolve.  The only resolution I've found is to create a new VM and link to the the old VHD.  A description of the
    machines with this issue:
    Shown in Failover Cluster role as Running but cannot Connect, turn off, shutdown, etc.
    Login to Host machine for the VM and open Hyper-V Manager the machine does not exist.  The only place this machine seems to exist is in Failover cluster.
    No details available on the Summary Tab, machine doesn't actually appear to be running despite what the console says.
    Under the resources tab for that Machine is shows the VM as running, but the VM Configuration as failed.
    Unable to bring the configuration back online.  Error is "The group or resource is not the correct state to perform the requested operation"
    I've seen other vague areas about null context pointers or something alone those lines.  I've tried researching the users methods to no avail.  How can I fix these? Or at least remove them when i've recreated the machine.

    Hi,
    Unfortunately, the available information is not enough to have a clear view of the occurred behavior.Could you provide more information about your environment.  The server version of the problem on, when this problem occurs the system log record information,
    screenshots is the best information.
    If you are using Server 2012R2 failover cluster please install the following update:
    Recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters
    http://support.microsoft.com/kb/2920151
    More information:
    Event Logs
    http://technet.microsoft.com/en-us/library/cc722404.aspx
    Thanks.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

Maybe you are looking for

  • How to sign and enable rights

    I know there have been some discussions about Reader Extensions and the like, so please forgive me.  I'm new to all of this and I just don't understand. Is there a very simple, plain English guide on signing and enabling rights?  My agency has finall

  • Recommended second 27 inch monitor for imac 27

    can anyone recommend a second monitor for a 27 inch imac for around £300

  • JCO Callbacks

    Is it possible using JD and OAS 4.0 to do callbacks from the JCO. If it is please would you give a simple example e.g. client registers with JCO. Then JCO calls a sendmessage("hello") method call to the client. Apologies if this is not the correct gr

  • Container && Images ..Is it possible?

    Hi Sun Team and Swing Fans ! Well I have the following code: public class dataTable extends JApplet { ..bla bla .bla boring stuff!!! public void init () { Container contentPane = getContentPane(); contentPane.setLayout(new FlowLayout()); contentPane.

  • Disk utility  help

    Disk utility will not erase free space it gets almost to the end and starts creating a temp file. The hard drive then fills until there's no space left. I've tried the same in terminal from the admin account with the same result. I've tried in single