HyperV Failover Cluster - twice some vms lost network

So i run a 4 node Hyper V Failover Cluster and twice now.... out of months of operations out of the blue on a node a portion of the VMs just lose network access(this has happened on two different nodes). I can just pause the node and everything migrates
off, and then its back up and going. Give it some time and i can unpause and move back. I am looking for ideas on what could be causing this.
There servers are Dell Power Edge 620 with the latest MS patches and Dell drivers and firmware. On my public side i have a 2 nic team using MS software teaming.

What are the NICs?
If Broadcom Netextreme disable VMQ as there's a known issue with them resulting in network loss
They are broadcom :/ 
Thanks for the tip.
I will google around but do you have any links for this information?
*EDIT*
http://support2.microsoft.com/kb/2986895

Similar Messages

  • Adding more RAM to all 3 nodes in a hyperV failover cluster, re-validate config?

    No issues with 2012 R2 also. Just add the ram and you will be fine.

    hey spiceheads,
    I have a 3 server node hyper-v failover cluster running hyper-v server 2012 R2.
    two of the servers have 96gb and the other 120gb.  Going to even all three servers to 128gb.
    Once this is done do I need to re-validate?  If so, re-validation would take my cluster completely off-line, correct?
    Thanks,
    ceez
    This topic first appeared in the Spiceworks Community

  • In Failover Cluster 2008 mailbox2 server network status Unavailable (down).

    Hi 
    I am new here, & hope i can get some help for the below issue
    let me keep it simple scenario 
    i have 2 mailbox server ( mbx01 & mbx02)
    2 cashub Server ( cshb1-cshb2) all running on hyper-v. last week due to some maintenance in data center i have to bring down all the production environment, once the thing are fixed in DC , i started all the server and found out that my exchange node 2 database
    are failed.
    In exchange 2010 failover cluster node ( node2 ) is down network status unavailable. due to by exchange node2 database are failed.
    also i cannot access the file share witness directory from any of the above server.
    can any body assist me on this please 
    thank you 

    Make sure both the DAG members have access to the witness directory. Check the location and make sure you have access. Open the DAG properties from Node2 and try to update the witness server and witness directory. Check the firewall/antivirus of the witness
    server. Try disabling antivirus and firewall. If witness server is not online your databases will go offline. Pleae check
    this
    Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,servers,witnessserver,witnessdirectory,alternatewitnessserver,alternatewitnessdirectory,operationalservers,primaryactivemanager,witnessshareinuse
    Thanks, MAS
    Please mark as helpful if you find my comment helpful or as an answer if it does answer your question. That will encourage me - and others - to take time out to help you.

  • Hyper-V Failover Cluster Networking Configuration After Install

    Hello All,
                Is it possible to install hyper-v and failover or in other words create a hyper-v failover cluster and then configure the networking part of the solution later?  As I am coming into
    terms with the networking part of it, wanted to do it later after the install.  Is it possible?
    And from later configuration, I am trying to say, creation of NIC Team, Virtual NICs, VLAN tagging, etc.

    Hi,
    Failover cluster deployment requires network connectivity between cluster nodes. You can't create a cluster without properly configured TCP\IP on cluster nodes.
    http://OpsMgr.ru/

  • Heartbeat Network inside Hyper-V Failover Cluster

    Dear All,
                I need to build an SQL Server 2012 SP1 Two-node cluster inside Hyper-V Failover Cluster.  There is only one virtual switch in the Hyper-V Failover Cluster environment which is being
    used for communication to the outside world using VLAN Tagging.  Now, since SQL Server 2012 Failover Cluster would require a heartbeat network as well and although possible to assign VLAN tag to the heartbeat adapter and use that for hearbeat, but since there
    would not be any gateway on the heartbeat network, it would render the VLAN tagging useless, so is following plan good enough:
    1.  Create an Internal Virtual Switch on all nodes of the cluster with the same name
    2.  Link the heartbeat adapter of the virtual machines to the Internal Virtual Switch
    Is it good enough? or is there any other better way?
    Thanks in advance.   

    I am not an expert on this, but here is what I would do.
    Create the VLAN and use that to keep the network setup as easy to understand as possible. Our Hyper-V cluster is only using IPv6 on the heartbeat network, and that is working like a charm. I would do the same inside the hyper-v hosts if I need to builld
    a virtual cluster. 
    What is the reason for deploying a failover cluster for SQL inside Hyper-V? Wouldn't a log shipping "cluster" provide a more secure solution for your SQL?
    /Martin
    Exchange is a passion not just a collaboration software.

  • HyperV 2012 R2 Failover cluster, HV problem, all VMs restart

    Hello, I have 2 node Failover cluster with two nodes, Hyperv 2012, multipath SAS storage MSA2000. But hardware problem with one node (node2). It shutdown unexpectly. When It hapens NODE1 restar all VMs it is normal? It was configured by cluster validation
    tool. There is no witness. I don't clearly understand what happens if one node crash. KR.

    As Eric has said it will start the VM's in a crash consistent state on the non crashed host.
    But from your example I take your seeing your guests on the non crashed host restart. If this is the case I would say yes! I have seen this happen before. It can happen if your not using quorum because only one node has a vote. I would recommend you create
    a witness, on your MSA 2000 carve out 1 GB and do a disk witness. Or if you have a server not in the VM cluster you could do a file share witness, file share is my preferred. Once you have a witness in play you will see all of your hosts having a vote. Look
    in the cluster manager at the nodes section. You should see a vote column. Currently it will say 1/0, once the witness is created it will show 1/1.

  • Cannot migrate VM in VMM but can in Failover Cluster Manager network adapters network optimization warning

    I have a 4 node Server 2012 R2 Hyper-V Cluster and manage it with VMM 2012 R2.  I just upgraded the cluster from 2012 RTM to 2012 R2 last week which meant pulling 2 nodes out of the existing cluster, creating the new R2 cluster, running the copy
    cluster roles wizard since the VHDs are stored on CSVs, and then added the other 2 nodes after installing R2 on them, back into the cluster.  After upgrading the cluster I am unable to migrate some VMs from one node to another.  When trying to do
    a live migration, I get the following notifications under the Rating Explanation tab:
    Warning: There currently are not network adapters with network optimization available on host Node7. 
    Error: Configuration issues related to the virtual machine VM1 prevent deployment and must be resolved before deployment can continue. 
    I get this error for 3 out of the 4 nodes in the cluster.  I do not get this error for Node10 and I can live migrate to that node in VMM.  It has a green check for Network optimization.  The others do not.  These errors only affect
    VMM. In the Failover Cluster Manager, I can live migrate any VM to any node in the cluster without any issues.  In the old 2012 RTM cluster I used to get the warning but I could still migrate the VMs anywhere I wanted to.  I've checked the network
    adapter settings in VMM on VM1 and they are the same as VM2 which can migrate to any host in VMM.  I then checked the network adapter settings of the VMs from the Failover Cluster Manager and VM1 under Hardware Acceleration has "Enable virtual machine
    queue" and Enable IPsec task offloading" checked.  I unchecked those 2 boxes refreshed the VMs, refreshed the cluster, rebooted the VM and refreshed again but I still could not live migrate VM1.  Why is this an issue now but it wasn't before
    running on the new cluster?  How do I resolve the issue?  VMM is useless if I can't migrate all my VMs with it.

    I checked the settings on the physical nics on each node and here is what I found:
    Node7: Virtual machine queue is not listed (Cannot live migrate problem VM's to this node in VMM)
    Node8: Virtual machine queue is not listed (Cannot live migrate problem VM's to this node in VMM)
    Node9: Virtual machine queue is listed and enabled (Cannot live migrate problem VM's to this node in VMM)Node10: Virtual machine queue is listed and enabled (Live Migration works on all VMs in VMM)
    From Hyper-V or the Failover Cluster manager I can see in the network adapter settings of the VMs under Hardware Acceleration that these two settings are checked "Enable virtual machine queue" and Enable IPsec task offloading".  I unchecked those
    2 boxes, refreshed the VMs, refreshed the cluster, rebooted the VM and refreshed again but I still cannot live migrate the problem VMs.
    It seems to me that if I could adjust those VM settings from VMM that it might fix the problem.  Why isn't that an option to do so in VMM? 
    Do I have to rebuild the VMM server with a new DB and then before adding the Hyper-V cluster uncheck those two settings on the VM's from Hyper-V manager?  That would be a lot of unnecessary work but I don't know what else to do at this point.

  • Server 2008 R2 Failover cluster network configuration

    Hi
    We have a customer with a Server 2008 R2 Hyper-V failover cluster. They have 2 cluster networks, "Cluster Network 1" and "Cluster Network 2".
    "Cluster Network 1": NIC team on 172.16.1.0/24 for private cluster network communication
    "Cluster Network 2": NIC team on 192.168.1.0/24 for production network communication
    I can see that "Cluster Network 1" is configured to "Allow cluster network communication on this network" and "Allow clients to connect through this network".
    If "Cluster Network 1" is ONLY for communication between the to cluster nodes then I assume the selection in "Allow clients to connect through this network" should be removed?
    /Lasse

    It will cause a lost network connection for any client that is accessing through that network.  Those clients would need to reconnect.
    Did you configure both IPs on the cluster resource name that clients are accessing?  If you only configured the one you want, there should be no issue.  If you configured both, then it is possible some clients might be connected via the private
    network.
    Another thing you should, and if you have already done this you will most likely not have issues at all, is that you should disable DNS registration on any network you do not want client access coming through.  If the clients can only find the resource
    through the DNS name registered, that is the way they will be coming in.  In my clusters, which often have 7 or more NICs, there is only one with a published DNS record.
    . : | : . : | : . tim

  • Network DR test causes Exchange DAG network to fail (Failover Cluster Manager reports comms errors)

    We have a DAG configured between 2 mailbox servers, one in each of our main data centres. Our comms team recently performed a DR test between our 2 data centres, switiching from the main production link to the backup link. During this outage the Failover
    Cluster Manager reported errors, with each mailbox server reporting the other as uncontactable. The Events that were logged include the following:
    Isatap interface isatap.{02ADE20A-D5D4-437F-AD00-E6601F7E7A9D} is no longer active. (EventID 4201)
    Cluster node 'MAILBOX_SERVER' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the
    Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is
    connected such as hubs, switches, or bridges. (EventID 1135)
    File share witness resource 'File Share Witness (\\WITNESS_SERVER\SHARE_NAME)' failed to arbitrate for the file share '\\WITNESS_SERVER\SHARE_NAME'. Please ensure that file share '\\WITNESS_SERVER\SHARE_NAME' exists and is accessible by the cluster. (EventID
    1564)
    Cluster resource 'File Share Witness (\\\WITNESS_SERVER\SHARE_NAME)' in clustered service or application 'Cluster Group' failed. (EventID 1069)
    The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk. Run the Validate a Configuration wizard to check your network
    configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. (EventID 1177)
    The Cluster Service service terminated with service-specific error A quorum of cluster nodes was not present to form a cluster. (EventID 7024)
    The Microsoft Exchange Information Store service terminated unexpectedly.  It has done this 1 time(s).  The following corrective action will be taken in 5000 milliseconds: Restart the service. (EventID 7031)
    Looking at the Cluster Events in the Failover Cluster Manager Snap-In i see a heap of Event ID 47 (cannot activate the DAG databases as the server is not up according to Windows Failover Cluster Service) and:
    Node status could not be recorded. This could prevent some network failure logic from functioning correctly. NodeStatus:IsHealthy=True,HasADAccess=True,ClusterErrorOverrideFalse,LastUpdate=5/2/2011 8:25:42 AMUTC Failure:An Active Manager operation failed.
    Error: An error occurred while attempting a cluster operation. Error: Cluster API '"ClusterRegSetValue() failed with 0x6be. Error: The remote procedure call failed"' failed.. (EventID 184)
    Forcefully dismounting all the locally mounted databases on server 'BACKUP_MAILBOX_SERVER. (EventID 307).
    Our Comms team doesn't believe it is a comms issue as they did not log any network communication errors between the servers in the two sites (using icmp). So if it is not a comms issue, how can I configure the Failover Cluster Manager to be resilient to
    this type of network failover event.
    Thanks
    Dan

    Isn't it also true that in a stretched DAG with even numbered nodes, the PAM needs to be in the same site as the active DAG node?  If the connection between both nodes goes down, and the PAM is in the "passive" site, the primary node will
    dismount the databases since it can't check with the PAM to make sure its safe for it to be up.  
    In a even-numbered node stretched DAG, the PAM changes to the DR/passive site everytime a failover occurs, but doesn't automatically switch back when you reactivate the primary node.

  • Create failover cluster to host Windows 2012 DC, Exchange 2013 and SQL as VMs

    One of our clients has running Windows Essential 2012, SQL and exchange 2007 as VM on VMware for 4 years without major issue. However, the physical server is getting old and have some hardware issues recently. They have budgets to buy two Dell servers, EqualLogic
    SAN, Windows server 2012 Datacenter and Exchange 2013. Is it possible for them to create failover cluster to host Windows 2012 DC, Exchange 2013 and SQL as VMs?
    Bob Lin, MCSE & CNE Networking, Internet, Routing, VPN Troubleshooting on
    http://www.ChicagoTech.net
    How to Setup Windows, Network, VPN & Remote Access on
    http://www.howtonetworking.com

    We will move all VMs from VMware to Hyper-V. Thank you.
    Bob Lin, MCSE & CNE Networking, Internet, Routing, VPN Troubleshooting on <p><a href="http://www.chicagotech.net"><span style="color:#0033cc">http://www.ChicagoTech.net<br/> </span></a></p>
    How to Setup Windows, Network, VPN &amp; Remote Access on <p><a href="http://www.howtonetworking.com"><span style="color:#0033cc">http://www.howtonetworking.com<br/> </span></a></p>

  • Hyper-V internal (1 of 2, NOT Cluster) network unavailable in Failover Cluster Manager 2008R2

    Hi all,
    I had a very strange situation in my Hyper-V 2 nodes-cluster:
    I have one networtk for HertBeat only (10.0.0.0/24) and second for HyperV internal networking for virtual machines (In properties marked "Do not allow clustern network communication")
    Machines were working properly and any migration too.
    One day, my secon done HyperV2 was marked red in Failover Cluster Manager mmc. I discovered that HyperV LAN is unavailable on this second node. BUT everything war working properly - HyperV2 node was on internet, communicated to AD domain, even culd run any
    virtual machine...
    Several times I checked the configuration, also check TMG configuratio, I was wondering if it can not be wrong settings on network access rule, I tried to restart this host - no result, ... network was still unavailable.
    After about a hour I found the resolutuion:
    On my second Hyper-V node Disable / Enable Local Area Connection network adapter, connected to Hyper-V LAN in Network Connections control panel!
    Hope this will help to somebody ;)
    Marian, just trying to help you

    Resolutuion:
    On affected Hyper-V node Disable / Enable Local Area Connection network adapter, connected to Hyper-V LAN in Network Connections control panel
    I guess, sometnig flush on network configuration and / or some combination with network adapter driver
    Marian, just trying to help you

  • Very Strange Network Issue With Two Guests on 2012 R2 Hyper-V Failover Cluster

    Hi all.  We're having a odd issue with two guests on our 2012 R2 failover cluster.  
    In a nutshell, if we shutdown a particular server (I'll call it Server A) another totally different server (Server B) on the same node loses it's network connectivity to the domain. If we start server A back up, network connectivity returns on server B.
    At first I thought server A might be running a service that was somehow linked to server B, so I decided to disable server A's NIC.  Interestingly, that had no affect on server B's connectivity.  
    The next step I tried was pausing server A and again, no adverse affect on server B's connectivity.  
    Next step was to live migrate server A to another node.  This action did
    cause server B to lose its network connection. 
    One other clue is that if I ping server B from either of the Hyper-V hosts in the cluster, I never lose network connection to server B.
    So I would suspect this is some network issue on the cluster, but I'm kind of at a loss where to go from here.  
    Has anyone seen this behavior before or does anyone have any troubleshooting suggestions I can try?
    Thanks! 
    George Moore

    Hi Sir,
    I'v never seen this before .
    >>Next step was to live migrate server A to another node.  This action did
    cause server B to lose its network connection. 
    They are connecting to same virtual switch ?
    First please run cluster validation to check if there is any error .
    If it is ok , please try the following items for troubleshooting :
    1. shutdown  serverA   serverB
    2. then add another virtual NIC for serverB
    3. start server B  check if the issue happens to both "old" and "new" virtual NIC .
    In addition , you can live migrate both A and B to another node , then try to live migrate A to the original node .
    If the issue persists , I would suggest you to remove that virtual switch on both nodes then re-create them .
    Best Regards,
    Elton Ji
    If it is not the answer please unmark it to continue
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected] .

  • Failover Cluster Network Name Failed and Can't be Repaired

    I have an issue that seem to be a different problem than any others have encountered.
    I've scoured everything I can find and nothing has fixed my problem.
    The problem starts with the common problem of the cluster network name failing on my 2 node server 2012 file server cluster.  The computer object was still in AD and appeared to be fine so it was not the common problem of the object
    getting deleted somehow.  At the time, there was no other object with that name in the recycling bin, so I don't think it was mistakenly deleted and quickly recreated to cover any tracks, so to speak.
    Following one guide, I tried to find the registry key that corresponded with the GUID of the object, but neither node in the cluster had it in its registry (which may be part of the problem).
    Since it was in the failed state, I tried to do the repair on the object to no avail.
    We run a "locked down" DC environment so all computer objects have to be pre-provisioned.  They were all pre-provisioned successfully and successfully assigned during cluster creation.  The cluster was running with no issues for a month
    or so before this problem came up.
    When I do a repair on the object while taking diagnostic logs the following 4609 error appears:
    The action 'Repair' did not complete. - System.ApplicationException: An error occurred resetting the password for 'Cluster Name'. ---> System.ComponentModel.Win32Exception: Unknown error (0x80005000)
    There appears to be a corresponding 4771 error with a failure code 0x18 that comes from the security log of the DC that states there was a Kerberos pre-authentication failure for the cluster network name object (Domain\Clustername$)
    I believe this is what is causing the repair failure.  All the information I found related to security error 4771 was either a bad credentials given for a user account or the fix was to reconnect the computer to the domain.  I can't seem to find
    a way to do this with the cluster network name.  If there's a way please let me know.
    I've tried a number of things, like resetting the object, disabling it, deleting and creating a new object with the same name, deleting that new object and recovering the original, etc...
    Can anyone shed some light on what is going on and hopefully how to fix it other than rebuilding the cluster?  I'm quite close to just tearing it down and building it back up but am hesitant because this cluster in currently in production...
    Any help would be appreciated

    Hi,
    I don’t find out the similar issue with yours, base on my experience, the 4096 error
     often caused by the CSV disk issue, and the 0x80005000 error some time caused by the repetitive computer object in OU. Please check the above related part or run the validate test then post the error information.
    Although I do have a CSV, there doesn't seem to be any problems with it and it was running just fine for a month or so before the problem started.  I double checked and there is no duplicate computer objects, maybe I don't understand what you mean by
    repetitive, could you explain further?
    The cluster validates successfully with a few warnings:
    Validating cluster resource Name: DT-FileCluster.
    This resource is marked with a state of 'Failed' instead of
    'Online'. This failed state indicates that the resource had a problem either
    coming online or had a failure while it was online. The event logs and cluster
    logs may have information that is helpful in identifying the cause of the
    failure.
    - This is because the cluster name is in the failed state
    Validating the service principal names for Name:
    DT-FileCluster.
    The network name Name: DT-FileCluster does not have a valid
    value for the read-only property 'ObjectGUID'. To validate the service principal
    name the read-only private property 'ObjectGuid' must have a valid value. To
    correct this issue make sure that the network name has been brought online at
    least once. If this does not correct this issue you will need to delete the
    network name and re-create it.
    - This is definitely related to the problem and the GUID probably got removed when we attempted a fix by resetting the object and trying the repair from the failover cluster manager.
    The user running validate, does not have permissions to create
    computer objects in the 'ad.unlv.edu' domain.
    - This is correct, we run a restricted domain.  I have a delegated OU that I can pre-provision accounts in.  The account was pro-provisioned successfully and was at one point setup and working just fine.
    There are no other errors nor warnings.

  • Event Logs of VMs Migration in Failover Cluster of Hyper-V Hosts

    Hello All,
    We're running Failover Cluster of Hyper-V hosts of Windows Server 2012 R2. Using SCVMM 2012 R2 with UR5 for management.
    If any host gets down unexpectedly (due to any reason power/bugcheck/hardware failure or what so ever), then the VMs on that host, of course, get migrated (either quick or live) to some other host within the cluster.
    I want to have logs/events of this VMs migration. I want to know that which of the VMs were residing on that host at that time of failure. Of course, we can't have this info in the Cluster events is Failover Cluster Manager. I am unable to find this info
    anywhere. I have searched in Event Viewer --> Administrative Roles --> Hyper-V. I have searched a lot in the SCVMM, but no success.
    Please help me in finding the exact location of these logs/events. I would also like to know that if the VM was quick migrated or live migrated, and to which host the VM got migrated.
    I'd be highly grateful.
    Thanks in anticipation.
    Regards,
    Hasan Bin Hasib

    This post was cross-posted in the clustering forum.  As noted in that forum, a failure of a host does not initiate a quick or live migration.  Migration requires both the source and destination nodes be operational during the entire migration
    process.  Should a host fail, it is impossible for that host to participate in a migration.  In the case of a host failure, the VM is restarted on another node of the cluster.  You can still use the information provided by Elton for viewing
    events in the event log.  If you want to see the exact sequence of log entries, perform quick/live migrations in a lab and notices the changes in the event log.  You can also fail a host and see the sequence of log entries.
    . : | : . : | : . tim

  • Hyper-V Failover Cluster - Inconsistent Network Availability

    We've got a Small cluster with, 7 hosts and a dozen or two VM's.  For some reason i'm getting inconsistent availability with the Cluster networks.  The host seem to function fine on there own but theres all types of issues using Migration which
    i'm assuming is because certain hosts think other hosts are unavailable. For Example:
    Cluster Network 1 - From Host 8
    Cluster Network 1 - From Host 10
    As far as I can tell all of the networks are UP. I can ping all hosts on all interfaces.  What criteria goes into determining host availability?

    Hi,
    Maybe you forgot to configure the Live migration Settings,
    In the Failover cluster manager->right click networks-> select the livemigration network.
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

Maybe you are looking for

  • Changing Font in Adobe reader?

    I was doing some pages, in which I can enter the information. The problem when I got to some of the places, is the font was to large in adobe reader? I looked at preferences and could not find any area to change the size of font, so it would fit prop

  • Mail will not send mail wirelessly: need to plug into ethernet port:

    Hi: This is weird. While my wife's ibook has a strong signal and can use Safari effortlessly, when she tries to send mail, Mail won't let her. In other words SMTP is the issue because she never has a problem receiving mail the POP server. Since we ha

  • Cancel invoce

    I created an excise invice for purchase return using J1IS. But for some reason it was wronly prepared and now I want to cancel it. How do we cancel it along with reversing its accounting entries? Also, can I et a list of cancelled invoices? VS

  • Problem in accessing field symbols passed as parameters to subroutine

    Hi, I have different internal tables/structures,i am populating those tables dynamically using field symbols. If i put that logic in perform, i am getting error while accessing the field symbol insdie the form. For ex: My code looks like below. tab1

  • Import a schema (tables, Views, Stored Procedures) on logical standby

    Hi, We have a logical standby for reporting purpose. The logical standby build through data guard we need to import a new user in logical standby using import utility. The user dump contain tables, views, procedures, packages, roles). The new user im