Host server live migration causing Guest Cluster node goes down

Hi 
I have two node Hyper host cluster , Im using converged network for Host management,Live migartion and cluster network. And Separate NICs for ISCSI multi-pathing. When I live migrate the Guest node from one host to another , within guest cluster the node
is going down.  I have increased clusterthroshold and clusterdelay values.  Guest nodes are connecting to ISCSI network directly from ISCSI initiator on Server 2012. 
The converged networks for management ,cluster and live migration networks are built on top of a NIC Team with switch Independent mode and load balancing as Hyper V port. 
I have VMQ enabled on Converged fabric  and jumbo frames enabled on ISCSI. 
Can Anyone guess why would live migration cause failure on the guest node. 
thanks
mumtaz 

Repost here: http://social.technet.microsoft.com/Forums/en-US/winserverhyperv/threads
in the Hyper-V forum.  You'll get a lot more help there.
This forum is for Virtual Server 2005.

Similar Messages

  • Hyper-V Guest Cluster Node Failing Regularly

    Hi,
    We currently have a 4-node Server 2012 R2 Cluster witch hosts among other things, a 3 node Guest Cluster running a single clustered file service.  
    Around once a week, the guest cluster node that is currently hosting the clustered file service will fail.  It's as if the VM is blue screening.  That in itself is fairly anoying and I'll be doing all the updates and checking event log for clues
    as to the cause.  
    The problem then is that whichever physical cluster node that is hosting the VM when it fails,  will not unlock some of the VM's files.  The Virtual machine configuration lists as Online Pending.  This means that the failed VM cannot be restarted
    on any other cluster node.  The only fix is to drain the physical host it failed on, and reboot. 
    Looking for suggestions on how to fix the following.
    1. Crashing guest file cluster node
    2. Failed VM with shared VHDX requiring Phyiscal host reboot.
    Event messages for the physical host that was hosting the failed vm in order that they occured.
    Hyper-V-Worker: Event ID 18590 - 'FS-03' has encountered a fatal error.  The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x9E, ErrorCode1: 0x6C2A17C0, ErrorCode2: 0x3C, ErrorCode3: 0xA, ErrorCode4:
    0x0.  If the problem persists, contact Product Support for the guest operating system.  (Virtual machine ID 36166B47-D003-4E51-AFB5-7B967A3EFD2D)
    FailoverClustering: Event ID 1069 - Cluster resource 'Virtual Machine FS-03' of type 'Virtual Machine' in clustered role 'FS-03' failed.
    Hyper-V-High-Availability: Event ID 21128 - 'Virtual Machine FS-03' failed to shutdown the virtual machine during the resource termination. The virtual machine will be forcefully stopped.
    Hyper-V-High-Availability: Event ID 21110 - 'Virtual Machine FS-03' failed to terminate.
    Hyper-V-VMMS: Event ID 20108 - The Virtual Machine Management Service failed to start the virtual machine '36166B47-D003-4E51-AFB5-7B967A3EFD2D': The group or resource is not in the correct state to perform the requested operation. (0x8007139F).
    Hyper-V-High-Availability: Event ID 21107 - 'Virtual Machine FS-03' failed to start.
    FailoverClustering: Event ID 1205 - The Cluster service failed to bring clustered role 'FS-03' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

    Hi,
    I don’t found the similar issue, Does your cluster can pass the cluster validation? Does all your Hyper-V host compatible with Server 2012r2? Have you try to disable all your
    AV soft and firewall? Please rerun Storage validation on the Cluster in non-production hours, the cluster validation report will quickly locate the issue.
    More information:
    Cluster
    http://technet.microsoft.com/en-us/library/dd581778(v=ws.10).aspx
    Hope this helps.
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • Live Migration to Best possible node

    Hi,
    I have a 20 node cluster with virtual machine role.
    I would like know equivalent power shell command for migrating VMs to best possible node.
    When I right click on VM from cluster manager, I see below option for Live migration to Best possible node. Same thing I would like to achieve through powershell.
    Thanks in advance.
    Thanks, Krishna

    Well, you're asking the cluster to make its best determination on where the VMs should go, so I don't really know that I can second-guess its behavior.
    You could ensure that all highly available VMs are moved off a specific node by using
    Suspend-ClusterNode:
    Suspend-ClusterNode -Name "node1" -Drain
    Then when you want to put the roles back, use
    Resume-ClusterNode:
    Resume-ClusterNode -Name "node1" -Failback Immediate
    You can enter multiple node names at once, if you want.
    But if stress testing a network device is your aim, I would look at actual test tools, like
    IOMeter.
    Eric Siron Altaro Hyper-V Blog
    I am an independent blog contributor, not an Altaro employee. I am solely responsible for the content of my posts.
    "Every relationship you have is in worse shape than you think."

  • Reports server 10.1.2.0.2 keeps going down !

    Hi Guys,
    The reports server 10.1.2.0.2 ( In-Process NOT standalone one ) works all day, when we come to work next day, we find that reports server is down and giving
    the following error:
    REP-501: Unable to connect to the specified database.
    We have to do a restart for the reports server to work. The reports server connect
    s to a cluster database, therefore, we dont shutdown the database during night f
    or backup for reports server connections to fail.
    I am thinking of maybe it is related to being idle and losing connections some how.
    http://servername:7778/reports/rwservlet?server=rep_servername&envid=PIBIS&report=SWRANBT&destype=CACHE&
    desformat=PDF¶mform=YES&userid=
    Operating System: Sun Solaris SPARC 64-bit
    Oracle Application Server: 10.1.2.0.2
    Oracle DB: 10.1.0.4
    Please advise,
    Cheers,
    Feras

    Hello,
    Please check the the engine (rwEng) trace file, is it showing similar error as follows;
    REP-0501: Unable to connect to the specified database.
    ORA-24323: value not allowed
    therefore, we dont shutdown the database during night for backup It has also been stated as a reason/cause in the doc;
    It is possible that the connection has been lost simply because a scheduled (e.g. overnight) backup and restart of the Oracle Server has taken place while the Report Server has remained up and running.
    Please review and see if it is helpful ;
    REP-501 on Initial Run After The Server's been Idle. Cannot Recover until Engine Restarts: Doc ID: Note:357652.1
    https://metalink.oracle.com/metalink/plsql/f?p=130:14:804350786344799034::::p14_database_id,p14_docid,p14_show_header,p14_show_help,p14_black_frame,p14_font:NOT,357652.1,1,1,1,helvetica
    Adith

  • Live Upgrade fails on cluster node with zfs root zones

    We are having issues using Live Upgrade in the following environment:
    -UFS root
    -ZFS zone root
    -Zones are not under cluster control
    -System is fully up to date for patching
    We also use Live Upgrade with the exact same same system configuration on other nodes except the zones are UFS root and Live Upgrade works fine.
    Here is the output of a Live Upgrade:
    bash-3.2# lucreate -n sol10-20110505 -m /:/dev/md/dsk/d302:ufs,mirror -m /:/dev/md/dsk/d320:detach,attach,preserve -m /var:/dev/md/dsk/d303:ufs,mirror -m /var:/dev/md/dsk/d323:detach,attach,preserve
    Determining types of file systems supported
    Validating file system requests
    The device name </dev/md/dsk/d302> expands to device path </dev/md/dsk/d302>
    The device name </dev/md/dsk/d303> expands to device path </dev/md/dsk/d303>
    Preparing logical storage devices
    Preparing physical storage devices
    Configuring physical storage devices
    Configuring logical storage devices
    Analyzing system configuration.
    Comparing source boot environment <sol10> file systems with the file
    system(s) you specified for the new boot environment. Determining which
    file systems should be in the new boot environment.
    Updating boot environment description database on all BEs.
    Updating system configuration files.
    The device </dev/dsk/c0t1d0s0> is not a root device for any boot environment; cannot get BE ID.
    Creating configuration for boot environment <sol10-20110505>.
    Source boot environment is <sol10>.
    Creating boot environment <sol10-20110505>.
    Creating file systems on boot environment <sol10-20110505>.
    Preserving <ufs> file system for </> on </dev/md/dsk/d302>.
    Preserving <ufs> file system for </var> on </dev/md/dsk/d303>.
    Mounting file systems for boot environment <sol10-20110505>.
    Calculating required sizes of file systems for boot environment <sol10-20110505>.
    Populating file systems on boot environment <sol10-20110505>.
    Checking selection integrity.
    Integrity check OK.
    Preserving contents of mount point </>.
    Preserving contents of mount point </var>.
    Copying file systems that have not been preserved.
    Creating shared file system mount points.
    Creating snapshot for <data/zones/img1> on <data/zones/img1@sol10-20110505>.
    Creating clone for <data/zones/img1@sol10-20110505> on <data/zones/img1-sol10-20110505>.
    Creating snapshot for <data/zones/jdb3> on <data/zones/jdb3@sol10-20110505>.
    Creating clone for <data/zones/jdb3@sol10-20110505> on <data/zones/jdb3-sol10-20110505>.
    Creating snapshot for <data/zones/posdb5> on <data/zones/posdb5@sol10-20110505>.
    Creating clone for <data/zones/posdb5@sol10-20110505> on <data/zones/posdb5-sol10-20110505>.
    Creating snapshot for <data/zones/geodb3> on <data/zones/geodb3@sol10-20110505>.
    Creating clone for <data/zones/geodb3@sol10-20110505> on <data/zones/geodb3-sol10-20110505>.
    Creating snapshot for <data/zones/dbs9> on <data/zones/dbs9@sol10-20110505>.
    Creating clone for <data/zones/dbs9@sol10-20110505> on <data/zones/dbs9-sol10-20110505>.
    Creating snapshot for <data/zones/dbs17> on <data/zones/dbs17@sol10-20110505>.
    Creating clone for <data/zones/dbs17@sol10-20110505> on <data/zones/dbs17-sol10-20110505>.
    WARNING: The file </tmp/.liveupgrade.4474.7726/.lucopy.errors> contains a
    list of <2> potential problems (issues) that were encountered while
    populating boot environment <sol10-20110505>.
    INFORMATION: You must review the issues listed in
    </tmp/.liveupgrade.4474.7726/.lucopy.errors> and determine if any must be
    resolved. In general, you can ignore warnings about files that were
    skipped because they did not exist or could not be opened. You cannot
    ignore errors such as directories or files that could not be created, or
    file systems running out of disk space. You must manually resolve any such
    problems before you activate boot environment <sol10-20110505>.
    Creating compare databases for boot environment <sol10-20110505>.
    Creating compare database for file system </var>.
    Creating compare database for file system </>.
    Updating compare databases on boot environment <sol10-20110505>.
    Making boot environment <sol10-20110505> bootable.
    ERROR: unable to mount zones:
    WARNING: zone jdb3 is installed, but its zonepath /.alt.tmp.b-tWc.mnt/zoneroot/jdb3-sol10-20110505 does not exist.
    WARNING: zone posdb5 is installed, but its zonepath /.alt.tmp.b-tWc.mnt/zoneroot/posdb5-sol10-20110505 does not exist.
    WARNING: zone geodb3 is installed, but its zonepath /.alt.tmp.b-tWc.mnt/zoneroot/geodb3-sol10-20110505 does not exist.
    WARNING: zone dbs9 is installed, but its zonepath /.alt.tmp.b-tWc.mnt/zoneroot/dbs9-sol10-20110505 does not exist.
    WARNING: zone dbs17 is installed, but its zonepath /.alt.tmp.b-tWc.mnt/zoneroot/dbs17-sol10-20110505 does not exist.
    zoneadm: zone 'img1': "/usr/lib/fs/lofs/mount /.alt.tmp.b-tWc.mnt/global/backups/backups/img1 /.alt.tmp.b-tWc.mnt/zoneroot/img1-sol10-20110505/lu/a/backups" failed with exit code 111
    zoneadm: zone 'img1': call to zoneadmd failed
    ERROR: unable to mount zone <img1> in </.alt.tmp.b-tWc.mnt>
    ERROR: unmounting partially mounted boot environment file systems
    ERROR: cannot mount boot environment by icf file </etc/lu/ICF.2>
    ERROR: Unable to remount ABE <sol10-20110505>: cannot make ABE bootable
    ERROR: no boot environment is mounted on root device </dev/md/dsk/d302>
    Making the ABE <sol10-20110505> bootable FAILED.
    ERROR: Unable to make boot environment <sol10-20110505> bootable.
    ERROR: Unable to populate file systems on boot environment <sol10-20110505>.
    ERROR: Cannot make file systems for boot environment <sol10-20110505>.
    Any ideas why it can't mount that "backups" lofs filesystem into /.alt? I am going to try and remove the lofs from the zone configuration and try again. But if that works I still need to find a way to use LOFS filesystems in the zones while using Live Upgrade
    Thanks

    I was able to successfully do a Live Upgrade with Zones with a ZFS root in Solaris 10 update 9.
    When attempting to do a "lumount s10u9c33zfs", it gave the following error:
    ERROR: unable to mount zones:
    zoneadm: zone 'edd313': "/usr/lib/fs/lofs/mount -o rw,nodevices /.alt.s10u9c33zfs/global/ora_export/stage /zonepool/edd313 -s10u9c33zfs/lu/a/u04" failed with exit code 111
    zoneadm: zone 'edd313': call to zoneadmd failed
    ERROR: unable to mount zone <edd313> in </.alt.s10u9c33zfs>
    ERROR: unmounting partially mounted boot environment file systems
    ERROR: No such file or directory: error unmounting <rpool1/ROOT/s10u9c33zfs>
    ERROR: cannot mount boot environment by name <s10u9c33zfs>
    The solution in this case was:
    zonecfg -z edd313
    info ;# display current setting
    remove fs dir=/u05 ;#remove filesystem linked to a "/global/" filesystem in the GLOBAL zone
    verify ;# check change
    commit ;# commit change
    exit

  • JMS Uniform Distribute Queue Unit Of Order, problem when one node goes down

    Hi ,
    I have the following code which post a message (with Unit of Order set ) to a Uniform Distribute Queue in a cluster with two member servers (server1 and server2).
    --UDQ is targeted to a subdeployment that is mapped to two JMS servers pointing to each member servers
    --Connection Factory is using default targeting ( i tried mapping to Sub deployment also)
    javax.naming.InitialContext serverContext = new javax.naming.InitialContext();
    javax.jms.QueueConnectionFactory qConnFactory = (javax.jms.QueueConnectionFactory)serverContext.lookup(jmsQConnFactoryName);
    javax.jms.QueueConnection qConn = (javax.jms.QueueConnection)qConnFactory.createConnection();
    javax.jms.QueueSession qSession = qConn.createQueueSession(false, Session.AUTO_ACKNOWLEDGE);
    javax.jms.Queue q = ( javax.jms.Queue)serverContext.lookup(jmsQName);
    weblogic.jms.extensions.WLMessageProducer qSender = (weblogic.jms.extensions.WLMessageProducer) qSession.createProducer(q);
    qSender.setUnitOfOrder("MyUnitOfOrder");
    javax.jms.ObjectMessage message = qSession.createObjectMessage();
    HashMap<String, Object> map = new HashMap<String, Object>();
    map.put("something", "SomeObject");
    message.setObject(map);
    qSender.send(message);
    } catch (Exception e) {           
    Steps followed:
    1. Post a message from "server1"
    2. Message picked up by "server2"
    3. Everything fine
    4. Shutdown "server2"
    5. Post a message from "server1"
    6. ERROR: "hashed member of MyAppJMSModule!MyDistributedQ is MyAppJMSModule!MyJMSServer-2@MyDistributedQ which is not available"
    WebLogic version : 10.3.5
    Is there a way (other than configuring Path Service ) to make this code work "with unit of order" for a UDQ even if some member servers go down ?
    Thanks very much for your time.

    If you want to avoid use of the Path Service, then the alternative is to make the destination members highly available. This will help ensure that the host member for a particular UOO is up.
    One approach to HA is to configure "service migration". For more information see the Automatic Service Migration white-paper at
    http://www.oracle.com/technology/products/weblogic/pdf/weblogic-automatic-service-migration-whitepaper.pdf
    In addition, I recommend referencing Best Practices for JMS Beginners and Advanced Users
    http://docs.oracle.com/cd/E17904_01/web.1111/e13738/best_practice.htm#JMSAD455 to help with WL configuration in general.
    Hope this helps,
    Tom

  • JDBC read stuck if RAC node goes down

    We did several tests with Java applications against our RAC DB and face a hanging application if we power off the RAC node that executes the current (long) running query.
    We can see that the application receives HA-events via UCP:
    2015-01-22 13:02:11 | r-thread-1 | WARN  | o.ucp.jdbc.oracle.ONSDatabaseFailoverEvent    | NO timezone in HA event
    However, the application started a query before and the query is not aborted with an exception. A Thread dump after about 7 minutes shows that the application is hanging in a socket read call:
    "pool-1-thread-1" #32 prio=5 os_prio=0 tid=0x00007fedf45b2000 nid=0xbc4 runnable [0x00007fee00cd3000]
       java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:150)
        at java.net.SocketInputStream.read(SocketInputStream.java:121)
        at oracle.net.ns.Packet.receive(Packet.java:283)
        at oracle.net.ns.DataPacket.receive(DataPacket.java:103)
        at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:230)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:175)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:100)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:85)
        at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
        at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
        at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1122)
        at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1099)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:288)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
        at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523)
        at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
        at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:863)
        at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1153)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1275)
        at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3576)
        at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3620)
        - locked <0x00000000c0ddcb20> (a oracle.jdbc.driver.T4CConnection)
        at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1491)
        at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:703)
    The expected behaviour would be that a running query is aborted with an exception. (BTW: This happens if the service is taken down with "shutdown immediate". All ok for this case.)
    We consider to implement custom ONS listeners [1], but we actually expect that UCP would handle such situations or lets us register strategies/callbacks for certain events.
    Our config:
    Oracle Enterprise 11.2.0.4.0 with RAC
    ons.jar 12.1.0.1
    ojdbc6.jar 11.2.0.2
    ucp.jar 12.1.0.1
    Server JRE 1.8.0_25
    Any hints appreciated.
    [1] http://docs.oracle.com/cd/E11882_01/java.112/e16548/apxracfan.htm#JJDBC28945

    You're concept isn't right:
    http://docs.oracle.com/cd/E11882_01/server.112/e25494/restart.htm#ADMIN13178
    Overview of Fast Application Notification
    FAN is a notification mechanism that Oracle Restart can use to notify other processes about configuration changes that include service status changes, such as UP or DOWN events. FAN provides the ability to immediately terminate inflight transaction when an instance or server fails. Integrated Oracle clients receive the events and respond. Applications can respond either by propagating the error to the user or by resubmitting the transactions and masking the error from the application user. When a DOWN event occurs, integrated clients immediately clean up connections to the terminated database. When an UP event occurs, the clients create new connections to the new primary database instance.
    Also, take a look at these docs: http://docs.oracle.com/cd/E11882_01/java.112/e12265/rac.htm#JJUCP08100 ; and https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=890204623685515&id=566573.1&_afrWindowMode=0&_adf.ctrl-s…
    And make a test, execute  a query that took about 1 minute and after you executed, just power down the node where it is executing, to see if it will retrieve the results.
    Regards.

  • Unable to live migrate VM (error 21502)

    Hi,
    I have four node Hyper-V cluster build on Windows Server 2012. I found an issue when one virtual machine is unable to live migrate to another cluster node with following error:
    Live migration of 'Virtual Machine VM' failed.
    Virtual machine migration operation for 'VM' failed at migration destination 'HYPERV2'. (Virtual machine ID EB7708F3-6D0B-4F7E-9EC9-EA7EE718A134)
    'VM' Microsoft Emulated IDE Controller (Instance ID 83F8638B-8DCA-4152-9EDA-2CA8B33039B4): Failed to restore with Error 'The process cannot access the file because another process has locked a portion of the file.' (0x80070021). (Virtual machine ID EB7708F3-6D0B-4F7E-9EC9-EA7EE718A134)
    'VM': Failed to open attachment 'C:\ClusterStorage\Volume1\VM\VM.vhdx'. Error: 'The process cannot access the file because another process has locked a portion of the file.' (0x80070021). (Virtual machine ID EB7708F3-6D0B-4F7E-9EC9-EA7EE718A134)
    'VM': Failed to open attachment 'C:\ClusterStorage\Volume1\VM\VM.vhdx'. Error: 'The process cannot access the file because another process has locked a portion of the file.' (0x80070021). (Virtual machine ID EB7708F3-6D0B-4F7E-9EC9-EA7EE718A134)
    It's possible to migrate VM in Stopped state but then the VM cannot start on new host with following error:
    'Virtual Machine VM' failed to start.
    'VM' failed to start. (Virtual machine ID EB7708F3-6D0B-4F7E-9EC9-EA7EE718A134)
    'VM' Microsoft Emulated IDE Controller (Instance ID 83F8638B-8DCA-4152-9EDA-2CA8B33039B4): Failed to Power on with Error 'The process cannot access the file because another process has locked a portion of the file.' (0x80070021). (Virtual machine ID EB7708F3-6D0B-4F7E-9EC9-EA7EE718A134)
    'VM': Failed to open attachment 'C:\ClusterStorage\Volume1\VM\VM.vhdx'. Error: 'The process cannot access the file because another process has locked a portion of the file.' (0x80070021). (Virtual machine ID EB7708F3-6D0B-4F7E-9EC9-EA7EE718A134)
    'VM': Failed to open attachment 'C:\ClusterStorage\Volume1\VM\VM.vhdx'. Error: 'The process cannot access the file because another process has locked a portion of the file.' (0x80070021). (Virtual machine ID EB7708F3-6D0B-4F7E-9EC9-EA7EE718A134)
    Live storage migration works fine. When I migrate VM back to original node then VM starts correctly.
    Thanks for any response.

    Hi, Daniel,
    Sometimes you might face failed live migration due to VMSwitches being named differently. So, the first thing to do is to make sure that VMSwitches on both hosts have the same name.
    Also, you can try to take cluster offline and perform repairing procedure that appears to fix the mysterious issue causing live migrations of VMs to fail. ( Open Failover Cluster Manager -> Select the cluster name -> Take Offline
    -> More Actions, click Repair.
    Otherwise, if you’re short on time and willing to migrate VM as soon as possible, you can perform one-time backup/restore operation, using one of the free backup utilities available on the market (VeaamZIP or similar). In many way this
    tool acts as zip-utility for VMs. It helped us a lot, when migration failed for whatever reason, and we didn't have enough time to find the root cause.
    Kind regards, Leonardo.

  • MDM Cluste Node 2 rebuild

    Hi ,
    We are using SAP MDM 5.5 application installed in Microsoft Cluster.
    Unfortunately one of our cluster node goes down and as per System Management team we have rebuild the node 2 from scratch.
    While checking the resolution I got below MS link which explains the similar situation and its resolution .
    http://technet.microsoft.com/en-us/library/cc786625(v=ws.10).aspx
    Scenario 6u2014Single Cluster Node Corruption or Failure .
    While System management team is working on this I want to just check what other option do we have, if we have to rebuild the server from scratch then what will be the process.
    I am assuming below process.
    1.     Windows team rebuild the server (O.S and Cluster configuration).
    2.     We have to install Oracle DB and MDM application from installation media.
    3.     We have to add this node 2 to existing cluster configuration (on node1).
    But I am not sure about this process and have some doubt like on node 2 do we have to perform fresh installation of apps and DB like we did while installing the cluster first time or in this case there will be different process as apps & db are working fine on node 1.
    Please help me if anyone has ever faced this kind of issue.
    Thanks and Regards
    Alok
    Edited by: Alok Jain on Mar 6, 2012 7:47 AM

    Hi buddy,
    What a pity!!! :(
    I wish the best for this recovering!!!
    About Your questions:
    Am I being too paranoid with this and wasting too much time on a mock environment while running on risky hardware? I don't think so, As You've never done it yet, I guess it's safer test it before. It can became worse if You do the wrong thing :)
    Is the recovery of this node really as straight forward as it seems: Delete the Node, Add the node back?Yes, As You have to rebuild the node, You`ll have to rebuild CRS too. You have to remove and add the node again, Don't forget about the instance, listeners, services,etc. The procedure on the documentations is really really clean.
    Can I add the node back as the same named node or will the cluster freak out due to some linguring previous config?You can add the node back as the same named node.
    Are there any other "gotchas" I may not be thinking about that some of you may have experienced?As You told this is very crucial component to Your production system, If I were You, I would Work with Oracle support, instead of executing everything by myself.
    Good Luck!
    Cerreia

  • Server 2012 cluster - virtual machine live migration does not work

    Hi,
    We have a hyper-v cluster with two nodes running Windows Server 2012. All the configurations are identical.
    When I try to make a Live migration from one node to the other I get an error message saying:
    Live migration of 'Virtual Machine XXXXXX' failed.
    I get no other error messages, not even in event viewer. This same happens with all of our virtual machines.
    A normal Quick migration works just fine for all of the virtual machines, so network configuration should not be an issue.
    The above error message does not provide much information.

    Hi,
    Please check whether your configuration meet live migration requirement:
    Two (or more) servers running Hyper-V that:
    Support hardware virtualization.
    Yes they support virtualization. 
    Are using processors from the same manufacturer (for example, all AMD or all Intel).
    Both Servers are identical and brand new Fujitsu-Siemens RX300S7 with the same kind of processor (Xeon E5-2620).
    Belong to either the same Active Directory domain, or to domains that trust each other.
    Both nodes are in the same domain.
    Virtual machines must be configured to use virtual hard disks or virtual Fibre Channel disks (no physical disks).
    All of the vitual machines have virtual hard disks.
    Use of a private network is recommended for live migration network traffic.
    Have tried this, but does not help.
    Requirements for live migration in a cluster:
    Windows Failover Clustering is enabled and configured.
    Yes
    Cluster Shared Volume (CSV) storage in the cluster is enabled.
    Yes
    Requirements for live migration using shared storage:
    All files that comprise a virtual machine (for example, virtual hard disks, snapshots, and configuration) are stored on an SMB share. They are all on the same CSV
    Permissions on the SMB share have been configured to grant access to the computer accounts of all servers running Hyper-V.
    Requirements for live migration with no shared infrastructure:
    No extra requirements exist.
    Also please refer to this article to check whether you have finished all preparation works for live migration:
    Virtual Machine Live Migration Overview
    http://technet.microsoft.com/en-us/library/hh831435.aspx
    Hyper-V: Using Live Migration with Cluster Shared Volumes in Windows Server 2008 R2
    http://technet.microsoft.com/en-us/library/dd446679(v=WS.10).aspx
    Configure and Use Live Migration on Non-clustered Virtual Machines
    http://technet.microsoft.com/en-us/library/jj134199.aspx
    Hope this helps!
    TechNet Subscriber Support
    If you are
    TechNet Subscription user and have any feedback on our support quality, please send your feedback
    here.
    Lawrence
    TechNet Community Support
    I have also read all of the technet articles but can't find anything that could help.

  • JNDI Lookup for multiple server instances with multiple cluster nodes

    Hi Experts,
    I need help with retreiving log files for multiple server instances with multiple cluster nodes. The system is Netweaver 7.01.
    There are 3 server instances all instances with 3 cluster nodes.
    There are EJB session beans deployed on them to retreive the log information for each server node.
    In the session bean there is a method:
    public List getServers() {
      List servers = new ArrayList();
      ClassLoader saveLoader = Thread.currentThread().getContextClassLoader();
      try {
       Properties prop = new Properties();
       prop.setProperty(Context.INITIAL_CONTEXT_FACTORY, "com.sap.engine.services.jndi.InitialContextFactoryImpl");
       prop.put(Context.SECURITY_AUTHENTICATION, "none");
       Thread.currentThread().setContextClassLoader((com.sap.engine.services.adminadapter.interfaces.RemoteAdminInterface.class).getClassLoader());
       InitialContext mInitialContext = new InitialContext(prop);
       RemoteAdminInterface rai = (RemoteAdminInterface) mInitialContext.lookup("adminadapter");
       ClusterAdministrator cadm = rai.getClusterAdministrator();
       ConvenienceEngineAdministrator cea = rai.getConvenienceEngineAdministrator();
       int nodeId[] = cea.getClusterNodeIds();
       int dispatcherId = 0;
       String dispatcherIP = null;
       String p4Port = null;
       for (int i = 0; i < nodeId.length; i++) {
        if (cea.getClusterNodeType(nodeId[i]) != 1)
         continue;
        Properties dispatcherProp = cadm.getNodeInfo(nodeId[i]);
        dispatcherIP = dispatcherProp.getProperty("Host", "localhost");
        p4Port = cea.getServiceProperty(nodeId[i], "p4", "port");
        String[] loc = new String[3];
        loc[0] = dispatcherIP;
        loc[1] = p4Port;
        loc[2] = null;
        servers.add(loc);
       mInitialContext.close();
      } catch (NamingException e) {
      } catch (RemoteException e) {
      } finally {
       Thread.currentThread().setContextClassLoader(saveLoader);
      return servers;
    and the retreived server information used here in another class:
    public void run() {
      ReadLogsSession readLogsSession;
      int total = servers.size();
      for (Iterator iter = servers.iterator(); iter.hasNext();) {
       if (keepAlive) {
        try {
         Thread.sleep(500);
        } catch (InterruptedException e) {
         status = status + e.getMessage();
         System.err.println("LogReader Thread Exception" + e.toString());
         e.printStackTrace();
        String[] serverLocs = (String[]) iter.next();
        searchFilter.setDetails("[" + serverLocs[1] + "]");
        Properties prop = new Properties();
        prop.put(Context.INITIAL_CONTEXT_FACTORY, "com.sap.engine.services.jndi.InitialContextFactoryImpl");
        prop.put(Context.PROVIDER_URL, serverLocs[0] + ":" + serverLocs[1]);
        System.err.println("LogReader run [" + serverLocs[0] + ":" + serverLocs[1] + "]");
        status = " Reading :[" + serverLocs[0] + ":" + serverLocs[1] + "] servers :[" + currentIndex + "/" + total + " ] ";
        prop.put("force_remote", "true");
        prop.put(Context.SECURITY_AUTHENTICATION, "none");
        try {
         Context ctx = new InitialContext(prop);
         Object ob = ctx.lookup("com.xom.sia.ReadLogsSession");
         ReadLogsSessionHome readLogsSessionHome = (ReadLogsSessionHome) PortableRemoteObject.narrow(ob, ReadLogsSessionHome.class);
         status = status + "Found ReadLogsSessionHome ["+readLogsSessionHome+"]";
         readLogsSession = readLogsSessionHome.create();
         if(readLogsSession!=null){
          status = status + " Created  ["+readLogsSession+"]";
          List l = readLogsSession.getAuditLogs(searchFilter);
          serverLocs[2] = String.valueOf(l.size());
          status = status + serverLocs[2];
          allRecords.addAll(l);
         }else{
          status = status + " unable to create  readLogsSession ";
         ctx.close();
        } catch (NamingException e) {
         status = status + e.getMessage();
         System.err.println(e.getMessage());
         e.printStackTrace();
        } catch (CreateException e) {
         status = status + e.getMessage();
         System.err.println(e.getMessage());
         e.printStackTrace();
        } catch (IOException e) {
         status = status + e.getMessage();
         System.err.println(e.getMessage());
         e.printStackTrace();
        } catch (Exception e) {
         status = status + e.getMessage();
         System.err.println(e.getMessage());
         e.printStackTrace();
       currentIndex++;
      jobComplete = true;
    The application is working for multiple server instances with a single cluster node but not working for multiple cusltered environment.
    Anybody knows what should be changed to handle more cluster nodes?
    Thanks,
    Gergely

    Thanks for the response.
    I was afraid that it would be something like that although
    was hoping for
    something closer to the application pools we use with IIS to
    isolate sites
    and limit the impact one badly behaving one can have on
    another.
    mmr
    "Ian Skinner" <[email protected]> wrote in message
    news:fe5u5v$pue$[email protected]..
    > Run CF with one instance. Look at your processes and see
    how much memory
    > the "JRun" process is using, multiply this by number of
    other CF
    > instances.
    >
    > You are most likely going to end up on implementing a
    "handful" of
    > instances versus "dozens" of instance on all but the
    beefiest of servers.
    >
    > This can be affected by how much memory each instance
    uses. An
    > application that puts major amounts of data into
    persistent scopes such as
    > application and|or session will have a larger foot print
    then a leaner
    > application that does not put much data into memory
    and|or leave it there
    > for a very long time.
    >
    > I know the first time we made use of CF in it's
    multi-home flavor, we went
    > a bit overboard and created way too many. After nearly
    bringing a
    > moderate server to its knees, we consolidated until we
    had three or four
    > or so IIRC. A couple dedicated to to each of our largest
    and most
    > critical applications and a couple general instances
    that ran many smaller
    > applications each.
    >
    >
    >
    >
    >

  • VM live migration during OVM server upgrade

    Hi Guys,
    I'm planning to upgrade OVM 3.1.1 to 3.2.7.
    There are 4 OVM Servers in server pool and all is using the same CPU family which means the live migration is possible.
    I just wondering if I upgrade one OVM server to 3.2.7 first and then is it still available to live migrate VMs from 3.1.1. servers to new 3.2.7 server?
    Thanks in advance.
    Jay

    Hi Jay,
    I'd do the following:
    - free up one OVS by migrating all guests to the remaining OVS
    - upgrade OVM Manager straigh to 3.2.8
    - upgrade the idle OVS to 3.2.8
    - live migrate your guests from one 3.1.1 OVS to the new, idle 3.2.8 OVS - if not using OVMM, then using xm
    - round robin upgrade your remaining OVS
    I've done that a couple of times…
    Cheers,
    budy

  • Guest Cluster error in Hyper-V Cluster

    Hello everybody,
    in my environment I do have an issue with failover clusters (Exchange, Fileserver) while performing a live migration of one virtual clusternode. The clustergroup is going offline.
    The environment is the following:
    2x Hyper-V Clusters: Hyper-V-Cluster1 and Hyper-V-Cluster2 (Windows Server 2012 R2) with 5 Nodes per Cluster
    1x Scaleout Fileserver (Windows Server 2012 R2) with 2 Nodes
    1x Exchange Cluster (Windows Server 2012 R2) with EX01 VM running on Hyper-V-Cluster1 and EX02 VM running on Hyper-V-Cluster2
    1x Fileserver Failover Cluster (Windows Server 2012 R2) with FS01 VM running on Hyper-V-Cluster1 and FS02 VM running on Hyper-V-Cluster2
    The physical networks on the Hyper-V Nodes are redundant with 2x 10Gb/s uplinks to 2x physical switches for VMs in a LBFO Team:
    New-NetLbfoTeam
    -Name 10Gbit_TEAM -TeamMembers 10Gbit_01,10Gbit_02
    -TeamingMode SwitchIndependent -LoadBalancingAlgorithm HyperVPort
    The SMB 3 traffic runs on 2x 10Gb/s NIC without NIC-Teaming (SMB-Multichannel).
    SMB is used for livemigrations.
    The VMs for clustering were installed according to the technet guideline:
    http://technet.microsoft.com/en-us/library/dn265980.aspx
    Because my Hyper-V Uplinks are allready redundant, I am using one NIC inside the VM.
    As I understand, there is no advantage of using two NICs inside the VM as long they are connected to the same vSwitch.
    Now, when I want to perform a hardware maintenance, I have to livemigrate the EX01 VM from Hyper-V-Cluster1-Node-1 to Hyper-V-Cluster1-Node-2.
    EX02 VM still runs untouched on Hyper-V-Cluster2-Node-1.
    At the end of the livemigration I see error 1135 (source: FailoverClustering) on EX01 VM, which says that EX02 VM was removed from Failover Cluster and I have to check my network.
    The clustergroup of exchange is offline after that event and I have to bring it online again manually.
    Any ideas what can cause this behavior?
    Thanks.
    Greetings,
    torsten

    Hello again,
    I found the cause and the solution :-)
    In the article here: http://technet.microsoft.com/en-us/library/dn440540.aspx
    is the description of my cluster failure:
    ########## relevant part from article #######################
    Protect against short-term network interruptions
    Failover cluster nodes use the network to send heartbeat packets to other nodes of the cluster. If a node does not receive a response from another node for a specified period of time, the cluster removes the node from cluster membership. By default, a guest
    cluster node is considered down if it does not respond within 5 seconds. Other nodes that are members of the cluster will take over any clustered roles that were running on the removed node.
    Typically, during the live migration of a virtual machine there is a fast final transition when the virtual machine is stopped on the source node and is running on the destination node. However, if something causes the final transition to take longer than
    the configured heartbeat threshold settings, the guest cluster considers the node to be down even though the live migration eventually succeeds. If the live migration final transition is completed within the TCP time-out interval (typically around 20 seconds),
    clients that are connected through the network to the virtual machine seamlessly reconnect.
    To make the cluster heartbeat time-out more consistent with the TCP time-out interval, you can change the
    SameSubnetThreshold and CrossSubnetThreshold cluster properties from the default of 5 seconds to 20 seconds. By default, the cluster sends a heartbeat every 1 second. The threshold specifies how many heartbeats to miss in succession
    before the cluster considers the cluster node to be down.
    After changing both parameters in failover cluster as described the error is gone.
    Greetings,
    torsten

  • Hyper-V live migration failed

    There is Hyper-V cluster with 2 nodes. Windows Server 2012 R2 is used as operating system.
    Trying to live migrate test VM from node 1 to node 2 and get error 21502:
    Live migration of 'Virtual Machine test' failed.
    'Virtual Machine test' failed to fixup network settings. Verify VM settings and update them as necessary.
    VM has Network Adapter connected to Virtual switch. This vSwitch has Private network as connection type.
    If I set virtual switch property to "Not connected" in Network Adapter settings of VM I get successful migration.
    All VM's that are not connected to any private networks (virtual switches with private network connection type) can be live migrated without any issues.
    Is there any official reference related to Hyper-V live migration of VM's that have "private network" connection type?

    I can Live Migrate virtual machines with adapters on private switches without error. Aside from having the wrong name, the only way I can get it to fail is if I make the switch on one host use a different QoS minimum mode than the other and
    enable QoS on the virtual adapter. Even then I get a different message than what you're getting. I only get that one with differently named switches.
    There is a PowerShell cmdlet available to see why a guest won't run on another host.
    Here's an example of its usage.
    There's a way to use it to get it to Live Migrate.
    But there is no way to truly Live Migrate three virtual machines in perfect lockstep. Even if you figure out whatever is preventing you from migrating these machines, there will still be periods during Live Migration where they can't communicate across that
    private network. You also can't guarantee that all these guests will always be running on the same host without preventing Live Migration in the first place. This is why there really isn't anyone doing what you're trying to do. I suggest you consider another
    isolation solution, like VLANs.
    Eric Siron Altaro Hyper-V Blog
    I am an independent blog contributor, not an Altaro employee. I am solely responsible for the content of my posts.
    "Every relationship you have is in worse shape than you think."

  • Live Migrating Virtual Machines with Shared VHDx

    I am facing problems when live migrating a Virtual Machine that is using Shared VHDx.  The Virtual Machine gets migrated that is the configuration gets migrated, but the Virtual Machine fails to start up and if manually tried, it fails too. 
    What is the method to to live migrate virtual machines that are using Shared VHDx.  Thanks in advance. 

    Another couple of gotchas:
    You cannot do host-level backups of the guest cluster.  This is the same as it always was.  You will have to install backup agents in the guest cluster nodes and back them up as if they were physical machines.
    You cannot perform a hot-resize of the shared VHDX.  But you can hot-add more shared VHDX files to the clustered VMs.
    You cannot Storage Live Migrate the shared VHDX file.  You can move the other VM files and perform normal Live Migration.
    as Long as you have your shared VHDx on a SMB3 Share you also could have the Nodes of the Guest Cluster on different Hyper-V Hosts.

Maybe you are looking for