OVM 3.1.1 - Live migration not completed

Hi,
I'm facing an interesting case with the VM live migration.
If I issue a migration from the manager, the VM is effectively moved to the new server but the job still in a "in progress" mode (0% completed) and the OVM servers still locked until I abort the job.
Once the job is aborted, everything is back to normal and the VM is running on the targeted server.
Any idea what's wrong?
thanks for the help.
Below is the log of the job:
Job Construction Phase
begin()
Appended operation 'Bridge Configure Operation' to object '0004fb000020000051ceed7ebd6f2ad9 (network.BondPort (1) in oracle55)'.
Appended operation 'Bridge Configure Operation' to object '0004fb000020000004cb599206575194 (network.EthernetPort (3) in oracle55)'.
Appended operation 'Virtual Machine Migrate' to object '0004fb0000060000a4a1035c270b5f7b (RH63_PVM_XDC_Node2)'.
commit()
Completed Step: COMMIT
Objects and Operations
Object (IN_USE): [Server] 36:34:31:30:31:36:43:5a:33:32:32:32:4b:42:4b:35 (oracle55)
Object (IN_USE): [EthernetPort] 0004fb000020000004cb599206575194 (network.EthernetPort (3) in oracle55)
Operation: Bridge Configure Operation
Object (IN_USE): [Server] 36:34:31:30:31:36:43:5a:33:32:32:32:4b:42:4b:33 (oracle54)
Object (IN_USE): [VirtualMachine] 0004fb0000060000a4a1035c270b5f7b (RH63_PVM_XDC_Node2)
Operation: Virtual Machine Migrate
Object (IN_USE): [BondPort] 0004fb000020000051ceed7ebd6f2ad9 (network.BondPort (1) in oracle55)
Operation: Bridge Configure Operation
Job Running Phase at 14:02 on Thu, Jan 10, 2013
Job Participants: [36:34:31:30:31:36:43:5a:33:32:32:32:4b:42:4b:33 (oracle54)]
Actioner
Starting operation 'Bridge Configure Operation' on object '0004fb000020000004cb599206575194 (network.EthernetPort (3) in oracle55)'
Bridge [0004fb001054934] already exists (and should exist) on interface [eth2] on server [oracle55]; skipping bridge creation
Completed operation 'Bridge Configure Operation' completed with direction ==> DONE
Starting operation 'Virtual Machine Migrate' on object '0004fb0000060000a4a1035c270b5f7b (RH63_PVM_XDC_Node2)'
Completed operation 'Virtual Machine Migrate' completed with direction ==> LATER
Starting operation 'Bridge Configure Operation' on object '0004fb000020000051ceed7ebd6f2ad9 (network.BondPort (1) in oracle55)'
Bridge [15.136.24.0] already exists (and should exist) on interface [bond0] on server [oracle55]; skipping bridge creation
Completed operation 'Bridge Configure Operation' completed with direction ==> DONE
Starting operation 'Virtual Machine Migrate' on object '0004fb0000060000a4a1035c270b5f7b

Some other log info from the ovs-agent.log file:
[2013-01-10 17:29:02 7647] DEBUG (notification:291) Connected to manager.
[2013-01-10 17:29:17 7655] ERROR (notification:64) Unable to send notification: (2, 'No such file or directory')
[2013-01-10 17:29:18 7647] ERROR (notification:333) Error in NotificationServer process: 'Invalid URL Request (receive) http://15.136.28.56:7001/ovm/core/OVMManagerCoreServlet'
Traceback (most recent call last):
File "/usr/lib64/python2.4/site-packages/agent/notification.py", line 308, in serve_forever
foundry = cm.getFoundryContext()
File "/usr/lib/python2.4/site-packages/com/oracle/ovm/mgr/api/manager/OvmManager.py", line 38, in getFoundryContext
self.foundry = self.getModelManager().getFoundryContext()
File "/usr/lib/python2.4/site-packages/com/oracle/ovm/mgr/api/manager/OvmManager.py", line 31, in getModelManager
if self.modelMgr == None:
File "/usr/lib/python2.4/site-packages/com/oracle/ovm/mgr/api/manager/ModelManager.py", line 364, in __cmp__
return self.compareTo(obj)
File "/usr/lib/python2.4/site-packages/com/oracle/ovm/mgr/api/manager/ModelManager.py", line 250, in compareTo
return self.exchange.invokeMethodByName(self.identifier,"compareTo","java.lang.Object",args,5,False)
File "/usr/lib/python2.4/site-packages/com/oracle/odof/OdofExchange.py", line 68, in invokeMethodByName
return self._send_(InvokeMethodByNameCommand(identifier, method, params, args, access))
File "/usr/lib/python2.4/site-packages/com/oracle/odof/OdofExchange.py", line 164, in send
return self._sendGivenConnection_(connection, command, timeout)
File "/usr/lib/python2.4/site-packages/com/oracle/odof/OdofExchange.py", line 170, in sendGivenConnection
result = connection.receive(command, timeout)
File "/usr/lib/python2.4/site-packages/com/oracle/odof/io/ServletConnection.py", line 88, in receive
raise OdofException("Invalid URL Request (receive) %s" % self.url, sys.exc_info()[1])
OdofException: 'Invalid URL Request (receive) http://15.136.28.56:7001/ovm/core/OVMManagerCoreServlet'
[2013-01-10 17:29:38 7655] ERROR (notification:64) Unable to send notification: (2, 'No such file or directory')
[2013-01-10 17:29:54 7647] DEBUG (notification:289) Trying to connect to manager.
[2013-01-10 17:29:58 7655] ERROR (notification:64) Unable to send notification: (2, 'No such file or directory')
[2013-01-10 17:30:19 7655] ERROR (notification:64) Unable to send notification: (2, 'No such file or directory')

Similar Messages

Hyper-v Live Migration not completing when using VM with large RAM

hi,
i have a two node server 2012 R2 cluster hyper-v which uses 100GB CSV, and 128GB RAM across 2 physical CPU's (approx 7.1GB used when the VM is not booted), and 1 VM running windows 7 which has 64GB RAM assigned, the VHD size is around 21GB and the BIN file
is 64GB (by the way do we have to have that, can we get rid of the BIN file?).
NUMA is enabled on both servers, when I attempt to live migrate i get event 1155 in the cluster events, the LM starts and gets into 60 something % but then fails. the event details are "The pending move for the role 'New Virtual Machine' did not complete."
however, when i lower the amount of RAM assigned to the VM to around 56GB (56+7 = 63GB) the LM seems to work, any amount of RAM below this allows LM to succeed, but it seems if the total used RAM from the physical server (including that used for the
VMs) is 64GB or above, the LM fails.... coincidence since the server has 64GB per CPU.....
why would this be?
many thanks
Steve

Hi,
I turned NUMA spanning off on both servers in the cluster - I assigned 62 GB, 64GB and 88GB and each time the VM started up no problems. with 62GB LM completed, but I cant get LM to complete with 64GB+.
my server is a HP DL380 G8, it has the latest BIOS (I just updated it today as it was a couple of months behind), i cant see any settings in BIOS relating to NUMA so i'm guessing it is enabled and cant be changed.
if i run the cmdlt as admin I get ProcessorsAvailability : {0, 0, 0, 0...}
if i run it as standard user i get ProcessorsAvailability
my memory and CPU config are as follows, hyper-threading is enabled for the CPU but I dont
think that would make a difference?
Processor 1 1 Good, In Use Yes 713756-081 DIMM DDR3 16384 MB 1600 MHz 1.35 V 2 Synchronous
Processor 1 4 Good, In Use Yes 713756-081 DIMM DDR3 16384 MB 1600 MHz 1.35 V 2 Synchronous
Processor 1 9 Good, In Use Yes 713756-081 DIMM DDR3 16384 MB 1600 MHz 1.35 V 2 Synchronous
Processor 1 12 Good, In Use Yes 713756-081 DIMM DDR3 16384 MB 1600 MHz 1.35 V 2 Synchronous
Processor 2 1 Good, In Use Yes 713756-081 DIMM DDR3 16384 MB 1600 MHz 1.35 V 2 Synchronous
Processor 2 4 Good, In Use Yes 713756-081 DIMM DDR3 16384 MB 1600 MHz 1.35 V 2 Synchronous
Processor 2 9 Good, In Use Yes 713756-081 DIMM DDR3 16384 MB 1600 MHz 1.35 V 2 Synchronous
Processor 2 12 Good, In Use Yes 713756-081 DIMM DDR3 16384 MB 1600 MHz 1.35 V 2 Synchronous
Processor Name
Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz
Processor Status
OK
Processor Speed
2400 MHz
Execution Technology
12/12 cores; 24 threads
Memory Technology
64-bit Capable
Internal L1 cache
384 KB
Internal L2 cache
3072 KB
Internal L3 cache
30720 KB
Processor 2
Processor Name
Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz
Processor Status
OK
Processor Speed
2400 MHz
Execution Technology
12/12 cores; 24 threads
Memory Technology
64-bit Capable
Internal L1 cache
384 KB
Internal L2 cache
3072 KB
Internal L3 cache
30720 KB
thanks
Steve

Exchange Co-existing Migration not completing (Event ID's 1006 - Mailbox Replication, 2002 - Exchange Migration)

Hi Guys,
a few issues all relating to AD/Exchange Setup, Causeing migration issues.
quick overview:
1. installed new server (Server 2012 R2), installed exchange 2013 install completed properly, issues logging into ecp and owa, ssl certs werent created, created manually didnt solve. - scrapped that server deleted from adsi exchange administration servers
(time spent troubleshooting was 5 days)
2. fixed ca issues, re-installed server installed exchange 2013, /preparead ,system mailboxes created migration mailbox created, create mailbox db, system mailboxes are there if checked with get-mailbox -arbitration
Try to migrate, migration mailbox not found in organization (accounts were there, made sure change user password on logon was not ticked on the account and never expire was ticked, account was disabled like it should be too so no go.
delete all system accounts from ad, recreate with /preparead, enable them according to articles:
http://technet.microsoft.com/en-us/library/gg588318%28v=exchg.150%29.aspx
http://www.telnetport25.com/2013/01/quick-tiprecovering-from-a-missing-migration-mailbox-in-exchange-2013/
http://social.technet.microsoft.com/wiki/contents/articles/5317.recreate-and-enable-missing-arbitration-user-accounts-and-mailboxes-in-exchange-server-2010.aspx
http://social.technet.microsoft.com/wiki/contents/articles/6874.how-to-recreate-system-mailbox-federatedemail-discoverysearchmailbox-in-exchange-2010.aspx
get-mailbox -arbitration found all mailboxes, check w/o -arbitration and discovery mailbox was there too.
3. Tried migration again no errors in ecp, migration not starting, encountered above error id's
Error 2002 - MSExchange Migration
error info ('Migration.MigrationMailboxNotFoundException|The migration mailbox for the organization is either missing or invalid.
Error 1006 - MSExchange Mailbox Replication (ps: no dag's setup yet either)
-> error info ('The Microsoft Exchange Mailbox Replication service was unable to process jobs in a mailbox database.
Database: CL1-3_Main-DB
same error for each db on the server with different mailboxes
Error: Couldn't find system mailbox '6d5f301b-7323-47c1-85eb-a76a01b9b3c0' in Active Directory.')
Error: Couldn't find system mailbox 'f6ba51f8-c076-42e9-b00f-014a1c6b3634' in Active Directory
Error: Couldn't find system mailbox 'SystemMailbox{eae9aad5-0984-440d-b37c-7a399961e171}' in Active Directory.
Error: Couldn't find system mailbox '8c6d0335-af32-41e3-86b1-e4c15fd14a23' in Active Directory.
SystemMailbox{eae9aad5-0984-440d-b37c-7a399961e171}' is in Active Directory and mail enabled according to above articles.
also received this error id 1006 - MSExchangeDiagnostics
error info ('The performance counter '\\CL1-3\LogicalDisk(HarddiskVolume17)\Free Megabytes' sustained a value of '207.00', for the '15' minute(s) interval starting at '8/22/2014 7:12:00 AM'. Additional information: None. Trigger Name:DatabaseDriveSpaceTrigger.
Instance:harddiskvolume17')
I don't want to reinstall again as i feel that would just add wood to the fire... can anyone help diagnose this?
been trying to fix this myself for over a month.
Thanks!!

Hi,
Even if you think re-installation is annoyed, I still suggest perform the re-installation.
Because fix this issue also annoyed.
Remove the Exchange server and use ADSIedit to clear it up.
Also remove the Windows server 2012 from AD.
Then re-install Windows server 2012 and Exchange 2013 server via articles from TechNet.
Thanks

Oracle vm 3.1.1 ( kernel 2.6.39-200.1.9.el5uek ) : Live migration breaks

Hello,
we are upgrading with last update of OVS the servers.. After we upgraded one server (called A) from 2.6.39-200.1.1.el5uek to 2.6.39-200.1.9.el5uek with certified yum repository from Oracle, the live migration not working anymore in correct mode.
If I will migrate one guest from another server to server A or vice versa , the results are the same, the 3%-10% of packets dropped. Is it a normal behaviour , if the kernel are different ? Or are this kernel/driver/xen bugged ?
Obviously the version of OVS are always 3.1.1 and the oracle vm the 3.1.1 build 478, and previously the live migration always worked well. No errors are visible and the job gone well.
Kind Regards
Edited by: user10717184 on Oct 29, 2012 12:46 AM

I try to migrate with xm command but the problem not disappear .
The xm command not give any result code. it finished correctly, by the way or we lost 3-10% of packets or stop pinging .
[root@******** ~]# xm migrate -l ****UUID*** ****SERVER_OVS_NAME***
[root@******** ~]# echo $?
0
Now the server have both the new kernel, but it continues to have the problem. The strange thing is that if you return to previous server OVS, the pinging restart, sometime.

Live Migration in OVM 3.0.2 should have an interruption or not?

Live Migration in OVM 3.0.2 should have an interruption or not?
I mean: I have 2 OVM Servers 3.0.2 & FibreChannel Storage
I installed a Oracle Linux 5.6 x64 in paravirtualized Mode
When I do a Live Migration the Virtual Machine changes in seconds to the other server with the lock image. Meanwhile I ping to the machine & Im inside the command line.
Communication interrupt like 10 seconds or sometimes more & command line does not work for the same time
Is that correct?
Greetings
Alex Dávila

alex davila wrote:
Right now Iam testing connectivity & when I do live migration the interruption is minimal, just 1 ping lost
I don't know why yesterday delay secondsYou might want to talk to your networking guys to make sure that PORTFAST is enabled (if you have Cisco switches) or that you have rapid STP configured. Keep in mind that we switch the MAC address of the guest from one physical server to another. The delay you saw was your network noticing and re-routing packets to the new location.

VM live migration during OVM server upgrade

Hi Guys,
I'm planning to upgrade OVM 3.1.1 to 3.2.7.
There are 4 OVM Servers in server pool and all is using the same CPU family which means the live migration is possible.
I just wondering if I upgrade one OVM server to 3.2.7 first and then is it still available to live migrate VMs from 3.1.1. servers to new 3.2.7 server?
Thanks in advance.
Jay

Hi Jay,
I'd do the following:
- free up one OVS by migrating all guests to the remaining OVS
- upgrade OVM Manager straigh to 3.2.8
- upgrade the idle OVS to 3.2.8
- live migrate your guests from one 3.1.1 OVS to the new, idle 3.2.8 OVS - if not using OVMM, then using xm
- round robin upgrade your remaining OVS
I've done that a couple of times…
Cheers,
budy

Live Migration in OVM 3.0.2

When I try to live migrate a guest vm, I don't get any target server options; here is the output I'm seeing in the AdminServer.log file:
...<Default CPU Parameters- en: true, threshold: 1.00>
...<Default Net Parameters- net: 2801-HF, en: true, threshold: 1.00>
...<Default Net Parameters- net: 10.2.6.0, en: true, threshold: 1.00>
...<Default Net Parameters- net: 10.2.7.0, en: true, threshold: 1.00>
...<Server: server01.test.com, cpu utilization: 0.88%>
...<Server: server01.test.com, network: 2801-HF, rx mBytes: 0, tx mBytes: 0>
...<Server: server01.test.com, network: 10.2.7.0, rx mBytes: 0, tx mBytes: 0>
...<VM/Server: vm-test/server02.test.com, speed 2666.8, cpus 1, util -1.0%>
However, if I remove the vnic from the guest vm, I am able to live migrate and this is what I see in AdminServer.log:
...<Default CPU Parameters- en: true, threshold: 1.00>
...<Default Net Parameters- net: 2801-HF, en: true, threshold: 1.00>
...<Default Net Parameters- net: 10.2.6.0, en: true, threshold: 1.00>
...<Default Net Parameters- net: 10.2.7.0, en: true, threshold: 1.00>
...<Server: server01.test.com, cpu utilization: 0.87%>
...<Server: server01.test.com, network: 2801-HF, rx mBytes: 0, tx mBytes: 0>
...<Server: server01.test.com, network: 10.2.7.0, rx mBytes: 0, tx mBytes: 0>
...<VM/Server: vm-test/server02.test.com, speed 2666.8, cpus 1, util -1.0%>
...<CompatibleServers for VM vm-test: [server01.test.com]>
...<Server: server01.test.com- simCpuUsed 371.5, VM: vm-test- cpuUsed -26.7>
...<Server: server01.test.com- serverCpuLimit 36268.5, newServerCPU 344.8>
...<Server/Network server01.test.com/2801-HF: serverNetLimit 38250.0, newServerRx 0, newServerTx 0>
...<Server/Network server01.test.com/10.2.7.0: serverNetLimit 38250.0, newServerRx 0, newServerTx 0>
...<Server/Network server01.test.com/10.2.6.0: serverNetLimit 38250.0, newServerRx 0, newServerTx 0>
...<Fit map for VM: vm-test, on server: server01.test.com- cpuFit:true, memFit:true, netFit:true, fitFactor:0.931>
I've seen the log throw out compatibility errors when the 2801-HF network wasn't present on each server; however, I've fixed that issue. Since I can live migrate without a VNIC present, I thought it might be another network issue, yet I'm not even seeing "com.oracle.ovm.mgr.business.CompatibilityChecker" being ran when a guest has a VNIC.

Both servers have the same bridge device configured (checked by running ifconfig -a on each server), if that's what you mean being defining a VNIC. While the guest vm is configured with a VNIC, I am able to offline the guest vm, migrate it, and then online it again on the second server without any issues. I was assuming any type of network difference (like a lack of a VNIC being defined on both servers) would have shown up in the AdminServer.log file under the execution of "com.oracle.ovm.mgr.business.CompatibilityChecker".
Thanks for the quick response.

Server 2012 cluster - virtual machine live migration does not work

Hi,
We have a hyper-v cluster with two nodes running Windows Server 2012. All the configurations are identical.
When I try to make a Live migration from one node to the other I get an error message saying:
Live migration of 'Virtual Machine XXXXXX' failed.
I get no other error messages, not even in event viewer. This same happens with all of our virtual machines.
A normal Quick migration works just fine for all of the virtual machines, so network configuration should not be an issue.
The above error message does not provide much information.

Hi,
Please check whether your configuration meet live migration requirement:
Two (or more) servers running Hyper-V that:
Support hardware virtualization.
Yes they support virtualization.
Are using processors from the same manufacturer (for example, all AMD or all Intel).
Both Servers are identical and brand new Fujitsu-Siemens RX300S7 with the same kind of processor (Xeon E5-2620).
Belong to either the same Active Directory domain, or to domains that trust each other.
Both nodes are in the same domain.
Virtual machines must be configured to use virtual hard disks or virtual Fibre Channel disks (no physical disks).
All of the vitual machines have virtual hard disks.
Use of a private network is recommended for live migration network traffic.
Have tried this, but does not help.
Requirements for live migration in a cluster:
Windows Failover Clustering is enabled and configured.
Yes
Cluster Shared Volume (CSV) storage in the cluster is enabled.
Yes
Requirements for live migration using shared storage:
All files that comprise a virtual machine (for example, virtual hard disks, snapshots, and configuration) are stored on an SMB share. They are all on the same CSV
Permissions on the SMB share have been configured to grant access to the computer accounts of all servers running Hyper-V.
Requirements for live migration with no shared infrastructure:
No extra requirements exist.
Also please refer to this article to check whether you have finished all preparation works for live migration:
Virtual Machine Live Migration Overview
http://technet.microsoft.com/en-us/library/hh831435.aspx
Hyper-V: Using Live Migration with Cluster Shared Volumes in Windows Server 2008 R2
http://technet.microsoft.com/en-us/library/dd446679(v=WS.10).aspx
Configure and Use Live Migration on Non-clustered Virtual Machines
http://technet.microsoft.com/en-us/library/jj134199.aspx
Hope this helps!
TechNet Subscriber Support
If you are
TechNet Subscription user and have any feedback on our support quality, please send your feedback
here.
Lawrence
TechNet Community Support
I have also read all of the technet articles but can't find anything that could help.

Error 10698 Virtual machine could not be live migrated to virtual machine host

Hi all,
I am running a fail over cluster of
Host:
2 x WS2008 R2 Data Centre
managed by VMM:
VMM 2008 R2
Virtual Host:
1x windows 2003 64bit guest host/virtual machine
I have attempted a live migration through VMM 2008 R2 and im presented withe the following error:
Error (10698)
Virtual machine XXXXX could not be live migrated to virtual machine host xxx-Host01 using this cluster configuration.
(Unspecified error (0x80004005))
What i have found when running the cluster validation:
1 out of the 2 hosts have an error with RPC related to network configuration:
An error occurred while executing the test.
Failed to connect to the service manager on 'xxx-Host02'.
The RPC server is unavailable
However there are no errors or events on host02 that are showing any probelms at all.
In fact the validation report goes on to showing the rest of the configuration information of both cluster hosts as ok.
See below:
List BIOS Information
List BIOS information from each node.
xxx-Host01
Gathering BIOS Information for xxx-Host01
Item Value
Name Phoenix ROM BIOS PLUS Version 1.10 1.1.6
Manufacturer Dell Inc.
SMBios Present True
SMBios Version 1.1.6
SMBios Major Version 2
SMBios Minor Version 5
Current Language en|US|iso8859-1
Release Date 3/23/2008 9:00:00 AM
Primary BIOS True
xxx-Host02
Gathering BIOS Information for xxx-Host02
Item Value
Name Phoenix ROM BIOS PLUS Version 1.10 1.1.6
Manufacturer Dell Inc.
SMBios Present True
SMBios Version 1.1.6
SMBios Major Version 2
SMBios Minor Version 5
Current Language en|US|iso8859-1
Release Date 3/23/2008 9:00:00 AM
Primary BIOS True
Back to Summary
Back to Top
List Cluster Core Groups
List information about the available storage group and the core group in the cluster.
Summary
Cluster Name: xxx-Cluster01
Total Groups: 2
Group Status Type
Cluster Group Online Core Cluster
Available Storage Offline Available Storage
Cluster Group
Description:
Status: Online
Current Owner: xxx-Host01
Preferred Owners: None
Failback Policy: No failback policy defined.
Resource Type Status Possible Owners
Cluster Disk 1 Physical Disk Online All Nodes
IP Address: 10.10.0.60 IP Address Online All Nodes
Name: xxx-Cluster01 Network Name Online All Nodes
Available Storage
Description:
Status: Offline
Current Owner: Per-Host02
Preferred Owners: None
Failback Policy: No failback policy defined.
Cluster Shared Volumes
Resource Type Status Possible Owners
Data Cluster Shared Volume Online All Nodes
Snapshots Cluster Shared Volume Online All Nodes
System Cluster Shared Volume Online All Nodes
Back to Summary
Back to Top
List Cluster Network Information
List cluster-specific network settings that are stored in the cluster configuration.
Network: Cluster Network 1
DHCP Enabled: False
Network Role: Internal and client use
Metric: 10000
Prefix Prefix Length
10.10.0.0 20
Network: Cluster Network 2
DHCP Enabled: False
Network Role: Internal use
Metric: 1000
Prefix Prefix Length
10.13.0.0 24
Subnet Delay
CrossSubnetDelay 1000
CrossSubnetThreshold 5
SameSubnetDelay 1000
SameSubnetThreshold 5
Validating that Network Load Balancing is not configured on node xxx-Host01.
Validating that Network Load Balancing is not configured on node xxx-Host02.
An error occurred while executing the test.
Failed to connect to the service manager on 'xxx-Host02'.
The RPC server is unavailable
Back to Summary
Back to Top
If it was an RPC connection issue, then i shouldnt be able to mstsc, explorer shares to host02. Well i can access them, which makes the report above is a bit misleading.
I have also checked the rpc service and it has started.
If there is anyone that can shed some light or advice me oany other option for trouble shooting this, that would be greatley appreciated.
Kind regards,
Chucky

Hi all,
I am running a fail over cluster of
Host:
2 x WS2008 R2 Data Centre
managed by VMM:
VMM 2008 R2
Virtual Host:
1x windows 2003 64bit guest host/virtual machine
I have attempted a live migration through VMM 2008 R2 and im presented withe the following error:
Error (10698)
Virtual machine XXXXX could not be live migrated to virtual machine host xxx-Host01 using this cluster configuration.
(Unspecified error (0x80004005))
What i have found when running the cluster validation:
1 out of the 2 hosts have an error with RPC related to network configuration:
An error occurred while executing the test.
Failed to connect to the service manager on 'xxx-Host02'.
The RPC server is unavailable
However there are no errors or events on host02 that are showing any probelms at all.
In fact the validation report goes on to showing the rest of the configuration information of both cluster hosts as ok.
See below:
List BIOS Information
List BIOS information from each node.
xxx-Host01
Gathering BIOS Information for xxx-Host01
Item Value
Name Phoenix ROM BIOS PLUS Version 1.10 1.1.6
Manufacturer Dell Inc.
SMBios Present True
SMBios Version 1.1.6
SMBios Major Version 2
SMBios Minor Version 5
Current Language en|US|iso8859-1
Release Date 3/23/2008 9:00:00 AM
Primary BIOS True
xxx-Host02
Gathering BIOS Information for xxx-Host02
Item Value
Name Phoenix ROM BIOS PLUS Version 1.10 1.1.6
Manufacturer Dell Inc.
SMBios Present True
SMBios Version 1.1.6
SMBios Major Version 2
SMBios Minor Version 5
Current Language en|US|iso8859-1
Release Date 3/23/2008 9:00:00 AM
Primary BIOS True
Back to Summary
Back to Top
List Cluster Core Groups
List information about the available storage group and the core group in the cluster.
Summary
Cluster Name: xxx-Cluster01
Total Groups: 2
Group Status Type
Cluster Group Online Core Cluster
Available Storage Offline Available Storage
Cluster Group
Description:
Status: Online
Current Owner: xxx-Host01
Preferred Owners: None
Failback Policy: No failback policy defined.
Resource Type Status Possible Owners
Cluster Disk 1 Physical Disk Online All Nodes
IP Address: 10.10.0.60 IP Address Online All Nodes
Name: xxx-Cluster01 Network Name Online All Nodes
Available Storage
Description:
Status: Offline
Current Owner: Per-Host02
Preferred Owners: None
Failback Policy: No failback policy defined.
Cluster Shared Volumes
Resource Type Status Possible Owners
Data Cluster Shared Volume Online All Nodes
Snapshots Cluster Shared Volume Online All Nodes
System Cluster Shared Volume Online All Nodes
Back to Summary
Back to Top
List Cluster Network Information
List cluster-specific network settings that are stored in the cluster configuration.
Network: Cluster Network 1
DHCP Enabled: False
Network Role: Internal and client use
Metric: 10000
Prefix Prefix Length
10.10.0.0 20
Network: Cluster Network 2
DHCP Enabled: False
Network Role: Internal use
Metric: 1000
Prefix Prefix Length
10.13.0.0 24
Subnet Delay
CrossSubnetDelay 1000
CrossSubnetThreshold 5
SameSubnetDelay 1000
SameSubnetThreshold 5
Validating that Network Load Balancing is not configured on node xxx-Host01.
Validating that Network Load Balancing is not configured on node xxx-Host02.
An error occurred while executing the test.
Failed to connect to the service manager on 'xxx-Host02'.
The RPC server is unavailable
Back to Summary
Back to Top
If it was an RPC connection issue, then i shouldnt be able to mstsc, explorer shares to host02. Well i can access them, which makes the report above is a bit misleading.
I have also checked the rpc service and it has started.
If there is anyone that can shed some light or advice me oany other option for trouble shooting this, that would be greatley appreciated.
Kind regards,
Chucky
Raja. B

When setting up converged network in VMM cluster and live migration virtual nics not working

Hello Everyone,
I am having issues setting up converged network in VMM. I have been working with MS engineers to no avail. I am very surprised with the expertise of the MS engineers. They had no idea what a converged network even was. I had way more
experience then these guys and they said there was no escalation track so I am posting here in hopes of getting some assistance.
Everyone including our consultants says my setup is correct.
What I want to do:
I have servers with 5 nics and want to use 3 of the nics for a team and then configure cluster, live migration and host management as virtual network adapters. I have created all my logical networks, port profile with the uplink defined as team and
networks selected. Created logical switch and associated portprofle. When I deploy logical switch and create virtual network adapters the logical switch works for VMs and my management nic works as well. Problem is that the cluster and live
migration virtual nics do not work. The correct Vlans get pulled in for the corresponding networks and If I run get-vmnetworkadaptervlan it shows cluster and live migration in vlans 14 and 15 which is correct. However nics do not work at all.
I finally decided to do this via the host in powershell and everything works fine which means this is definitely an issue with VMM. I then imported host into VMM again but now I cannot use any of the objects I created and VMM and have to use standard
switch.
I am really losing faith in VMM fast.
Hosts are 2012 R2 and VMM is 2012 R2 all fresh builds with latest drivers
Thanks

Have you checked our whitepaper http://gallery.technet.microsoft.com/Hybrid-Cloud-with-NVGRE-aa6e1e9a for how to configure this through VMM?
Are you using static IP address assignment for those vNICs?
Are you sure your are teaming the correct physical adapters where the VLANs are trunked through the connected ports?
Note; if you create the teaming configuration outside of VMM, and then import the hosts to VMM, then VMM will not recognize the configuration.
The details should be all in this whitepaper.
-kn
Kristian (Virtualization and some coffee: http://kristiannese.blogspot.com )

How to Fix: Error (10698) The virtual machine () could not be live migrated to the virtual machine host () using this cluster configuration.

I am unable to live migrate via SCVMM 2012 R2 to one Host in our 5 node cluster. The job fails with the errors below.
Error (10698)
The virtual machine () could not be live migrated to the virtual machine host () using this cluster configuration.
Recommended Action
Check the cluster configuration and then try the operation again.
Information (11037)
There currently are no network adapters with network optimization available on host.
The host properties indicate network optimization is available as indicated in the screen shot below.
Any guidance on things to check is appreciated.
Thanks,
Glenn

Here is a snippet of the cluster log when from the current VM owner node of the failed migration:
00000e50.000025c0::2014/02/03-13:16:07.495 INFO [RHS] Resource Virtual Machine Configuration VMNameHere called SetResourceLockedMode. LockedModeEnabled0, LockedModeReason0.
00000b6c.00001a9c::2014/02/03-13:16:07.495 INFO [RCM] HandleMonitorReply: LOCKEDMODE for 'Virtual Machine Configuration VMNameHere', gen(0) result 0/0.
00000e50.000025c0::2014/02/03-13:16:07.495 INFO [RHS] Resource Virtual Machine VMNameHere called SetResourceLockedMode. LockedModeEnabled0, LockedModeReason0.
00000b6c.00001a9c::2014/02/03-13:16:07.495 INFO [RCM] HandleMonitorReply: LOCKEDMODE for 'Virtual Machine VMNameHere', gen(0) result 0/0.
00000b6c.00001a9c::2014/02/03-13:16:07.495 INFO [RCM] HandleMonitorReply: INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine VMNameHere', gen(0) result 0/0.
00000b6c.000020ec::2014/02/03-13:16:07.495 INFO [GEM] Node 3: Sending 1 messages as a batched GEM message
00000e50.000025c0::2014/02/03-13:16:07.495 INFO [RES] Virtual Machine Configuration <Virtual Machine Configuration VMNameHere>: Current state 'MigrationSrcWaitForOffline', event 'MigrationSrcCompleted', result 0x8007274d
00000e50.000025c0::2014/02/03-13:16:07.495 INFO [RES] Virtual Machine Configuration <Virtual Machine Configuration VMNameHere>: State change 'MigrationSrcWaitForOffline' -> 'Online'
00000e50.000025c0::2014/02/03-13:16:07.495 INFO [RES] Virtual Machine <Virtual Machine VMNameHere>: Current state 'MigrationSrcOfflinePending', event 'MigrationSrcCompleted', result 0x8007274d
00000e50.000025c0::2014/02/03-13:16:07.495 INFO [RES] Virtual Machine <Virtual Machine VMNameHere>: State change 'MigrationSrcOfflinePending' -> 'Online'
00000e50.00002080::2014/02/03-13:16:07.510 ERR   [RES] Virtual Machine <Virtual Machine VMNameHere>: Live migration of 'Virtual Machine VMNameHere' failed.
Virtual machine migration operation for 'VMNameHere' failed at migration source 'SourceHostNameHere'. (Virtual machine ID 6901D5F8-B759-4557-8A28-E36173A14443)
The Virtual Machine Management Service failed to establish a connection for a Virtual Machine migration with host 'DestinationHostNameHere': No connection could be made because the tar
00000e50.00002080::2014/02/03-13:16:07.510 ERR   [RHS] Resource Virtual Machine VMNameHere has cancelled offline with error code 10061.
00000b6c.000020ec::2014/02/03-13:16:07.510 INFO [RCM] HandleMonitorReply: OFFLINERESOURCE for 'Virtual Machine VMNameHere', gen(0) result 0/10061.
00000b6c.000020ec::2014/02/03-13:16:07.510 INFO [RCM] Res Virtual Machine VMNameHere: OfflinePending -> Online( StateUnknown )
00000b6c.000020ec::2014/02/03-13:16:07.510 INFO [RCM] TransitionToState(Virtual Machine VMNameHere) OfflinePending-->Online.
00000b6c.00001a9c::2014/02/03-13:16:07.510 INFO [GEM] Node 3: Sending 1 messages as a batched GEM message
00000b6c.000020ec::2014/02/03-13:16:07.510 INFO [RCM] rcm::QueuedMovesHolder::VetoOffline: (VMNameHere with flags 0)
00000b6c.000020ec::2014/02/03-13:16:07.510 INFO [RCM] rcm::QueuedMovesHolder::RemoveGroup: (VMNameHere) GroupBeingMoved: false AllowMoveCancel: true NotifyMoveFailure: true
00000b6c.000020ec::2014/02/03-13:16:07.510 INFO [RCM] VMNameHere: Removed Flags 4 from StatusInformation. New StatusInformation 0
00000b6c.000020ec::2014/02/03-13:16:07.510 INFO [RCM] rcm::RcmGroup::CancelClusterGroupOperation: (VMNameHere)
00000b6c.00001a9c::2014/02/03-13:16:07.510 INFO [GEM] Node 3: Sending 1 messages as a batched GEM message
00000b6c.000021a8::2014/02/03-13:16:07.510 INFO [GUM] Node 3: executing request locally, gumId:3951, my action: /dm/update, # of updates: 1
00000b6c.000021a8::2014/02/03-13:16:07.510 INFO [GEM] Node 3: Sending 1 messages as a batched GEM message
00000b6c.00001a9c::2014/02/03-13:16:07.510 INFO [GEM] Node 3: Sending 1 messages as a batched GEM message
00000b6c.000022a0::2014/02/03-13:16:07.510 INFO [RCM] moved 0 tasks from staging set to task set. TaskSetSize=0
00000b6c.000022a0::2014/02/03-13:16:07.510 INFO [RCM] rcm::RcmPriorityManager::StartGroups: [RCM] done, executed 0 tasks
00000b6c.00000dd8::2014/02/03-13:16:07.510 INFO [RCM] ignored non-local state Online for group VMNameHere
00000b6c.000021a8::2014/02/03-13:16:07.526 INFO [GUM] Node 3: executing request locally, gumId:3952, my action: /dm/update, # of updates: 1
00000b6c.000021a8::2014/02/03-13:16:07.526 INFO [GEM] Node 3: Sending 1 messages as a batched GEM message
00000b6c.000018e4::2014/02/03-13:16:07.526 INFO [RCM] HandleMonitorReply: INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine VMNameHere', gen(0) result 0/0.
No entry is made on the cluster log of the destination node.
To me this means the nodes cannot talk to each other, but I don’t know why.
They are on the same domain. Their server names resolve properly and they can ping eachother both by name and IP.

Live Migration and private network

Is it a best practice to put up a Private Network beetween the nodes in a pool (reserving a few network cards and switch ports for it), to have a dedicated network for the traffic generated e.g. by live migration and/or ocfs2 heartbeat? I was wondering why such setup is generally recommended in other virtualization solutions, but apparently it's not considered strictly necessary in OVM... Why? Are there any docs regarding this? I couldn't find any.
Thanks!

Hi Roynor,
regarding the physical separation beetween management+hypervisor and the guest VMs, it's now implemented and working...
My next doubt on the list of doubts :-) at this point is:
I could easily set up ONE MORE dedicated bond, create a Bridge with a private IP on it on each server (e.g. 10.xxx.xxx.xxx), and then create a Private VLAN completely insulated from the rest of the world.
I'd be putting the physical switch ports where the Private Bonds/Bridges belong to on the same VLAN ID.
But:
- How can I be sure that this network WILL be actually used by the relevant traffic? If I'm not wrong, when you set up e.g. a physical RAC cluster, at a certain point you are prompted to choose what network to use for the Heartbeat (and it will be marked as PRIVATE), and what network will be used by clients traffic (PUBLIC).
In Oracle VM such setting does not exist... Neither during installation, nor in VM Manager, nowhere.
- Apart from Security, I'm doubting that during heavy VMs migration problems could arise, because if the network gets saturated, there are chances that the OCFS2 heartbeat would be somehow "lost", therefore messing up HA etc. This is at least the reason why in a RAC setup a private network is highly recommended.
- I finally found that doc you mention from IBM (thanks for pointing it out!) but my opinion is that THEIR INTENTION was to separate the traffic at the same way I'd like to, but there is simply NO PROOF that such setup would work... They do not mention where you can specify what traffic you want to be on what network...
This is a very important point... I'm wondering why this lack of information.
Thanks for your feedback, btw
Edited by: rlomba on Dec 17, 2009 6:16 AM

Live migration suddenly won't work on 3.1.1

I've done many live migrations over the last few months, with no problems. Suddenly, they don't work any more. Here's what I see:
I start the migration from the Manager.
The VM immediately disappears from the list of VMs on the source servers, and appears on the destination server.
The job shows "in progress", and it NEVER completes.
The "% complete" for the job never says anything but ZERO.
If I look at the 'details' on the 'in progress' migration job, it says:
Job Construction Phase
begin()
Appended operation 'Bridge Configure Operation' to object '0004fb00002000005c945b4212271249 (network.BondPort (2) in oravm3.acbl.net)'.
Appended operation 'Virtual Machine Migrate' to object '0004fb000006000066c8e49bc5ab54b0 (jiplcm01)'.
commit()
Completed Step: COMMIT
Objects and Operations
Object (IN_USE): [Server] e2:a3:70:c6:67:89:e1:11:bb:8e:e4:1f:13:eb:92:b2 (oravm3.acbl.net)
Object (IN_USE): [BondPort] 0004fb00002000005c945b4212271249 (network.BondPort (2) in oravm3.acbl.net)
Operation: Bridge Configure Operation
Object (IN_USE): [Server] 92:0f:60:b4:84:91:e1:11:aa:cb:e4:1f:13:eb:d2:3a (oravm2.acbl.net)
Object (IN_USE): [VirtualMachine] 0004fb000006000066c8e49bc5ab54b0 (jiplcm01)
Operation: Virtual Machine Migrate
Job Running Phase at 13:10 on Wed, Jan 2, 2013
Job Participants: [92:0f:60:b4:84:91:e1:11:aa:cb:e4:1f:13:eb:d2:3a (oravm2.acbl.net)]
Actioner
Starting operation 'Bridge Configure Operation' on object '0004fb00002000005c945b4212271249 (network.BondPort (2) in oravm3.acbl.net)'
Bridge [0004fb001018c4c] already exists (and should exist) on interface [bond1] on server [oravm3.acbl.net]; skipping bridge creation
Completed operation 'Bridge Configure Operation' completed with direction ==> DONE
Starting operation 'Virtual Machine Migrate' on object '0004fb000006000066c8e49bc5ab54b0 (jiplcm01)'
Job failed commit (internal) due to Caught during invoke method: java.net.SocketException: Socket closed
Wed Jan 02 13:11:36 EST 2013
com.oracle.odof.exception.InternalException: Caught during invoke method: java.net.SocketException: Socket closed
Wed Jan 02 13:11:36 EST 2013
at com.oracle.odof.OdofExchange.invokeMethod(OdofExchange.java:956)
at com.oracle.ovm.mgr.api.job.InternalJobProxy.objectCommitter(Unknown Source)
at com.oracle.ovm.mgr.api.job.JobImpl.internalJobCommit(JobImpl.java:281)
at com.oracle.ovm.mgr.api.job.JobImpl.commit(JobImpl.java:651)
at com.oracle.ovm.mgr.faces.model.JobEO$CommitWork.run(JobEO.java:233)
at weblogic.work.j2ee.J2EEWorkManager$WorkWithListener.run(J2EEWorkManager.java:183)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:178)
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248)
at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541)
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
at com.oracle.odof.io.AbstractSocket.receive(AbstractSocket.java:220)
at com.oracle.odof.io.AbstractSocket.receive(AbstractSocket.java:173)
at com.oracle.odof.OdofExchange.send(OdofExchange.java:473)
at com.oracle.odof.OdofExchange.send(OdofExchange.java:427)
at com.oracle.odof.OdofExchange.invokeMethod(OdofExchange.java:938)
... 7 more
Anyone have any idea what the problem is? What can I do to gather useful information?

Job failed commit (internal) due to Caught during invoke method: java.net.SocketException: Socket closed
at com.oracle.ovm.mgr.api.job.InternalJobProxy.objectCommitter(Unknown Source)
It looks to me that either the target server does not have access to everything needed to complete the migration (access to the shared pool, access to the shared storage and etc) or the target server is having an issue communicating with the VM Manager.
I wish such error were more descriptive but I believe the "unknown source" and "socket closed" indicates such a problem.

Hyper-V Live Migration Compatibility with Hyper-V Replica/Hyper-V Recovery Manager

Hi,
Is Hyper-V Live Migration compatible with Hyper-V Replica/Hyper-V Recovery
Manager?
I have 2 Hyper-V clusters in my datacenter - both using CSVs on Fibre Channel arrays. These clusters where created and are managed using the same "System Center 2012 R2 VMM" installation. My goal it to eventually move one of these clusters to a remote
DR site. Both sites are connected/will be connected to each other through dark fibre.
I manually configured Hyper-V Replica in the Fail Over Cluster Manager on both clusters and started replicating some VMs using Hyper-V
Replica.
Now every time I attempt to use SCVMM to do a Live Migration of a VM that is protected using Hyper-V Replica to
another host within the same cluster,
the Migration VM Wizard gives me the following "Rating Explanation" error:
"The virtual machine virtual machine name which
requires Hyper-V Recovery Manager protection is going to be moved using the type "Live". This could break the recovery protection status of the virtual machine.
When I ignore the error and do the Live Migration anyway, the Live migration completes successfully with the info above. There doesn't seem to be any impact on the VM or it's replication.
When a Host Shuts-down or is put into maintenance, the VM Migrates successfully, again, with no noticeable impact on users or replication.
When I stop replication of the VM, the error goes away.
Initially, I thought this error was because I attempted to manually configure
the replication between both clusters using Hyper-V Replica in Failover Cluster Manager (instead of using Hyper-V Recovery Manager).
However, even after configuring and using Hyper-V Recovery Manager, I still get the same error. This error does not seem to have any impact on the high-availability of
my VM or on Replication of this VM. Live migrations still occur successfully and replication seems to carry on without any issues.
However, it now has me concern that Live Migration may one day occur and break replication of my VMs between both clusters.
I have searched, and searched and searched, and I cannot find any mention in official or un-official Microsoft channels, on the compatibility of these two features.
I know vMware vSphere replication and vMotion are compatible with each otherhttp://pubs.vmware.com/vsphere-55/index.jsp?topic=%2Fcom.vmware.vsphere.replication_admin.doc%2FGUID-8006BF58-6FA8-4F02-AFB9-A6AC5CD73021.html.
Please confirm to me: Are Hyper-V Live Migration and Hyper-V Replica compatible
with each other?
If they are, any link to further documentation on configuring these services so that they work in a fully supported manner will be highly appreciated.
D

This can be considered as a minor GUI bug.
Let me explain. Live Migration and Hyper-V Replica is supported on both Windows Server 2012 and 2012 R2 Hyper-V.
This is because we have the Hyper-V Replica Broker Role (in a cluster) that is able to detect, receive and keep track of the VMs and the synchronizations. The configuration related to VMs enabled with replications follows the VMs itself.
If you try to live migrate a VM within Failover Cluster Manager, you will not get any message at all. But VMM will (as you can see), give you an
error but it should rather be an informative message instead.
Intelligent placement (in VMM) is responsible for putting everything in your environment together to give you tips about where the VM best possible can run, and that is why we are seeing this message here.
I have personally reported this as a bug. I will check on this one and get back to this thread.
Update: just spoke to one of the PMs of HRM and they can confirm that live migration is supported - and should work in this context.
Please see this thread as well: http://social.msdn.microsoft.com/Forums/windowsazure/en-US/29163570-22a6-4da4-b309-21878aeb8ff8/hyperv-live-migration-compatibility-with-hyperv-replicahyperv-recovery-manager?forum=hypervrecovmgr
-kn
Kristian (Virtualization and some coffee: http://kristiannese.blogspot.com )

2012 R2 Cluster and Live Migration

6 Node Cluster with Server 2012 R2, all VM's are Server 2012 R2
4 Fiber SAN's
Moving and Live Migration worked fine in FailOver Cluster Manager
But this time we were trying do it with SCVMM 2012 R2 and just move one VM (Gen2)
Of course it failed at 99%
Error (12711)
VMM cannot complete the WMI operation on the server (whatever) because of an error: [MSCluster_Resource.Name="SCVMM VMHost"] The cluster resource could not be found.
The cluster resource could not be found (0x138F)
Recommended Action
Resolve the issue and then try the operation again.
How do I fix this? the VM is still running. The two vhdx files it was moving are smaller then orginal's , but it change the configuration file to point to new ones, which are bad.
it says I can Repair it... Redo or Undo....of course neither of those options work.
Wait for the object to be updated automatically by the next periodic Live migrate storage of virtual machine vmhost from whatever to whatever job.
ID: 1708
Cluster has no errors, SAN's have no errors, CSV have no errors. the machine running scvmm is VM running on the cluster

How did you create this VM? if this is created outside of VMM, I recommend doing a manual refresh of the VM first to ensure that VMM can read its attributes. Then retry the operation.
Btw, are the VMs using diff disk? any checkpoints associated with them?
Kristian (Virtualization and some coffee: http://kristiannese.blogspot.com )

OVM 3.1.1 - Live migration not completed

Similar Messages

Maybe you are looking for