One cluster node shows erro
Hi,
This FTP channel Sender and in RWB i see one cluster node Red ( error ) and other as Green
The error is Unknown host exception. Files are being pulled by other node but first node shows error.
Any guess why would this happen.
Thanks!
>>>Files are being pulled by other node but first node shows error.
This is fine. If you observer the error time-stamp it might not be latest unlike the one which is processing.
AFAIK - At a given point your sender channel will point to only one instance.
So whenever (random behavior) the channel points to other instance, this should go away.
If you notice a different behavior then you should configure the advanced mode parameter "clusterSyncMode".
Similar Messages
-
ASM disk busy 99% only on one cluster node
Hello,
We have a three node Oracle RAC cluster. Our dba(s) called us and said they are getting OEM critical alers for an asm disk on one node only. I checked and the SAN attached drive does not show the same high utilization on either of the other two nodes. I checked the hardware and it seems fine. If the issue was with the SAN attached disk, we would be seeing the same errors on all three nodes since they share the same disks. The system crashed last week(alert dump in the +asm directories), and at the disk has been busy ever since. I asked if the dba reviewed the ADDM reports and he said he had and that there were no suspicious looking entries that would lead us to the root cause based on those reports. CPU utilization is fine. I am not sure where to look at this point and any help pointing me in the right direction would be appreciated. They do use RMAN, could there be a backup running using those disks only on one node? Has anyone ever seen this before?
Thank you,
Benita Ulisano
Unix/SAN Team
Chicago Public Schools
[email protected]Hi Harish,
Thank you for responding. To answer your question, yes, the disks are all of the same spec and are shared among the three cluster node. The asm disk sdw1 is the one with the issue.
Problem Node: coefsdb02
three nodes in RAC cluster
coefsdb01, coefsdb02, coefsdb03
iostat results for all three nodes - same disk
coefsdb01
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdw1 0.00 1.71 0.12 0.58 1.27 18.78 28.63 0.01 13.38 1.75 0.12
coefsdb02
sdw1 0.11 0.02 4.00 0.62 305.84 21.72 70.93 2.96 12.58 211.95 97.88
coefdb03
sdw1 0.21 0.01 4.70 0.33 224.05 13.52 47.22 0.05 10.11 6.15 3.09
The dba(s) run RMAN backups, but only on coefsdb01.
Benita -
Easiest way getting one cluster node "clean" (without messages) in conv clu
hi *,
i would like to know if any of you have (production) experience with MQ when it comes to troubleshooting...
for the case something really bad happens to one of your brokers like his file store is getting bigger and bigger (no matter why or e.g. [something similar like this|http://forums.sun.com/thread.jspa?threadID=5334175&tstart=0] )
how do you deal with this normally?
we until now do not have any clustered JMS server in our production system since our JMS clients are always configured to access single points (single JMS servers) we nowadays do it like this:
turn off all producers to the JMS server
wait some time till all messages have been consumed
stop JMS server
delete his file store
boot up JMS server again (clean)
start producers again
since we are planning to rollout a conv cluster with 15 nodes (brokers) and with severel cross referencing clients (clients are configured to use up to 3-5 servers for ensuring they are aleways served) we do not know how we would do the same for one cluster noedes of this cluster.
we can not do the exact thing as i mentioned above since i do not know which clients are at this time bound to which server.
any idea?
regards chrishi linda,
thanks fo your feedback.
we will implement it like you suggested. by scripting a drain scenario with imqcmd.
one thing i do not understand about your last post is:
when you say: "We'll look @ adding something that removes all consumers on a service in the next release (to make this easier) "
what would this feature help me to drain a broker?
ideally for me it would be like that:
1)st quiesce a broker
2)nd kill producers (force them to failover)
3)rd wait till all messages are gone
4)th kill consumers (force them to failover)
stop broker.
so my "feature request" would be killing cxn due to what they are (consumers / producers / maybe all).
do i have to log an enhancement request for this to make your life easier?
regards chris -
We have Oracle datbase 11gR1 in RAC node with Oracle ASM.Recently our database server got crashed and we are trying to restore back services.
Using Snapshot technologyBusiness copy we had synced all our disk on storage level. Post this when we are trying to start ASM instance on node 1 it is coming and showing all diskgroups but on other node it is throwing errot with missing e diskgroup.
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "5" is missing
Expert please share your views.
Thanks,
TusharThe I/O fabric layer on the other node failed to mount all storage LUNs - resulting in ASM being unable to mount a diskgroup as there are missing disks in that group.
Rebooting is exactly what could be needed to reset the h/w and infrastructure used by that node, in order for it to see all the storage disks again. As node 1 sees all storage disks (and is working), the disk itself on the storage system is intact and usable.
What is the o/s? What is the fabric layer? What is used on o/s for dealing with the I/O fabric layer? -
SQL LOG Backup failed in one Cluster Node
I have 02 node SQL fail over cluster, NOD01 and NODE 02. and configure SQL log backup job via SQL Logshipping
When the sql service is mounted to node 02 job backup will work without any issues, Once its connected to node 01 this will provide below issue
Executed as user: <domain>\administrator. The process could not be created for step 1 of job 0xAC90A0F3623AE44285089E9EF53B12C7 (reason: The system cannot find the file specified). The step failed.
could anyone have on fix for this
ThanxSQL Server Agent on both nodes run under same domain account?
Are you sure that path location is correct?
Best Regards,Uri Dimant SQL Server MVP,
http://sqlblog.com/blogs/uri_dimant/
MS SQL optimization: MS SQL Development and Optimization
MS SQL Consulting:
Large scale of database and data cleansing
Remote DBA Services:
Improves MS SQL Database Performance
SQL Server Integration Services:
Business Intelligence -
SCVMM losing connection to cluster nodes
Hey guys'n girls, I hope this is the right forum for this question. I already opened a ticket at MS support as well because it's impacting our production environment indirectly, but even after a week there's been no contact. Losing faith in MS support there
The problem we're having is that scvmm is that a host enters the 'needs attention' state, with a winrm error 0x80338126. I guess it has something to do with the network or with Kerberos, and I've found some info on it, but I still haven't been able to solve
it. Do you guys have any ideas?
Problem summary:
We are seeing an issue on our new hyper-v platform. The platform should have been in production last week, but this issue is delaying our project as we can't seem to get it stable.
The problem we are experiencing is that SCVMM loses the connection to some of the Hyper-V nodes. Not one
specific node. Last week it happened to two nodes, and today it happened to another node. I see issues with WinRM, and I expect something to do with kerberos. See the bottom of this post for background details and software versions.
The host gets the status 'needs attention', and if you look at the status of the machine, WinRM gives an error. The error is:
Error (2916)
VMM is unable to complete the request. The connection to the agent cc1-hyp-10.domaincloud1.local was lost.
WinRM: URL: [http://cc1-hyp-10.domaincloud1.local:5985], Verb: [ENUMERATE], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_Service], Filter: [select * from Win32_Service where Name="WinRM"]
Unknown error (0x80338126)
Recommended Action
Ensure that the Windows Remote Management (WinRM) service and the VMM agent are installed and running and that a firewall is not blocking HTTP/HTTPS traffic. Ensure that VMM server is able to communicate with cc1-hyp-10.domaincloud1.local over WinRM by successfully
running the following command:
winrm id –r:cc1-hyp-10.domaincloud1.local
This
problem can also be caused by a Windows Management Instrumentation (WMI) service crash. If the server is running Windows Server 2008 R2, ensure that KB 982293 (http://support.microsoft.com/kb/982293)
is installed on it.
If the error persists, restart cc1-hyp-10.domaincloud1.local and then try the operation again. /nRefer to
http://support.microsoft.com/kb/2742275 for more details.
Doing a simple test from the VMM server to the problematic cluster node shows this error:
PS C:\> hostname
CC1-VMM-01
PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
WSManFault
Message = WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this
computer. By default, the WinRM firewall exception for public profiles limits access to remote computers within the same local subnet.
Error number: -2144108250 0x80338126
WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this computer. By default, the WinRM
firewall exception for public profiles limits access to remote computers within the same local subnet.
I CAN connect from other hosts to this problematic cluster node:
PS C:\> hostname
CC1-HYP-16
PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
IdentifyResponse
ProtocolVersion =
http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
ProductVendor = Microsoft Corporation
ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
SecurityProfiles
SecurityProfileName =
http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
And I can connect from the vmm server to all other cluster nodes:
PS C:\> hostname
CC1-VMM-01
PS C:\> winrm id -r:cc1-hyp-11.domaincloud1.local
IdentifyResponse
ProtocolVersion =
http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
ProductVendor = Microsoft Corporation
ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
SecurityProfiles
SecurityProfileName =
http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
So at this point only the test from the cc1-vmm-01 to cc1-hyp-10 seems to be problematic.
I followed the steps in the page
https://support.microsoft.com/kb/2742275 (which is referred to above). I tried the VMMCA, but it can't really get it working the way I want, or it seems to give outdated recommendations.
I tried checking for duplicate SPN's by running setspn -x on affected machines. No results (although I do not understand
what an SPN is or how it works). I rebuilt the performance counters.
It tried setting 'sc config winrm type= own' as described in [http://blinditandnetworkadmin.blogspot.nl/2012/08/kb-how-to-troubleshoot-needs-attention.html].
If I reboot this cc1-hyp-10 machine, it will start working perfectly again. However, then I can't troubleshoot the issue, and it will happen again.
I want this problem to be solved, so vmm never loses connection to the hypervisors it's managing again!
Background information:
We've set up a platform with Hyper-V to run a VM workload. The platform consists of the following hardware:
2 Dell R620's with 32GB of RAM, running hyper-v to virtualize the cloud management layer (DC's, VMM, SQL). These machines are called cc1-hyp-01 and cc1-hyp-02. They run the management vm's like cc1-dc-01/02, cc1-sql-01, cc1-vmm-01, etc. The names are self-explanatory.
The VMM machine is NOT clustered.
8 Dell M620 blades with 320GB of RAM, running hyper-v to virtualize the customer workload. The machines are
called cc1-hyp-10 until cc1-hyp-17. They are in a cluster.
2 Equallogic units form a SAN (premium storage), and we have a Dell R515 running iscsi target (budget storage).
We have Dell Force10 switches and Cisco C3750X switches to connect everything together (mostly 10GB links).
All hosts run Windows Server 2012R2 Datacenter edition. The VMM server runs System Center Virtual Machine Manage 2012 R2.
All the latest Windows updates are installed on every host. There are no firewalls between any host (vmm and hypervisors) at this level. Windows firewalls are all disabled. No antivirus software is installed, no symantec software is installed.
The only non-standard software that is installed is the Dell Host Integration Tools 4.7.1, Dell Openmanage Server Administrator, and some small stuff like 7-zip, bginfo, net-snap, etc.
The SCVMM service is running under the domain account DOMAINCLOUD1\scvmm. This machine is in the local administrators group of each cluster node.
On top of this cloud layer we're running the tenant layer with a lot of vm's for a specific customer (although they are all off now).I think I found the culprit, after an hour of analyzing wireshark dumps I found the vmm had jumbo frames enabled on the management interface to the hosts (and the underlying infrastructure does not).. Now my winrm commands started working again.
-
Processing in Multiple Cluster Nodes
Hi All,
In our PI system we have 2 Java nodes due to some requirement. When the communication channel runs and we check the message log, in one Cluster node we have a successful message. In other Cluster Node we have an error message that says "File not found".
The file processing is completeing successfully on one Cluster node. But I wanted to know if there is any way to suppress the processing of the same file by same channel on another Node. Some setting in administration or IB where we can get this done.
Is there any way to get this done by some setting?
Thanks,
Rashmi.Hello!
As per note #801926, please set the clusterSyncMode parameter on Advanced tab of the communication channel with LOCK value.
And also check the entries 4 and 48 of the FAQ note #821267:
4. FTP Sender File Processing in Cluster Environment
48. File System(NFS) File Sender Processing in Cluster Environment
Best regards,
Lucas -
Show the company I work for that we should have one cluster
The company I work for have several projects where we are going to use RAC. The issue I'm having is they want each database to be in its own cluster hardware. I'm trying to show then that we should cluster all the hardware together and make small RAC database in the one cluster.
Does any know of anyone that has written a go doc how why you should have one cluster versus multiple clusters.
Example of what they are thinking of doing note this is all separate hardware cluster1 will not be cluster with cluster2 I'm try to show them if we cluster it all together we get more power and may not need so much hardware:
Project 1
cluster1_dev
2 nodes
cluster2_qa
4 nodes
cluster3_prd
4 nodes
Project 2
cluster2_dev
2 nodes
cluster2_qa
4 nodes
cluster2_prd
4 nodes
What I’m proposing is the following setup:
Cluster_dev
3 nodes
Project1 instance cluster on all 3 nodes
Project2 instance cluster on all 3 nodes
Cluster_qa
6 nodes
Project1 instance cluster on all 6 nodes
Project2 instance cluster on all 6 nodes
Cluster_prd
6 nodes
Project1 instance cluster on all 6 nodes
Project2 instance cluster on all 6 nodesI am always impressed by choosing rac without having expressed what types of failures to be covered.
maybe you really don't need rac ...
moreover, rac is imho really not 2 nodes oriented but well grid oriented.
a lot of small server in the same cluster is for me the way to go or the the way to think.
Services features were introduced to allow one rac db to be shared by multiple applications. -
Hi All,
I am facing the below error while installing Oracle RAC in Silent Mode.
SEVERE: There are no common subnets represented by network interfaces across all cluster nodes.
SEVERE: [FATAL] [INS-40925] One or more nodes have interfaces not configured with a subnet that is common across all cluster nodes.
CAUSE: Not all nodes have network interfaces that are configured on subnets that are common to all nodes in the cluster.
ACTION: Ensure all cluster nodes have a public interface defined with the same subnet accessible by all nodes in the cluster.
My /etc/hosts is given below.
127.0.0.1 localhost localhost.localdomain
#Public
192.168.1.101 rac1 rac1.localdomain
192.168.1.102 rac2 rac2.localdomain
#Private
192.168.2.101 rac1-priv rac1-priv.localdomain
192.168.2.102 rac2-priv rac2-priv.localdomain
#Virtual
192.168.1.103 rac1-vip rac1-vip.localdomain
192.168.1.104 rac2-vip rac2-vip.localdomain
#SCAN
192.168.1.105 rac-scan rac-scan.localdomain
Could you please help me to get rid of the error INS-40925....Any Idea...???Hi Ramesh,
Please find the result of ifconfig -a from both nodes RAC1 & RAC2.
ifconfig -a in RAC1
[oracle@rac1 Desktop]$ ifconfig -a
eth0 Link encap:Ethernet HWaddr 08:00:27:17:7A:D5
inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe17:7ad5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:102 errors:0 dropped:0 overruns:0 frame:0
TX packets:48 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:25472 (24.8 KiB) TX bytes:3322 (3.2 KiB)
Interrupt:19 Base address:0xd020
eth1 Link encap:Ethernet HWaddr 08:00:27:C0:AC:DB
inet addr:192.168.2.101 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fec0:acdb/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4 errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:240 (240.0 b) TX bytes:816 (816.0 b)
Interrupt:16 Base address:0xd240
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:56 errors:0 dropped:0 overruns:0 frame:0
TX packets:56 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6394 (6.2 KiB) TX bytes:6394 (6.2 KiB)
virbr0 Link encap:Ethernet HWaddr 52:54:00:CC:BD:FB
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
virbr0-nic Link encap:Ethernet HWaddr 52:54:00:CC:BD:FB
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
ifconfig -a in RAC2
[oracle@rac2 Desktop]$ ifconfig -a
eth0 Link encap:Ethernet HWaddr 08:00:27:C9:38:82
inet addr:192.168.1.102 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fec9:3882/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:122 errors:0 dropped:0 overruns:0 frame:0
TX packets:59 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:32617 (31.8 KiB) TX bytes:5157 (5.0 KiB)
Interrupt:19 Base address:0xd020
eth1 Link encap:Ethernet HWaddr 08:00:27:90:B5:A0
inet addr:192.168.2.102 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe90:b5a0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4 errors:0 dropped:0 overruns:0 frame:0
TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:240 (240.0 b) TX bytes:746 (746.0 b)
Interrupt:16 Base address:0xd240
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:56 errors:0 dropped:0 overruns:0 frame:0
TX packets:56 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6390 (6.2 KiB) TX bytes:6390 (6.2 KiB)
virbr0 Link encap:Ethernet HWaddr 52:54:00:CC:BD:FB
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
virbr0-nic Link encap:Ethernet HWaddr 52:54:00:CC:BD:FB
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) -
Cluster with 2 hosts 2012 R2
Scheduled CAU fails with:
CAU run {4EFE116C-AB49-456D-8EED-F7EDC764DA49} on cluster Cluster1 failed. Error Message:One or more errors occurred while checking the status of Windows Firewall on the cluster nodes. Review the errors for more information on how to resolve the problems.
Error Code:-2146233088 Stack: at MS.Internal.ClusterAwareUpdating.Util.<CheckFirewallsAsync>d__3a.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ClusterAwareUpdating.Commands.InvokeCauRunCommand.<_ProcessCluster>d__78.MoveNext()
If I run CAU "Analyze Readiness" ALL comes as PASS
If I run CUA by hand on same hosts with NO change to the system (not even reboot) it finishes OK
Anybody any ideas?
Thanks
SebHi,
In some case if you disabled the connection in Windows firewall inbound of
"Cluster aware updating" service it will can’t use the CAU.
More information:
Starting with Cluster-Aware Updating: Self-Updating
http://blogs.technet.com/b/filecab/archive/2012/05/17/starting-with-cluster-aware-updating-self-updating.aspx
What is Cluster Aware Updating in Windows Server 2012? (Part 1)
http://blogs.technet.com/b/mspfe/archive/2013/02/06/what-is-cluster-aware-updating-in-windows-server-2012.aspx
Cluster-Aware Updating Overview
http://technet.microsoft.com/en-us/library/hh831694.aspx
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place. -
Cluster with one 2 Node RAC and a Single Instance using ASM
Hi there,
i am not sure with one planned installation and want to ask, weather i am on the right track.
Some Facts:
Clusterware 11g
ASM 11g
Database 10gR2
AIX 5.3
3 Machines
2 Storages DS4700
My Plan
On Node 1 and Node 2 we install a RAC Database for an ERP Software
On Node 3 we install a single Instance Database for a Logistic Software
So i will install on all three Nodes Clusterware and an 3 Instances ASM - Cluster
I create 2 Diskgroups, one for the FRA and one for the Data, both on Luns on the DS4700
The RAC-Database and the Logistic-Database are using the same Diskgroups.
Is this the way to go for this circumstances?
The alternative is, as far as i see
Clusterware on an 3 Servers
One 2 Node ASM for the ERP Software
one Single Node ASM for the Logistcs
4 Diskgroups, because of the 2 ASM-Database 2 for the RAC and 2 for the Single Instance.
Please give me some hints, which way i should prefer.
My tendence is going to the first alternative. I like the idea to share the Diskgroups over more than on Database because of easy administration.
The load of the 2 Databases are completly different, the logistc software will nearly do nothing compared to the ERP Software, so this should'nt be a problem.
But maybe i oversee something, so please do not hesitate to tell me, i am completly wrong ;)
Thanks a lot
JörgChris Slattery wrote:
why clusterware on 3rd machine ?
I'd have separate DGs but that's just me.If you wish to install ASM you need OCS installed on the machine, even if it is just one node at all.
It is a kind of a dependency, no OCS, no ASM
cu
Jörg -
VMM Thinks Cluster Node is in Maintenance
I'm running VMM 2012 SP1 (version 3.1.6020.0). The cluster in question are Windows Server 2012 Datacenter.
I performed maintenance on one of my Hyper-V failover clusters (installed KB's in
this article
) and when I took one the nodes out of maintenance I successfully migrated VM's between the two via the Failover Cluster Manger console. However, I noticed that VMM still had the exclamation mark on the cluster name. I didn't noticed this until
a couple of days later and now I'm trying to do a cross-cluster migration and it's not allowing me because VMM thinks the node is in maintenance. I've tried rebooting the VMM server, refreshing the cluster, refreshing all the VMMs and no luck.
When I go into the Failover Cluster Manager on each of the cluster nodes, both nodes show in production (not in maintenance). Any ideas?
Note: the way that I took the node out of maintenance was via the Failover Cluster Manager console and NOT through VMM console, as the VMM server was unavailable at the time).It is interesting that VMM was unavailable at the time you were doing this. Are you able to refresh this particular host and see if anything changes? Are the option for "stop maintenance mode" available on this host from VMM?
Anyhow, the root cause here will be that the data in VMM database is not consistent with your resources, so as a last attempt you could remote - and add your cluster again, just so that the database will perform a clean up of the objects.
-kn
Kristian (Virtualization and some coffee: http://kristiannese.blogspot.com ) -
Cluster node does not shutdown after "received shutdown"
Hi,
We put together an automated restart process that restarts cluster nodes across multiple servers. To shutdown a node, we use the Coherence MBeanConnector and invoke stop on object: name=Management,nodeId=<member id>. This works for most cases where member's log output shows "received shutdown", partition transfer messages and after the last primary partitions have been transferred the VM exits.
For one node however, the VM did not exit. From looking at the log file for this particular node, the primary partitions were transferred, the distributedCache thread stops showing output, but the Cluster thread continues to show activity.
Note that this node was the last VM to stop on the given server.
Has anyone seen this before or ideas on why this particular node did not exit after receiving the shutdown message?
Thanks!
Marcel.Hi Marcel -
Please take a thread dump (via "kill -3" or "ctrl-break") on the VM that does not stop correctly. Coherence does not shut the VM down; it simply shuts itself down. If a non-daemon thread is running on the VM, then it may not exit. However, we won't know that until we see the thread dump.
Peace,
Cameron Purdy | Oracle Coherence -
Cluster node has exceeded it's failover threshold
I am trying to create the Availability group listener for a 2 node cluster and cluster node events show failure due to "Cluster node has exceeded it's failover threshold" after I make one attempt that has failed for a variety of reasons, usually
permission. How do I set the threshold higher. All the information I get tells me to open processes that are not listed in the failover cluster manager. I haven't seen code that works in powershell. How can I set the failover threshold higher than one?Hi,
Please try to install the recommended hotfixes and updates for Windows Server 2012-based failover cluster then monitor it again.
The related hotfixes.
Recommended hotfixes and updates for Windows Server 2012-based failover clusters
http://support.microsoft.com/kb/2784261/en-us
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place. -
Unable to failover the services in active-active cluster node
Hi,
i am applying the sp2 patch for sql server 2008 r2 in active-active cluster, we have 3 services in the cluster , node 1 as 2 prefered owner and node 2 as 1 prefered owner, when i try to move the service from node 2 to node1 , i am getting the below errors
DCOM was unable to communicate with the computer XXXXXXXXX using any of the configured protocols.
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server XXXXXXXXX. The target name used was RPCSS/XXXXXX. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal
name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using
a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server
name is not fully qualified, and the target domain (XXXXXX) is different from the client domain (XXXXXXX), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.
The Cluster service failed to bring clustered service or application 'CHCROCHC045' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
Cluster resource 'SQL Server (CHCROCHC045)' in clustered service or application 'CHCROCHC045' failed.
any inputs appreciated to resolve this issue as i could not procedd with patching
BR
PGRHi PGR,
As the issue is more related to Windows Server, I would like to recommend you post the issue in the
Windows Server forums for better support.
In addition, below are some article about troubleshooting error ” DCOM was unable to communicate with the computer XXXXXXXXX using any of the configured protocols” for your reference.
Event ID 10009 — COM Remote Service Availability
How to troubleshoot DCOM 10009 error logged in system event?
Thanks,
Lydia Zhang
Lydia Zhang
TechNet Community Support
Maybe you are looking for
-
How can I Setup my Email Account in Mavericks?
The existing description on Apple support is not valid. There ist no possibility to setup the account manually to e.g. IMAP. Best regards
-
Hi. I am design a application in Jdeveloper 11g, but i have a problem with a try to edit a toplink expression a first argument, in the button edit I click in this button but nothing happend. Why happend this, i need edit this first argument but i can
-
Labview executable error - system cannot find the file specified
Hi I am trying to create a stand-alone application and I'm getting the error message below: "System cannot find the file specified" This error message occurs when I run the setup.exe. Does any one know why this is happening? I've also attached the er
-
Pio 110D on the way....question regarding firmware
I finally pull the trigger and ordered a Pioneer 110D from OWC (along with 2GB ram, too!). Obviously I won't know what firmware it has until I get it hooked up, but I wanted to be prepared to upgrade it in the meantime. I found the DVRFlash utility,
-
9360 curve, Bluetooth and Wi-Fi not working
I kindly need assistance with my 9360 curve, for starters , every time i enable the Bluetooth the phone freezes, and the phone switches of and reboots up. Secondly everytime i enable the Wi-Fi in the drop down menu , the wheel is constantly spinning,